Professor Elsa Olivetti of MIT delivered an A3MD Distinguished Seminar entitled: “Bridging the Gap Between Literature Data Extraction and Domain Specific Materials Informatics”
Data has become a fundamental ingredient for accelerating and optimizing materials design and synthesis. Advances in applying natural language processing (NLP) to material science text has greatly increased the size and acquisition speed of materials science data from the published literature. This presentation will describe work to extract information from peer reviewed academic literature across a range of materials. Applying NLP pipelines to these types of materials science systems can be challenging due to the general schema and the noisiness of automatically extraction data. I will present data engineering techniques and discuss an optimal balance between automatic and manual data extraction.