Project: #40

Materials Data Extraction using Large Language Models (LLMs) from Scientific Literature

Available

The key idea here is to explore how LLMs can be used to accurately extract materials data (e.g. chemical composition, mechanical properties, etc.) from journal publications to create a materials data repository.

Preliminary results show that even GPT4.0 (used in a rudimentary way) results in poor accuracy, and thus modifications (prompt engineering, RAG, fine tuning) are necessary to improve performance. Our group is working on developing such an LLM pipeline that could be used to extract materials knowledge buried in literature and a store in an easily accessible relational database format.