Generalizable descriptors for automatic titanium alloys design by learning from texts via large language model

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Descriptors are essential prerequisites for the success of machine learning-based materials design. However, the automatic construction of generalizable descriptors for various properties, together with their integration with design criteria, remains a long-standing challenge. Here, we overcome this obstacle by devising a framework that integrates large language model, domain theory constraints ([Mo]eq and d-electron theory), and multi-objective global optimization. This framework fuses structured (tabular composition/processing) and unstructured (texts) data to obtain enriched descriptors and facilitate materials design. Using titanium alloys as model case, we first automatically achieve descriptors that generalize well across a variety of properties for optimized prediction models. With which a design pipeline is proposed, several alloys with high promise for enhanced competing properties are identified from a vast chemical and processing space, and their reliability is validated via experimental synthesis. It is revealed that the enriched descriptors, learned from ∼ 50,000 texts without apparent physics, capture expertise related to phase stability, alloying rules for specific properties, and more. Our proposed approach can be applied to the design of other materials where descriptors are currently inadequate and structured data availability is limited.

Original languageEnglish
Article number121275
JournalActa Materialia
Volume296
DOIs
StatePublished - 1 Sep 2025

Keywords

  • Descriptors
  • Large language model
  • Natural language processing
  • Titanium alloy

Fingerprint

Dive into the research topics of 'Generalizable descriptors for automatic titanium alloys design by learning from texts via large language model'. Together they form a unique fingerprint.

Cite this