TY - JOUR
T1 - Generalizable descriptors for automatic titanium alloys design by learning from texts via large language model
AU - Wang, Ping
AU - Jiang, Yuan
AU - Liao, Weijie
AU - Wang, Rong
AU - Lai, Minjie
AU - Kou, Hongchao
AU - Liang, Xiubing
AU - Li, Jinshan
AU - Lookman, Turab
AU - Yuan, Ruihao
N1 - Publisher Copyright:
© 2025 Acta Materialia Inc.
PY - 2025/9/1
Y1 - 2025/9/1
N2 - Descriptors are essential prerequisites for the success of machine learning-based materials design. However, the automatic construction of generalizable descriptors for various properties, together with their integration with design criteria, remains a long-standing challenge. Here, we overcome this obstacle by devising a framework that integrates large language model, domain theory constraints ([Mo]eq and d-electron theory), and multi-objective global optimization. This framework fuses structured (tabular composition/processing) and unstructured (texts) data to obtain enriched descriptors and facilitate materials design. Using titanium alloys as model case, we first automatically achieve descriptors that generalize well across a variety of properties for optimized prediction models. With which a design pipeline is proposed, several alloys with high promise for enhanced competing properties are identified from a vast chemical and processing space, and their reliability is validated via experimental synthesis. It is revealed that the enriched descriptors, learned from ∼ 50,000 texts without apparent physics, capture expertise related to phase stability, alloying rules for specific properties, and more. Our proposed approach can be applied to the design of other materials where descriptors are currently inadequate and structured data availability is limited.
AB - Descriptors are essential prerequisites for the success of machine learning-based materials design. However, the automatic construction of generalizable descriptors for various properties, together with their integration with design criteria, remains a long-standing challenge. Here, we overcome this obstacle by devising a framework that integrates large language model, domain theory constraints ([Mo]eq and d-electron theory), and multi-objective global optimization. This framework fuses structured (tabular composition/processing) and unstructured (texts) data to obtain enriched descriptors and facilitate materials design. Using titanium alloys as model case, we first automatically achieve descriptors that generalize well across a variety of properties for optimized prediction models. With which a design pipeline is proposed, several alloys with high promise for enhanced competing properties are identified from a vast chemical and processing space, and their reliability is validated via experimental synthesis. It is revealed that the enriched descriptors, learned from ∼ 50,000 texts without apparent physics, capture expertise related to phase stability, alloying rules for specific properties, and more. Our proposed approach can be applied to the design of other materials where descriptors are currently inadequate and structured data availability is limited.
KW - Descriptors
KW - Large language model
KW - Natural language processing
KW - Titanium alloy
UR - https://www.scopus.com/pages/publications/105009657455
U2 - 10.1016/j.actamat.2025.121275
DO - 10.1016/j.actamat.2025.121275
M3 - 文章
AN - SCOPUS:105009657455
SN - 1359-6454
VL - 296
JO - Acta Materialia
JF - Acta Materialia
M1 - 121275
ER -