Understanding LLMs: A comprehensive overview from training to inference

Yiheng Liu; Hao He; Tianle Han; Xu Zhang; Mengyuan Liu; Jiaming Tian; Yutong Zhang; Jiaqi Wang; Xiaohui Gao; Tianyang Zhong; Yi Pan; Shaochen Xu; Zihao Wu; Zhengliang Liu; Xin Zhang; Shu Zhang; Xintao Hu; Tuo Zhang; Ning Qiang; Tianming Liu; Bao Ge

doi:10.1016/j.neucom.2024.129190

Understanding LLMs: A comprehensive overview from training to inference

Yiheng Liu, Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, Yi Pan, Shaochen Xu, Zihao Wu, Zhengliang Liu, Xin Zhang, Shu Zhang, Xintao Hu, Tuo Zhang, Ning Qiang, Tianming LiuBao Ge

Research output: Contribution to journal › Short survey › peer-review

11 Scopus citations

Abstract

The introduction of ChatGPT has led to a significant increase in the utilization of Large Language Models (LLMs) for addressing downstream tasks. There is an increasing focus on cost-efficient training and deployment within this context. Low-cost training and deployment of LLMs represent the future development trend. This paper reviews the evolution of LLMs training techniques and inference deployment technologies aligned with this emerging trend. The objective is to provide researchers with a guide for integrating LLMs into their work. The discussion on training includes various aspects, including data preprocessing, training architecture, pre-training tasks, parallel training, and relevant content related to model fine-tuning. On the inference side, the paper covers topics such as model compression, parallel computation, memory scheduling, and structural optimization. It also explores LLMs’ utilization and provides insights into their future development.

Original language	English
Article number	129190
Journal	Neurocomputing
Volume	620
DOIs	https://doi.org/10.1016/j.neucom.2024.129190
State	Published - 1 Mar 2025

Keywords

Inference
Large language models
Survey
Training

Access to Document

10.1016/j.neucom.2024.129190

Cite this

@article{c497366f2dfd4df198ae05f1a49c9152,

title = "Understanding LLMs: A comprehensive overview from training to inference",

abstract = "The introduction of ChatGPT has led to a significant increase in the utilization of Large Language Models (LLMs) for addressing downstream tasks. There is an increasing focus on cost-efficient training and deployment within this context. Low-cost training and deployment of LLMs represent the future development trend. This paper reviews the evolution of LLMs training techniques and inference deployment technologies aligned with this emerging trend. The objective is to provide researchers with a guide for integrating LLMs into their work. The discussion on training includes various aspects, including data preprocessing, training architecture, pre-training tasks, parallel training, and relevant content related to model fine-tuning. On the inference side, the paper covers topics such as model compression, parallel computation, memory scheduling, and structural optimization. It also explores LLMs{\textquoteright} utilization and provides insights into their future development.",

keywords = "Inference, Large language models, Survey, Training",

author = "Yiheng Liu and Hao He and Tianle Han and Xu Zhang and Mengyuan Liu and Jiaming Tian and Yutong Zhang and Jiaqi Wang and Xiaohui Gao and Tianyang Zhong and Yi Pan and Shaochen Xu and Zihao Wu and Zhengliang Liu and Xin Zhang and Shu Zhang and Xintao Hu and Tuo Zhang and Ning Qiang and Tianming Liu and Bao Ge",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier B.V.",

year = "2025",

month = mar,

day = "1",

doi = "10.1016/j.neucom.2024.129190",

language = "英语",

volume = "620",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Understanding LLMs

T2 - A comprehensive overview from training to inference

AU - Liu, Yiheng

AU - He, Hao

AU - Han, Tianle

AU - Zhang, Xu

AU - Liu, Mengyuan

AU - Tian, Jiaming

AU - Zhang, Yutong

AU - Wang, Jiaqi

AU - Gao, Xiaohui

AU - Zhong, Tianyang

AU - Pan, Yi

AU - Xu, Shaochen

AU - Wu, Zihao

AU - Liu, Zhengliang

AU - Zhang, Xin

AU - Zhang, Shu

AU - Hu, Xintao

AU - Zhang, Tuo

AU - Qiang, Ning

AU - Liu, Tianming

AU - Ge, Bao

PY - 2025/3/1

Y1 - 2025/3/1

N2 - The introduction of ChatGPT has led to a significant increase in the utilization of Large Language Models (LLMs) for addressing downstream tasks. There is an increasing focus on cost-efficient training and deployment within this context. Low-cost training and deployment of LLMs represent the future development trend. This paper reviews the evolution of LLMs training techniques and inference deployment technologies aligned with this emerging trend. The objective is to provide researchers with a guide for integrating LLMs into their work. The discussion on training includes various aspects, including data preprocessing, training architecture, pre-training tasks, parallel training, and relevant content related to model fine-tuning. On the inference side, the paper covers topics such as model compression, parallel computation, memory scheduling, and structural optimization. It also explores LLMs’ utilization and provides insights into their future development.

AB - The introduction of ChatGPT has led to a significant increase in the utilization of Large Language Models (LLMs) for addressing downstream tasks. There is an increasing focus on cost-efficient training and deployment within this context. Low-cost training and deployment of LLMs represent the future development trend. This paper reviews the evolution of LLMs training techniques and inference deployment technologies aligned with this emerging trend. The objective is to provide researchers with a guide for integrating LLMs into their work. The discussion on training includes various aspects, including data preprocessing, training architecture, pre-training tasks, parallel training, and relevant content related to model fine-tuning. On the inference side, the paper covers topics such as model compression, parallel computation, memory scheduling, and structural optimization. It also explores LLMs’ utilization and provides insights into their future development.

KW - Inference

KW - Large language models

KW - Survey

KW - Training

UR - http://www.scopus.com/inward/record.url?scp=85212921025&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2024.129190

DO - 10.1016/j.neucom.2024.129190

M3 - 短篇评述

AN - SCOPUS:85212921025

SN - 0925-2312

VL - 620

JO - Neurocomputing

JF - Neurocomputing

M1 - 129190

ER -

Understanding LLMs: A comprehensive overview from training to inference

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this