EarnHFT: Efficient Hierarchical Reinforcement Learning for High Frequency Trading

Molei Qin; Shuo Sun; Wentao Zhang; Haochong Xia; Xinrun Wang; Bo An

doi:10.1609/aaai.v38i13.29384

EarnHFT: Efficient Hierarchical Reinforcement Learning for High Frequency Trading

Molei Qin, Shuo Sun, Wentao Zhang, Haochong Xia, Xinrun Wang, Bo An

Nanyang Technological University

Research output: Contribution to journal › Conference article › peer-review

8 Scopus citations

Abstract

High-frequency trading (HFT) uses computer algorithms to make trading decisions in short time scales (e.g., second-level), which is widely used in the Cryptocurrency (Crypto) market (e.g., Bitcoin). Reinforcement learning (RL) in financial research has shown stellar performance on many quantitative trading tasks. However, most methods focus on low-frequency trading, e.g., day-level, which cannot be directly applied to HFT because of two challenges. First, RL for HFT involves dealing with extremely long trajectories (e.g., 2.4 million steps per month), which is hard to optimize and evaluate. Second, the dramatic price fluctuations and market trend changes of Crypto make existing algorithms fail to maintain satisfactory performance. To tackle these challenges, we propose an Efficient hieArchical Reinforcement learNing method for High Frequency Trading (EarnHFT), a novel three-stage hierarchical RL framework for HFT. In stage I, we compute a Q-teacher, i.e., the optimal action-value based on dynamic programming, for enhancing the performance and training efficiency of second-level RL agents. In stage II, we construct a pool of diverse RL agents for different market trends, distinguished by return rates, where hundreds of RL agents are trained with different preferences of return rates and only a tiny fraction of them will be selected into the pool based on their profitability. In stage III, we train a minute-level router which dynamically picks a second-level agent from the pool to achieve stable performance across different markets. Through extensive experiments in various market trends on Crypto markets in a high-fidelity simulation trading environment, we demonstrate that EarnHFT significantly outperforms 6 state-of-art baselines in 3 popular financial criteria, exceeding the runner-up by 30% in profitability.

Original language	English
Pages (from-to)	14669-14676
Number of pages	8
Journal	Proceedings of the AAAI Conference on Artificial Intelligence
Volume	38
Issue number	13
DOIs	https://doi.org/10.1609/aaai.v38i13.29384
State	Published - 25 Mar 2024
Externally published	Yes
Event	38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, Canada Duration: 20 Feb 2024 → 27 Feb 2024

Access to Document

10.1609/aaai.v38i13.29384

Cite this

@article{f65d809e2ea24427928efbf85947354c,

title = "EarnHFT: Efficient Hierarchical Reinforcement Learning for High Frequency Trading",

abstract = "High-frequency trading (HFT) uses computer algorithms to make trading decisions in short time scales (e.g., second-level), which is widely used in the Cryptocurrency (Crypto) market (e.g., Bitcoin). Reinforcement learning (RL) in financial research has shown stellar performance on many quantitative trading tasks. However, most methods focus on low-frequency trading, e.g., day-level, which cannot be directly applied to HFT because of two challenges. First, RL for HFT involves dealing with extremely long trajectories (e.g., 2.4 million steps per month), which is hard to optimize and evaluate. Second, the dramatic price fluctuations and market trend changes of Crypto make existing algorithms fail to maintain satisfactory performance. To tackle these challenges, we propose an Efficient hieArchical Reinforcement learNing method for High Frequency Trading (EarnHFT), a novel three-stage hierarchical RL framework for HFT. In stage I, we compute a Q-teacher, i.e., the optimal action-value based on dynamic programming, for enhancing the performance and training efficiency of second-level RL agents. In stage II, we construct a pool of diverse RL agents for different market trends, distinguished by return rates, where hundreds of RL agents are trained with different preferences of return rates and only a tiny fraction of them will be selected into the pool based on their profitability. In stage III, we train a minute-level router which dynamically picks a second-level agent from the pool to achieve stable performance across different markets. Through extensive experiments in various market trends on Crypto markets in a high-fidelity simulation trading environment, we demonstrate that EarnHFT significantly outperforms 6 state-of-art baselines in 3 popular financial criteria, exceeding the runner-up by 30% in profitability.",

author = "Molei Qin and Shuo Sun and Wentao Zhang and Haochong Xia and Xinrun Wang and Bo An",

note = "Publisher Copyright: Copyright {\textcopyright} 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 38th AAAI Conference on Artificial Intelligence, AAAI 2024 ; Conference date: 20-02-2024 Through 27-02-2024",

year = "2024",

month = mar,

day = "25",

doi = "10.1609/aaai.v38i13.29384",

language = "英语",

volume = "38",

pages = "14669--14676",

journal = "Proceedings of the AAAI Conference on Artificial Intelligence",

issn = "2159-5399",

publisher = "Association for the Advancement of Artificial Intelligence",

number = "13",

}

TY - JOUR

T1 - EarnHFT

T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024

AU - Qin, Molei

AU - Sun, Shuo

AU - Zhang, Wentao

AU - Xia, Haochong

AU - Wang, Xinrun

AU - An, Bo

PY - 2024/3/25

Y1 - 2024/3/25

N2 - High-frequency trading (HFT) uses computer algorithms to make trading decisions in short time scales (e.g., second-level), which is widely used in the Cryptocurrency (Crypto) market (e.g., Bitcoin). Reinforcement learning (RL) in financial research has shown stellar performance on many quantitative trading tasks. However, most methods focus on low-frequency trading, e.g., day-level, which cannot be directly applied to HFT because of two challenges. First, RL for HFT involves dealing with extremely long trajectories (e.g., 2.4 million steps per month), which is hard to optimize and evaluate. Second, the dramatic price fluctuations and market trend changes of Crypto make existing algorithms fail to maintain satisfactory performance. To tackle these challenges, we propose an Efficient hieArchical Reinforcement learNing method for High Frequency Trading (EarnHFT), a novel three-stage hierarchical RL framework for HFT. In stage I, we compute a Q-teacher, i.e., the optimal action-value based on dynamic programming, for enhancing the performance and training efficiency of second-level RL agents. In stage II, we construct a pool of diverse RL agents for different market trends, distinguished by return rates, where hundreds of RL agents are trained with different preferences of return rates and only a tiny fraction of them will be selected into the pool based on their profitability. In stage III, we train a minute-level router which dynamically picks a second-level agent from the pool to achieve stable performance across different markets. Through extensive experiments in various market trends on Crypto markets in a high-fidelity simulation trading environment, we demonstrate that EarnHFT significantly outperforms 6 state-of-art baselines in 3 popular financial criteria, exceeding the runner-up by 30% in profitability.

AB - High-frequency trading (HFT) uses computer algorithms to make trading decisions in short time scales (e.g., second-level), which is widely used in the Cryptocurrency (Crypto) market (e.g., Bitcoin). Reinforcement learning (RL) in financial research has shown stellar performance on many quantitative trading tasks. However, most methods focus on low-frequency trading, e.g., day-level, which cannot be directly applied to HFT because of two challenges. First, RL for HFT involves dealing with extremely long trajectories (e.g., 2.4 million steps per month), which is hard to optimize and evaluate. Second, the dramatic price fluctuations and market trend changes of Crypto make existing algorithms fail to maintain satisfactory performance. To tackle these challenges, we propose an Efficient hieArchical Reinforcement learNing method for High Frequency Trading (EarnHFT), a novel three-stage hierarchical RL framework for HFT. In stage I, we compute a Q-teacher, i.e., the optimal action-value based on dynamic programming, for enhancing the performance and training efficiency of second-level RL agents. In stage II, we construct a pool of diverse RL agents for different market trends, distinguished by return rates, where hundreds of RL agents are trained with different preferences of return rates and only a tiny fraction of them will be selected into the pool based on their profitability. In stage III, we train a minute-level router which dynamically picks a second-level agent from the pool to achieve stable performance across different markets. Through extensive experiments in various market trends on Crypto markets in a high-fidelity simulation trading environment, we demonstrate that EarnHFT significantly outperforms 6 state-of-art baselines in 3 popular financial criteria, exceeding the runner-up by 30% in profitability.

UR - http://www.scopus.com/inward/record.url?scp=85189642243&partnerID=8YFLogxK

U2 - 10.1609/aaai.v38i13.29384

DO - 10.1609/aaai.v38i13.29384

M3 - 会议文章

AN - SCOPUS:85189642243

SN - 2159-5399

VL - 38

SP - 14669

EP - 14676

JO - Proceedings of the AAAI Conference on Artificial Intelligence

JF - Proceedings of the AAAI Conference on Artificial Intelligence

IS - 13

Y2 - 20 February 2024 through 27 February 2024

ER -

EarnHFT: Efficient Hierarchical Reinforcement Learning for High Frequency Trading

Abstract

Access to Document

Other files and links

Fingerprint

Cite this