TY - JOUR
T1 - A policy-based Monte Carlo tree search method for container pre-marshalling
AU - Wang, Ziliang
AU - Zhou, Chenhao
AU - Che, Ada
AU - Gao, Jingkun
N1 - Publisher Copyright:
© 2023 Informa UK Limited, trading as Taylor & Francis Group.
PY - 2024
Y1 - 2024
N2 - The container pre-marshalling problem (CPMP) aims to minimise the number of reshuffling moves, ultimately achieving an optimised stacking arrangement in each bay based on the priority of containers during the non-loading phase. Given the sequential decision nature, we formulated the CPMP as a Markov decision process (MDP) model to account for the specific state and action of the reshuffling process. To address the challenge that the relocated container may trigger a chain effect on the subsequent reshuffling moves, this paper develops an improved policy-based Monte Carlo tree search (P-MCTS) to solve the CPMP, where eight composite reshuffling rules and modified upper confidence bounds are employed in the selection phases, and a well-designed heuristic algorithm is utilised in the simulation phases. Meanwhile, considering the effectiveness of reinforcement learning methods for solving the MDP model, an improved Q-learning is proposed as the compared method. Numerical results show that the P-MCTS outperforms all compared methods in scenarios where all containers have different priorities and scenarios where containers can share the same priority.
AB - The container pre-marshalling problem (CPMP) aims to minimise the number of reshuffling moves, ultimately achieving an optimised stacking arrangement in each bay based on the priority of containers during the non-loading phase. Given the sequential decision nature, we formulated the CPMP as a Markov decision process (MDP) model to account for the specific state and action of the reshuffling process. To address the challenge that the relocated container may trigger a chain effect on the subsequent reshuffling moves, this paper develops an improved policy-based Monte Carlo tree search (P-MCTS) to solve the CPMP, where eight composite reshuffling rules and modified upper confidence bounds are employed in the selection phases, and a well-designed heuristic algorithm is utilised in the simulation phases. Meanwhile, considering the effectiveness of reinforcement learning methods for solving the MDP model, an improved Q-learning is proposed as the compared method. Numerical results show that the P-MCTS outperforms all compared methods in scenarios where all containers have different priorities and scenarios where containers can share the same priority.
KW - Automated container terminal
KW - Container pre-marshalling problem
KW - Markov decision process
KW - Monte Carlo tree search
KW - Q-learning algorithm
UR - http://www.scopus.com/inward/record.url?scp=85176274068&partnerID=8YFLogxK
U2 - 10.1080/00207543.2023.2279130
DO - 10.1080/00207543.2023.2279130
M3 - 文章
AN - SCOPUS:85176274068
SN - 0020-7543
VL - 62
SP - 4776
EP - 4792
JO - International Journal of Production Research
JF - International Journal of Production Research
IS - 13
ER -