A policy-based Monte Carlo tree search method for container pre-marshalling

Ziliang Wang; Chenhao Zhou; Ada Che; Jingkun Gao

doi:10.1080/00207543.2023.2279130

A policy-based Monte Carlo tree search method for container pre-marshalling

Ziliang Wang, Chenhao Zhou, Ada Che, Jingkun Gao

School of Management

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

The container pre-marshalling problem (CPMP) aims to minimise the number of reshuffling moves, ultimately achieving an optimised stacking arrangement in each bay based on the priority of containers during the non-loading phase. Given the sequential decision nature, we formulated the CPMP as a Markov decision process (MDP) model to account for the specific state and action of the reshuffling process. To address the challenge that the relocated container may trigger a chain effect on the subsequent reshuffling moves, this paper develops an improved policy-based Monte Carlo tree search (P-MCTS) to solve the CPMP, where eight composite reshuffling rules and modified upper confidence bounds are employed in the selection phases, and a well-designed heuristic algorithm is utilised in the simulation phases. Meanwhile, considering the effectiveness of reinforcement learning methods for solving the MDP model, an improved Q-learning is proposed as the compared method. Numerical results show that the P-MCTS outperforms all compared methods in scenarios where all containers have different priorities and scenarios where containers can share the same priority.

Original language	English
Pages (from-to)	4776-4792
Number of pages	17
Journal	International Journal of Production Research
Volume	62
Issue number	13
DOIs	https://doi.org/10.1080/00207543.2023.2279130
State	Published - 2024

Keywords

Automated container terminal
Container pre-marshalling problem
Markov decision process
Monte Carlo tree search
Q-learning algorithm

Access to Document

10.1080/00207543.2023.2279130

Cite this

@article{8b2815f0df0f49f98cf83304f5a19b57,

title = "A policy-based Monte Carlo tree search method for container pre-marshalling",

abstract = "The container pre-marshalling problem (CPMP) aims to minimise the number of reshuffling moves, ultimately achieving an optimised stacking arrangement in each bay based on the priority of containers during the non-loading phase. Given the sequential decision nature, we formulated the CPMP as a Markov decision process (MDP) model to account for the specific state and action of the reshuffling process. To address the challenge that the relocated container may trigger a chain effect on the subsequent reshuffling moves, this paper develops an improved policy-based Monte Carlo tree search (P-MCTS) to solve the CPMP, where eight composite reshuffling rules and modified upper confidence bounds are employed in the selection phases, and a well-designed heuristic algorithm is utilised in the simulation phases. Meanwhile, considering the effectiveness of reinforcement learning methods for solving the MDP model, an improved Q-learning is proposed as the compared method. Numerical results show that the P-MCTS outperforms all compared methods in scenarios where all containers have different priorities and scenarios where containers can share the same priority.",

keywords = "Automated container terminal, Container pre-marshalling problem, Markov decision process, Monte Carlo tree search, Q-learning algorithm",

author = "Ziliang Wang and Chenhao Zhou and Ada Che and Jingkun Gao",

note = "Publisher Copyright: {\textcopyright} 2023 Informa UK Limited, trading as Taylor & Francis Group.",

year = "2024",

doi = "10.1080/00207543.2023.2279130",

language = "英语",

volume = "62",

pages = "4776--4792",

journal = "International Journal of Production Research",

issn = "0020-7543",

publisher = "Taylor and Francis Ltd.",

number = "13",

}

TY - JOUR

T1 - A policy-based Monte Carlo tree search method for container pre-marshalling

AU - Wang, Ziliang

AU - Zhou, Chenhao

AU - Che, Ada

AU - Gao, Jingkun

PY - 2024

Y1 - 2024

N2 - The container pre-marshalling problem (CPMP) aims to minimise the number of reshuffling moves, ultimately achieving an optimised stacking arrangement in each bay based on the priority of containers during the non-loading phase. Given the sequential decision nature, we formulated the CPMP as a Markov decision process (MDP) model to account for the specific state and action of the reshuffling process. To address the challenge that the relocated container may trigger a chain effect on the subsequent reshuffling moves, this paper develops an improved policy-based Monte Carlo tree search (P-MCTS) to solve the CPMP, where eight composite reshuffling rules and modified upper confidence bounds are employed in the selection phases, and a well-designed heuristic algorithm is utilised in the simulation phases. Meanwhile, considering the effectiveness of reinforcement learning methods for solving the MDP model, an improved Q-learning is proposed as the compared method. Numerical results show that the P-MCTS outperforms all compared methods in scenarios where all containers have different priorities and scenarios where containers can share the same priority.

AB - The container pre-marshalling problem (CPMP) aims to minimise the number of reshuffling moves, ultimately achieving an optimised stacking arrangement in each bay based on the priority of containers during the non-loading phase. Given the sequential decision nature, we formulated the CPMP as a Markov decision process (MDP) model to account for the specific state and action of the reshuffling process. To address the challenge that the relocated container may trigger a chain effect on the subsequent reshuffling moves, this paper develops an improved policy-based Monte Carlo tree search (P-MCTS) to solve the CPMP, where eight composite reshuffling rules and modified upper confidence bounds are employed in the selection phases, and a well-designed heuristic algorithm is utilised in the simulation phases. Meanwhile, considering the effectiveness of reinforcement learning methods for solving the MDP model, an improved Q-learning is proposed as the compared method. Numerical results show that the P-MCTS outperforms all compared methods in scenarios where all containers have different priorities and scenarios where containers can share the same priority.

KW - Automated container terminal

KW - Container pre-marshalling problem

KW - Markov decision process

KW - Monte Carlo tree search

KW - Q-learning algorithm

UR - http://www.scopus.com/inward/record.url?scp=85176274068&partnerID=8YFLogxK

U2 - 10.1080/00207543.2023.2279130

DO - 10.1080/00207543.2023.2279130

M3 - 文章

AN - SCOPUS:85176274068

SN - 0020-7543

VL - 62

SP - 4776

EP - 4792

JO - International Journal of Production Research

JF - International Journal of Production Research

IS - 13

ER -

A policy-based Monte Carlo tree search method for container pre-marshalling

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this