Adaptive Dynamic Programming for Multi-Driver Order Dispatching at Large-Scale

Kai Jiang; Yue Cao; Huan Zhou; Jie Wu; Zhao Zhang; Zhi Liu

doi:10.1109/TCCN.2023.3327578

Adaptive Dynamic Programming for Multi-Driver Order Dispatching at Large-Scale

Kai Jiang, Yue Cao, Huan Zhou, Jie Wu, Zhao Zhang, Zhi Liu

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

3 引用（Scopus）

摘要

Order dispatching, which involves assigning orders to demand-matched vehicles, is an underlying issue for ride-sharing services. Previous works on order dispatching are often quasi-static and myopic 1, thus performing unsatisfactorily in the ride-sharing setting. To address these challenges, recent studies attempt to augment large-scale decision optimization from a data-driven perspective. Among them, Adaptive Dynamic Programming (ADP) has exhibited its particular potential for sequential decision-making with a long-term objective under uncertainty. In this paper, we investigate order dispatching with consideration of vehicle repositioning by exploiting ADP. We first formulate the optimization problem as a Markov Decision Process (MDP), where the dispatching decision is determined by a series of agents (the decision-making entity) under the time sequence model. Then, based on the generated available trips by a graph theory-based method, an ADP-based Multi-driver Order Dispatching method (AMOD) is proposed. In particular, AMOD reconstructs the Bellman update process around the post-decision states to avoid approximating the embedded expectations explicitly. As for non-linear function approximation, it converts the value function into a linear combination by a quadratic decomposition, and estimates the decomposed value function with neural network-based parameter approximation. In addition, vehicle repositioning is performed along with each batch dispatching to balance ride supply across geographic dimensions. Extensive simulations are conducted based on real-world data. Especially, AMOD can achieve 34.6% improvement at maximum and 15.9% on average compared with other baselines, when the capacity constraint is 10.

源语言	英语
页（从-至）	607-621
页数	15
期刊	IEEE Transactions on Cognitive Communications and Networking
卷	10
期	2
DOI	https://doi.org/10.1109/TCCN.2023.3327578
出版状态	已出版 - 1 4月 2024

访问文件

10.1109/TCCN.2023.3327578

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{2782b04d872141e59ef6720d2ce38116,

title = "Adaptive Dynamic Programming for Multi-Driver Order Dispatching at Large-Scale",

abstract = "Order dispatching, which involves assigning orders to demand-matched vehicles, is an underlying issue for ride-sharing services. Previous works on order dispatching are often quasi-static and myopic 1, thus performing unsatisfactorily in the ride-sharing setting. To address these challenges, recent studies attempt to augment large-scale decision optimization from a data-driven perspective. Among them, Adaptive Dynamic Programming (ADP) has exhibited its particular potential for sequential decision-making with a long-term objective under uncertainty. In this paper, we investigate order dispatching with consideration of vehicle repositioning by exploiting ADP. We first formulate the optimization problem as a Markov Decision Process (MDP), where the dispatching decision is determined by a series of agents (the decision-making entity) under the time sequence model. Then, based on the generated available trips by a graph theory-based method, an ADP-based Multi-driver Order Dispatching method (AMOD) is proposed. In particular, AMOD reconstructs the Bellman update process around the post-decision states to avoid approximating the embedded expectations explicitly. As for non-linear function approximation, it converts the value function into a linear combination by a quadratic decomposition, and estimates the decomposed value function with neural network-based parameter approximation. In addition, vehicle repositioning is performed along with each batch dispatching to balance ride supply across geographic dimensions. Extensive simulations are conducted based on real-world data. Especially, AMOD can achieve 34.6% improvement at maximum and 15.9% on average compared with other baselines, when the capacity constraint is 10.",

keywords = "adaptive dynamic programming, graph theory, Markov decision process, Order dispatching",

author = "Kai Jiang and Yue Cao and Huan Zhou and Jie Wu and Zhao Zhang and Zhi Liu",

note = "Publisher Copyright: {\textcopyright} 2015 IEEE.",

year = "2024",

month = apr,

day = "1",

doi = "10.1109/TCCN.2023.3327578",

language = "英语",

volume = "10",

pages = "607--621",

journal = "IEEE Transactions on Cognitive Communications and Networking",

issn = "2332-7731",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "2",

}

TY - JOUR

T1 - Adaptive Dynamic Programming for Multi-Driver Order Dispatching at Large-Scale

AU - Jiang, Kai

AU - Cao, Yue

AU - Zhou, Huan

AU - Wu, Jie

AU - Zhang, Zhao

AU - Liu, Zhi

PY - 2024/4/1

Y1 - 2024/4/1

N2 - Order dispatching, which involves assigning orders to demand-matched vehicles, is an underlying issue for ride-sharing services. Previous works on order dispatching are often quasi-static and myopic 1, thus performing unsatisfactorily in the ride-sharing setting. To address these challenges, recent studies attempt to augment large-scale decision optimization from a data-driven perspective. Among them, Adaptive Dynamic Programming (ADP) has exhibited its particular potential for sequential decision-making with a long-term objective under uncertainty. In this paper, we investigate order dispatching with consideration of vehicle repositioning by exploiting ADP. We first formulate the optimization problem as a Markov Decision Process (MDP), where the dispatching decision is determined by a series of agents (the decision-making entity) under the time sequence model. Then, based on the generated available trips by a graph theory-based method, an ADP-based Multi-driver Order Dispatching method (AMOD) is proposed. In particular, AMOD reconstructs the Bellman update process around the post-decision states to avoid approximating the embedded expectations explicitly. As for non-linear function approximation, it converts the value function into a linear combination by a quadratic decomposition, and estimates the decomposed value function with neural network-based parameter approximation. In addition, vehicle repositioning is performed along with each batch dispatching to balance ride supply across geographic dimensions. Extensive simulations are conducted based on real-world data. Especially, AMOD can achieve 34.6% improvement at maximum and 15.9% on average compared with other baselines, when the capacity constraint is 10.

AB - Order dispatching, which involves assigning orders to demand-matched vehicles, is an underlying issue for ride-sharing services. Previous works on order dispatching are often quasi-static and myopic 1, thus performing unsatisfactorily in the ride-sharing setting. To address these challenges, recent studies attempt to augment large-scale decision optimization from a data-driven perspective. Among them, Adaptive Dynamic Programming (ADP) has exhibited its particular potential for sequential decision-making with a long-term objective under uncertainty. In this paper, we investigate order dispatching with consideration of vehicle repositioning by exploiting ADP. We first formulate the optimization problem as a Markov Decision Process (MDP), where the dispatching decision is determined by a series of agents (the decision-making entity) under the time sequence model. Then, based on the generated available trips by a graph theory-based method, an ADP-based Multi-driver Order Dispatching method (AMOD) is proposed. In particular, AMOD reconstructs the Bellman update process around the post-decision states to avoid approximating the embedded expectations explicitly. As for non-linear function approximation, it converts the value function into a linear combination by a quadratic decomposition, and estimates the decomposed value function with neural network-based parameter approximation. In addition, vehicle repositioning is performed along with each batch dispatching to balance ride supply across geographic dimensions. Extensive simulations are conducted based on real-world data. Especially, AMOD can achieve 34.6% improvement at maximum and 15.9% on average compared with other baselines, when the capacity constraint is 10.

KW - adaptive dynamic programming

KW - graph theory

KW - Markov decision process

KW - Order dispatching

UR - http://www.scopus.com/inward/record.url?scp=85176304237&partnerID=8YFLogxK

U2 - 10.1109/TCCN.2023.3327578

DO - 10.1109/TCCN.2023.3327578

M3 - 文章

AN - SCOPUS:85176304237

SN - 2332-7731

VL - 10

SP - 607

EP - 621

JO - IEEE Transactions on Cognitive Communications and Networking

JF - IEEE Transactions on Cognitive Communications and Networking

IS - 2

ER -

Adaptive Dynamic Programming for Multi-Driver Order Dispatching at Large-Scale

摘要

访问文件

其它文件与链接

指纹

引用此