Distributed Deep Multi-Agent Reinforcement Learning for Cooperative Edge Caching in Internet-of-Vehicles

Huan Zhou; Kai Jiang; Shibo He; Geyong Min; Jie Wu

doi:10.1109/TWC.2023.3272348

Distributed Deep Multi-Agent Reinforcement Learning for Cooperative Edge Caching in Internet-of-Vehicles

Huan Zhou, Kai Jiang, Shibo He, Geyong Min, Jie Wu

School of Computer Science

Research output: Contribution to journal › Article › peer-review

89 Scopus citations

Abstract

Edge caching is a promising approach to reduce duplicate content transmission in Internet-of-Vehicles (IoVs). Several Reinforcement Learning (RL) based edge caching methods have been proposed to improve the resource utilization and reduce the backhaul traffic load. However, they only obtain the local sub-optimal solution, as they neglect the influence from environments by other agents. This paper investigates the edge caching strategies with consideration of the content delivery and cache replacement by exploiting the distributed Multi-Agent Reinforcement Learning (MARL). A hierarchical edge caching architecture for IoVs is proposed and the corresponding problem is formulated with the goal to minimize the long-term content access cost in the system. Then, we extend the Markov Decision Process (MDP) in the single agent RL to the context of a multi-agent system, and tackle the corresponding combinatorial multi-armed bandit problem based on the framework of a stochastic game. Specifically, we firstly propose a Distributed MARL-based Edge caching method (DMRE), where each agent can adaptively learn its best behaviour in conjunction with other agents for intelligent caching. Meanwhile, we attempt to reduce the computation complexity of DMRE by parameter approximation, which legitimately simplifies the training targets. However, DMRE is enabled to represent and update the parameter by creating a lookup table, essentially a tabular-based method, which generally performs inefficiently in large-scale scenarios. To circumvent the issue and make more expressive parametric models, we incorporate the technical advantage of the Deep- Q Network into DMRE, and further develop a computationally efficient method (DeepDMRE) with neural network-based Nash equilibria approximation. Extensive simulations are conducted to verify the effectiveness of the proposed methods. Especially, DeepDMRE outperforms DMRE, Q -learning, LFU, and LRU, and the edge hit rate is improved by roughly 5%, 19%, 40%, and 35%, respectively, when the cache capacity reaches 1, 000 MB.

Original language	English
Pages (from-to)	9595-9609
Number of pages	15
Journal	IEEE Transactions on Wireless Communications
Volume	22
Issue number	12
DOIs	https://doi.org/10.1109/TWC.2023.3272348
State	Published - 1 Dec 2023

Keywords

Edge caching
Internet-of-Vehicles
cache replacement
content delivery
multi-agent reinforcement learning

Access to Document

10.1109/TWC.2023.3272348

Cite this

@article{36218b2985e2482eaa11b3b51ef4af50,

title = "Distributed Deep Multi-Agent Reinforcement Learning for Cooperative Edge Caching in Internet-of-Vehicles",

abstract = "Edge caching is a promising approach to reduce duplicate content transmission in Internet-of-Vehicles (IoVs). Several Reinforcement Learning (RL) based edge caching methods have been proposed to improve the resource utilization and reduce the backhaul traffic load. However, they only obtain the local sub-optimal solution, as they neglect the influence from environments by other agents. This paper investigates the edge caching strategies with consideration of the content delivery and cache replacement by exploiting the distributed Multi-Agent Reinforcement Learning (MARL). A hierarchical edge caching architecture for IoVs is proposed and the corresponding problem is formulated with the goal to minimize the long-term content access cost in the system. Then, we extend the Markov Decision Process (MDP) in the single agent RL to the context of a multi-agent system, and tackle the corresponding combinatorial multi-armed bandit problem based on the framework of a stochastic game. Specifically, we firstly propose a Distributed MARL-based Edge caching method (DMRE), where each agent can adaptively learn its best behaviour in conjunction with other agents for intelligent caching. Meanwhile, we attempt to reduce the computation complexity of DMRE by parameter approximation, which legitimately simplifies the training targets. However, DMRE is enabled to represent and update the parameter by creating a lookup table, essentially a tabular-based method, which generally performs inefficiently in large-scale scenarios. To circumvent the issue and make more expressive parametric models, we incorporate the technical advantage of the Deep- Q Network into DMRE, and further develop a computationally efficient method (DeepDMRE) with neural network-based Nash equilibria approximation. Extensive simulations are conducted to verify the effectiveness of the proposed methods. Especially, DeepDMRE outperforms DMRE, Q -learning, LFU, and LRU, and the edge hit rate is improved by roughly 5%, 19%, 40%, and 35%, respectively, when the cache capacity reaches 1, 000 MB.",

keywords = "Edge caching, Internet-of-Vehicles, cache replacement, content delivery, multi-agent reinforcement learning",

author = "Huan Zhou and Kai Jiang and Shibo He and Geyong Min and Jie Wu",

note = "Publisher Copyright: {\textcopyright} 2002-2012 IEEE.",

year = "2023",

month = dec,

day = "1",

doi = "10.1109/TWC.2023.3272348",

language = "英语",

volume = "22",

pages = "9595--9609",

journal = "IEEE Transactions on Wireless Communications",

issn = "1536-1276",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "12",

}

TY - JOUR

T1 - Distributed Deep Multi-Agent Reinforcement Learning for Cooperative Edge Caching in Internet-of-Vehicles

AU - Zhou, Huan

AU - Jiang, Kai

AU - He, Shibo

AU - Min, Geyong

AU - Wu, Jie

PY - 2023/12/1

Y1 - 2023/12/1

N2 - Edge caching is a promising approach to reduce duplicate content transmission in Internet-of-Vehicles (IoVs). Several Reinforcement Learning (RL) based edge caching methods have been proposed to improve the resource utilization and reduce the backhaul traffic load. However, they only obtain the local sub-optimal solution, as they neglect the influence from environments by other agents. This paper investigates the edge caching strategies with consideration of the content delivery and cache replacement by exploiting the distributed Multi-Agent Reinforcement Learning (MARL). A hierarchical edge caching architecture for IoVs is proposed and the corresponding problem is formulated with the goal to minimize the long-term content access cost in the system. Then, we extend the Markov Decision Process (MDP) in the single agent RL to the context of a multi-agent system, and tackle the corresponding combinatorial multi-armed bandit problem based on the framework of a stochastic game. Specifically, we firstly propose a Distributed MARL-based Edge caching method (DMRE), where each agent can adaptively learn its best behaviour in conjunction with other agents for intelligent caching. Meanwhile, we attempt to reduce the computation complexity of DMRE by parameter approximation, which legitimately simplifies the training targets. However, DMRE is enabled to represent and update the parameter by creating a lookup table, essentially a tabular-based method, which generally performs inefficiently in large-scale scenarios. To circumvent the issue and make more expressive parametric models, we incorporate the technical advantage of the Deep- Q Network into DMRE, and further develop a computationally efficient method (DeepDMRE) with neural network-based Nash equilibria approximation. Extensive simulations are conducted to verify the effectiveness of the proposed methods. Especially, DeepDMRE outperforms DMRE, Q -learning, LFU, and LRU, and the edge hit rate is improved by roughly 5%, 19%, 40%, and 35%, respectively, when the cache capacity reaches 1, 000 MB.

AB - Edge caching is a promising approach to reduce duplicate content transmission in Internet-of-Vehicles (IoVs). Several Reinforcement Learning (RL) based edge caching methods have been proposed to improve the resource utilization and reduce the backhaul traffic load. However, they only obtain the local sub-optimal solution, as they neglect the influence from environments by other agents. This paper investigates the edge caching strategies with consideration of the content delivery and cache replacement by exploiting the distributed Multi-Agent Reinforcement Learning (MARL). A hierarchical edge caching architecture for IoVs is proposed and the corresponding problem is formulated with the goal to minimize the long-term content access cost in the system. Then, we extend the Markov Decision Process (MDP) in the single agent RL to the context of a multi-agent system, and tackle the corresponding combinatorial multi-armed bandit problem based on the framework of a stochastic game. Specifically, we firstly propose a Distributed MARL-based Edge caching method (DMRE), where each agent can adaptively learn its best behaviour in conjunction with other agents for intelligent caching. Meanwhile, we attempt to reduce the computation complexity of DMRE by parameter approximation, which legitimately simplifies the training targets. However, DMRE is enabled to represent and update the parameter by creating a lookup table, essentially a tabular-based method, which generally performs inefficiently in large-scale scenarios. To circumvent the issue and make more expressive parametric models, we incorporate the technical advantage of the Deep- Q Network into DMRE, and further develop a computationally efficient method (DeepDMRE) with neural network-based Nash equilibria approximation. Extensive simulations are conducted to verify the effectiveness of the proposed methods. Especially, DeepDMRE outperforms DMRE, Q -learning, LFU, and LRU, and the edge hit rate is improved by roughly 5%, 19%, 40%, and 35%, respectively, when the cache capacity reaches 1, 000 MB.

KW - Edge caching

KW - Internet-of-Vehicles

KW - cache replacement

KW - content delivery

KW - multi-agent reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85162891664&partnerID=8YFLogxK

U2 - 10.1109/TWC.2023.3272348

DO - 10.1109/TWC.2023.3272348

M3 - 文章

AN - SCOPUS:85162891664

SN - 1536-1276

VL - 22

SP - 9595

EP - 9609

JO - IEEE Transactions on Wireless Communications

JF - IEEE Transactions on Wireless Communications

IS - 12

ER -

Distributed Deep Multi-Agent Reinforcement Learning for Cooperative Edge Caching in Internet-of-Vehicles

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this