Momentum recursive DARTS

Benteng Ma; Yanning Zhang; Yong Xia

doi:10.1016/j.patcog.2024.110710

Momentum recursive DARTS

Benteng Ma, Yanning Zhang, Yong Xia

计算机学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

DARTS has emerged as a popular method for neural architecture search (NAS) owing to its efficiency and simplicity. It employs gradient-based bi-level optimization to iteratively optimize the upper-level architecture parameters and lower-level super-network weights. The key challenge in DARTS is the accurate estimation of gradients for two-level object functions, leading to significant errors in gradient approximation. To address this issue, we propose a new approach, MR-DARTS, that incorporates a momentum term and a recursive scheme to improve gradient estimation. Specifically, we leverage historical information by using a running average of past observed gradients to enhance the quality of current gradient estimation in both upper-level and lower-level functions. Our theoretical analysis shows that the variance of our estimated gradient decreases with each iteration. By utilizing momentum and a recursive scheme, MR-DARTS effectively controls the error in stochastic gradient updates that result from inaccurate gradient estimation. Furthermore, we utilize the Neumann series approximation and Hessian Vector Product scheme to reduce computational requirements and memory usage. We evaluate our proposed method on several benchmarks and demonstrate its effectiveness through comprehensive experiments.

源语言	英语
文章编号	110710
期刊	Pattern Recognition
卷	156
DOI	https://doi.org/10.1016/j.patcog.2024.110710
出版状态	已出版 - 12月 2024

访问文件

10.1016/j.patcog.2024.110710

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{804bec05488c4d128efa2cdbd00b55fa,

title = "Momentum recursive DARTS",

abstract = "DARTS has emerged as a popular method for neural architecture search (NAS) owing to its efficiency and simplicity. It employs gradient-based bi-level optimization to iteratively optimize the upper-level architecture parameters and lower-level super-network weights. The key challenge in DARTS is the accurate estimation of gradients for two-level object functions, leading to significant errors in gradient approximation. To address this issue, we propose a new approach, MR-DARTS, that incorporates a momentum term and a recursive scheme to improve gradient estimation. Specifically, we leverage historical information by using a running average of past observed gradients to enhance the quality of current gradient estimation in both upper-level and lower-level functions. Our theoretical analysis shows that the variance of our estimated gradient decreases with each iteration. By utilizing momentum and a recursive scheme, MR-DARTS effectively controls the error in stochastic gradient updates that result from inaccurate gradient estimation. Furthermore, we utilize the Neumann series approximation and Hessian Vector Product scheme to reduce computational requirements and memory usage. We evaluate our proposed method on several benchmarks and demonstrate its effectiveness through comprehensive experiments.",

keywords = "Gradient estimation, Image recognition, Neural architecture search",

author = "Benteng Ma and Yanning Zhang and Yong Xia",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier Ltd",

year = "2024",

month = dec,

doi = "10.1016/j.patcog.2024.110710",

language = "英语",

volume = "156",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - Momentum recursive DARTS

AU - Ma, Benteng

AU - Zhang, Yanning

AU - Xia, Yong

PY - 2024/12

Y1 - 2024/12

N2 - DARTS has emerged as a popular method for neural architecture search (NAS) owing to its efficiency and simplicity. It employs gradient-based bi-level optimization to iteratively optimize the upper-level architecture parameters and lower-level super-network weights. The key challenge in DARTS is the accurate estimation of gradients for two-level object functions, leading to significant errors in gradient approximation. To address this issue, we propose a new approach, MR-DARTS, that incorporates a momentum term and a recursive scheme to improve gradient estimation. Specifically, we leverage historical information by using a running average of past observed gradients to enhance the quality of current gradient estimation in both upper-level and lower-level functions. Our theoretical analysis shows that the variance of our estimated gradient decreases with each iteration. By utilizing momentum and a recursive scheme, MR-DARTS effectively controls the error in stochastic gradient updates that result from inaccurate gradient estimation. Furthermore, we utilize the Neumann series approximation and Hessian Vector Product scheme to reduce computational requirements and memory usage. We evaluate our proposed method on several benchmarks and demonstrate its effectiveness through comprehensive experiments.

AB - DARTS has emerged as a popular method for neural architecture search (NAS) owing to its efficiency and simplicity. It employs gradient-based bi-level optimization to iteratively optimize the upper-level architecture parameters and lower-level super-network weights. The key challenge in DARTS is the accurate estimation of gradients for two-level object functions, leading to significant errors in gradient approximation. To address this issue, we propose a new approach, MR-DARTS, that incorporates a momentum term and a recursive scheme to improve gradient estimation. Specifically, we leverage historical information by using a running average of past observed gradients to enhance the quality of current gradient estimation in both upper-level and lower-level functions. Our theoretical analysis shows that the variance of our estimated gradient decreases with each iteration. By utilizing momentum and a recursive scheme, MR-DARTS effectively controls the error in stochastic gradient updates that result from inaccurate gradient estimation. Furthermore, we utilize the Neumann series approximation and Hessian Vector Product scheme to reduce computational requirements and memory usage. We evaluate our proposed method on several benchmarks and demonstrate its effectiveness through comprehensive experiments.

KW - Gradient estimation

KW - Image recognition

KW - Neural architecture search

UR - http://www.scopus.com/inward/record.url?scp=85198020758&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2024.110710

DO - 10.1016/j.patcog.2024.110710

M3 - 文章

AN - SCOPUS:85198020758

SN - 0031-3203

VL - 156

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 110710

ER -

Momentum recursive DARTS

摘要

访问文件

其它文件与链接

指纹

引用此