γ-Razor: Hardness-Aware Dataset Pruning for Efficient Neural Network Training

Lei Liu; Peng Zhang; Yunji Liang; Junrui Liu; Lia Morra; Bin Guo; Zhiwen Yu; Yanyong Zhang; Daniel D. Zeng

doi:10.1109/TCSS.2024.3453600

γ-Razor: Hardness-Aware Dataset Pruning for Efficient Neural Network Training

Lei Liu, Peng Zhang, Yunji Liang, Junrui Liu, Lia Morra, Bin Guo, Zhiwen Yu, Yanyong Zhang, Daniel D. Zeng

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Training deep neural networks (DNNs) on largescale datasets is often inefficient with large computational needs and significant energy consumption. Although great efforts have been taken to optimize DNNs, few studies focused on the inefficiency caused by the data samples with less value for model training. In this article, we empirically demonstrate that sample complexity is important for model efficiency and selecting representative samples is constructive to the model efficiency. In particular, we propose hardness-aware dataset pruning method (γ-Razor) to select representative samples from large-scale datasets to remove the less valuable data samples for model training.γ-Razor is a two-stage framework that includes interclass sampling and intraclass sampling. First, we introduce the inverse self-paced learning strategy to learn hard samples and adjust their weights adaptively according to the inverse frequency of effective samples of each class. For intraclass sampling, hardness-aware cluster sampling algorithm is proposed to downsample easy samples within each class. To evaluate the performance of γ-Razor, we conducted extensive experiments on three large-scale datasets for image classification tasks. The experimental results show that models trained with the pruned datasets show competitive performances against their counterparts trained with the original large-scale datasets in terms of robustness and efficiency. Furthermore, models trained with the pruned datasets converge faster with lower energy consumption.

源语言	英语
期刊	IEEE Transactions on Computational Social Systems
DOI	https://doi.org/10.1109/TCSS.2024.3453600
出版状态	已接受/待刊 - 2024

联合国可持续发展目标

此成果有助于实现下列可持续发展目标：

访问文件

10.1109/TCSS.2024.3453600

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{fcade53f93494c5b852b8b00f08d1485,

title = "γ-Razor: Hardness-Aware Dataset Pruning for Efficient Neural Network Training",

abstract = "Training deep neural networks (DNNs) on largescale datasets is often inefficient with large computational needs and significant energy consumption. Although great efforts have been taken to optimize DNNs, few studies focused on the inefficiency caused by the data samples with less value for model training. In this article, we empirically demonstrate that sample complexity is important for model efficiency and selecting representative samples is constructive to the model efficiency. In particular, we propose hardness-aware dataset pruning method (γ-Razor) to select representative samples from large-scale datasets to remove the less valuable data samples for model training.γ-Razor is a two-stage framework that includes interclass sampling and intraclass sampling. First, we introduce the inverse self-paced learning strategy to learn hard samples and adjust their weights adaptively according to the inverse frequency of effective samples of each class. For intraclass sampling, hardness-aware cluster sampling algorithm is proposed to downsample easy samples within each class. To evaluate the performance of γ-Razor, we conducted extensive experiments on three large-scale datasets for image classification tasks. The experimental results show that models trained with the pruned datasets show competitive performances against their counterparts trained with the original large-scale datasets in terms of robustness and efficiency. Furthermore, models trained with the pruned datasets converge faster with lower energy consumption.",

keywords = "Dataset pruning, energy efficiency, hard sample mining, inverse self-paced learning (SPL)",

author = "Lei Liu and Peng Zhang and Yunji Liang and Junrui Liu and Lia Morra and Bin Guo and Zhiwen Yu and Yanyong Zhang and Zeng, {Daniel D.}",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.",

year = "2024",

doi = "10.1109/TCSS.2024.3453600",

language = "英语",

journal = "IEEE Transactions on Computational Social Systems",

issn = "2329-924X",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - γ-Razor

T2 - Hardness-Aware Dataset Pruning for Efficient Neural Network Training

AU - Liu, Lei

AU - Zhang, Peng

AU - Liang, Yunji

AU - Liu, Junrui

AU - Morra, Lia

AU - Guo, Bin

AU - Yu, Zhiwen

AU - Zhang, Yanyong

AU - Zeng, Daniel D.

PY - 2024

Y1 - 2024

N2 - Training deep neural networks (DNNs) on largescale datasets is often inefficient with large computational needs and significant energy consumption. Although great efforts have been taken to optimize DNNs, few studies focused on the inefficiency caused by the data samples with less value for model training. In this article, we empirically demonstrate that sample complexity is important for model efficiency and selecting representative samples is constructive to the model efficiency. In particular, we propose hardness-aware dataset pruning method (γ-Razor) to select representative samples from large-scale datasets to remove the less valuable data samples for model training.γ-Razor is a two-stage framework that includes interclass sampling and intraclass sampling. First, we introduce the inverse self-paced learning strategy to learn hard samples and adjust their weights adaptively according to the inverse frequency of effective samples of each class. For intraclass sampling, hardness-aware cluster sampling algorithm is proposed to downsample easy samples within each class. To evaluate the performance of γ-Razor, we conducted extensive experiments on three large-scale datasets for image classification tasks. The experimental results show that models trained with the pruned datasets show competitive performances against their counterparts trained with the original large-scale datasets in terms of robustness and efficiency. Furthermore, models trained with the pruned datasets converge faster with lower energy consumption.

AB - Training deep neural networks (DNNs) on largescale datasets is often inefficient with large computational needs and significant energy consumption. Although great efforts have been taken to optimize DNNs, few studies focused on the inefficiency caused by the data samples with less value for model training. In this article, we empirically demonstrate that sample complexity is important for model efficiency and selecting representative samples is constructive to the model efficiency. In particular, we propose hardness-aware dataset pruning method (γ-Razor) to select representative samples from large-scale datasets to remove the less valuable data samples for model training.γ-Razor is a two-stage framework that includes interclass sampling and intraclass sampling. First, we introduce the inverse self-paced learning strategy to learn hard samples and adjust their weights adaptively according to the inverse frequency of effective samples of each class. For intraclass sampling, hardness-aware cluster sampling algorithm is proposed to downsample easy samples within each class. To evaluate the performance of γ-Razor, we conducted extensive experiments on three large-scale datasets for image classification tasks. The experimental results show that models trained with the pruned datasets show competitive performances against their counterparts trained with the original large-scale datasets in terms of robustness and efficiency. Furthermore, models trained with the pruned datasets converge faster with lower energy consumption.

KW - Dataset pruning

KW - energy efficiency

KW - hard sample mining

KW - inverse self-paced learning (SPL)

UR - http://www.scopus.com/inward/record.url?scp=85205775074&partnerID=8YFLogxK

U2 - 10.1109/TCSS.2024.3453600

DO - 10.1109/TCSS.2024.3453600

M3 - 文章

AN - SCOPUS:85205775074

SN - 2329-924X

JO - IEEE Transactions on Computational Social Systems

JF - IEEE Transactions on Computational Social Systems

ER -

γ-Razor: Hardness-Aware Dataset Pruning for Efficient Neural Network Training

摘要

联合国可持续发展目标

访问文件

其它文件与链接

指纹

引用此