γ-Razor: Hardness-Aware Dataset Pruning for Efficient Neural Network Training

Lei Liu, Peng Zhang, Yunji Liang, Junrui Liu, Lia Morra, Bin Guo, Zhiwen Yu, Yanyong Zhang, Daniel D. Zeng

科研成果: 期刊稿件文章同行评审

1 引用 (Scopus)

摘要

Training deep neural networks (DNNs) on largescale datasets is often inefficient with large computational needs and significant energy consumption. Although great efforts have been taken to optimize DNNs, few studies focused on the inefficiency caused by the data samples with less value for model training. In this article, we empirically demonstrate that sample complexity is important for model efficiency and selecting representative samples is constructive to the model efficiency. In particular, we propose hardness-aware dataset pruning method (γ-Razor) to select representative samples from large-scale datasets to remove the less valuable data samples for model training.γ-Razor is a two-stage framework that includes interclass sampling and intraclass sampling. First, we introduce the inverse self-paced learning strategy to learn hard samples and adjust their weights adaptively according to the inverse frequency of effective samples of each class. For intraclass sampling, hardness-aware cluster sampling algorithm is proposed to downsample easy samples within each class. To evaluate the performance of γ-Razor, we conducted extensive experiments on three large-scale datasets for image classification tasks. The experimental results show that models trained with the pruned datasets show competitive performances against their counterparts trained with the original large-scale datasets in terms of robustness and efficiency. Furthermore, models trained with the pruned datasets converge faster with lower energy consumption.

源语言英语
期刊IEEE Transactions on Computational Social Systems
DOI
出版状态已接受/待刊 - 2024

指纹

探究 'γ-Razor: Hardness-Aware Dataset Pruning for Efficient Neural Network Training' 的科研主题。它们共同构成独一无二的指纹。

引用此