Skip to main navigation Skip to search Skip to main content

γ-Razor: Hardness-Aware Dataset Pruning for Efficient Neural Network Training

  • Lei Liu
  • , Peng Zhang
  • , Yunji Liang
  • , Junrui Liu
  • , Lia Morra
  • , Bin Guo
  • , Zhiwen Yu
  • , Yanyong Zhang
  • , Daniel D. Zeng
  • Northwestern Polytechnical University Xian
  • Polytechnic University of Turin
  • University of Science and Technology of China
  • CAS - Institute of Automation

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

—Training deep neural networks (DNNs) on large-scale datasets is often inefficient with large computational needs and significant energy consumption. Although great efforts have been taken to optimize DNNs, few studies focused on the inefficiency caused by the data samples with less value for model training. In this article, we empirically demonstrate that sample complexity is important for model efficiency and selecting representative samples is constructive to the model efficiency. In particular, we propose hardness-aware dataset pruning method (γ-Razor) to select representative samples from large-scale datasets to remove the less valuable data samples for model training. γ-Razor is a two-stage framework that includes interclass sampling and intraclass sampling. First, we introduce the inverse self-paced learning strategy to learn hard samples and adjust their weights adaptively according to the inverse frequency of effective samples of each class. For intraclass sampling, hardness-aware cluster sampling algorithm is proposed to downsample easy samples within each class. To evaluate the performance of γ-Razor, we conducted extensive experiments on three large-scale datasets for image classification tasks. The experimental results show that models trained with the pruned datasets show competitive performances against their counterparts trained with the original large-scale datasets in terms of robustness and efficiency. Furthermore, models trained with the pruned datasets converge faster with lower energy consumption.

Original languageEnglish
Pages (from-to)957-971
Number of pages15
JournalIEEE Transactions on Computational Social Systems
Volume12
Issue number3
DOIs
StatePublished - Jun 2025

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 7 - Affordable and Clean Energy
    SDG 7 Affordable and Clean Energy

Keywords

  • Dataset pruning
  • energy efficiency
  • hard sample mining
  • inverse self-paced learning (SPL)

Fingerprint

Dive into the research topics of 'γ-Razor: Hardness-Aware Dataset Pruning for Efficient Neural Network Training'. Together they form a unique fingerprint.

Cite this