Abstract
Highlights: What are the main findings? A novel filter-wise mask pruning approach is proposed, whichachieves the benefits of both unstructured and structured pruning. The newlyintroduced structural constraint on the filter dimension leads to moreregularity, generating more hardware-friendly and performant models. An FMP-based acceleration architecture is proposed for real-timeprocessing. The strategy for calculation parallelism and memory access isdedicatedly optimized to enhance workload balance and throughput. What are the implications of the main findings? The proposed pruning method is proven on both classificationnetworks and detection networks. The pruning rate can achieve 75.1% for VGG-16and 84.6% for ResNet-50 without accuracy compromise. The pruned YOLOv5sachieves a pruning rate of 53.43% with a slight accuracy degradation of 0.6%. The proposed acceleration architecture is implemented on FPGA toevaluate its practical execution performances. The throughput reaches up to809.46MOPS. The pruned network achieves the speedup of 2.23× and 4.4×, with a compressionrate of 2.25× and 4.5×, respectively, converting the model compression to the executionspeedup effectively. Pruning and acceleration has become an essential and promising technique for convolutional neural networks (CNN) in remote sensing image processing, especially for deployment on resource-constrained devices. However, how to maintain model accuracy and achieve satisfactory acceleration simultaneously remains to be a challenging and valuable problem. To break this limitation, we introduce a novel pruning pattern of filter-wise mask by enforcing extra filter-wise structural constraints on pattern-based pruning, which achieves the benefits of both unstructured and structured pruning. The newly introduced filter-wise mask enhances fine-grained sparsity with more hardware-friendly regularity. We further design an acceleration architecture with optimization of calculation parallelism and memory access, aiming to fully translate weight pruning to hardware performance gain. The proposed pruning method is firstly proven on classification networks. The pruning rate can achieve 75.1% for VGG-16 and 84.6% for ResNet-50 without accuracy compromise. Further to this, we enforce our method on the widely used object detection model, the you only look once (YOLO) CNN. On the aerial image dataset, the pruned YOLOv5s achieves a pruning rate of 53.43% with a slight accuracy degradation of 0.6%. Meanwhile, we implement the acceleration architecture on a field-programmable gate array (FPGA) to evaluate its practical execution performance. The throughput reaches up to 809.46MOPS. The pruned network achieves a speedup of 2.23× and 4.4×, with a compression rate of 2.25× and 4.5×, respectively, converting the model compression to execution speedup effectively. The proposed pruning and acceleration approach provides crucial technology to facilitate the application of remote sensing with CNN, especially in scenarios such as on-board real-time processing, emergency response, and low-cost monitoring.
| Original language | English |
|---|---|
| Article number | 3582 |
| Journal | Remote Sensing |
| Volume | 17 |
| Issue number | 21 |
| DOIs | |
| State | Published - Nov 2025 |
Keywords
- YOLO
- hardware acceleration
- network pruning
- object classification
- object detection
Fingerprint
Dive into the research topics of 'Filter-Wise Mask Pruning and FPGA Acceleration for Object Classification and Detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver