特征重排列注意力机制的双池化残差分类网络

Yuan Heng; Liu Jie; Jiang Wentao; Liu Wanjun

doi:10.11834/jig.240061

特征重排列注意力机制的双池化残差分类网络

Translated title of the contribution: Double-pooling residual classification network based on feature reordering attention mechanism

Yuan Heng, Liu Jie, Jiang Wentao, Liu Wanjun

Liaoning Technical University

Research output: Contribution to journal › Article › peer-review

Abstract

Objective A residual classification network is a deep convolutional neural network architecture that plays an important role and has a considerable influence in the field of deep learning. It has become one of the commonly used network structures in various image classification tasks in the field of computer vision. To solve the problem of network degradation in deep networks, unlike the traditional method of simply stacking convolutional layers, residual networks innova-tively introduce residual connections, which directly add input features to output features through skip connections, and pass the original features directly to subsequent network layers. It forms a shortcut path, thereby preserving and utilizing feature information better. Although residual classification network effectively solves the problems of gradient explosion and vanishing during deep network training, when the output dimension of the residual block does not match the input dimension, convolution maps are needed to ensure the same dimensions, which causes a large number of pixels on the channel matrix in the residual module to be skipped, resulting in the problem of feature information loss. In addition, correlation exists between image channels, and a fixed order of channels may lead to feature bias, making it difficult to fully utilize information from other channels and limiting the model’s ability to express key features. In response to the above issues, this article proposes a double pooling residual classification network of feature reordering attention mechanism (FDPRNet). Method FDPRNet is based on the ResNet-34 residual network. First, the kernel size of the first convolutional layer is changed from 7 × 7 to 3 × 3. This change is made because, for relatively small images, larger convolutional kernels can cause the receptive field to become larger, capturing too much useless contextual information. Time, the maximum pooling layer is removed to prevent the feature map from shrinking further, retaining more image information, avoiding information loss caused by pooling operations, and making it easier for subsequent network layers to extract features better. Then, a feature reordering attention module (FRAM) is proposed to group the feature map channels and perform inter-group and intra-group reordering so that adjacent channels are no longer connected, and the intra-group channels are grouped in a sequence of equal differences with a step size of 1. This operation can not only disrupt the order of some original channels before and after but also preserve the relationship between some channels before and after, introducing a certain degree of randomness, allowing the model to comprehensively consider the interaction between different channels, and avoiding excessive dependence on specific channels. The features of each channel combination are extracted and spliced by one-dimensional convolution, and then the sigmoid activation function is used to obtain the weights of the rearranged features, which are multiplied element by element with the input features to obtain the feature map of the feature rearranged attention mechanism. Finally, a double pooling residual (DPR) module is proposed, which uses both maximum pooling and average pooling to perform parallel operations on feature maps. This module obtains both salient and typical features of the input images, enhancing the expressive power of features and helping the network capture important information better in the images, thereby improving model performance. Element-by-element summation and convolutional mapping on the after-pooling feature maps are performed to extract key features, reduce the size of the feature maps, and ensure that the channel matrices are capable of element-level summation operations in residual concatenation. Result In the CIFAR-100, CIFAR-10, SVHN, Flowers-102, and NWPU-RESISC45 datasets, compared with the original model ResNet-34, the accuracy of ResNet-34 with the addition of FRAM is improved by 1. 66%, 0. 19%, 0. 13%, 4. 28%, and 2. 00%, respectively. The accuracy of ResNet-34 with the addition of DPR is improved by 1. 7%, 0. 26%, 0. 12%, 3. 18%, and 1. 31%, respec-tively. The accuracy of FDPRNet, which is the combination of the FRAM and DPR modules, is improved by 2. 07%, 0. 3%, 0. 17%, 8. 31%, and 2. 47%, respectively. Compared with four attention mechanisms—squeeze and excitation, efficient channel attention, coordinate attention, and convolutional block attention module, the accuracy of FRAM is improved by an average of 0. 72%, 1. 28%, and 1. 46% in the CIFAR-100, Flowers-102, and STL-10 datasets. In summary, whether on small or large, less categorized, or more categorized datasets, both the FRAM and DPR modules contribute to the improvement of recognition accuracy in the ResNet-34 network. The combination of the two modules—FDPR—has the best effect on improving the recognition rate of the network and achieves a significant improvement in accuracy compared with other image classification networks. Conclusion The proposed FDPRNet can enhance the information exchange within the image channel and reduce feature loss. It not only shows high classification accuracy but also effectively enhances the network’s feature learning ability and model generalization ability. The main contributions of this article are as follows:1) FRAM is proposed, which breaks the connections between the original channels and groups them according to certain rules. Learning the weights of channel combinations in different orders ensures that the channels between different groups interact without losing the front and back connections between all channels, achieving information exchange and channel crossing within the feature map, enhancing the interaction between features, better capturing the correlation between contextual information and features, and improving the accuracy of model classification. 2) DPR is proposed, which replaces the skip connections in the original residual block with a DPR module, solving the problem of feature information loss caused by a large number of pixel points being skipped in the channel matrix during the skip connections in the residual module. Using dual pooling to obtain salient and typical features of input images can not only enhance the expression ability of features but also help the network better capture important information in images and improve model classification performance. 3) The proposed FDPRNet inserts two modules—FRAM and DPR—into the residual network to enhance network channel interaction and feature expression capabilities, enabling the network model to capture complex relationships and strong generalization ability. It achieved high classification accuracy on some mainstream image classification datasets.

Translated title of the contribution	Double-pooling residual classification network based on feature reordering attention mechanism
Original language	Chinese (Traditional)
Pages (from-to)	110-129
Number of pages	20
Journal	Journal of Image and Graphics
Volume	30
Issue number	1
DOIs	https://doi.org/10.11834/jig.240061
State	Published - Jan 2025
Externally published	Yes

Access to Document

10.11834/jig.240061

Cite this

@article{7d33604b6f754a4a87e66883f746e8d7,

title = "特征重排列注意力机制的双池化残差分类网络",

abstract = "Objective A residual classification network is a deep convolutional neural network architecture that plays an important role and has a considerable influence in the field of deep learning. It has become one of the commonly used network structures in various image classification tasks in the field of computer vision. To solve the problem of network degradation in deep networks, unlike the traditional method of simply stacking convolutional layers, residual networks innova-tively introduce residual connections, which directly add input features to output features through skip connections, and pass the original features directly to subsequent network layers. It forms a shortcut path, thereby preserving and utilizing feature information better. Although residual classification network effectively solves the problems of gradient explosion and vanishing during deep network training, when the output dimension of the residual block does not match the input dimension, convolution maps are needed to ensure the same dimensions, which causes a large number of pixels on the channel matrix in the residual module to be skipped, resulting in the problem of feature information loss. In addition, correlation exists between image channels, and a fixed order of channels may lead to feature bias, making it difficult to fully utilize information from other channels and limiting the model{\textquoteright}s ability to express key features. In response to the above issues, this article proposes a double pooling residual classification network of feature reordering attention mechanism (FDPRNet). Method FDPRNet is based on the ResNet-34 residual network. First, the kernel size of the first convolutional layer is changed from 7 × 7 to 3 × 3. This change is made because, for relatively small images, larger convolutional kernels can cause the receptive field to become larger, capturing too much useless contextual information. Time, the maximum pooling layer is removed to prevent the feature map from shrinking further, retaining more image information, avoiding information loss caused by pooling operations, and making it easier for subsequent network layers to extract features better. Then, a feature reordering attention module (FRAM) is proposed to group the feature map channels and perform inter-group and intra-group reordering so that adjacent channels are no longer connected, and the intra-group channels are grouped in a sequence of equal differences with a step size of 1. This operation can not only disrupt the order of some original channels before and after but also preserve the relationship between some channels before and after, introducing a certain degree of randomness, allowing the model to comprehensively consider the interaction between different channels, and avoiding excessive dependence on specific channels. The features of each channel combination are extracted and spliced by one-dimensional convolution, and then the sigmoid activation function is used to obtain the weights of the rearranged features, which are multiplied element by element with the input features to obtain the feature map of the feature rearranged attention mechanism. Finally, a double pooling residual (DPR) module is proposed, which uses both maximum pooling and average pooling to perform parallel operations on feature maps. This module obtains both salient and typical features of the input images, enhancing the expressive power of features and helping the network capture important information better in the images, thereby improving model performance. Element-by-element summation and convolutional mapping on the after-pooling feature maps are performed to extract key features, reduce the size of the feature maps, and ensure that the channel matrices are capable of element-level summation operations in residual concatenation. Result In the CIFAR-100, CIFAR-10, SVHN, Flowers-102, and NWPU-RESISC45 datasets, compared with the original model ResNet-34, the accuracy of ResNet-34 with the addition of FRAM is improved by 1. 66%, 0. 19%, 0. 13%, 4. 28%, and 2. 00%, respectively. The accuracy of ResNet-34 with the addition of DPR is improved by 1. 7%, 0. 26%, 0. 12%, 3. 18%, and 1. 31%, respec-tively. The accuracy of FDPRNet, which is the combination of the FRAM and DPR modules, is improved by 2. 07%, 0. 3%, 0. 17%, 8. 31%, and 2. 47%, respectively. Compared with four attention mechanisms—squeeze and excitation, efficient channel attention, coordinate attention, and convolutional block attention module, the accuracy of FRAM is improved by an average of 0. 72%, 1. 28%, and 1. 46% in the CIFAR-100, Flowers-102, and STL-10 datasets. In summary, whether on small or large, less categorized, or more categorized datasets, both the FRAM and DPR modules contribute to the improvement of recognition accuracy in the ResNet-34 network. The combination of the two modules—FDPR—has the best effect on improving the recognition rate of the network and achieves a significant improvement in accuracy compared with other image classification networks. Conclusion The proposed FDPRNet can enhance the information exchange within the image channel and reduce feature loss. It not only shows high classification accuracy but also effectively enhances the network{\textquoteright}s feature learning ability and model generalization ability. The main contributions of this article are as follows:1) FRAM is proposed, which breaks the connections between the original channels and groups them according to certain rules. Learning the weights of channel combinations in different orders ensures that the channels between different groups interact without losing the front and back connections between all channels, achieving information exchange and channel crossing within the feature map, enhancing the interaction between features, better capturing the correlation between contextual information and features, and improving the accuracy of model classification. 2) DPR is proposed, which replaces the skip connections in the original residual block with a DPR module, solving the problem of feature information loss caused by a large number of pixel points being skipped in the channel matrix during the skip connections in the residual module. Using dual pooling to obtain salient and typical features of input images can not only enhance the expression ability of features but also help the network better capture important information in images and improve model classification performance. 3) The proposed FDPRNet inserts two modules—FRAM and DPR—into the residual network to enhance network channel interaction and feature expression capabilities, enabling the network model to capture complex relationships and strong generalization ability. It achieved high classification accuracy on some mainstream image classification datasets.",

keywords = "attention mechanism, deep learning, feature rearrangement, image classification, residual network",

author = "Yuan Heng and Liu Jie and Jiang Wentao and Liu Wanjun",

year = "2025",

month = jan,

doi = "10.11834/jig.240061",

language = "繁体中文",

volume = "30",

pages = "110--129",

journal = "Journal of Image and Graphics",

issn = "1006-8961",

publisher = "Editorial and Publishing Board of JIG",

number = "1",

}

TY - JOUR

T1 - 特征重排列注意力机制的双池化残差分类网络

AU - Heng, Yuan

AU - Jie, Liu

AU - Wentao, Jiang

AU - Wanjun, Liu

PY - 2025/1

Y1 - 2025/1

N2 - Objective A residual classification network is a deep convolutional neural network architecture that plays an important role and has a considerable influence in the field of deep learning. It has become one of the commonly used network structures in various image classification tasks in the field of computer vision. To solve the problem of network degradation in deep networks, unlike the traditional method of simply stacking convolutional layers, residual networks innova-tively introduce residual connections, which directly add input features to output features through skip connections, and pass the original features directly to subsequent network layers. It forms a shortcut path, thereby preserving and utilizing feature information better. Although residual classification network effectively solves the problems of gradient explosion and vanishing during deep network training, when the output dimension of the residual block does not match the input dimension, convolution maps are needed to ensure the same dimensions, which causes a large number of pixels on the channel matrix in the residual module to be skipped, resulting in the problem of feature information loss. In addition, correlation exists between image channels, and a fixed order of channels may lead to feature bias, making it difficult to fully utilize information from other channels and limiting the model’s ability to express key features. In response to the above issues, this article proposes a double pooling residual classification network of feature reordering attention mechanism (FDPRNet). Method FDPRNet is based on the ResNet-34 residual network. First, the kernel size of the first convolutional layer is changed from 7 × 7 to 3 × 3. This change is made because, for relatively small images, larger convolutional kernels can cause the receptive field to become larger, capturing too much useless contextual information. Time, the maximum pooling layer is removed to prevent the feature map from shrinking further, retaining more image information, avoiding information loss caused by pooling operations, and making it easier for subsequent network layers to extract features better. Then, a feature reordering attention module (FRAM) is proposed to group the feature map channels and perform inter-group and intra-group reordering so that adjacent channels are no longer connected, and the intra-group channels are grouped in a sequence of equal differences with a step size of 1. This operation can not only disrupt the order of some original channels before and after but also preserve the relationship between some channels before and after, introducing a certain degree of randomness, allowing the model to comprehensively consider the interaction between different channels, and avoiding excessive dependence on specific channels. The features of each channel combination are extracted and spliced by one-dimensional convolution, and then the sigmoid activation function is used to obtain the weights of the rearranged features, which are multiplied element by element with the input features to obtain the feature map of the feature rearranged attention mechanism. Finally, a double pooling residual (DPR) module is proposed, which uses both maximum pooling and average pooling to perform parallel operations on feature maps. This module obtains both salient and typical features of the input images, enhancing the expressive power of features and helping the network capture important information better in the images, thereby improving model performance. Element-by-element summation and convolutional mapping on the after-pooling feature maps are performed to extract key features, reduce the size of the feature maps, and ensure that the channel matrices are capable of element-level summation operations in residual concatenation. Result In the CIFAR-100, CIFAR-10, SVHN, Flowers-102, and NWPU-RESISC45 datasets, compared with the original model ResNet-34, the accuracy of ResNet-34 with the addition of FRAM is improved by 1. 66%, 0. 19%, 0. 13%, 4. 28%, and 2. 00%, respectively. The accuracy of ResNet-34 with the addition of DPR is improved by 1. 7%, 0. 26%, 0. 12%, 3. 18%, and 1. 31%, respec-tively. The accuracy of FDPRNet, which is the combination of the FRAM and DPR modules, is improved by 2. 07%, 0. 3%, 0. 17%, 8. 31%, and 2. 47%, respectively. Compared with four attention mechanisms—squeeze and excitation, efficient channel attention, coordinate attention, and convolutional block attention module, the accuracy of FRAM is improved by an average of 0. 72%, 1. 28%, and 1. 46% in the CIFAR-100, Flowers-102, and STL-10 datasets. In summary, whether on small or large, less categorized, or more categorized datasets, both the FRAM and DPR modules contribute to the improvement of recognition accuracy in the ResNet-34 network. The combination of the two modules—FDPR—has the best effect on improving the recognition rate of the network and achieves a significant improvement in accuracy compared with other image classification networks. Conclusion The proposed FDPRNet can enhance the information exchange within the image channel and reduce feature loss. It not only shows high classification accuracy but also effectively enhances the network’s feature learning ability and model generalization ability. The main contributions of this article are as follows:1) FRAM is proposed, which breaks the connections between the original channels and groups them according to certain rules. Learning the weights of channel combinations in different orders ensures that the channels between different groups interact without losing the front and back connections between all channels, achieving information exchange and channel crossing within the feature map, enhancing the interaction between features, better capturing the correlation between contextual information and features, and improving the accuracy of model classification. 2) DPR is proposed, which replaces the skip connections in the original residual block with a DPR module, solving the problem of feature information loss caused by a large number of pixel points being skipped in the channel matrix during the skip connections in the residual module. Using dual pooling to obtain salient and typical features of input images can not only enhance the expression ability of features but also help the network better capture important information in images and improve model classification performance. 3) The proposed FDPRNet inserts two modules—FRAM and DPR—into the residual network to enhance network channel interaction and feature expression capabilities, enabling the network model to capture complex relationships and strong generalization ability. It achieved high classification accuracy on some mainstream image classification datasets.

AB - Objective A residual classification network is a deep convolutional neural network architecture that plays an important role and has a considerable influence in the field of deep learning. It has become one of the commonly used network structures in various image classification tasks in the field of computer vision. To solve the problem of network degradation in deep networks, unlike the traditional method of simply stacking convolutional layers, residual networks innova-tively introduce residual connections, which directly add input features to output features through skip connections, and pass the original features directly to subsequent network layers. It forms a shortcut path, thereby preserving and utilizing feature information better. Although residual classification network effectively solves the problems of gradient explosion and vanishing during deep network training, when the output dimension of the residual block does not match the input dimension, convolution maps are needed to ensure the same dimensions, which causes a large number of pixels on the channel matrix in the residual module to be skipped, resulting in the problem of feature information loss. In addition, correlation exists between image channels, and a fixed order of channels may lead to feature bias, making it difficult to fully utilize information from other channels and limiting the model’s ability to express key features. In response to the above issues, this article proposes a double pooling residual classification network of feature reordering attention mechanism (FDPRNet). Method FDPRNet is based on the ResNet-34 residual network. First, the kernel size of the first convolutional layer is changed from 7 × 7 to 3 × 3. This change is made because, for relatively small images, larger convolutional kernels can cause the receptive field to become larger, capturing too much useless contextual information. Time, the maximum pooling layer is removed to prevent the feature map from shrinking further, retaining more image information, avoiding information loss caused by pooling operations, and making it easier for subsequent network layers to extract features better. Then, a feature reordering attention module (FRAM) is proposed to group the feature map channels and perform inter-group and intra-group reordering so that adjacent channels are no longer connected, and the intra-group channels are grouped in a sequence of equal differences with a step size of 1. This operation can not only disrupt the order of some original channels before and after but also preserve the relationship between some channels before and after, introducing a certain degree of randomness, allowing the model to comprehensively consider the interaction between different channels, and avoiding excessive dependence on specific channels. The features of each channel combination are extracted and spliced by one-dimensional convolution, and then the sigmoid activation function is used to obtain the weights of the rearranged features, which are multiplied element by element with the input features to obtain the feature map of the feature rearranged attention mechanism. Finally, a double pooling residual (DPR) module is proposed, which uses both maximum pooling and average pooling to perform parallel operations on feature maps. This module obtains both salient and typical features of the input images, enhancing the expressive power of features and helping the network capture important information better in the images, thereby improving model performance. Element-by-element summation and convolutional mapping on the after-pooling feature maps are performed to extract key features, reduce the size of the feature maps, and ensure that the channel matrices are capable of element-level summation operations in residual concatenation. Result In the CIFAR-100, CIFAR-10, SVHN, Flowers-102, and NWPU-RESISC45 datasets, compared with the original model ResNet-34, the accuracy of ResNet-34 with the addition of FRAM is improved by 1. 66%, 0. 19%, 0. 13%, 4. 28%, and 2. 00%, respectively. The accuracy of ResNet-34 with the addition of DPR is improved by 1. 7%, 0. 26%, 0. 12%, 3. 18%, and 1. 31%, respec-tively. The accuracy of FDPRNet, which is the combination of the FRAM and DPR modules, is improved by 2. 07%, 0. 3%, 0. 17%, 8. 31%, and 2. 47%, respectively. Compared with four attention mechanisms—squeeze and excitation, efficient channel attention, coordinate attention, and convolutional block attention module, the accuracy of FRAM is improved by an average of 0. 72%, 1. 28%, and 1. 46% in the CIFAR-100, Flowers-102, and STL-10 datasets. In summary, whether on small or large, less categorized, or more categorized datasets, both the FRAM and DPR modules contribute to the improvement of recognition accuracy in the ResNet-34 network. The combination of the two modules—FDPR—has the best effect on improving the recognition rate of the network and achieves a significant improvement in accuracy compared with other image classification networks. Conclusion The proposed FDPRNet can enhance the information exchange within the image channel and reduce feature loss. It not only shows high classification accuracy but also effectively enhances the network’s feature learning ability and model generalization ability. The main contributions of this article are as follows:1) FRAM is proposed, which breaks the connections between the original channels and groups them according to certain rules. Learning the weights of channel combinations in different orders ensures that the channels between different groups interact without losing the front and back connections between all channels, achieving information exchange and channel crossing within the feature map, enhancing the interaction between features, better capturing the correlation between contextual information and features, and improving the accuracy of model classification. 2) DPR is proposed, which replaces the skip connections in the original residual block with a DPR module, solving the problem of feature information loss caused by a large number of pixel points being skipped in the channel matrix during the skip connections in the residual module. Using dual pooling to obtain salient and typical features of input images can not only enhance the expression ability of features but also help the network better capture important information in images and improve model classification performance. 3) The proposed FDPRNet inserts two modules—FRAM and DPR—into the residual network to enhance network channel interaction and feature expression capabilities, enabling the network model to capture complex relationships and strong generalization ability. It achieved high classification accuracy on some mainstream image classification datasets.

KW - attention mechanism

KW - deep learning

KW - feature rearrangement

KW - image classification

KW - residual network

UR - http://www.scopus.com/inward/record.url?scp=85216982458&partnerID=8YFLogxK

U2 - 10.11834/jig.240061

DO - 10.11834/jig.240061

M3 - 文章

AN - SCOPUS:85216982458

SN - 1006-8961

VL - 30

SP - 110

EP - 129

JO - Journal of Image and Graphics

JF - Journal of Image and Graphics

IS - 1

ER -

特征重排列注意力机制的双池化残差分类网络

Abstract

Access to Document

Other files and links

Fingerprint

Cite this