Abstract
Varying the input image scale allows convolutional networks to extract different features and learn richer image representations. This serves as a form of data augmentation and helps address the few-shot learning challenges. While historical few-shot learning methods have focused on multi-scale feature fusion using techniques such as random resizing or feature pyramids, the exploration of inter-scale feature differences has largely been overlooked. Unlike previous methods, we propose a novel few-shot learning approach, the Scale Parallax Network, which treats images at different resolutions as complementary sources of visual information. We adopt an image-pyramid-based structure to extract multi-scale feature representations and enhance the model representational capacity. Experimental results demonstrate that our method achieves state-of-the-art performance on the miniImageNet and tieredImageNet datasets.
| Original language | English |
|---|---|
| Article number | 112504 |
| Journal | Pattern Recognition |
| Volume | 172 |
| DOIs | |
| State | Published - Apr 2026 |
Keywords
- Few-shot learning
- Image pyramid
- Representation learning
- Scale parallax