Abstract
The widespread presence of multimodal fake news on social media platforms has severely impacted public order, making the automatic detection and filtering of such content a pressing issue. Although existing studies have attempted to integrate multimodal data for this task, they often struggle to effectively model cross-modal correlations. Most approaches focus on the global features of each modality and compute scalar similarities, which limits their capacity to learn and process comprehensive samples. To address this challenge, this paper introduces a novel cross-modal content correlation network. This method leverages salient objects from images and nouns from the text as the multimodal content, utilizing CLIP to extract generalizable features for similarity measurement, thereby enhancing cross-modal interaction. By applying convolution to the similarity matrix between nouns and image crops, the model captures learnable patterns of cross-modal content correlations that facilitate news classification, without relying on predefined scalar similarities or requiring supplementary information or auxiliary tasks. Experiments on two real-world datasets reveal that our method outperforms previous methods, achieving 3.1% and 1.9% gains in overall accuracy on Weibo and Twitter, respectively. The source code is available at https://github.com/cgao-comp/C3N.
Original language | English |
---|---|
Article number | 104120 |
Journal | Information Processing and Management |
Volume | 62 |
Issue number | 5 |
DOIs | |
State | Published - Sep 2025 |
Keywords
- Fake news detection
- Multimodal learning
- Neural network
- Social network