Does GCL Need a Large Number of Negative Samples? Enhancing Graph Contrastive Learning with Effective and Efficient Negative Sampling

Yongqi Huang; Jitao Zhao; Dongxiao He; Di Jin; Yuxiao Huang; Zhen Wang

doi:10.1609/aaai.v39i16.33925

Does GCL Need a Large Number of Negative Samples? Enhancing Graph Contrastive Learning with Effective and Efficient Negative Sampling

Yongqi Huang, Jitao Zhao, Dongxiao He, Di Jin, Yuxiao Huang, Zhen Wang

School of Cybersecurity

Research output: Contribution to journal › Conference article › peer-review

Abstract

Graph Contrastive Learning (GCL) aims to self-supervised learn low-dimensional graph representations, primarily through instance discrimination, which involves manually mining positive and negative pairs from graphs, increasing the similarity of positive pairs while decreasing negative pairs. Drawing from the success of Contrastive Learning (CL) in other domains, a consensus has been reached that the effectiveness of GCLs depends on a large number of negative pairs. As a result, despite the significant computational overhead, GCLs typically leverage as many negative node pairs as possible to improve model performance. However, given that nodes within a graph are interconnected, we argue that nodes cannot be treated as independent instances. Therefore, we challenge this consensus: Does employing more negative nodes lead to a more effective GCL model? To answer this, we explore the role of negative nodes in the commonly used InfoNCE loss for GCL and observe that: (1) Counterintuitively, a large number of negative nodes can actually hinder the model’s ability to distinguish nodes with different semantics. (2) A smaller number of high-quality and non-topologically coupled negative nodes are sufficient to enhance the discriminability of representations. Based on these findings, we propose a new method called GCL with Effective and Efficient Negative samples, E2Neg, which learns discriminative representations using only a very small set of representative negative samples. E2Neg significantly reduces computational overhead and speeds up model training. We demonstrate the effectiveness and efficiency of E2Neg across multiple datasets compared to other GCL methods.

Original language	English
Pages (from-to)	17511-17518
Number of pages	8
Journal	Proceedings of the AAAI Conference on Artificial Intelligence
Volume	39
Issue number	16
DOIs	https://doi.org/10.1609/aaai.v39i16.33925
State	Published - 11 Apr 2025
Event	39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, United States Duration: 25 Feb 2025 → 4 Mar 2025

Access to Document

10.1609/aaai.v39i16.33925

Cite this

@article{4a63b8ca566e46fa9aa5368ab4249bbc,

title = "Does GCL Need a Large Number of Negative Samples? Enhancing Graph Contrastive Learning with Effective and Efficient Negative Sampling",

abstract = "Graph Contrastive Learning (GCL) aims to self-supervised learn low-dimensional graph representations, primarily through instance discrimination, which involves manually mining positive and negative pairs from graphs, increasing the similarity of positive pairs while decreasing negative pairs. Drawing from the success of Contrastive Learning (CL) in other domains, a consensus has been reached that the effectiveness of GCLs depends on a large number of negative pairs. As a result, despite the significant computational overhead, GCLs typically leverage as many negative node pairs as possible to improve model performance. However, given that nodes within a graph are interconnected, we argue that nodes cannot be treated as independent instances. Therefore, we challenge this consensus: Does employing more negative nodes lead to a more effective GCL model? To answer this, we explore the role of negative nodes in the commonly used InfoNCE loss for GCL and observe that: (1) Counterintuitively, a large number of negative nodes can actually hinder the model{\textquoteright}s ability to distinguish nodes with different semantics. (2) A smaller number of high-quality and non-topologically coupled negative nodes are sufficient to enhance the discriminability of representations. Based on these findings, we propose a new method called GCL with Effective and Efficient Negative samples, E2Neg, which learns discriminative representations using only a very small set of representative negative samples. E2Neg significantly reduces computational overhead and speeds up model training. We demonstrate the effectiveness and efficiency of E2Neg across multiple datasets compared to other GCL methods.",

author = "Yongqi Huang and Jitao Zhao and Dongxiao He and Di Jin and Yuxiao Huang and Zhen Wang",

note = "Publisher Copyright: Copyright {\textcopyright} 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 ; Conference date: 25-02-2025 Through 04-03-2025",

year = "2025",

month = apr,

day = "11",

doi = "10.1609/aaai.v39i16.33925",

language = "英语",

volume = "39",

pages = "17511--17518",

journal = "Proceedings of the AAAI Conference on Artificial Intelligence",

issn = "2159-5399",

publisher = "Association for the Advancement of Artificial Intelligence",

number = "16",

}

Does GCL Need a Large Number of Negative Samples? Enhancing Graph Contrastive Learning with Effective and Efficient Negative Sampling. / Huang, Yongqi; Zhao, Jitao; He, Dongxiao et al.
In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39, No. 16, 11.04.2025, p. 17511-17518.

Research output: Contribution to journal › Conference article › peer-review

TY - JOUR

T1 - Does GCL Need a Large Number of Negative Samples? Enhancing Graph Contrastive Learning with Effective and Efficient Negative Sampling

AU - Huang, Yongqi

AU - Zhao, Jitao

AU - He, Dongxiao

AU - Jin, Di

AU - Huang, Yuxiao

AU - Wang, Zhen

PY - 2025/4/11

Y1 - 2025/4/11

N2 - Graph Contrastive Learning (GCL) aims to self-supervised learn low-dimensional graph representations, primarily through instance discrimination, which involves manually mining positive and negative pairs from graphs, increasing the similarity of positive pairs while decreasing negative pairs. Drawing from the success of Contrastive Learning (CL) in other domains, a consensus has been reached that the effectiveness of GCLs depends on a large number of negative pairs. As a result, despite the significant computational overhead, GCLs typically leverage as many negative node pairs as possible to improve model performance. However, given that nodes within a graph are interconnected, we argue that nodes cannot be treated as independent instances. Therefore, we challenge this consensus: Does employing more negative nodes lead to a more effective GCL model? To answer this, we explore the role of negative nodes in the commonly used InfoNCE loss for GCL and observe that: (1) Counterintuitively, a large number of negative nodes can actually hinder the model’s ability to distinguish nodes with different semantics. (2) A smaller number of high-quality and non-topologically coupled negative nodes are sufficient to enhance the discriminability of representations. Based on these findings, we propose a new method called GCL with Effective and Efficient Negative samples, E2Neg, which learns discriminative representations using only a very small set of representative negative samples. E2Neg significantly reduces computational overhead and speeds up model training. We demonstrate the effectiveness and efficiency of E2Neg across multiple datasets compared to other GCL methods.

AB - Graph Contrastive Learning (GCL) aims to self-supervised learn low-dimensional graph representations, primarily through instance discrimination, which involves manually mining positive and negative pairs from graphs, increasing the similarity of positive pairs while decreasing negative pairs. Drawing from the success of Contrastive Learning (CL) in other domains, a consensus has been reached that the effectiveness of GCLs depends on a large number of negative pairs. As a result, despite the significant computational overhead, GCLs typically leverage as many negative node pairs as possible to improve model performance. However, given that nodes within a graph are interconnected, we argue that nodes cannot be treated as independent instances. Therefore, we challenge this consensus: Does employing more negative nodes lead to a more effective GCL model? To answer this, we explore the role of negative nodes in the commonly used InfoNCE loss for GCL and observe that: (1) Counterintuitively, a large number of negative nodes can actually hinder the model’s ability to distinguish nodes with different semantics. (2) A smaller number of high-quality and non-topologically coupled negative nodes are sufficient to enhance the discriminability of representations. Based on these findings, we propose a new method called GCL with Effective and Efficient Negative samples, E2Neg, which learns discriminative representations using only a very small set of representative negative samples. E2Neg significantly reduces computational overhead and speeds up model training. We demonstrate the effectiveness and efficiency of E2Neg across multiple datasets compared to other GCL methods.

UR - http://www.scopus.com/inward/record.url?scp=105003903724&partnerID=8YFLogxK

U2 - 10.1609/aaai.v39i16.33925

DO - 10.1609/aaai.v39i16.33925

M3 - 会议文章

AN - SCOPUS:105003903724

SN - 2159-5399

VL - 39

SP - 17511

EP - 17518

JO - Proceedings of the AAAI Conference on Artificial Intelligence

JF - Proceedings of the AAAI Conference on Artificial Intelligence

IS - 16

T2 - 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025

Y2 - 25 February 2025 through 4 March 2025

ER -

Does GCL Need a Large Number of Negative Samples? Enhancing Graph Contrastive Learning with Effective and Efficient Negative Sampling

Abstract

Access to Document

Other files and links

Fingerprint

Cite this