Abstract
Large graph datasets are becoming increasingly popular nowadays. For example, graphs like Web Graphs, Biological Networks, and Social Networks, are often at the scale of hundreds of billions or even a trillion edges, and they are continuously growing. How to mine and calculate these large-scale graph data efficiently is the primary task of studying complex network. Parallel computing technology is one of the most mature, widely used and feasible computing acceleration technologies. Graph partitioning is an effective way to improve the performance of parallel computing. The increasing popularity and ubiquity of various large graph datasets have caused renewed interest for graph partitioning. Existing graph partitioners either scale poorly against large graphs or disregard the impact of the underlying hardware topology. A few solutions have shown that the nonuniform network communication costs may affect the performance greatly. Since the cost of partitioning the entire graph is strictly prohibitive, there are some recent tentative works towards streaming graph partitioning which run faster, are easily parallelized, and can be incrementally updated. Most of the existing works on streaming partitioning assume that worker nodes within a cluster are homogeneous in nature. Unfortunately, this assumption does not always hold. Experiments show that these homogeneous algorithms suffer a significant performance degradation when running at heterogeneous environment. The research of graph partitioning is driven by the demand of practical application. Aiming at the distributed cluster in heterogeneous computing environment, we propose a streaming graph partitioning algorithm based on heterogeneous aware. The method not only considers the difference of network bandwidth and node compute ability in the cluster, but also considers the competition for shared resources between cores in high-speed network environment represented by InfiniBand. Taking BFS, SSSP and PageRank as examples, compared with the streaming algorithm without considering the heterogeneous environment, the efficiency of graph computing is improved by 38%, 45.7% and 61.8%, respectively. At the same time, in the process of streaming graph partitioning, aiming at the low efficiency of searching neighbor vertices in the cache, we design a cache searching algorithm with adjacent edge structure, which improves the efficiency of graph partitioning by 13.4% on average under the same conditions. Extensive experiments are conducted on a moderate sized computing cluster with real-world web and social network graphs. The results demonstrate that the proposed approach achieves significant improvement compared with the state-of-the-art solutions.
Translated title of the contribution | Research on Graph Partitioning in Heterogeneous Computing Environment |
---|---|
Original language | Chinese (Traditional) |
Pages (from-to) | 1751-1766 |
Number of pages | 16 |
Journal | Jisuanji Xuebao/Chinese Journal of Computers |
Volume | 44 |
Issue number | 8 |
DOIs | |
State | Published - Aug 2021 |
Externally published | Yes |