TY - GEN
T1 - Parallel Image Scaling Density-based Clustering
AU - Bi, Wenhao
AU - Zhang, An
AU - Gao, Fei
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/10/11
Y1 - 2020/10/11
N2 - Clustering is one of the most important methods to discover the intrinsic grouping in a set of unlabeled data. As ways of getting data are more various and easier, the amount of data processed is increasing exponentially and the data is more likely to be located at different clients. Traditional clustering methods cannot process the large dataset one time due to the limit of memories. In this paper, an Image Scaling Density-based Clustering (ISDC) algorithm is proposed. ISDC can process data by a client alone as well as process in parallel by several clients to deal with data located at different clients. The ISDC algorithm does not need any parameters to be designated manually. The parameters are determined by the algorithm based on the statistical features of dataset. In Parallel ISDC or PISDC, each data block located at different client is clustered alone to form intermediate clusters. By border detection algorithm, representative clusters are formed by the points that are at the edge of intermediate clusters. Then, in global clustering, representative clusters from all clients are merged by the server. The border detection algorithm reduces the communication cost between clients and the server, as well as increases the efficiency of global clustering. At last, the server feeds back the clustering information to clients to complete clustering. Our experimental results verified the effectiveness and efficiency of PISDC and ISDC.
AB - Clustering is one of the most important methods to discover the intrinsic grouping in a set of unlabeled data. As ways of getting data are more various and easier, the amount of data processed is increasing exponentially and the data is more likely to be located at different clients. Traditional clustering methods cannot process the large dataset one time due to the limit of memories. In this paper, an Image Scaling Density-based Clustering (ISDC) algorithm is proposed. ISDC can process data by a client alone as well as process in parallel by several clients to deal with data located at different clients. The ISDC algorithm does not need any parameters to be designated manually. The parameters are determined by the algorithm based on the statistical features of dataset. In Parallel ISDC or PISDC, each data block located at different client is clustered alone to form intermediate clusters. By border detection algorithm, representative clusters are formed by the points that are at the edge of intermediate clusters. Then, in global clustering, representative clusters from all clients are merged by the server. The border detection algorithm reduces the communication cost between clients and the server, as well as increases the efficiency of global clustering. At last, the server feeds back the clustering information to clients to complete clustering. Our experimental results verified the effectiveness and efficiency of PISDC and ISDC.
KW - clustering
KW - image scaling
KW - large-size datasets
KW - parallel clustering algorithm
UR - https://www.scopus.com/pages/publications/85098862306
U2 - 10.1109/SMC42975.2020.9282985
DO - 10.1109/SMC42975.2020.9282985
M3 - 会议稿件
AN - SCOPUS:85098862306
T3 - Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
SP - 2084
EP - 2091
BT - 2020 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2020
Y2 - 11 October 2020 through 14 October 2020
ER -