TY - JOUR
T1 - PurTreeClust
T2 - A clustering algorithm for customer segmentation from massive customer transaction data
AU - Chen, Xiaojun
AU - Fang, Yixiang
AU - Yang, Min
AU - Nie, Feiping
AU - Zhao, Zhou
AU - Huang, Joshua Zhexue
N1 - Publisher Copyright:
© 1989-2012 IEEE.
PY - 2018
Y1 - 2018
N2 - Clustering of customer transaction data is an important procedure to analyze customer behaviors in retail and e-commerce companies. Note that products from companies are often organized as a product tree, in which the leaf nodes are goods to sell, and the internal nodes (except root node) could be multiple product categories. Based on this tree, we propose the 'personalized product tree', named purchase tree, to represent a customer's transaction records. So the customers' transaction data set can be compressed into a set of purchase trees. We propose a partitional clustering algorithm, named PurTreeClust, for fast clustering of purchase trees. A new distance metric is proposed to effectively compute the distance between two purchase trees. To cluster the purchase tree data, we first rank the purchase trees as candidate representative trees with a novel separate density, and then select the top k customers as the representatives of k customer groups. Finally, the clustering results are obtained by assigning each customer to the nearest representative. We also propose a gap statistic based method to evaluate the number of clusters. A series of experiments were conducted on ten real-life transaction data sets, and experimental results show the superior performance of the proposed method.
AB - Clustering of customer transaction data is an important procedure to analyze customer behaviors in retail and e-commerce companies. Note that products from companies are often organized as a product tree, in which the leaf nodes are goods to sell, and the internal nodes (except root node) could be multiple product categories. Based on this tree, we propose the 'personalized product tree', named purchase tree, to represent a customer's transaction records. So the customers' transaction data set can be compressed into a set of purchase trees. We propose a partitional clustering algorithm, named PurTreeClust, for fast clustering of purchase trees. A new distance metric is proposed to effectively compute the distance between two purchase trees. To cluster the purchase tree data, we first rank the purchase trees as candidate representative trees with a novel separate density, and then select the top k customers as the representatives of k customer groups. Finally, the clustering results are obtained by assigning each customer to the nearest representative. We also propose a gap statistic based method to evaluate the number of clusters. A series of experiments were conducted on ten real-life transaction data sets, and experimental results show the superior performance of the proposed method.
KW - Clustering transaction data
KW - Clustering trees
KW - Customer segmentation
KW - Purchase tree
UR - http://www.scopus.com/inward/record.url?scp=85032743244&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2017.2763620
DO - 10.1109/TKDE.2017.2763620
M3 - 文章
AN - SCOPUS:85032743244
SN - 1041-4347
VL - 30
SP - 559
EP - 572
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 3
ER -