PurTreeClust: A clustering algorithm for customer segmentation from massive customer transaction data

Xiaojun Chen, Yixiang Fang, Min Yang, Feiping Nie, Zhou Zhao, Joshua Zhexue Huang

Research output: Contribution to journalArticlepeer-review

52 Scopus citations

Abstract

Clustering of customer transaction data is an important procedure to analyze customer behaviors in retail and e-commerce companies. Note that products from companies are often organized as a product tree, in which the leaf nodes are goods to sell, and the internal nodes (except root node) could be multiple product categories. Based on this tree, we propose the 'personalized product tree', named purchase tree, to represent a customer's transaction records. So the customers' transaction data set can be compressed into a set of purchase trees. We propose a partitional clustering algorithm, named PurTreeClust, for fast clustering of purchase trees. A new distance metric is proposed to effectively compute the distance between two purchase trees. To cluster the purchase tree data, we first rank the purchase trees as candidate representative trees with a novel separate density, and then select the top k customers as the representatives of k customer groups. Finally, the clustering results are obtained by assigning each customer to the nearest representative. We also propose a gap statistic based method to evaluate the number of clusters. A series of experiments were conducted on ten real-life transaction data sets, and experimental results show the superior performance of the proposed method.

Original languageEnglish
Pages (from-to)559-572
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume30
Issue number3
DOIs
StatePublished - 2018

Keywords

  • Clustering transaction data
  • Clustering trees
  • Customer segmentation
  • Purchase tree

Fingerprint

Dive into the research topics of 'PurTreeClust: A clustering algorithm for customer segmentation from massive customer transaction data'. Together they form a unique fingerprint.

Cite this