TY - JOUR
T1 - Accelerated Frequent Closed Sequential Pattern Mining for uncertain data
AU - You, Tao
AU - Sun, Yue
AU - Zhang, Ying
AU - Chen, Jinchao
AU - Zhang, Peng
AU - Yang, Mei
N1 - Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2022/10/15
Y1 - 2022/10/15
N2 - Data uncertainty has been taken into a consideration for mining and discovery of its hidden knowledge in a variety of applications. Due to the fact that the nature of closed sequences is closely related to possible world, more recent studies on uncertain closed sequential data mining has usually been challenged by the explosive possible worlds, which is exponential to the number of uncertain sequences in the database. Although basic Probabilistic Frequent Closed Sequences Mining (PFCSM-FF) strategy can solve this problem preliminarily, the inclusion–exclusion rules and closure checking methods used in PFCSM-FF makes mining algorithm very inefficient. And on this basis, another two improvements, PFCSM-CF and PFCSM-CC algorithms, are designed to reduce the search space and simplify the candidate sequence database, which significantly compress the computational scale. Substantial experiments on the real and synthetic datasets have demonstrated the efficiency improvement on the proposed PFCSM-CC and PFCSM-CF methods. Besides, the high usability of the proposed PFCSM-CC algorithm is further demonstrated according to the similarity of the time spent on existing probabilistic frequent sequence mining algorithm.
AB - Data uncertainty has been taken into a consideration for mining and discovery of its hidden knowledge in a variety of applications. Due to the fact that the nature of closed sequences is closely related to possible world, more recent studies on uncertain closed sequential data mining has usually been challenged by the explosive possible worlds, which is exponential to the number of uncertain sequences in the database. Although basic Probabilistic Frequent Closed Sequences Mining (PFCSM-FF) strategy can solve this problem preliminarily, the inclusion–exclusion rules and closure checking methods used in PFCSM-FF makes mining algorithm very inefficient. And on this basis, another two improvements, PFCSM-CF and PFCSM-CC algorithms, are designed to reduce the search space and simplify the candidate sequence database, which significantly compress the computational scale. Substantial experiments on the real and synthetic datasets have demonstrated the efficiency improvement on the proposed PFCSM-CC and PFCSM-CF methods. Besides, the high usability of the proposed PFCSM-CC algorithm is further demonstrated according to the similarity of the time spent on existing probabilistic frequent sequence mining algorithm.
KW - Frequent closed sequences
KW - Possible world semantics
KW - Uncertain database
UR - http://www.scopus.com/inward/record.url?scp=85133934264&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2022.117254
DO - 10.1016/j.eswa.2022.117254
M3 - 文献综述
AN - SCOPUS:85133934264
SN - 0957-4174
VL - 204
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 117254
ER -