SQL based frequent pattern mining without candidate generation

Xuequn Shang; Kai Uwe Sattler; Ingolf Geist

doi:10.1145/967900.968027

SQL based frequent pattern mining without candidate generation

Xuequn Shang, Kai Uwe Sattler, Ingolf Geist

Otto von Guericke University Magdeburg

科研成果: 会议稿件 › 论文 › 同行评审

10 引用（Scopus）

摘要

Scalable data mining in large databases is one of today's real challenges to database research area. The integration of data mining with database systems is an essential component for any successful large-scale data mining application. A fundamental component in data mining tasks is finding frequent patterns in a given dataset. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. In this study we present an evaluation of SQL based frequent pattern mining with a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short patterns without candidate generation. We examine some techniques to improve performance. In addition, we have made performance evaluation on commercial DBMS (IBM DB2 UDB EEE V8).

源语言	英语
页	618-619
页数	2
DOI	https://doi.org/10.1145/967900.968027
出版状态	已出版 - 2004
已对外发布	是
活动	Applied Computing 2004 - Proceedings of the 2004 ACM Symposium on Applied Computing - Nicosia, 塞浦路斯期限: 14 3月 2004 → 17 3月 2004

会议

会议	Applied Computing 2004 - Proceedings of the 2004 ACM Symposium on Applied Computing
国家/地区	塞浦路斯
市	Nicosia
时期	14/03/04 → 17/03/04

访问文件

10.1145/967900.968027

其它文件与链接

链接到 Scopus 的出版物

引用此

@conference{a8ecace6d009412e89f5b3950d701b48,

title = "SQL based frequent pattern mining without candidate generation",

abstract = "Scalable data mining in large databases is one of today's real challenges to database research area. The integration of data mining with database systems is an essential component for any successful large-scale data mining application. A fundamental component in data mining tasks is finding frequent patterns in a given dataset. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. In this study we present an evaluation of SQL based frequent pattern mining with a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short patterns without candidate generation. We examine some techniques to improve performance. In addition, we have made performance evaluation on commercial DBMS (IBM DB2 UDB EEE V8).",

keywords = "Data mining, Database mining, Frequent pattern mining, Mining algorithms in SQL",

author = "Xuequn Shang and Sattler, {Kai Uwe} and Ingolf Geist",

year = "2004",

doi = "10.1145/967900.968027",

language = "英语",

pages = "618--619",

note = "Applied Computing 2004 - Proceedings of the 2004 ACM Symposium on Applied Computing ; Conference date: 14-03-2004 Through 17-03-2004",

}

TY - CONF

T1 - SQL based frequent pattern mining without candidate generation

AU - Shang, Xuequn

AU - Sattler, Kai Uwe

AU - Geist, Ingolf

PY - 2004

Y1 - 2004

N2 - Scalable data mining in large databases is one of today's real challenges to database research area. The integration of data mining with database systems is an essential component for any successful large-scale data mining application. A fundamental component in data mining tasks is finding frequent patterns in a given dataset. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. In this study we present an evaluation of SQL based frequent pattern mining with a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short patterns without candidate generation. We examine some techniques to improve performance. In addition, we have made performance evaluation on commercial DBMS (IBM DB2 UDB EEE V8).

AB - Scalable data mining in large databases is one of today's real challenges to database research area. The integration of data mining with database systems is an essential component for any successful large-scale data mining application. A fundamental component in data mining tasks is finding frequent patterns in a given dataset. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. In this study we present an evaluation of SQL based frequent pattern mining with a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short patterns without candidate generation. We examine some techniques to improve performance. In addition, we have made performance evaluation on commercial DBMS (IBM DB2 UDB EEE V8).

KW - Data mining

KW - Database mining

KW - Frequent pattern mining

KW - Mining algorithms in SQL

UR - http://www.scopus.com/inward/record.url?scp=2442600326&partnerID=8YFLogxK

U2 - 10.1145/967900.968027

DO - 10.1145/967900.968027

M3 - 论文

AN - SCOPUS:2442600326

SP - 618

EP - 619

T2 - Applied Computing 2004 - Proceedings of the 2004 ACM Symposium on Applied Computing

Y2 - 14 March 2004 through 17 March 2004

ER -

SQL based frequent pattern mining without candidate generation

摘要

会议

访问文件

其它文件与链接

指纹

引用此