SQL based frequent pattern mining without candidate generation

Xuequn Shang; Kai Uwe Sattler; Ingolf Geist

doi:10.1145/967900.968027

SQL based frequent pattern mining without candidate generation

Xuequn Shang, Kai Uwe Sattler, Ingolf Geist

Otto von Guericke University Magdeburg

Research output: Contribution to conference › Paper › peer-review

10 Scopus citations

Abstract

Scalable data mining in large databases is one of today's real challenges to database research area. The integration of data mining with database systems is an essential component for any successful large-scale data mining application. A fundamental component in data mining tasks is finding frequent patterns in a given dataset. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. In this study we present an evaluation of SQL based frequent pattern mining with a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short patterns without candidate generation. We examine some techniques to improve performance. In addition, we have made performance evaluation on commercial DBMS (IBM DB2 UDB EEE V8).

Original language	English
Pages	618-619
Number of pages	2
DOIs	https://doi.org/10.1145/967900.968027
State	Published - 2004
Externally published	Yes
Event	Applied Computing 2004 - Proceedings of the 2004 ACM Symposium on Applied Computing - Nicosia, Cyprus Duration: 14 Mar 2004 → 17 Mar 2004

Conference

Conference	Applied Computing 2004 - Proceedings of the 2004 ACM Symposium on Applied Computing
Country/Territory	Cyprus
City	Nicosia
Period	14/03/04 → 17/03/04

Keywords

Data mining
Database mining
Frequent pattern mining
Mining algorithms in SQL

Access to Document

10.1145/967900.968027

Cite this

@conference{a8ecace6d009412e89f5b3950d701b48,

title = "SQL based frequent pattern mining without candidate generation",

abstract = "Scalable data mining in large databases is one of today's real challenges to database research area. The integration of data mining with database systems is an essential component for any successful large-scale data mining application. A fundamental component in data mining tasks is finding frequent patterns in a given dataset. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. In this study we present an evaluation of SQL based frequent pattern mining with a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short patterns without candidate generation. We examine some techniques to improve performance. In addition, we have made performance evaluation on commercial DBMS (IBM DB2 UDB EEE V8).",

keywords = "Data mining, Database mining, Frequent pattern mining, Mining algorithms in SQL",

author = "Xuequn Shang and Sattler, {Kai Uwe} and Ingolf Geist",

year = "2004",

doi = "10.1145/967900.968027",

language = "英语",

pages = "618--619",

note = "Applied Computing 2004 - Proceedings of the 2004 ACM Symposium on Applied Computing ; Conference date: 14-03-2004 Through 17-03-2004",

}

TY - CONF

T1 - SQL based frequent pattern mining without candidate generation

AU - Shang, Xuequn

AU - Sattler, Kai Uwe

AU - Geist, Ingolf

PY - 2004

Y1 - 2004

N2 - Scalable data mining in large databases is one of today's real challenges to database research area. The integration of data mining with database systems is an essential component for any successful large-scale data mining application. A fundamental component in data mining tasks is finding frequent patterns in a given dataset. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. In this study we present an evaluation of SQL based frequent pattern mining with a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short patterns without candidate generation. We examine some techniques to improve performance. In addition, we have made performance evaluation on commercial DBMS (IBM DB2 UDB EEE V8).

AB - Scalable data mining in large databases is one of today's real challenges to database research area. The integration of data mining with database systems is an essential component for any successful large-scale data mining application. A fundamental component in data mining tasks is finding frequent patterns in a given dataset. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. In this study we present an evaluation of SQL based frequent pattern mining with a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short patterns without candidate generation. We examine some techniques to improve performance. In addition, we have made performance evaluation on commercial DBMS (IBM DB2 UDB EEE V8).

KW - Data mining

KW - Database mining

KW - Frequent pattern mining

KW - Mining algorithms in SQL

UR - http://www.scopus.com/inward/record.url?scp=2442600326&partnerID=8YFLogxK

U2 - 10.1145/967900.968027

DO - 10.1145/967900.968027

M3 - 论文

AN - SCOPUS:2442600326

SP - 618

EP - 619

T2 - Applied Computing 2004 - Proceedings of the 2004 ACM Symposium on Applied Computing

Y2 - 14 March 2004 through 17 March 2004

ER -

SQL based frequent pattern mining without candidate generation

Abstract

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this