SQL based frequent pattern mining without candidate generation

Xuequn Shang, Kai Uwe Sattler, Ingolf Geist

Research output: Contribution to conferencePaperpeer-review

10 Scopus citations

Abstract

Scalable data mining in large databases is one of today's real challenges to database research area. The integration of data mining with database systems is an essential component for any successful large-scale data mining application. A fundamental component in data mining tasks is finding frequent patterns in a given dataset. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. In this study we present an evaluation of SQL based frequent pattern mining with a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short patterns without candidate generation. We examine some techniques to improve performance. In addition, we have made performance evaluation on commercial DBMS (IBM DB2 UDB EEE V8).

Original languageEnglish
Pages618-619
Number of pages2
DOIs
StatePublished - 2004
Externally publishedYes
EventApplied Computing 2004 - Proceedings of the 2004 ACM Symposium on Applied Computing - Nicosia, Cyprus
Duration: 14 Mar 200417 Mar 2004

Conference

ConferenceApplied Computing 2004 - Proceedings of the 2004 ACM Symposium on Applied Computing
Country/TerritoryCyprus
CityNicosia
Period14/03/0417/03/04

Keywords

  • Data mining
  • Database mining
  • Frequent pattern mining
  • Mining algorithms in SQL

Fingerprint

Dive into the research topics of 'SQL based frequent pattern mining without candidate generation'. Together they form a unique fingerprint.

Cite this