Frequent itemset mining with parallel RDBMS

Xuequn Shang, Kai Uwe Sattler

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation. We investigate approaches based on SQL for the problem of finding frequent patterns from a transaction table, including an algorithm that we recently proposed, called Ppropad (Parallel PROjection PAttern Discovery). Ppropad successively projects the transaction table into frequent itemsets to avoid making multiple passes over the large original transaction table and generating a huge sets of candidates. We have built a parallel database system with DB2 and made performance evaluation on it. We prove that data mining with SQL can achieve sufficient performance by the utilization of database tuning.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 9th Pacific-Asia Conference, PAKDD 2005, Proceedings
PublisherSpringer Verlag
Pages539-544
Number of pages6
ISBN (Print)3540260765, 9783540260769
DOIs
StatePublished - 2005
Externally publishedYes
Event9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2005 - Hanoi, Viet Nam
Duration: 18 May 200520 May 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3518 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2005
Country/TerritoryViet Nam
CityHanoi
Period18/05/0520/05/05

Fingerprint

Dive into the research topics of 'Frequent itemset mining with parallel RDBMS'. Together they form a unique fingerprint.

Cite this