Frequent itemset mining with parallel RDBMS

Xuequn Shang; Kai Uwe Sattler

doi:10.1007/11430919_63

Frequent itemset mining with parallel RDBMS

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation. We investigate approaches based on SQL for the problem of finding frequent patterns from a transaction table, including an algorithm that we recently proposed, called Ppropad (Parallel PROjection PAttern Discovery). Ppropad successively projects the transaction table into frequent itemsets to avoid making multiple passes over the large original transaction table and generating a huge sets of candidates. We have built a parallel database system with DB2 and made performance evaluation on it. We prove that data mining with SQL can achieve sufficient performance by the utilization of database tuning.

Original language	English
Title of host publication	Advances in Knowledge Discovery and Data Mining - 9th Pacific-Asia Conference, PAKDD 2005, Proceedings
Publisher	Springer Verlag
Pages	539-544
Number of pages	6
ISBN (Print)	3540260765, 9783540260769
DOIs	https://doi.org/10.1007/11430919_63
State	Published - 2005
Externally published	Yes
Event	9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2005 - Hanoi, Viet Nam Duration: 18 May 2005 → 20 May 2005

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	3518 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2005
Country/Territory	Viet Nam
City	Hanoi
Period	18/05/05 → 20/05/05

Access to Document

10.1007/11430919_63

Cite this

Shang, X., & Sattler, K. U. (2005). Frequent itemset mining with parallel RDBMS. In Advances in Knowledge Discovery and Data Mining - 9th Pacific-Asia Conference, PAKDD 2005, Proceedings (pp. 539-544). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3518 LNAI). Springer Verlag. https://doi.org/10.1007/11430919_63

@inproceedings{ff8a9b355e154832baf5cf697893891d,

title = "Frequent itemset mining with parallel RDBMS",

abstract = "Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation. We investigate approaches based on SQL for the problem of finding frequent patterns from a transaction table, including an algorithm that we recently proposed, called Ppropad (Parallel PROjection PAttern Discovery). Ppropad successively projects the transaction table into frequent itemsets to avoid making multiple passes over the large original transaction table and generating a huge sets of candidates. We have built a parallel database system with DB2 and made performance evaluation on it. We prove that data mining with SQL can achieve sufficient performance by the utilization of database tuning.",

author = "Xuequn Shang and Sattler, {Kai Uwe}",

year = "2005",

doi = "10.1007/11430919_63",

language = "英语",

isbn = "3540260765",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "539--544",

booktitle = "Advances in Knowledge Discovery and Data Mining - 9th Pacific-Asia Conference, PAKDD 2005, Proceedings",

note = "9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2005 ; Conference date: 18-05-2005 Through 20-05-2005",

}

Shang, X & Sattler, KU 2005, Frequent itemset mining with parallel RDBMS. in Advances in Knowledge Discovery and Data Mining - 9th Pacific-Asia Conference, PAKDD 2005, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3518 LNAI, Springer Verlag, pp. 539-544, 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2005, Hanoi, Viet Nam, 18/05/05. https://doi.org/10.1007/11430919_63

Frequent itemset mining with parallel RDBMS. / Shang, Xuequn; Sattler, Kai Uwe.
Advances in Knowledge Discovery and Data Mining - 9th Pacific-Asia Conference, PAKDD 2005, Proceedings. Springer Verlag, 2005. p. 539-544 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3518 LNAI).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Frequent itemset mining with parallel RDBMS

AU - Shang, Xuequn

AU - Sattler, Kai Uwe

PY - 2005

Y1 - 2005

N2 - Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation. We investigate approaches based on SQL for the problem of finding frequent patterns from a transaction table, including an algorithm that we recently proposed, called Ppropad (Parallel PROjection PAttern Discovery). Ppropad successively projects the transaction table into frequent itemsets to avoid making multiple passes over the large original transaction table and generating a huge sets of candidates. We have built a parallel database system with DB2 and made performance evaluation on it. We prove that data mining with SQL can achieve sufficient performance by the utilization of database tuning.

AB - Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation. We investigate approaches based on SQL for the problem of finding frequent patterns from a transaction table, including an algorithm that we recently proposed, called Ppropad (Parallel PROjection PAttern Discovery). Ppropad successively projects the transaction table into frequent itemsets to avoid making multiple passes over the large original transaction table and generating a huge sets of candidates. We have built a parallel database system with DB2 and made performance evaluation on it. We prove that data mining with SQL can achieve sufficient performance by the utilization of database tuning.

UR - http://www.scopus.com/inward/record.url?scp=26944468268&partnerID=8YFLogxK

U2 - 10.1007/11430919_63

DO - 10.1007/11430919_63

M3 - 会议稿件

AN - SCOPUS:26944468268

SN - 3540260765

SN - 9783540260769

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 539

EP - 544

BT - Advances in Knowledge Discovery and Data Mining - 9th Pacific-Asia Conference, PAKDD 2005, Proceedings

PB - Springer Verlag

T2 - 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2005

Y2 - 18 May 2005 through 20 May 2005

ER -

Frequent itemset mining with parallel RDBMS

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this