Depth-first frequent itemset mining in relational databases

Xuequn Shang; Kai Uwe Sattler

doi:10.1145/1066677.1066928

Depth-first frequent itemset mining in relational databases

Xuequn Shang, Kai Uwe Sattler

Research output: Contribution to conference › Paper › peer-review

7 Scopus citations

Abstract

Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation since the prohibitive nature of the cost associated with extracting knowledge, as well as the lack of suitable declarative query language support. We investigate approaches based on SQL for the problem of finding frequent patterns from a transaction table, including an algorithm that we recently proposed, called Propad (Pro-jection PAttern Discovery). Propad fundamentally differs from an Apriori-like candidate set generation-and-test approach. This approach successively projects the transaction table into frequent itemsets to avoid making multiple passes over the large original transaction table and generating a huge sets of candidates. We have made performance evaluation on DBMS (IBM DB2 UDB EEE V8) and compared the performance results with K-Way join approach proposed in [11] and SQL based FP-tree approach proposed in [13]. The experimental results show that our algorithm can get efficient performance.

Original language	English
Pages	1112-1117
Number of pages	6
DOIs	https://doi.org/10.1145/1066677.1066928
State	Published - 2005
Externally published	Yes
Event	20th Annual ACM Symposium on Applied Computing - Santa Fe, NM, United States Duration: 13 Mar 2005 → 17 Mar 2005

Conference

Conference	20th Annual ACM Symposium on Applied Computing
Country/Territory	United States
City	Santa Fe, NM
Period	13/03/05 → 17/03/05

Keywords

Data mining
Database mining
Frequent pattern mining
Mining algorithms in SQL

Access to Document

10.1145/1066677.1066928

Cite this

@conference{8c6746aa818e4b418d9594375cfa96cd,

title = "Depth-first frequent itemset mining in relational databases",

abstract = "Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation since the prohibitive nature of the cost associated with extracting knowledge, as well as the lack of suitable declarative query language support. We investigate approaches based on SQL for the problem of finding frequent patterns from a transaction table, including an algorithm that we recently proposed, called Propad (Pro-jection PAttern Discovery). Propad fundamentally differs from an Apriori-like candidate set generation-and-test approach. This approach successively projects the transaction table into frequent itemsets to avoid making multiple passes over the large original transaction table and generating a huge sets of candidates. We have made performance evaluation on DBMS (IBM DB2 UDB EEE V8) and compared the performance results with K-Way join approach proposed in [11] and SQL based FP-tree approach proposed in [13]. The experimental results show that our algorithm can get efficient performance.",

keywords = "Data mining, Database mining, Frequent pattern mining, Mining algorithms in SQL",

author = "Xuequn Shang and Sattler, {Kai Uwe}",

year = "2005",

doi = "10.1145/1066677.1066928",

language = "英语",

pages = "1112--1117",

note = "20th Annual ACM Symposium on Applied Computing ; Conference date: 13-03-2005 Through 17-03-2005",

}

TY - CONF

T1 - Depth-first frequent itemset mining in relational databases

AU - Shang, Xuequn

AU - Sattler, Kai Uwe

PY - 2005

Y1 - 2005

N2 - Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation since the prohibitive nature of the cost associated with extracting knowledge, as well as the lack of suitable declarative query language support. We investigate approaches based on SQL for the problem of finding frequent patterns from a transaction table, including an algorithm that we recently proposed, called Propad (Pro-jection PAttern Discovery). Propad fundamentally differs from an Apriori-like candidate set generation-and-test approach. This approach successively projects the transaction table into frequent itemsets to avoid making multiple passes over the large original transaction table and generating a huge sets of candidates. We have made performance evaluation on DBMS (IBM DB2 UDB EEE V8) and compared the performance results with K-Way join approach proposed in [11] and SQL based FP-tree approach proposed in [13]. The experimental results show that our algorithm can get efficient performance.

AB - Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation since the prohibitive nature of the cost associated with extracting knowledge, as well as the lack of suitable declarative query language support. We investigate approaches based on SQL for the problem of finding frequent patterns from a transaction table, including an algorithm that we recently proposed, called Propad (Pro-jection PAttern Discovery). Propad fundamentally differs from an Apriori-like candidate set generation-and-test approach. This approach successively projects the transaction table into frequent itemsets to avoid making multiple passes over the large original transaction table and generating a huge sets of candidates. We have made performance evaluation on DBMS (IBM DB2 UDB EEE V8) and compared the performance results with K-Way join approach proposed in [11] and SQL based FP-tree approach proposed in [13]. The experimental results show that our algorithm can get efficient performance.

KW - Data mining

KW - Database mining

KW - Frequent pattern mining

KW - Mining algorithms in SQL

UR - http://www.scopus.com/inward/record.url?scp=33644504485&partnerID=8YFLogxK

U2 - 10.1145/1066677.1066928

DO - 10.1145/1066677.1066928

M3 - 论文

AN - SCOPUS:33644504485

SP - 1112

EP - 1117

T2 - 20th Annual ACM Symposium on Applied Computing

Y2 - 13 March 2005 through 17 March 2005

ER -

Depth-first frequent itemset mining in relational databases

Abstract

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this