Depth-first frequent itemset mining in relational databases

Xuequn Shang; Kai Uwe Sattler

doi:10.1145/1066677.1066928

Depth-first frequent itemset mining in relational databases

Xuequn Shang, Kai Uwe Sattler

科研成果: 会议稿件 › 论文 › 同行评审

7 引用（Scopus）

摘要

Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation since the prohibitive nature of the cost associated with extracting knowledge, as well as the lack of suitable declarative query language support. We investigate approaches based on SQL for the problem of finding frequent patterns from a transaction table, including an algorithm that we recently proposed, called Propad (Pro-jection PAttern Discovery). Propad fundamentally differs from an Apriori-like candidate set generation-and-test approach. This approach successively projects the transaction table into frequent itemsets to avoid making multiple passes over the large original transaction table and generating a huge sets of candidates. We have made performance evaluation on DBMS (IBM DB2 UDB EEE V8) and compared the performance results with K-Way join approach proposed in [11] and SQL based FP-tree approach proposed in [13]. The experimental results show that our algorithm can get efficient performance.

源语言	英语
页	1112-1117
页数	6
DOI	https://doi.org/10.1145/1066677.1066928
出版状态	已出版 - 2005
已对外发布	是
活动	20th Annual ACM Symposium on Applied Computing - Santa Fe, NM, 美国期限: 13 3月 2005 → 17 3月 2005

会议

会议	20th Annual ACM Symposium on Applied Computing
国家/地区	美国
市	Santa Fe, NM
时期	13/03/05 → 17/03/05

访问文件

10.1145/1066677.1066928

其它文件与链接

链接到 Scopus 的出版物

引用此

@conference{8c6746aa818e4b418d9594375cfa96cd,

title = "Depth-first frequent itemset mining in relational databases",

abstract = "Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation since the prohibitive nature of the cost associated with extracting knowledge, as well as the lack of suitable declarative query language support. We investigate approaches based on SQL for the problem of finding frequent patterns from a transaction table, including an algorithm that we recently proposed, called Propad (Pro-jection PAttern Discovery). Propad fundamentally differs from an Apriori-like candidate set generation-and-test approach. This approach successively projects the transaction table into frequent itemsets to avoid making multiple passes over the large original transaction table and generating a huge sets of candidates. We have made performance evaluation on DBMS (IBM DB2 UDB EEE V8) and compared the performance results with K-Way join approach proposed in [11] and SQL based FP-tree approach proposed in [13]. The experimental results show that our algorithm can get efficient performance.",

keywords = "Data mining, Database mining, Frequent pattern mining, Mining algorithms in SQL",

author = "Xuequn Shang and Sattler, {Kai Uwe}",

year = "2005",

doi = "10.1145/1066677.1066928",

language = "英语",

pages = "1112--1117",

note = "20th Annual ACM Symposium on Applied Computing ; Conference date: 13-03-2005 Through 17-03-2005",

}

TY - CONF

T1 - Depth-first frequent itemset mining in relational databases

AU - Shang, Xuequn

AU - Sattler, Kai Uwe

PY - 2005

Y1 - 2005

N2 - Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation since the prohibitive nature of the cost associated with extracting knowledge, as well as the lack of suitable declarative query language support. We investigate approaches based on SQL for the problem of finding frequent patterns from a transaction table, including an algorithm that we recently proposed, called Propad (Pro-jection PAttern Discovery). Propad fundamentally differs from an Apriori-like candidate set generation-and-test approach. This approach successively projects the transaction table into frequent itemsets to avoid making multiple passes over the large original transaction table and generating a huge sets of candidates. We have made performance evaluation on DBMS (IBM DB2 UDB EEE V8) and compared the performance results with K-Way join approach proposed in [11] and SQL based FP-tree approach proposed in [13]. The experimental results show that our algorithm can get efficient performance.

AB - Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation since the prohibitive nature of the cost associated with extracting knowledge, as well as the lack of suitable declarative query language support. We investigate approaches based on SQL for the problem of finding frequent patterns from a transaction table, including an algorithm that we recently proposed, called Propad (Pro-jection PAttern Discovery). Propad fundamentally differs from an Apriori-like candidate set generation-and-test approach. This approach successively projects the transaction table into frequent itemsets to avoid making multiple passes over the large original transaction table and generating a huge sets of candidates. We have made performance evaluation on DBMS (IBM DB2 UDB EEE V8) and compared the performance results with K-Way join approach proposed in [11] and SQL based FP-tree approach proposed in [13]. The experimental results show that our algorithm can get efficient performance.

KW - Data mining

KW - Database mining

KW - Frequent pattern mining

KW - Mining algorithms in SQL

UR - http://www.scopus.com/inward/record.url?scp=33644504485&partnerID=8YFLogxK

U2 - 10.1145/1066677.1066928

DO - 10.1145/1066677.1066928

M3 - 论文

AN - SCOPUS:33644504485

SP - 1112

EP - 1117

T2 - 20th Annual ACM Symposium on Applied Computing

Y2 - 13 March 2005 through 17 March 2005

ER -

Depth-first frequent itemset mining in relational databases

摘要

会议

访问文件

其它文件与链接

指纹

引用此