TY - GEN
T1 - Ontology-based automatic classification and ranking for web documents
AU - Fang, Jun
AU - Guo, Lei
AU - Wang, Xiao Dong
AU - Yang, Ning
PY - 2007
Y1 - 2007
N2 - The process of web document classification involves calculating similarities between documents and categories by using the information extracted from them. In recent years, ontology-based web documents classification method is introduced to solve the problem of classifier training and not considering semantic relations between words in traditional Machine Learning algorithms. However, previous works on ontology-based web documents classification miss some important issues of automatic ontology construction and ranking of classified documents. In order to solve these problems, this paper proposes an ontology-based web documents classification and ranking method. Firstly, weighted terms set are extracted from web documents, and ontology is build up by using an effective ontology construction method which clarifies and augments an existent ontology; then similarity score between documents and ontology is computed based on WordNet by using Earth Mover's Distance (EMD) method; finally, web documents are assigned to categories according to the similarity score, and a simple ranking method is used to sort the documents in the same categories. The experiment result shows our classification algorithm achieves better precision and recall compare with adaptive KNN method, and is competitive with SVM method, the ranking method also has good performance.
AB - The process of web document classification involves calculating similarities between documents and categories by using the information extracted from them. In recent years, ontology-based web documents classification method is introduced to solve the problem of classifier training and not considering semantic relations between words in traditional Machine Learning algorithms. However, previous works on ontology-based web documents classification miss some important issues of automatic ontology construction and ranking of classified documents. In order to solve these problems, this paper proposes an ontology-based web documents classification and ranking method. Firstly, weighted terms set are extracted from web documents, and ontology is build up by using an effective ontology construction method which clarifies and augments an existent ontology; then similarity score between documents and ontology is computed based on WordNet by using Earth Mover's Distance (EMD) method; finally, web documents are assigned to categories according to the similarity score, and a simple ranking method is used to sort the documents in the same categories. The experiment result shows our classification algorithm achieves better precision and recall compare with adaptive KNN method, and is competitive with SVM method, the ranking method also has good performance.
UR - http://www.scopus.com/inward/record.url?scp=44049085245&partnerID=8YFLogxK
U2 - 10.1109/FSKD.2007.432
DO - 10.1109/FSKD.2007.432
M3 - 会议稿件
AN - SCOPUS:44049085245
SN - 0769528740
SN - 9780769528748
T3 - Proceedings - Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007
SP - 627
EP - 631
BT - Proceedings - Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007
T2 - 4th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007
Y2 - 24 August 2007 through 27 August 2007
ER -