TY - GEN
T1 - Probabilistic labeled semi-supervised SVM
AU - Qian, Mingjie
AU - Nie, Feiping
AU - Zhang, Changshui
PY - 2009
Y1 - 2009
N2 - Semi-supervised learning has been paid increasing attention and is widely used in many fields such as data mining, information retrieval and knowledge management as it can utilize both labeled and unlabeled data. Laplacian SVM (LapSVM) is a very classical method whose effectiveness has been validated by large number of experiments. However, LapSVM is sensitive to labeled data and it exposes to cubic computation complexity which limit its application in large scale scenario. In this paper, we propose a multi-class method called Probabilistic labeled Semi-supervised SVM (PLSVM) in which the optimal decision surface is taught by probabilistic labels of all the training data including the labeled and unlabeled data. Then we propose a kernel version dual coordinate descent method to efficiently solve the dual problems of our Probabilistic labeled Semi-supervised SVM and decrease its requirement of memory. Synthetic data and several benchmark real world datasets show that PLSVM is less sensitive to labeling and has better performance over traditional methods like SVM, LapSVM (LapSVM) and Transductive SVM (TSVM).
AB - Semi-supervised learning has been paid increasing attention and is widely used in many fields such as data mining, information retrieval and knowledge management as it can utilize both labeled and unlabeled data. Laplacian SVM (LapSVM) is a very classical method whose effectiveness has been validated by large number of experiments. However, LapSVM is sensitive to labeled data and it exposes to cubic computation complexity which limit its application in large scale scenario. In this paper, we propose a multi-class method called Probabilistic labeled Semi-supervised SVM (PLSVM) in which the optimal decision surface is taught by probabilistic labels of all the training data including the labeled and unlabeled data. Then we propose a kernel version dual coordinate descent method to efficiently solve the dual problems of our Probabilistic labeled Semi-supervised SVM and decrease its requirement of memory. Synthetic data and several benchmark real world datasets show that PLSVM is less sensitive to labeling and has better performance over traditional methods like SVM, LapSVM (LapSVM) and Transductive SVM (TSVM).
KW - Dual coordinate descent algorithm
KW - Multi-class classification
KW - Probabilistic label
KW - Semi-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=77951153511&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2009.14
DO - 10.1109/ICDMW.2009.14
M3 - 会议稿件
AN - SCOPUS:77951153511
SN - 9780769539027
T3 - ICDM Workshops 2009 - IEEE International Conference on Data Mining
SP - 394
EP - 399
BT - ICDM Workshops 2009 - IEEE International Conference on Data Mining
T2 - 2009 IEEE International Conference on Data Mining Workshops, ICDMW 2009
Y2 - 6 December 2009 through 6 December 2009
ER -