EMD-based metric for document semantic similarity

Xiao Dong Wang, Lei Guo, Jun Fang, Shu Fu Dong

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Aiming at the conflicts between EMD (Earth Mover's Distance)-based measure for document semantic similarity and metric axioms, which prevent EMD from being widely applied in the information retrieval and data mining, a novel EMD-based metric for document semantic similarity named Mdss_EMD is presented. Firstly, based on the analysis of drawbacks of EMD and its existing modifications, the concepts of document width and virtual term are proposed. Subsequently, by adding virtual term to initial document vector, the approach aligns the total weights of document vectors, so that all of metric axioms are satisfied. Finally, in order to improve the applicability and processing speed of the metric, the similarity distance of virtual term is designed to be elastic and EMD algorithm is also simplified. The proposed approach extends EMD to metric space, and substantially improves EMD on indexing and accuracy. The experimental results demonstrate that Mdss_EMD outperforms the original EMD and other similar measures in general.

Original languageEnglish
Pages (from-to)2156-2161
Number of pages6
JournalDianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology
Volume30
Issue number9
DOIs
StatePublished - Sep 2008

Keywords

  • Document similarity
  • EMD (Earth Mover's Distance)
  • Information retrieval
  • Match
  • Metric
  • Semantic distance

Fingerprint

Dive into the research topics of 'EMD-based metric for document semantic similarity'. Together they form a unique fingerprint.

Cite this