TY - JOUR
T1 - MSClust
T2 - A Multi-Seeds based Clustering algorithm for microbiome profiling using 16S rRNA sequence
AU - Chen, Wei
AU - Cheng, Yongmei
AU - Zhang, Clarence
AU - Zhang, Shaowu
AU - Zhao, Hongyu
PY - 2013/9
Y1 - 2013/9
N2 - Recent developments of next generation sequencing technologies have led to rapid accumulation of 16S rRNA sequences for microbiome profiling. One key step in data processing is to cluster short sequences into operational taxonomic units (OTUs). Although many methods have been proposed for OTU inferences, a major challenge is the balance between inference accuracy and computational efficiency, where inference accuracy is often sacrificed to accommodate the need to analyze large numbers of sequences. Inspired by the hierarchical clustering method and a modified greedy network clustering algorithm, we propose a novel multi-seeds based heuristic clustering method, named MSClust, for OTU inference. MSClust first adaptively selects multi-seeds instead of one seed for each candidate cluster, and the reads are then processed using a greedy clustering strategy. Through many numerical examples, we demonstrate that MSClust enjoys less memory usage, and better biological accuracy compared to existing heuristic clustering methods while preserving efficiency and scalability.
AB - Recent developments of next generation sequencing technologies have led to rapid accumulation of 16S rRNA sequences for microbiome profiling. One key step in data processing is to cluster short sequences into operational taxonomic units (OTUs). Although many methods have been proposed for OTU inferences, a major challenge is the balance between inference accuracy and computational efficiency, where inference accuracy is often sacrificed to accommodate the need to analyze large numbers of sequences. Inspired by the hierarchical clustering method and a modified greedy network clustering algorithm, we propose a novel multi-seeds based heuristic clustering method, named MSClust, for OTU inference. MSClust first adaptively selects multi-seeds instead of one seed for each candidate cluster, and the reads are then processed using a greedy clustering strategy. Through many numerical examples, we demonstrate that MSClust enjoys less memory usage, and better biological accuracy compared to existing heuristic clustering methods while preserving efficiency and scalability.
KW - 16S rRNA reads
KW - Clustering algorithms
KW - Next-generation sequencing
KW - Operational taxonomic unit (OTU)
KW - Seeds-selection
UR - http://www.scopus.com/inward/record.url?scp=84882775202&partnerID=8YFLogxK
U2 - 10.1016/j.mimet.2013.07.004
DO - 10.1016/j.mimet.2013.07.004
M3 - 文章
C2 - 23899776
AN - SCOPUS:84882775202
SN - 0167-7012
VL - 94
SP - 347
EP - 355
JO - Journal of Microbiological Methods
JF - Journal of Microbiological Methods
IS - 3
ER -