Abstract
With the development of high-throughput DNA sequencing technology, DNA sequencing data grows rapidly. The use of compression techniques provides an important candidate solution for the storage and transmission challenges of high-throughput DNA sequencing data. In this paper, the traditional DNA sequences compression methods, including substitutionary and statistical methods, and the reference-genome-based compression method for high-throughput DNA sequencing data are surveyed. The state-of-the-art algorithms of re-sequencing data compression, de novo sequencing data compression, quality score compression, and compressed data indexing are introduced and compared. The challenges and future prospects of high-throughput DNA sequencing data compression are also discussed.
Original language | English |
---|---|
Pages (from-to) | 409-415 |
Number of pages | 7 |
Journal | Shenzhen Daxue Xuebao (Ligong Ban)/Journal of Shenzhen University Science and Engineering |
Volume | 30 |
Issue number | 4 |
DOIs | |
State | Published - Jul 2013 |
Externally published | Yes |
Keywords
- Computer application
- Data compression
- De novo sequencing
- DNA sequencing
- High-throughput sequencing
- Next generation sequencing
- Resequencing