CSB2010 A Genome Compression Algorithm Supporting Manipulation

A Genome Compression Algorithm Supporting Manipulation

Lenwood S. Heath, Ao-ping Hou, Huadong Xia, Liqing Zhang*

Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA. lqzhang@vt.edu

Proc LSS Comput Syst Bioinform Conf. August, 2010. Vol. 9, p. 38-49. Full-Text PDF

*To whom correspondence should be addressed.


With the advent of the thousand dollar genome, one can anticipate the need to store, communicate, and manipulate many human genomes. Data compression methods have been developed to store and communicate genomes efficiently. Unfortunately, these methods do not support efficient manipulation (e.g., subsequence retrieval) of the compressed genome. We develop a data compression scheme that achieves both efficient storage and efficient sequence manipulation. We demonstrate the practicality of the method on two databases of genomes, one for the human mitochondrion and one for the H3N2 virus. In both cases, we achieve high compression ratios and O(log n) subsequence retrieval times.


[ CSB2010 Conference Home Page ] .... [ CSB2010 Online Proceedings ] .... [ Life Sciences Society Home Page ]