CSB2008 Designing secondary structure profiles for fast ncRNA identification

Designing secondary structure profiles for fast ncRNA identification

Yanni Sun*, Jeremy Buhler

Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA. yanni@cse.wustl.edu

Proc LSS Comput Syst Bioinform Conf. August, 2008. Vol. 7, p. 145-156. Full-Text PDF

*To whom correspondence should be addressed.


Detecting non-coding RNAs (ncRNAs) in genomic DNA is an important part of annotation. However, the most widely used tool for modeling ncRNA families, the covariance model (CM), incurs a high computational cost when used for search. This cost can be reduced by using a filter to exclude sequence that is unlikely to contain the ncRNA of interest, applying the CM only where it is likely to match strongly. Despite recent advances, designing an efficient filter that can detect nearly all ncRNA instances while excluding most irrelevant sequences remains challenging. This work proposes a systematic procedure to convert a CM for an ncRNA family to a secondary structure profile (SSP), which augments a conservation profile with secondary structure information but can still be efficiently scanned against long sequences. We use dynamic programming to estimate an SSP's sensitivity and FP rate, yielding an efficient, fully automated filter design algorithm. Our experiments demonstrate that designed SSP filters can achieve significant speedup over unfiltered CM search while maintaining high sensitivity for various ncRNA families, including those with and without strong sequence conservation. For highly structured ncRNA families, including secondary structure conservation yields better performance than using primary sequence conservation alone.


[ CSB2008 Conference Home Page ] .... [ CSB2008 Online Proceedings ] .... [ Life Sciences Society Home Page ]