CSB2010 Approximating Conserved Regions of Protein Structures

Approximating Conserved Regions of Protein Structures

Yi Wei, Mingfu Shao, Jishuang Yang, Chao Wang, Shuai Cheng Li, Dongbo Bu*

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. dbu@ict.ac.cn

Proc LSS Comput Syst Bioinform Conf. August, 2010. Vol. 9, p. 204-212. Full-Text PDF

*To whom correspondence should be addressed.

Motivation: We present in this paper an efficient algorithm to identify conserved regions from multiple protein structures. The Critical Assessment of Techniques for Protein Structure Prediction experience suggests that for a given target sequence, threading methods usually generate several structures (called decoys or models) with conserved regions similar to the native structure, and identification of conserved regions can help improve structure prediction. Thus, it is important to efficiently detect conserved regions without requirement of alignment information. Results: Based on our previous work on approximating the bottleneck distance, we present in this paper an O(m ² n ² log n) time algorithm to identify the maximum conserved regions from m decoys, where n is the number of amino acids per decoy. To measure the quality of the identified conserved regions, we performed two experiments: the first one directly compare the identified conserved regions against native structures; the other experiment serves as an indirect measure of conserved regions; that is, it aims to investigate whether these conserved regions help improve protein structure prediction. Experimental results demonstrate that for 16 out of 25 TBM (template-based modeling) targets in CASP7, our method can identify over 70% native-like regions and filter out over 90% of non-native-like regions, simultaneously. The algorithm also performs well for 10 out of 12 FM (free modeling) targets in CASP7, where we obtain more than half of native-like regions and filter out over 80% non-native-like regions. Furthermore, we applied the identified conserved regions to improve fragment-assembly-based approaches to protein structure prediction. We observed that for 10 out of 12 FM targets in CASP7, our method shows higher accuracies than ROSETTA. In particular, by identifying conserved regions, TM-score are improved significantly from meaningless (< 0.4) to meaningful (> 0.4) on four targets. This experiment provides with an indirect evidence of the performance of our algorithm to identify conserved regions.

[ CSB2010 Conference Home Page ] .... [ CSB2010 Online Proceedings ] .... [ Life Sciences Society Home Page ]