Modeling Species-Genes Data for Efficient Phylogenetic Inference

Wenyuan Li, Ying Liu*

Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083, USA. ying.liu@utdallas.edu

Proc LSS Comput Syst Bioinform Conf. August, 2007. Vol. 6, p. 429-440. Full-Text PDF

*To whom correspondence should be addressed.


In recent years, biclique methods have been proposed to construct phylogenetic trees. One of the key steps of these methods is to find complete sub-matrices (without missing entries) from a species-genes data matrix. To enumerate all complete sub-matrices, 17 described an exact algorithm, whose running time is exponential. Furthermore, it generates a large number of complete sub-matrices, many of which may not be used for tree reconstruction. Further investigating and understanding the characteristics of species-genes data may be helpful for discovering complete sub-matrices. Therefore, in this paper, we focus on quantitatively studying and understanding the characteristics of species-genes data, which can be used to guide new algorithm design for efficient phylogenetic inference. In this paper, a mathematical model is constructed to simulate the real species-genes data. The results indicate that sequence-availability probability distributions follow power law, which leads to the skewness and sparseness of the real species-genes data. Moreover, a special structure, called "ladder structure", is discovered in the real species-genes data. This ladder structure is used to identify complete sub-matrices, and more importantly, to reveal overlapping relationships among complete sub-matrices. To discover the distinct ladder structure in real species-genes data, we propose an efficient evolutionary dynamical system, called "generalized replicator dynamics". Two species-genes data sets from green plants are used to illustrate the effectiveness of our model. Empirical study has shown that our model is effective and efficient in understanding species-genes data for phylogenetic inference.


[CSB2007 Conference Home Page]....[CSB2007 Online Proceedings]....[Life Sciences Society Home Page]