CSB2008 A max-flow based approach to the identification of protein complexes using protein interaction and microarray data

A max-flow based approach to the identification of protein complexes using protein interaction and microarray data

Jianxing Feng*, Rui Jiang, Tao Jiang

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China. fengjx06@mails.tsinghua.edu.cn

Proc LSS Comput Syst Bioinform Conf. August, 2008. Vol. 7, p. 51-62. Full-Text PDF

*To whom correspondence should be addressed.

The emergence of high-throughput technologies leads to abundant protein-protein interaction (PPI) data and microarray gene expression profiles, and provides a great opportunity for the identification of novel protein complexes using computational methods. Although it has been demonstrated in the literature that methods using protein-protein interaction data alone can successfully predict a large number of protein complexes, the incorporation of gene expression profiles could help refine the putative complexes and hence improve the accuracy of the computational methods. By combining protein-protein interaction data and microarray gene expression profiles, we propose a novel Graph Fragmentation Algorithm (GFA) for protein complex identification. Adapted from a classical max-flow algorithm for finding the (weighted) densest subgraphs, GFA first finds large (weighted) dense subgraphs in a protein-protein interaction network and then breaks each such subgraph into fragments iteratively by weighting its nodes appropriately in terms of their corresponding log fold changes in the microarray data, until the fragment subgraphs are sufficiently small. Our extensive tests on three widely used protein-protein interaction datasets and comparisons with the latest methods for protein complex identification demonstrate the superior performance of our method in terms of accuracy, efficiency, and capability in predicting novel protein complexes. Given the high specificity (or precision) that our method has achieved, we conjecture that our prediction results imply more than 200 novel protein complexes.

[ CSB2008 Conference Home Page ] .... [ CSB2008 Online Proceedings ] .... [ Life Sciences Society Home Page ]