CSB2010 Lex-SVM: Exploring the Potential of Exon Expression Profiling for Disease Classification

Lex-SVM: Exploring the Potential of Exon Expression Profiling for Disease Classification

Xiongying Yuan, Yi Zhao, Changning Liu, Dongbo Bu*

Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China. dbu@ict.ac.cn

Proc LSS Comput Syst Bioinform Conf. August, 2010. Vol. 9, p. 180-191. Full-Text PDF

*To whom correspondence should be addressed.


Exon expression profiling technologies, including exon arrays and RNA-Seq, measure the abundance of every exon in a gene. Compared with gene expression profiling technologies like 3' array, exon expression profiling technologies could detect alterations in both transcription and alternative splicing, therefore are expected to be more sensitive in diagnosis. However, exon expression profiling also brings higher dimension, more redundancy, and significant correlation among features. Ignoring the correlation structure among exons of a gene, popular classification method like L1-SVM selects exons individually from each gene and thus is vulnerable to noise. To overcome this limitation, we present in this paper a new variant of SVM named Lex-SVM to incorporate correlation structure among exons and known splicing patterns to promote classification performance. Specifically, we construct a new norm, ex-norm, including our prior knowledge on exon correlation structure to regularize the coefficients of a linear SVM. Lex-SVM can be solved efficiently using standard linear programming techniques. The advantage of Lex-SVM is that it can select features group-wisely, force features in a subgroup to take equal weights and exclude the features that contradict the majority in the subgroup. Experimental results suggest that on exon expression profile Lex-SVM is more accurate than existing methods. Lex-SVM also generates a more compact model and selects genes more consistently in cross-validation. Unlike L1-SVM selecting only one exon in a gene, Lex-SVM assigns equal weights to as many exons in a gene as possible, lending itself easier for further interpretation.


[ CSB2010 Conference Home Page ] .... [ CSB2010 Online Proceedings ] .... [ Life Sciences Society Home Page ]