From Protein Sequence to Protein Function via Multi-Label Linear Discriminant Analysis
Hua Wang, Heng Huang, Chris Ding
TCBB - 2017
Sequence describes the primary structure of a protein, which contains important structural, characteristic, and genetic information and thereby motivates many sequence-based computational approaches to infer protein function. Among them, feature-base approaches attract increased attention because they make prediction from a set of transformed and more biologically meaningful sequence features. However, original features extracted from sequence are usually of high dimensionality and often compromised by irrelevant patterns, therefore dimension reduction is necessary prior to classification for efficient and effective protein function prediction. A protein usually performs several different functions within an organism, which makes protein function prediction a multi-label classification problem. In machine learning, multi-label classification deals with problems where each object may belong to more than one class. As a well-known feature reduction method, linear discriminant analysis (LDA) has been successfully applied in many practical applications. It, however, by nature is designed for single-label classification, in which each object can belong to exactly one class. Because directly applying LDA in multi-label classification causes ambiguity when computing scatters matrices, we apply a new Multi-label Linear Discriminant Analysis (MLDA) approach to address this problem and meanwhile preserve powerful classification capability inherited from classical LDA. We further extend MLDA by ‘1-normalization to overcome the problem of over-counting data points with multiple labels. In addition, we incorporate biological network data using Laplacian embedding into our method, and assess the reliability of predicted putative functions. Extensive empirical evaluations demonstrate promising results of our methods.
Links
- View publications from Hua Wang
- View publications researching Multiple-Label Learning
- View publications applied to Bioinformatics
Cite this paper
MLA
Wang, Hua, et al. "From protein sequence to protein function via multi-label linear discriminant analysis." IEEE/ACM transactions on computational biology and bioinformatics 14.3 (2016): 503-513.
BibTeX
@article{wang2016protein, title={From protein sequence to protein function via multi-label linear discriminant analysis}, author={Wang, Hua and Yan, Lin and Huang, Heng and Ding, Chris}, journal={IEEE/ACM transactions on computational biology and bioinformatics}, volume={14}, number={3}, pages={503--513}, year={2016}, publisher={IEEE} }