Chapter 10

Prediction of Human Protein Subcellular Locations with Feature Selection and Analysis

Bi-Qing Li, Tao Huang, Lei Chen, Kai-Yan Feng and Yu-Dong Cai

Abstract

In this paper, we propose a strategy to predict subcellular locations of human proteins using multi-step feature selection. Each protein is firstly coded by features derived from KEGG and GO enrichment scores. After an initial feature reduction, 9958 features remain and they are sorted by the Minimum Redundancy Maximum Relevance (mRMR) method. The sorted features are then filtered by an incremental feature selection (IFS) procedure and a compact set of features are obtained. Random forest (RF) is used as the prediction model and achieved an overall prediction accuracy of 67.72%, evaluated by ten-fold cross-validation. The corresponding KEGG pathways and GO terms of the resultant features are analyzed in-depth, and are deemed as the most important terms relating to human protein subcellular location.

Total Pages: 206-225 (20)

Purchase Chapter  Book Details

RELATED BOOKS

.Frontiers in Molecular Pharming.
.MICROBIAL PROTEOMICS: DEVELOPMENT IN TECHNOLOGIES AND APPLICATIONS.
.Amelogenins: Multifaceted Proteins for Dental and Bone Formation and Repair.