Effective identification of essential proteins based on priori knowledge, network topology and gene expressions.
Identification of essential proteins is very important for understanding the minimal requirements for cellular life and also necessary for a series of practical applications, such as drug design. With the advances in high throughput technologies, a large number of protein-protein interactions are available, which makes it possible to detect proteins' essentialities from the network level. Considering that most species already have a number of known essential proteins, we proposed a new priori knowledge-based scheme to discover new essential proteins from protein interaction networks. Based on the new scheme, two essential protein discovery algorithms, CPPK and CEPPK, were developed. CPPK predicts new essential proteins based on network topology and CEPPK detects new essential proteins by integrating network topology and gene expressions. The performances of CPPK and CEPPK were validated based on the protein interaction network of Saccharomyces cerevisiae. The experimental results showed that the priori knowledge of known essential proteins was effective for improving the predicted precision. The predicted precisions of CPPK and CEPPK clearly exceeded that of the other 10 previously proposed essential protein discovery methods: Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), Bottle Neck (BN), Density of Maximum Neighborhood Component (DMNC), Local Average Connectivity-based method (LAC), and Network Centrality (NC). Especially, CPPK achieved 40% improvement in precision over BC, CC, SC, EC, and BN, and CEPPK performed even better. CEPPK was also compared to four other methods (EPC, ORFL, PeC, and CoEWC) which were not node centralities and CEPPK was showed to achieve the best results.