Project Team Members
- Fadi Towfic
- Rui Yang
Project Description
Protein-RNA interactions play an important role in translation, RNA splicing, replication of many viruses as well as many other processes in the cell. The prediction of protein-RNA interfaces can aid in the design of drug-inhibitors for viruses, down-regulation of unwanted genes as well as contributing to our basic understanding of the mechanisms involved in protein-RNA recognition [5, 6, 4, 3]. Multiple families of RNA-binding proteins have already been identified using sequence- based analyses of proteins [2].
The goal of this project is to evaluate the performance of structure-based predictions (using PocketPicker and machine learning methods) of protein- RNA binding sites compared to the more traditional sequence-based algorithms.
Project Tasks
Completed Tasks- Obtain the dataset used for evaluating RNAbindr (RB147 dataset)
- Classify each protein chain in the dataset according to the type of RNA bound by the protein
- Create a script to run PocketPicker on each chain in the RB147 dataset and get the output
- Filter the output of PocketPicker such that only the largest pockets detected by Pocket Picker remain as part of the dataset (to reduce the amount of false positives)
- Convert the output into ARFF format
- Filter the results such that only the Amino Acid residues on the protein surface are fed to the classifier. This step is justified since only surface residues can be part of the protein-RNA interface. This step will further reduce the number of false positives.
- Construct a Naive-Bayes classifier using WEKA based on the type of RNA bound by the protein chain and the buriedness-index calculated by PocketPicker for each of the residues in the protein chain.
- Evaluate the accuracy, running time, specificity, sensitivity and area under the Reciever Operating Characteristic curve of predictions of the Naive-Bayes classifier compared to RNAbindr.
Project Report
The project report for CS 573 is available hereSource Code
The full source-code and datasets for the project can be downloaded from herePresentation
The presentation for CS 573 is available hereClassifier Results
| Running time | Accuracy | Specificity+ | Sensitivity+ | AUC |
| 0.35 seconds | 86% | 98.8% | 2% | 0.661 |
Project Proposal
The project proposal is available in PDF format here
References
[1] H.M. Berman, J. Westbrook, Z. Feng, and et al. The protein data bank. Nucleic Acids Res, 28:235–242, 2000.
[2] Yu Chen and Gabriele Varani. Protein families and rna recognition. FEBS Journal, 272(9):2088–2097, 2005.
[3] E. O. Freed and A. J. Mouland. The cell biology of hiv-1 and other retroviruses. Retrovirology, 3(77):10, 2006.
[4] M. S. Jurica and M. J. Moore. Pre-mrna splicing: awash in a sea of proteins. Mol. Cell, 12:5–14, 2003.
[5] M. J. Moore. From birth to death: the complex lives of eukaryotic mrnas. Science, 309:15141518, 2005.
[6] H. F. Noller. Rna structure: reading the ribosome. Science, 309:15081514, 2005.
[7] Michael Terribilini, Jeffry D. Sander, Jae-Hyung Lee, Peter Zaback, Robert L. Jernigan, Vasant Honavar, and Drena Dobbs. Rnabindr: a server for analyzing and predicting rna-binding sites in proteins. Nucleic Acids Research, 35(2):1–7, 2007.
[8] Fadi Towfic, David C. Gemperline, Cornelia Caragea, Feihong Wu, Drena Dobbs, and Vasant Honavar. Structural characterization of rna-binding sites of proteins: Preliminary results. In Proceedings of Computational Structural Bioinformatics Workshop, 2007.
[9] Martin Weisel, Ewgenij Proschak, and Gisbert Schneider. Pocketpicker: analysis of ligand binding-sites with shape descriptors. Chemistry Central Journal, 1(7):17, 2007.
[10] Ian H. Witten and Eibe Frank. Data Mining: Practical Machine Learning Tools and Tech- niques, Second Edition (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann, June 2005.