Previous Projects

    Knowledge-based entropies improve the identification of native protein structures. [Proc Natl Acad Sci USA (2017) 114 (11): 2928-2933].


    Comparison of the entropy matrix and potential matrix for a reduced four-letter alphabet.
    (A) Heatmap of the entropy matrix, expressed as fraction of contact changes between pairs of amino acids using a reduced alphabet, where A is acidic, B is basic amino acids, H is hydrophobic, and P is polar. Changes in interactions involving polar/charged residues are higher (red = high) than those involving hydrophobic residues (blue = low). (B) Heatmap of the MJ2h potential energy matrix (44) using a reduced alphabet. Hydrophobic interactions yield lowest energy (red = low) with charged interactions being relatively higher in energies (blue = high). The entropy and energy matrices are complementary in nature.

    Abstract: Evaluating protein structures requires reliable free energies with good estimates of both potential energies and entropies. Although there are many demonstrated successes from using knowledge-based potential energies, computing entropies of proteins has lagged far behind. Here we take an entirely different approach and evaluate knowledge-based conformational entropies of proteins based on the observed frequencies of contact changes between amino acids in a set of 167 diverse proteins, each of which has two alternative structures. The results show that charged and polar interactions break more often than hydrophobic pairs. This pattern correlates strongly with the average solvent exposure of amino acids in globular proteins, as well as with polarity indices and the sizes of the amino acids. Knowledge-based entropies are derived by using the inverse Boltzmann relationship, in a manner analogous to the way that knowledge-based potentials have been extracted. Including these new knowledge-based entropies almost doubles the performance of knowledge-based potentials in selecting the native protein structures from decoy sets. Beyond the overall energy-entropy compensation, a similar compensation is seen for individual pairs of interacting amino acids. The entropies in this report have immediate applications for 3D structure prediction, protein model assessment, and protein engineering and design.

    Coarse-grained free energy landscapes for protein conformational changes [J. Chem. Phys. (2015) 143(24): 243153].


    Figure: Human serum albumin
    (a) The % of variance captured by PCA (b) free energy landscape (c) Visualization of PC1 (d) Visualization of PC2

    Abstract: In this work, we use principal component analysis of experimental structures of 50 diverse proteins to extract the most important directions of their motions, sample structures along these directions, and estimate their free energy landscapes by combining knowledge-based potentials and entropy computed from elastic network models. When these resulting motions are visualized upon their coarse-grained free energy landscapes, the basis for conformational pathways becomes readily apparent. Using three well-studied proteins, T4 lysozyme, serum albumin, and sarco-endoplasmic reticular Ca2+ adenosine triphosphatase (SERCA), as examples, we show that such free energy landscapes of conformational changes provide meaningful insights into the functional dynamics and suggest transition pathways between different conformational states. As a further example, we also show that Monte Carlo simulations on the coarse-grained landscape of HIV-1 protease can directly yield pathways for force-driven conformational changes.

    The Use of Experimental Structures to Model Protein Dynamics [Meth. Mol. Biol. (2014) 1215: 213-236.]


    Figure: HIV-1 protease.
    (a) Cartoon representation of HIV-1 protease structure with each monomer in red and blue. (b) Visualization of the first three PCs of HIV-1 protease visualized on the structures, derived from a set of 329 structures.

    Abstract: The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high-for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, we discuss two methods: Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein.

    An analysis of conformational changes upon RNA-protein binding. [Proc. 5th ACM Conf. on Bioinf., Comp. Biol. and Health Inf. (2014) 592-593.]


    Figure: ATP-dependent RNA helicase DDX48.
    (a) Error-scaled internal distance change matrices between RNA-bound and unbound forms of DDX48. Red (blue) indicates increase (decrease) in internal distance. (b) Superimposition of bound (violet) and unbound (green) forms of DDX48 showing flexible, invariant and interface residues.

    Abstract: RNA-binding proteins (RBPs) have myriad functions in transcription, translation, and post-transcriptional gene regulation, with central roles in normal development as well as in both genetic and infectious diseases. When a protein binds RNA, a conformational change often occurs. For RNA-protein complexes that have been characterized, conformational changes have been observed in the protein, the RNA, or both. These conformational changes have not been sufficiently characterized, however, in part due to the small number of structures of bound and unbound complexes of RNA-binding proteins available until recently. Here we systematically analyze a new dataset of 90 pairs of bound and unbound proteins to evaluate the conformational changes that occur upon RNA binding. Most of the conformational changes were observed in noninterfacial regions of the RNA-binding proteins. Detailed analyses of the modes of RNA binding and any associated conformational changes in proteins are critical for fully understanding the mechanisms of RNA-protein recognition, for developing better RNA-protein docking methods and methods for predicting interfacial residues, and for RNA-based drug design.

    DOCKSCORE: A scoring scheme for scoring docked poses of protein-protein complexes. [PLoS ONE (2014) 9(2): e80255].


    Abstract: Molecular interactions are studied computationally using the approach named as Molecular Docking, which employs search algorithms to predict the possible conformations for interacting partners and then calculates interaction energies. However, docking proposes number of solutions as different docked poses and hence offers a serious challenge to identify the native (or near native) structures from the pool of these docked poses. We have proposed a rigorous scoring scheme called 'DockScore' which can be used to rank the docked poses and identify the best docked pose out of many as proposed by docking algorithm employed. The scoring identifies the optimal interactions between the two protein partners utilising various features of the putative interface like area, short contacts, conservation, spatial clustering and the presence of positively charged and hydrophobic residues. DockScore was first trained on a set of 30 protein-protein complexes to determine the weights for different parameters. Subsequently, we tested the scoring scheme on 30 different protein-protein complexes and native or near-native structure were assigned the top rank from a pool of docked poses in 26 of the tested cases. We tested the ability of DockScore to discriminate likely dimer interactions that differ substantially within a homologous family and also demonstrate that DOCKSCORE can distinguish correct pose for all 10 recent CAPRI targets. Currently the Sowdhamini Lab has made the algorithm available to the public as a web-server: DOCKSCORE Server

    DOR: a Database of Olfactory Receptors from selected eukaryotic genomes. [Bioinf. & Biol. Insights (2014) 8: 147-158.]


    Abstract: The database of olfactory receptors (DOR) is a repository that provides sequence and structural information on ORs of selected organisms (such as Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus, and Homo sapiens). Users can download OR sequences, study predicted membrane topology, and obtain cross-genome sequence alignments and phylogeny, including three-dimensional (3D) structural models of 100 selected ORs and their predicted dimer interfaces. The database can be accessed from Such a database should be helpful in designing experiments on point mutations to probe into the possible dimerization modes of ORs and to even understand the evolutionary changes between different receptors.

    TM-MOTIF: An alignment viewer for transmembrane regions and motifs in G-Protein Coupled Receptors (GPCRs). [Bioinformation (2011) 7(5): 214-221.]


    Abstract: Multiple sequence alignments become biologically meaningful only if conserved and functionally important residues and secondary structural elements preserved can be identified at equivalent positions. This is particularly important for transmembrane proteins like G-protein coupled receptors (GPCRs) with seven transmembrane helices. TM-MOTIF is a software package and an effective alignment viewer to identify and display conserved motifs and amino acid substitutions (AAS) at each position of the aligned set of homologous sequences of GPCRs. The key feature of the package is to display the predicted membrane topology for seven transmembrane helices in seven colours (VIBGYOR colouring scheme) and to map the identified motifs on its respective helices /loop regions. It is an interactive package which provides options to the user to submit query or pre-aligned set of GPCR sequences to align with a reference sequence, like rhodopsin, whose structure has been solved experimentally. It also provides the possibility to identify the nearest homologue from the available inbuilt GPCR or Olfactory Receptor cluster dataset whose association is already known for its receptor type. The tool is available for download from the DOR homepage