PREvaIL, an integrative approach for inferring catalytic residues using
sequence, structural and network features in a machine learning framework

This web page provides the prediction model and local stand-alone executable associated with the paper titled above.

   The datasets we used in this study were originally prepared by the Kurgan group, which can be downloaded at

Prediction models

   The prediction model was developed using the the random forest (RF) algorithm (implemented using the randomForest R package).
   It can be downloaded from here:


   1. The source codes of PREvaIL model were written using R and Perl. To run PREvaIL, please make sure that your computer has installed R and Perl.

   2. Third-party software used by PREvaIL to extract the input features for the RF model:

            BioPython for calculating protein structural features;
            DSSP for protein secondary structure;
            PSI-BLAST for generating the position-specific scoring matrix (PSSM);
            NACCESS for calculating solvent accessibility.

   3. Instructions for performing prediction of catalytic residues using PREvaIL:

            1) After unzipping the file, you will find there exist six subfolders in the PREvaIL folder. The `model` folder contains the RF model of PREvaIL,
   whereas the other five subfolders contain the extracted feature results. Please refer to the readme file in the PREvaIL for more detail.

            2) Execute the following command to predict the catalytic residues of the example:

                      perl 1CRK A

            where `1CRK` denotes the PDB ID, `A` is the PDB chain ID.

            After performing the prediction, the prediction results will be saved in the `prediction_results_1CRK_A.txt` file.

            The detailed annotations for the `prediction_results_1CRK_A.txt` file are provided as follows:
            For each line:
            1. The first column denotes the PDB ID;
            2. The second column denotes the PDB chain;
            3. The third column denotes residue position;
            4. The fourth column denotes the residue name;
            5. The fifth line denotes the predicted outcome of catalytic residues. "P" denotes a positive prediction, i.e., the residue is predicted as catalytic,
while "N" denotes a negative prediction, i.e. the residue is predicted as noncatalytic.


    If you find PREVaIL is useful in your research, please cite: "PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework". Journal of Theorectical Biology. 2018 Apr 14;443:125-137.


Copyright © 2012-2018. Monash Bioinformatics Platform, School of Biomedical Sciences, Faculty of Medicine, Faculty of Information Technology, Monash University, Australia