Prediction algorithms for natural and synthetic antimicrobial peptides are incorporated in the database. These are based on Support Vector Machines (SVM), Random Forests (RF) and Artificial Neural Network (ANN). User can select the dataset (natural or synthetic) and algorithm required for prediction.
Peptide sequence/s in FASTA format can be pasted or uploaded for prediction. The user can click on “Example” for pre-loaded peptide sequence. The results for RF, ANN and SVM are explained below:
AMP: The sequence is predicted to be antimicrobial.
NAMP: The sequence is predicted to be not antimicrobial.
RF, SVM and ANN give a probability score (0 to 1) for the prediction. Higher the probability, greater is the possibility of the peptide being antimicrobial.
The prediction algorithm provides three options to the users:

  • Users can scan the entire protein for predicting its antimicrobial activity.
  • Users can scan sequences for antimicrobial regions within proteins.
  • Users can rationally design antimicrobial peptides by generating all possible single residue mutations and select the sequences having the highest AMP probability.


Simple search in CAMPR4 allows users to search based on keywords like "brevinin" or string searches like "human defensin". Users can restrict the search to a particular field descriptor. Searches using Boolean operators are possible using the ‘Advanced search’ option. All searches are case insensitive. A complete list of the field descriptors and their description is given below:




Protein sequences represented as single letter amino acids.


The length of antimicrobial peptides represented in a numerical manner. 
E.g. 29


Scientific name of the source organism of the antimicrobial peptide. 
E.g. Phyllomedusaoreades


E.g. antibacterial, antifungal, antiviral, antimicrobial, anticancerous


E.g. E.coli


E.g. 12379643

GenInfo Identifier

GenInfo Identifier of NCBI. E.g. 41016983


E.g. Dermaseptin-01


E.g. P83637


E.g. 2JQ0


E.g. Dermaseptin


E.g. MIC=30


Secondary structure



Helical residues more than 80%


Beta residues more than 80%


Turn + bend residues more than 80%

Majorly Helical

( Helical residues > 60% and beta residues < 5% ) or ( helical residues > 50% and beta residues < 10% )

Majorly Strand

Beta residues > 30% and helical residue < 5%

Majorly Coil

Turn + bend residues > 50% and helical residues < 50% and beta residues < 30%


Helical residues < 50% and beta residues < 30% and turn+bend residues < 50%


Users can browse through the different AMP families. The page contains a table providing information about the AMP family and signatures captured using patterns or HMMs.

H: symbol H represents HMMs.

P: symbol P represents Patterns.

Description of Family:  This information has been obtained from Pfam, InterPro and/or published literature.

Signature IDs: The format for Signature ID is the family name, followed by H or P for HMM or Pattern respectively. The integer suffixed to H or P denotes the length of the sequences used to create the family signature. If no integer is suffixed, it indicates that the signature was created using all the sequences of the family. The integer following the underscore denotes the number of sequences used for the creation of signatures.

For example:
AureinH_21 is a HMM for Aurein family created using 21 sequences.
AureinP16_9 is a pattern for Aurein family derived using 9 input sequences that are 16 residues long.


BLAST in CAMPR4 provides option for selection of databases of interest such as the entire database, sequence, structure, patent, experimentally validated, predicted and predicted based on signature datasets.

  • Altschul, S. F. et al. (1997), Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res. 25:3389-3402.

VAST is an algorithm used for the identification of similar protein 3-dimensional structures based on geometric criteria and also for the identification of distant homologs. The similar 3D structures identified by VAST are referred to as “structure neighbours”. Users can input PDB or MMDB ID of their interest.

  • Gibrat JF, Madej T, Bryant SH. Surprising similarities in structure comparison. Curr Opin Struct Biol. 1996 Jun; 6(3): 377-85.
  • Madej T, Marchler-Bauer A, Lanczycki C, Zhang D, Bryant SH. Biological Assembly Comparison with VAST. Methods Mol Biol. 2020;2112:175-186. doi: 10.1007/978-1-0716-0270-6_13. [PubMed PMID: 32006286].

Clustal Omega
Clustal Omega tool can be used for multiple sequence alignment. It uses seeded guide trees and HMM profile-profile techniques to generate progressive alignment of three or more biological sequences. Users can paste their sequence/s or browse a text file with sequence/s in the fasta format.

  • Sievers F., Wilm A., Dineen D., Gibson T.J., Karplus K., Li W., Lopez R., McWilliam H., Remmert M., Söding J., Thompson J.D., Higgins D.G. (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011 Oct 11;7:539. doi: 10.1038/msb.2011.75.
  • Goujon M., McWilliam H., Li W., Valentin F., Squizzato S., Paern J., Lopez R. (2010) A new bioinformatics analysis tools framework at EMBL-EBI. Nucleic Acids Res. 2010 Jul;38(Web Server issue):W695-9. doi: 10.1093/nar/gkq313. Epub 2010 May 3.
  • McWilliam H., Li W., Uludag M., Squizzato S., Park Y.M., Buso N., Cowley A.P., Lopez R.(2013) Analysis Tool Web Services from the EMBL-EBI. Nucleic Acids Res. 2013 Jul;41(Web Server issue):W597-600. doi: 10.1093/nar/gkt376. Epub 2013 May 13.

Pratt tool is used to search patterns conserved in a set of protein sequences. Users can either input their sequences in the FASTA format or Swiss-Prot format. Multiple sequence alignment of the sequences in the FASTA format can also be used as an input. Users can provide how many sequences should match a pattern to be reported.

  • Jonassen I., Collins J.F., Higgins D.G.(1995) Finding flexible patterns in unaligned protein sequences. Protein Sci. 1995 Aug; 4(8):1587-95.
  • Jonassen I. (1997) Efficient discovery of conserved patterns using a pattern graph. Comput Appl Biosci. 1997 Oct;13(5):509-22.

ScanProsite tool can be used to scan protein sequences against the PROSITE collection of motifs or scan user-defined motifs against protein sequence/s.

  • de Castro E., Sigrist C.J., Gattiker A., Bulliard V., Langendijk-Genevaux P.S., Gasteiger E., Bairoch A., Hulo N. (2006) ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W362-5.

Pattern Hit Initiated BLAST uses regular expression pattern for searching against protein sequence database. It can find sequences that contain the pattern and are homologous to the query protein sequence. Users have to provide a query protein sequence as well as the pattern associated with the sequence.

  • Zhang Z., Schäffer A.A., Miller W., Madden T.L., Lipman D.J., Koonin E.V., Altschul S.F. (1988) Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res. 1998 Sep 1;26(17):3986-90.

jackhmmer: The tool allows users to iteratively scan a sequence, HMM or multiple sequence alignment against a protein sequence database.

  • Finn R.D., Clements J., Eddy S.R. HMMER web server: interactive sequence similarity searching. (2011) Nucleic Acids Res. 2011 Jul;39(Web Server issue):W29-37. doi: 10.1093/nar/gkr367. Epub 2011 May 18.
  • Eddy S.R.(1998) Profile hidden Markov models. Bioinformatics. 1998;14(9):755-63. Review.

Sequence formats:
Swiss-Prot:  The first line starts with 'ID' and then the name of the sequence, followed by an arbitrary number of lines, and then a line starting with 'SQ' followed by the sequence (on one or several lines), followed by a line starting with '//' which indicates the termination.
For example: 
ID   DB119_HUMAN             Reviewed;          84 AA.
AC   Q8N690; Q5GRG1; Q5JWP1; Q5TH42; Q8N689;
DT   06-DEC-2002, integrated into UniProtKB/Swiss-Prot.
DT   02-FEB-2004, sequence version 2.
DT   04-FEB-2015, entry version 95.
DE   RecName: Full=Beta-defensin 119;
DE   AltName: Full=Beta-defensin 120;
DE   AltName: Full=Beta-defensin 19;
SQ   SEQUENCE   84 AA;  9822 MW;  0C2828612A674AB1 CRC64;

FASTA: FASTA format begins with a greater-than ('>') symbol followed by a single-line description. The sequence data starts from the next line. The description line is demarked from the sequence data by a greater-than ('>') symbol in the first line.
For example:
>sp|P80391|AMP1_MELGA Antimicrobial peptide THP1 OS=Meleagris gallopavo PE=1 SV=2

Stockholm Format: The Stockholm format starts with a line that contains the format and the version identifier, currently “# STOCKHOLM 1.0”. The sequence alignment is shown as the sequence name followed by the aligned sequence. Each sequence on a separate line followed by “//” to mark the end of the alignment. The Stockholm format also contains the mark-up lines which contains features like accession number, description, organism etc.
For example:
Sequence_1   --PGLGFY--
Sequence_2   ---RKKWFW-
Sequence_3   ----FRWWHR
Sequence_4   ----RRWWRF

| © 2022, Biomedical Informatics Centre, ICMR-NIRRCH |
ICMR-National Institute for Research in Reproductive and Child Health, Jehangir Merwanji Street, Parel, Mumbai-400012
Maharashtra, India