Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Methods for predicting bacterial protein subcellular localization

A Corrigendum to this article was published on 01 November 2006

Key Points

  • The prediction of a bacterial protein's subcellular localization can be of considerable aid to microbiological research. It can be used to infer potential functions for a protein, to either design or support the results of particular experimental approaches and, in the case of surface-exposed proteins, to quickly identify potential drug or vaccine targets in a given pathogen genome, or potential diagnostic/detection targets in pathogen or environmental isolates.

  • Bacterial proteins contain sequence features that either directly influence the targeting of a protein to a particular cellular compartment or else are characteristic of proteins found at a specific localization site. These features are encoded in the protein's amino-acid sequence and can be identified computationally.

  • By analyzing a protein for the presence or absence of one or more of these features and integrating the results, a prediction of which compartment a protein is likely to reside in can be generated.

  • Since the 1991 release of the first comprehensive, web-based bacterial protein localization prediction method, PSORT I, seven other such tools have been released. This review summarizes the techniques implemented by each tool, their benefits, pitfalls and predictive performance.

  • The review also describes alternative methods for localization prediction, including similarity searches against localization databases and the use of predictive tools designed to identify individual sequence features. The performance of these methods is compared with that of the seven broad-spectrum localization prediction tools.

  • PSORTb and Proteome Analyst are the most precise predictive methods currently available, with other methods complementing them when higher sensitivity (a larger number of predictions) is required.

  • The precision of certain localization prediction tools has now surpassed the precision of some high-throughput laboratory methods for localization determination. We can now reliably assign potential localization sites to the majority of proteins encoded in a genome.

Abstract

The computational prediction of the subcellular localization of bacterial proteins is an important step in genome annotation and in the search for novel vaccine or drug targets. Since the 1991 release of PSORT I ? the first comprehensive algorithm to predict bacterial protein localization ? many other localization prediction tools have been developed. These methods offer significant improvements in predictive performance over PSORT I and the accuracy of some methods now rivals that of certain high-throughput laboratory methods for protein localization identification.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Protein localization in bacteria.

Similar content being viewed by others

References

  1. Holland, I. B., Schmitt, L. & Young, J. Type 1 protein secretion in bacteria, the ABC-transporter dependent pathway. Mol. Membr. Biol. 22, 29?39 (2005).

    Article  CAS  PubMed  Google Scholar 

  2. Pugsley, A. P. The complete general secretory pathway in Gram-negative bacteria. Microbiol. Rev. 57, 50?108 (1993).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Muller, M. & Klosgen, R. B. The Tat pathway in bacteria and chloroplasts. Mol. Membr. Biol. 22, 113?121 (2005).

    Article  PubMed  CAS  Google Scholar 

  4. Journet, L., Hughes, K. T. & Cornelis, G. R. Type III secretion: a secretory pathway serving both motility and virulence. Mol. Membr. Biol. 22, 41?50 (2005).

    Article  CAS  PubMed  Google Scholar 

  5. Christie, P. J. & Cascales, E. Structural and dynamic properties of bacterial type IV secretion systems. Mol. Membr. Biol. 22, 51?61 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Thanassi, D. G., Stathopoulos, C., Karkal, A. & Li, H. Protein secretion in the absence of ATP: the autotransporter, two-partner secretion and chaperone/usher pathways of Gram-negative bacteria (review). Mol. Membr. Biol. 22, 63?72 (2005).

    Article  CAS  PubMed  Google Scholar 

  7. Nishikawa, K. & Ooi, T. Correlation of the amino acid composition of a protein to its structural and biological characters. J. Biochem. (Tokyo) 91, 1821?1824 (1982).

    Article  CAS  PubMed  Google Scholar 

  8. Cedano, J., Aloy, P., Perez-Pons, J. A. & Querol, E. Relation between amino acid composition and cellular location of proteins. J. Mol. Biol. 266, 594?600 (1997).

    Article  CAS  PubMed  Google Scholar 

  9. Holland, I. B. Translocation of bacterial proteins ? an overview. Biochim. Biophys. Acta 1694, 5?16 (2004).

    Article  CAS  PubMed  Google Scholar 

  10. van Wely, K. H., Swaving, J., Freudl, R. & Driessen, A. J. Translocation of proteins across the cell envelope of Gram-positive bacteria. FEMS Microbiol. Rev. 25, 437?454 (2001).

    Article  CAS  PubMed  Google Scholar 

  11. McGeoch, D. J. On the predictive recognition of signal peptide sequences. Virus Res. 3, 271?286 (1985).

    Article  CAS  PubMed  Google Scholar 

  12. von Heijne, G. A new method for predicting signal sequence cleavage sites. Nucleic Acids Res. 14, 4683?4690 (1986).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Eisenberg, D., Weiss, R. M. & Terwilliger, T. C. The hydrophobic moment detects periodicity in protein hydrophobicity. Proc. Natl. Acad. Sci. USA 81, 140?144 (1984).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Kyte, J. & Doolittle, R. F. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105?132 (1982). Introduces the Kyte and Doolittle hydropathy scale and the sliding window approach for identifying hydrophobic segments within a protein, both of which were later used in many transmembrane α-helix prediction methods.

    Article  CAS  PubMed  Google Scholar 

  15. von Heijne, G. Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J. Mol. Biol. 225, 487?494 (1992).

    Article  CAS  PubMed  Google Scholar 

  16. Nakai, K. & Kanehisa, M. Expert system for predicting protein localization sites in Gram-negative bacteria. Proteins 11, 95?110 (1991). Describes PSORT I, the first localization prediction method to implement the analysis of multiple sequence features.

    Article  CAS  PubMed  Google Scholar 

  17. Rey, S., Gardy, J. L. & Brinkman, F. S. Assessing the precision of high-throughput computational and laboratory approaches for the genome-wide identification of protein subcellular localization in bacteria. BMC Genomics 6, 162 (2005).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. Gardy, J. L. et al. PSORTb v. 2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 21, 617?623 (2005).

    Article  CAS  PubMed  Google Scholar 

  19. Gardy, J. L. et al. PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Res. 31, 3613?3617 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Lu, Z. et al. Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics 20, 547?556 (2004).

    Article  CAS  PubMed  Google Scholar 

  21. Nakai, K. & Horton, P. PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem. Sci. 24, 34?36 (1999).

    Article  CAS  PubMed  Google Scholar 

  22. Tusnady, G. E. & Simon, I. The HMMTOP transmembrane topology prediction server. Bioinformatics 17, 849?850 (2001).

    Article  CAS  PubMed  Google Scholar 

  23. Rey, S. et al. PSORTdb: a protein subcellular localization database for bacteria. Nucleic Acids Res. 33, D164?D168 (2005).

    Article  CAS  PubMed  Google Scholar 

  24. Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365?370 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389?3402 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Lu, P. et al. PA-GOSUB: a searchable database of model organism protein sequences with their predicted gene ontology molecular function and subcellular localization. Nucleic Acids Res. 33, D147?D153 (2005).

    Article  CAS  PubMed  Google Scholar 

  27. Vapnik, V. The Nature of Statistical Learning Theory (Springer, New York, 2000). Although Vapnik had formulated the idea of using hyperplanes for linear classification in the 1960s, it was not until this book was published that support vector machine became a well-developed and widely recognized method for the classification of non-linearly separable data.

    Book  Google Scholar 

  28. Hua, S. & Sun, Z. Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17, 721?728 (2001).

    Article  CAS  PubMed  Google Scholar 

  29. Reinhardt, A. & Hubbard, T. Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res. 26, 2230?2236 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Yu, C. S., Lin, C. J. & Hwang, J. K. Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Sci. 13, 1402?1406 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Yu, C. S., Chen, Y. C., Lu, C. H. & Hwang, J. K. Prediction of protein subcellular localization. Proteins 64, 643?651 (2006).

    Article  CAS  PubMed  Google Scholar 

  32. Bhasin, M., Garg, A. & Raghava, G. P. PSLpred: prediction of subcellular localization of bacterial proteins. Bioinformatics 21, 2522?2524 (2005).

    Article  CAS  PubMed  Google Scholar 

  33. Nair, R. & Rost, B. Mimicking cellular sorting improves prediction of subcellular localization. J. Mol. Biol. 348, 85?100 (2005).

    Article  CAS  PubMed  Google Scholar 

  34. Wang, J., Sung, W. K., Krishnan, A. & Li, K. B. Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines. BMC Bioinformatics 6, 174 (2005).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Nair, R. & Rost, B. Sequence conserved for subcellular localization. Protein Sci. 11, 2836?2847 (2002). The authors demonstrate that subcellular localization is an evolutionarily conserved property and that, above certain levels of sequence similarity, localization annotation can be transferred from a known protein to an unknown protein with a high degree of confidence.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Guo, T., Hua, S., Ji, X. & Sun, Z. DBSubLoc: database of protein subcellular localization. Nucleic Acids Res. 32, D122?D124 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Bendtsen, J. D., Nielsen, H., von Heijne, G. & Brunak, S. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340, 783?795 (2004).

    Article  PubMed  CAS  Google Scholar 

  38. Juncker, A. S. et al. Prediction of lipoprotein signal peptides in Gram-negative bacteria. Protein Sci. 12, 1652?1662 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Bendtsen, J. D., Nielsen, H., Widdick, D., Palmer, T. & Brunak, S. Prediction of twin-arginine signal peptides. BMC Bioinformatics 6, 167 (2005).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  40. Kall, L., Krogh, A. & Sonnhammer, E. L. A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol. 338, 1027?1036 (2004).

    Article  CAS  PubMed  Google Scholar 

  41. Menne, K. M., Hermjakob, H. & Apweiler, R. A comparison of signal sequence prediction methods using a test set of signal peptides. Bioinformatics 16, 741?742 (2000).

    Article  CAS  PubMed  Google Scholar 

  42. Zhang, Z. & Henzel, W. J. Signal peptide prediction based on analysis of experimentally verified cleavage sites. Protein Sci. 13, 2819?2824 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Moller, S., Croning, M. D. & Apweiler, R. Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics 17, 646?653 (2001).

    Article  CAS  PubMed  Google Scholar 

  44. Kall, L. & Sonnhammer, E. L. Reliability of transmembrane predictions in whole-genome data. FEBS Lett. 532, 415?418 (2002).

    Article  CAS  PubMed  Google Scholar 

  45. Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A. & Nielsen, H. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412?424 (2000). An excellent technical review of several performance evaluation metrics used in bioinformatics, discussed in the context of transmembrane α-helix and signal peptide prediction.

    Article  CAS  PubMed  Google Scholar 

  46. Huang, Y. L. & Chen, D. R. Support vector machines in sonography: application to decision making in the diagnosis of breast cancer. Clin. Imaging 29, 179?184 (2005).

    Article  PubMed  Google Scholar 

  47. Ratsch, G., Sonnenburg, S. & Scholkopf, B. RASE: recognition of alternatively spliced exons in C. elegans. Bioinformatics 21 (Suppl. 1) i369?i377 (2005).

    Article  PubMed  Google Scholar 

  48. Barutcuoglu, Z., Schapire, R. E. & Troyanskaya, O. G. Hierarchical multi-label prediction of gene function. Bioinformatics 22, 830?836 (2006).

    Article  CAS  PubMed  Google Scholar 

  49. Al-Shahib, A., Breitling, R. & Gilbert, D. Feature selection and the class imbalance problem in predicting protein function from sequence. Appl. Bioinformatics 4, 195?203 (2005).

    Article  CAS  PubMed  Google Scholar 

  50. Gardy, J. L. in Molecular Biology and Biochemistry (Simon Fraser Univ., Burnaby, 2006).

    Google Scholar 

  51. Saleh, M. T., Fillon, M., Brennan, P. J. & Belisle, J. T. Identification of putative exported/secreted proteins in prokaryotic proteomes. Gene 269, 195?204 (2001).

    Article  CAS  PubMed  Google Scholar 

  52. Schatz, G. & Dobberstein, B. Common principles of protein translocation across membranes. Science 271, 1519?1526 (1996).

    Article  CAS  PubMed  Google Scholar 

  53. Schneider, G. How many potentially secreted proteins are contained in a bacterial genome? Gene 237, 113?121 (1999).

    Article  CAS  PubMed  Google Scholar 

  54. Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E. L. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567?580 (2001).

    Article  CAS  PubMed  Google Scholar 

  55. Arai, M. et al. ConPred II: a consensus prediction method for obtaining transmembrane topology models with high reliability. Nucleic Acids Res. 32, W390?W393 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Berven, F. S., Flikka, K., Jensen, H. B. & Eidhammer, I. BOMP: a program to predict integral β-barrel outer membrane proteins encoded within genomes of Gram-negative bacteria. Nucleic Acids Res. 32, W394?W399 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Bigelow, H. R., Petrey, D. S., Liu, J., Przybylski, D. & Rost, B. Predicting transmembrane β-barrels in proteomes. Nucleic Acids Res. 32, 2566?2577 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Bigelow, H. & Rost, B. PROFtmb: a web server for predicting bacterial transmembrane β-barrel proteins. Nucleic Acids Res. 34, W186?W188 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Garrow, A. G., Agnew, A. & Westhead, D. R. TMB-Hunt: an amino acid composition based method to screen proteomes for β-barrel transmembrane proteins. BMC Bioinformatics 6, 56 (2005).

    PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fiona S. L. Brinkman.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Table 1

test description (PDF 99 kb)

Related links

Related links

DATABASES

Entrez Genome Project

Bacillus subtilis

Pseudomonas aeruginosa

FURTHER INFORMATION

Fiona Brinkman's homepage

PSORT portal

BLAST

BOMP

CELLO

ConPredII

LipoP

LOCtree

PA-GOSUB

P-CLASSIFIER

Phobius

Prof-TMB

Proteome Analyst

PSLpred

PSORT I

PSORTb

SignalP

SubLoc

SwissProt

TatP

TMB-Hunt

TMHMM

Glossary

Type I secretion system

A protein export system spanning the bacterial cell envelope that transports newly synthesized proteins directly from the cytoplasm to the extracellular space.

Type II secretion system

A two-stage protein export system spanning both the bacterial cytoplasmic and outer membranes. Also known as the general secretory pathway.

Sec-dependent pathway

One of several possible first stages of the general secretory pathway protein export system in the cytoplasmic membrane that transports newly synthesized proteins into or across the cytoplasmic membrane.

SRP-dependent pathway

One of several possible first stages of the general secretory pathway protein export system that inserts membrane proteins into the cytoplasmic membrane.

Twin arginine translocation pathway

(TAT pathway). One of several possible first stages of the general secretory pathway protein export system in the cytoplasmic membrane that transports folded proteins across the cytoplasmic membrane.

Type III secretion system

A system that is used by many pathogenic bacteria to inject virulence proteins directly into host cells through needle-like structures. Ancestrally related to the system used by bacteria to export flagellum protein subunits.

Type IV secretion system

A syringe-like proteinaceous machinery that can transport bacterial protein or DNA effector molecules directly into a eukaryotic cell.

Type V secretion system

A system that involves autotransporter proteins, which are translocated across the outer membrane of Gram-negative bacteria through a transmembrane pore that is formed by a self-encoded β-barrel structure.

Signal peptide

A short sequence of mainly hydrophobic amino acids at the N terminus of some secreted proteins that directs the nascent protein to the first step of the general secretory pathway.

κ-nearest-neighbour classification technique

A method for classifying an unknown object based on its proximity in multidimensional space to neighbouring objects of known class.

HMMTOP

(Hidden Markov model for topology prediction). An automatic server for predicting transmembrane helices and the topology of proteins. HMMTOP is based on the principle that the topology of transmembrane proteins is determined by the maximum divergence of amino-acid composition of sequence.

Bayesian network

A statistical approach (named after Bayes' Theorem) for inferring the likelihood of an event given a series of prior events with known probabilities.

BLAST

(Basic local alignment search tool). A sequence comparison algorithm, optimized for speed, used to search sequence databases for regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches.

PA?GOSUB

(Proteome Analyst?Gene Ontology Molecular Function and Subcellular Localization). A publicly available, web-based, searchable and downloadable database that contains the sequences, predicted molecular functions and predicted subcellular localizations of over 107,000 proteins from ten model organisms.

Matthews Correlation Coefficient

(MCC). A measure of predictive performance that incorporates both precision and recall into a single value between −1 and +1.

PSI-BLAST

Position-specific iterative BLAST. This is a feature of BLAST 2.0 in which a profile (or position-specific scoring matrix, (PSSM)) is constructed from a multiple alignment of the highest scoring hits in an initial BLAST search. The PSSM is generated by calculating position-specific scores for each position in the alignment. Highly conserved positions receive high scores and weakly conserved positions receive scores near zero. The profile is then used to do a second BLAST search and the results of each 'iteration' are used to refine the profile. This iterative searching strategy results in increased sensitivity.

Expect value

This describes the likelihood that a sequence with a similar score will occur in the database by chance. The smaller the e value, the more significant the alignment. For example, if the first alignment has a low e value of 10−117, this indicates that there is a significant sequence alignment and that a sequence with a similar score is unlikely to occur simply by chance.

FASTA

A commonly used sequence format in bioinformatics starting with a '>' character and optional description, followed by a DNA or protein sequence.

BLASTp

This is used to compare an amino-acid query sequence with other protein sequences stored in databases.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gardy, J., Brinkman, F. Methods for predicting bacterial protein subcellular localization. Nat Rev Microbiol 4, 741–751 (2006). https://doi.org/10.1038/nrmicro1494

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrmicro1494

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing