Ligand-based models for the isoform specificity of cytochrome P450 3A4, 2D6, and 2C9 substrates

J Chem Inf Model. 2007 Jul-Aug;47(4):1688-701. doi: 10.1021/ci700010t. Epub 2007 Jul 3.

Abstract

A data set of 379 drugs and drug analogs that are metabolized by human cytochrome P450 (CYP) isoforms 3A4, 2D6, and 2C9, respectively, was studied. A series of descriptor sets directly calculable from the constitution of these drugs was systematically investigated as to their power into classifying a compound into the CYP isoform that metabolizes it. In a four-step build-up process eventually 303 different descriptor components were investigated for 146 compounds of a training set by various model building methods, such as multinomal logistic regression, decision tree, or support vector machine (SVM). Automatic variable selection algorithms were used in order to decrease the number of descriptors. A comprehensive scheme of cross-validation (CV) experiments was applied to assess the robustness and reliability of the four models developed. In addition, the predictive power of the four models presented in this paper was inspected by predicting an external validation data set with 233 compounds. The best model has a leave-one-out (LOO) cross-validated predictivity of 89% and gives 83% correct predictions for the external validation data set. For our favored model we showed the strong influence on the predictivity of the way a data set is split into a training and test data set.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Cytochrome P-450 Enzyme System / metabolism*
  • Isoenzymes / metabolism*
  • Ligands
  • Models, Molecular*
  • Substrate Specificity

Substances

  • Isoenzymes
  • Ligands
  • Cytochrome P-450 Enzyme System