ReviewsMachine learning approaches for predicting compounds that interact with therapeutic and ADMET related proteins
Section snippets
INTRODUCTION
Modern drug discovery efforts have primarily been focused on the search and optimization of agents interacting with specific therapeutic target, possessing desirable ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties, and exhibiting insignificant adverse drug reactions.1., 2., 3. Methods for predicting these pharmacodynamic and ADMET properties, particularly in early discovery stages, are highly useful for facilitating drug development and drug safety evaluation.1,
DRUG DISCOVERY AND PREDICTION OF COMPOUNDS THAT INTERACT WITH THERAPEUTIC AND ADMET RELATED PROTEINS
Most drugs exert their therapeutic actions by inhibiting, antagonizing, blocking, agonizing, or activating specific therapeutic target protein.6 For instance, many antidepressant drugs target proteins that modulate neurotransmission particularly that of monoamines, which include 5HT reuptake inhibitors, adenosine receptor A agonists, alpha 2 blockers, CRF antagonists, dopamine D antagonists, dopamine reuptake inhibitors, HT agonists, HT antagonists, MAO inhibitors, and norepinephrine reuptake
MOLECULAR DESCRIPTORS FOR REPRESENTING COMPOUNDS
Molecular descriptors are used for representing structural and physicochemical properties of compounds from their 1D, 2D or 3D structure. The most popularly used computer programs for deriving molecular descriptors are DRAGON,67 Molconn‐Z,68 JOELib,69 and Xue descriptor set.25 Web‐servers such as MODEL (http://jing.cz3.nus.edu.sg/cgi‐bin/model/model.cgi) have also emerged for facilitating the computation of molecular descriptors. Over 3000 molecular descriptors can be derived from these
COMMONLY USED MACHINE LEARNING METHODS
Several machine learning methods have been widely used for the classification of pharmaceutical relevance. These include logistic regression (LR), linear discriminant analysis (LDA), k nearest neighbor (kNN), binary kernel discrimination (BKD), decision tree (DT), artificial neural network (ANN), probabilistic neural network (PNN), and support vector machine (SVM). Websites for the freely downloadable codes of some methods are given in Table 1.
METHODS FOR TRAINING, TESTING AND ESTIMATING GENERALIZATION CAPABILITIES OF MACHINE LEARNING METHODS
Several validation methods have been used for training, testing, and estimating generalization errors of a ML model based on a “re‐sampling” strategy.94, 95 The commonly used validation methods include N‐fold cross‐validation, leave one out, leave‐v‐out, jack‐knifing, and bootstrapping. In N‐fold cross‐validation, compounds are randomly divided into N subsets of approximately equal size. N − 1 subsets are used as a training set for developing a ML model, and the remaining one is used as a
SELECTION OF MOLECULAR DESCRIPTORS BY FEATURE SELECTION METHODS
Not all of the available molecular descriptors are needed for representing features of a particular class of compounds. Descriptors most appropriate for representing compounds of a particular property can be selected either by intuition as those used in QSAR and QSPR studies11, 13, 14 or by using feature selection methods. The commonly used feature selection methods include genetic algorithm‐based approach,97 recursive feature eliminations (RFE),98 and simulated annealing‐based approach.99 Some
PERFORMANCE MEASUREMENT
As in the case of all discriminative methods,107 the performance of ML methods can be evaluated by the quantity of true positives TP, true negatives TN, false positives FP (negatives but misclassified as positives), false negatives FN (positives but misclassified as negatives). Here positives refer to compounds having a particular pharmaceutical activity such as inhibitors, agonists or substrates of a protein and negatives refer to those compounds without the pharmaceutical property such as
Prediction of Inhibitors, Antagonists, Blockers, Agonists, Activators, and Substrates of Proteins Related to Specific Therapeutic and ADMET Property
Table 2 summarizes the reported performance in using ML methods for predicting inhibitors, antagonists, blockers, agonists, activators, and substrates of pharmaceutical relevance. The number of compounds in many of the studies listed in Table 2 is in the range of hundreds or even thousands of compounds, which is significantly higher than the tens of compounds typically used in QSAR and QSPR studies108 and closer to those used for developing structure‐based21, 22 and ligand‐based23, 24 VS
UNDERLYING DIFFICULTIES IN THE APPLICATION OF MACHINE LEARNING METHODS
The performance of ML methods critically depends on the diversity of compounds in a training dataset and the appropriate description of the compounds. The datasets used in the most of the ML models described in Table 2, Table 3, Table 4 are not expected to be fully representative of all of the compounds interacting with and those not interacting with a particular therapeutic or ADMET related protein. This is particularly true for compounds not interacting with a therapeutic or ADMET related
CONCLUSIONS AND PERSPECTIVES
ML methods consistently show promising capability for predicting compounds of diverse ranges of structures and of a wide variety of protein binding activities of pharmaceutical relevance. Regression‐based ML methods can be used for quantitative prediction of the activity levels if the activity data are available for a sufficient number of compounds with specific binding activity. Regression methods have the capacity for estimating the contribution of specific structural and physicochemical
REFERENCES (128)
- et al.
Advances in molecular toxicology—Towards understanding idiosyncratic drug toxicity
Toxicology
(2000) - et al.
Present and future in vitro approaches for drug metabolism
J Pharmacol Toxicol Methods
(2000) - et al.
QSAR and ADME
Bioorg Med Chem
(2004) - et al.
Neural networks in drug discovery: Have they lived up to their promise?
Eur J Med Chem
(1999) - et al.
Drug design by machine learning: Support vector machines for pharmaceutical data analysis
Comput Chem
(2001) - et al.
Quantitative structure–pharmacokinetic parameters relationships (QSPKR) analysis of antimicrobial agents in humans using simulated annealing k‐nearest‐neighbor and partial least‐square analysis methods
J Pharm Sci
(2004) - et al.
Structure‐based virtual screening of chemical libraries for drug discovery
Curr Opin Chem Biol
(2006) - et al.
Integrating virtual screening in lead discovery
Curr Opin Chem Biol
(2004) - et al.
Novel technologies for virtual screening
Drug Discov Today
(2004) - et al.
A neural network based virtual screening of cytochrome P450 3A4 inhibitors
Bioorg Med Chem Lett
(2002)
The clinical potential of chemokine receptor antagonists
Pharmacol Ther
Protein kinases as targets for anticancer agents: From inhibitors to useful drugs
Pharmacol Ther
The evolving role of estrogen therapy in prostate cancer
Clin Prostate Cancer
Screening for human ADME/Tox drug properties in drug discovery
Drug Discov Today
Substrates of human hepatic cytochrome P450 3 A4
Toxicology
Properties of cytochrome P450 isoenzymes and their substrates. Part 1. Active site characteristics
Drug Discov Today
Pharmacophore modeling of cytochromes P450
Adv Drug Deliv Rev
Nuclear receptors and drug disposition gene regulation
J Pharm Sci
Predicting undesirable drug interactions with promiscuous proteins in silico
Drug Discov Today
Structure and mechanism of ABC transporters
Curr Opin Struct Biol
Deriving the 3D structure of organic molecules from their infrared spectra
Vibrational Spectrosc
Graph theoretical approach to local and overall aromaticity of benzenoid hydrocarbons
Tetrahedron
Probabilistic neural networks
Neural Netw
Understanding and using genetic algorithms. Part 1. Concepts, properties and context
Chemometr Intell Lab
Comparison of forward selection, backward elimination, and generalized simulated annealing for variable selection
Microchem J
Drug discovery: A historical perspective
Science
An introduction to drug disposition: the basic principles of absorption, distribution, metabolism, and excretion
Toxicol Pathol
High‐throughput screening in drug metabolism and pharmacokinetic support of drug discovery
Annu Rev Pharmacol Toxicol
Therapeutic targets: Progress of their exploration and investigation of their characteristics
Pharmacol Rev
Drug ADME‐associated protein database as a resource for facilitating pharmacogenomics research
Drug Dev Res
Drug adverse reaction target database (DART): Proteins related to adverse drug reactions
Drug Saf
QSPR as a means of predicting and understanding chemical and physical properties in terms of structure
Pure Appl Chem
ADMET in silico modelling: Towards prediction paradise?
Nat Rev Drug Discov
Support vector machines for ADME property classification
QSAR Combinatorial Sci
QSAR and classification study of 1,4‐dihydropyridine calcium channel antagonists based on least squares support vector machines
Mol Pharm
Quantitative structure–pharmacokinetic relationships for drug distribution properties by using general regression neural network
J Pharm Sci
Prediction of biological activity for high‐throughput screening using binary kernel discrimination
J Chem Inf Comput Sci
Virtual screening of molecular databases using a support vector machine
J Chem Inf Model
Enrichment of high‐throughput screening data with increasing levels of noise using support vector machines, recursive partitioning, and laplacian‐modified naive bayesian classifiers
J Chem Inf Model
Virtual screening of chemical libraries
Nature
Effect of molecular descriptor feature selection in support vector machine classification of pharmacokinetic and toxicological properties of chemical agents
J Chem Inf Comput Sci
Effect of selection of molecular descriptors on the prediction of blood–brain barrier penetrating and nonpenetrating agents by statistical learning methods
J Chem Inf Model
Molecular descriptors influencing melting point and their role in classification of solid drugs
J Chem Inf Comput Sci
Managing molecular diversity
Chem Soc Rev
Chemical similarity searching
J Chem Inf Comput Sci
Random or rational design? Evaluation of diverse compound subsets from chemical structure databases
J Med Chem
Trends in the development of new antidepressants. Is there a light at the end of the tunnel?
Curr Med Chem
The second generation of COX‐2 inhibitors: What advantages do the newest offer?
Drugs
COX‐2 selective inhibitors in the treatment of arthritis: A rheumatologist perspective
Curr Top Med Chem
Pharmacology and clinical potential of direct thrombin inhibitors
Curr Pharm Des
Cited by (58)
The influence of phase II enzymes on in vitro half-life of pirydo[1,2-c]pirymidine derivatives as structural analogues of arylpiperazine
2020, Microchemical JournalCitation Excerpt :The main idea was to validate if a common procedure, well described and used by authors before, can be easily modified to include phase II metabolism. Even though many companies describe the metabolic stability assay and state that simple addition of alamethicin and phase II cofactors can enable phase II investigation of drug metabolism, authors found no evidence regarding how this affects in vitro half-life values [26–28]. According to the chemical equilibrium, when products are constantly “consumed”, the reaction can occur faster.
In silico approaches and tools for the prediction of drug metabolism and fate: A review
2019, Computers in Biology and MedicineCitation Excerpt :Currently, machine learning is widely used in the field of computer-aided drug discovery, which allows for predicting the interaction between a ligand and a target protein, and hence facilitates the development of new drugs [52–54]. It also seeks to predict the ADMET properties of drugs, thus ultimately facilitating the development of safe and promising agents [55–57]. Drug metabolism is broken down into several phases, each with numerous enzymes that play a role in metabolizing the drug; therefore, a large number of machine learning models have been built to classify a drug's fate based on whether or not the drug will be metabolized by certain enzymes [58].
In silico methods for predicting drug-drug interactions with cytochrome P-450s, transporters and beyond
2015, Advanced Drug Delivery ReviewsCitation Excerpt :Such models could use continuous or classification data. Recently, MLM such as support vector machines (SVMs) has been explored for predicting DDI [51]. P450-mediated DDI is widely known as responsible for a large number of potential drug failures during the preclinical process.
Drug–Drug Interactions in People Living With HIV at Risk of Hepatic and Renal Impairment: Current Status and Future Perspectives
2022, Journal of Clinical PharmacologyMolecular Modeling Techniques Applied to the Design of Multitarget Drugs: Methods and Applications
2022, Current Topics in Medicinal Chemistry