Visual Overview
Abstract
Volume of distribution at steady state (VD,ss) is one of the key pharmacokinetic parameters estimated during the drug discovery process. Despite considerable efforts to predict VD,ss, accuracy and choice of prediction methods remain a challenge, with evaluations constrained to a small set (<150) of compounds. To address these issues, a series of in silico methods for predicting human VD,ss directly from structure were evaluated using a large set of clinical compounds. Machine learning (ML) models were built to predict VD,ss directly and to predict input parameters required for mechanistic and empirical VD,ss predictions. In addition, log D, fraction unbound in plasma (fup), and blood-to-plasma partition ratio (BPR) were measured on 254 compounds to estimate the impact of measured data on predictive performance of mechanistic models. Furthermore, the impact of novel methodologies such as measuring partition (Kp) in adipocytes and myocytes (n = 189) on VD,ss predictions was also investigated. In predicting VD,ss directly from chemical structures, both mechanistic and empirical scaling using a combination of predicted rat and dog VD,ss demonstrated comparable performance (62%–71% within 3-fold). The direct ML model outperformed other in silico methods (75% within 3-fold, r2 = 0.5, AAFE = 2.2) when built from a larger data set. Scaling to human from predicted VD,ss of either rat or dog yielded poor results (<47% within 3-fold). Measured fup and BPR improved performance of mechanistic VD,ss predictions significantly (81% within 3-fold, r2 = 0.6, AAFE = 2.0). Adipocyte intracellular Kp showed good correlation to the VD,ss but was limited in estimating the compounds with low VD,ss.
SIGNIFICANCE STATEMENT This work advances the in silico prediction of VD,ss directly from structure and with the aid of in vitro data. Rigorous and comprehensive evaluation of various methods using a large set of clinical compounds (n = 956) is presented. The scale of techniques evaluated is far beyond any previously presented. The novel data set (n = 254) generated using a single protocol for each in vitro assay reported in this study could further aid in advancing VD,ss prediction methodologies.
Introduction
The current drug discovery path is a sequential, time-consuming process with a high attrition rate (Hinkson et al., 2020). Attrition of small-molecule drug candidates due to poor pharmacokinetic (PK) profiles has diminished significantly in recent years (Waring et al., 2015). This advancement can partly be attributed to the unprecedented emphasis on screening compounds based on PK parameters in the drug discovery phase (Ferreira and Andricopulo, 2019). PK is a well recognized and fundamental property that influences drug concentrations at target, which ultimately determines a drug’s efficacy and safety (Ferreira and Andricopulo, 2019). Volume of distribution at steady state (VD,ss) is a key PK parameter that describes the relationship between drug concentration measured in plasma or blood to the amount of drug in the body at equilibrium (Smith et al., 2015). Estimation of apparent VD,ss is of utmost importance because it influences Cmax and half-life in plasma and target tissues, which in turn determines dose and dosing regimen in the clinic (del Amo et al., 2013). Toward this end, VD,ss in humans is commonly predicted using preclinical in vivo and in vitro data in conjunction with various allometric scaling methods such as the Oie and Tozer method (Jones et al., 2011). Alternatively, VD,ss can be extrapolated from tissue-to-plasma partition coefficients (Kp) from preclinical species (generally rat) (Nigade et al., 2019). These experiments are resource-intensive and require the synthesis of compounds; these limitations further hinder the ability to predict human VD,ss early in drug discovery or during lead optimization. Thus, considerable effort has been undertaken to develop predictive in silico models to accelerate and reduce the cost of drug discovery processes (Wenzel et al., 2019). As VD,SS is dependent on the tissue partitioning of compounds, numerous studies have focused on developing in silico approaches to predict tissue partitioning based on physicochemical properties such as pKa and log P, plasma protein binding, and blood-to-plasma partition ratio (BPR) (Graham et al., 2012; del Amo et al., 2013). Poulin and Theil were some of the first to propose a mechanistic Kp prediction method (Poulin and Krishnan, 1995; Poulin and Theil, 2002). This method incorporates several important mechanisms, such as albumin binding, neutral lipid, and phospholipid binding. Berezhkovskiy (2004) is another method similar to Poulin and Theil. The Rogers and Rowland method (Rodgers et al., 2005; Rodgers and Rowland, 2006) is by far the most comprehensive Kp prediction method in terms of mechanisms captured. It includes all the mechanisms captured in previous published methods along with the addition of acidic phospholipid and cytosolic ion partitioning. A drawback for the Rogers and Rowland method is that there are two sets of equations based on the dissociation constant or pKa of the compounds, and the cutoff or switch between these equations was set at a pKa of 7. This results in a discontinuous relationship between the dissociation constant and plasma tissue partitioning. Finally, the method is also heavily dependent on accurate pKa predictions. To address these issues, a modified Rodgers and Rowland method was developed (Lukacova et al., 2008) that employs a single continuous combined equation for compounds regardless of pKa. Ion partitioning into acidic or basic intracellular compartments (lysosomes and mitochondria) was described by Trapp et al. (2008) and can be used as an aid to Kp prediction method for compounds for which ion trapping is expected. Key mechanisms that play a crucial role in partitioning itself between plasma and the specific organ tissue implemented by each prediction method is summarized in Table 1.
Comparison of mechanistic tissue partitioning (Kp) prediction methods
Accurately predicting VD,ss remains a challenge that has not been adequately solved (Smith et al., 2015). Few studies have evaluated the performance of various VD,ss prediction methods; however, these reports were either in preclinical species (Graham et al., 2012) or used a small set (<150) of clinical compounds (Jones et al., 2011; Korzekwa and Nagar, 2017; Chan et al., 2018; Nigade et al., 2019; Mayumi et al., 2020). Recently, Lombardo et al. (2018) published a manually curated data set of VD,ss for 1352 drugs after intravenous dosing, which presented an opportunity to evaluate the predictive performance of various VD,ss methodologies in determining human VD,ss. Therefore, we investigated the 1) performance of the most common VD,ss prediction strategies, 2) sensitivity of input parameters that influence VD,ss predictions, 3) impact of experimental data on mechanistic VD,ss predictions, and 4) whether novel methodologies such as using adipocyte and myocyte cell partitioning could improve VD,ss predictions.
Materials and Methods
Experimental Approaches
The VD,ss prediction strategies investigated are broadly categorized into two approaches based on the starting data for the analysis, which is either fully in silico (e.g., structural) or in vitro (experimental). Based on the compound availability, an initial in vitro experimental data set of 331 compounds (Lombardo et al., 2018) was identified. Predictive performances were assessed using 956 compounds for the in silico and 254 compounds for the in vitro experimental approaches, respectively.
For the in silico approach, VD,ss was predicted directly from chemical structure [using compound Simplified Molecular Input Line Entry System (SMILES) as input] by using the following four approaches: 1) mechanistic VD,ss prediction using predicted physicochemical properties from commercial software (ADMET Predictor 9.0) or 2) using machine learning (ML) models generated by the Accelerating Therapeutics for Opportunities in Medicine (ATOM) consortium, 3) allometric scaling from predicted VD,ss for preclinical species such as rat and dog ML models, and 4) direct human VD,ss predictions using an ML model built using clinical compounds (see schematic shown in Fig. 1).
Overview of human VD,ss prediction methods and input parameters (in silico and in vitro data) evaluated in this study.
In the Experimental Data approach, two distinct experimental data sets were generated. The first experimental data set included measurement of physicochemical properties under a single protocol for each in vitro experiment, which included log D, fraction unbound in plasma (fup), and BPR for 331 clinical compounds (Lombardo et al., 2018). The above experimental data were used as input parameters individually or in combination to predict mechanistic VD,ss (Lukacova et al., 2008). In addition, novel experiments were conducted to determine partition of compounds in human adipocytes and myocytes for 200 compounds that were a subset of the 331 compounds selected above. In silico and experimental methodologies are further described in detail below. The percentage of compounds that had accurately predicted VD,ss within 2-, 3-, or 10-fold; r2 (Pearson correlation coefficient); and absolute average fold error (AAFE) were used as key criteria for comparison of predictive performance of each method.
In Silico Methods
VD,ss of the clinical compounds data set (Lombardo et al., 2018) was subdivided based on whether experimental data were directly measured (331 compounds) or not (970 compounds). Evaluation of in silico methods was performed on both data sets. It is important to note that all the evaluations were performed on a complete hold-out set. For example, when predicting VD,ss for the experimental data set, none of the compounds in the experimental data set were a part of any of the ML model building data sets.
ADMET Mechanistic VD,ss Prediction.
ADMET Predictor (version 9.0) was used to predict pKa (S + Acidic_pKa, S + Basic_pKa), fraction unbound in plasma (hum_fup%, converted to fup), BPR, and log P/D (S + log D, S + log P) from chemical structure. These parameters were subsequently used as input parameters to predict mechanistic Kp and human VD,ss predictions (Lukacova et al., 2008). Predicted values of the input parameters were limited to typical assay limits for each of the input parameters (hum_fup%: 0.1%–100%, BPR: 0–200, log P and log D: −3 to 10).
ATOM Mechanistic, Allometry, and Direct ML Predictions
ATOM Mechanistic VD,ss Prediction.
Data sets generated by GlaxoSmithKline (Supplemental Table 1) containing molecular structure information and physicochemical parameters (log D, fup, BPR) were split into train, validation, and test subsets. Model training and evaluation was generally performed as previously described (Minnich et al., 2020). Briefly, a grid search hyperparameter optimization technique was employed to train several machine learning models (neural networks and random forests) with different hyperparameter combinations (learning rate, layer sizes, number of nodes, dropout rates for neural networks and maximum depth, number of trees for random forests), splitting strategies (random and scaffold), and featurization techniques [graph convolution, extended connectivity fingerprint (ECFP), molecular operating environment (MOE) descriptors, and Mordred descriptors]. Additional details related to data sets and model performances are described in Supplemental Table 1. Models with highest validation set R2 (coefficient of determination calculated using sklearn’s r2_score package) regression score function were selected to predict fup, BPR, and log D from chemical structures. These parameters were subsequently used to predict mechanistic Kp and human VD,ss predictions by the Lukacova method (Lukacova et al., 2008) as described in the ADMET Mechanistic VD,ss Prediction section above.
Allometric Scaling.
Rat fup, rat VD,ss, dog fup, dog VD,ss, and human fup values were predicted using ATOM ML models built on GlaxoSmithKline proprietary data sets as described in the ATOM Mechanistic VD,ss Prediction section (Supplemental Table 1). Subsequently, human VD,ss was predicted using the following three methods:
Single-species allometry scaling from rat (Jones et al., 2011)
Single-species allometry scaling from dog (Jones et al., 2011)
Predicted from rat and dog VD,ss using two species (Wajima et al., 2003)
Direct ML Models.
An alternative approach to mechanistic prediction of human VD,ss is to build ML models to predict volumes of distribution directly from chemical structures. For this approach, regression models based on molecular structure were fit to directly predict the log base 10 experimental human VD,ss values of clinical compounds (Lombardo et al., 2018). Compounds were clustered by Bemis-Murcko scaffold and subsequently divided into training, validation, and test sets, starting with the largest cluster size to the smallest cluster size. A train/validation/test split of 70%/10%/20% was used to train and evaluate random forest and neural network models as described for the in vitro parameter models (Minnich et al., 2020). Neural network models sampled different combinations of learning rates, layer sizes, and number of nodes. Random forest models sampled different maximum tree depth and number of trees. Several featurization approaches were used including DeepChem’s (https://github.com/deepchem/deepchem) graph convolution model, ECFP, and calculated MOE and Mordred descriptors. Models were selected by picking the model with the maximum validation set R2. Clinical compounds were grouped into two sets. The first set of compounds was the 287 compounds that were selected for experimental measurements (BPR, fup, and log D). The second set of compounds was the 970 additional compounds described in Lombardo et al. (2018) without further experimental measurements. These sets were used in two ways for fitting and prediction. 1) To compare predictive performance of the direct ML models against the other in vitro approaches, models were trained using the 970 human VD,ss of compounds without further experimental measurements. The VD,ss ML model was then used to predict VD,ss for the 287 compounds with new experimental measurements for comparison with in vitro methods. 2) A very challenging (due to the small size of the training set) external test set was used by inverting the previous approach. Models were developed using 287 compounds with new experimental measurements. Then, the fit model was used to predict VD,ss for the 970 compounds without further experimental measurements. In both approaches, the set of compounds used for model development was further split into training, validation, and internal test sets as previously described.
Experimental Data
Log D.
The chromatographic hydrophobicity index (CHI) (Valkó et al., 1997) values were measured using a reversed phase high-performance liquid chromatography (HPLC) column (50 × 2 mm 3 µM Gemini NX C18; Phenomenex, UK) with fast acetonitrile gradient at starting mobile phase of pH 2, 7.4, and 10.5. CHI values are derived directly from the gradient retention times using calibration parameters for standard compounds. The CHI value approximates to the volume percent organic concentration when the compound elutes. CHI is linearly transformed into ChromlogD (Young et al., 2011) by least-squares fitting of experimental CHI values to calculated ClogP values for over 20,000 research compounds using the following formula: ChromlogDpH=7.4 = 0.0857CHI-2.00.
Blood-to-Plasma Partition Ratio.
In vitro measurement of blood-to-plasma partition was conducted in human blood (K2EDTA as anticoagulant) obtained from a commercial source (BioReclamation IVT, Liverpool, NY). Hematocrit (the ratio of volume of red blood cells to total blood) was measured by centrifugation of the whole blood at 3000 rpm for 10 minutes using microhematocrit capillary tubes. Control plasma was prepared from a portion of the whole blood by centrifugation at 3000g for 10 minutes. Both whole blood and control plasma samples were warmed at 37°C in a water bath for 30 minutes. Subsequently, the test compounds (1 µM in the final concentration) and controls [methazolamide (BPR ∼1) and metoprolol (BPR ∼40)] were spiked into blood and incubated at 37°C (5% CO2) with shaking at 200 rpm for 60 minutes along with control samples. After incubation for 60 minutes, the incubated whole blood was removed from the water bath, and the plasma was separated by centrifugation at 1000g for 10 minutes. Aliquots of the control plasma were also removed. All plasma samples (50 µl) were treated with 400 µl of ice-cold acetonitrile containing an internal standard (100 ng/ml tolbutamide in acetonitrile). After the removal of protein by centrifugation at 1640g (3000 rpm) for 10 minutes at 4°C, the supernatants were transferred to HPLC autosampler plate. Test compounds and internal standard response (or peak area) ratio in whole blood and its resulting plasma were measured using liquid chromatography with tandem mass spectrometry (LC/MS/MS). Blood-to-plasma partition was calculated by ratio of mass spectrometric response of compounds in blood samples after 60 minutes of incubation to mass spectrometric response in plasma samples.
Fraction Unbound in Plasma.
In vitro measurement of fup was conducted using a rapid equilibrium dialysis (RED) device. The fup values of test compounds and a positive control (warfarin) were determined at a single time point of 4 hours postincubation. Considering high surface-to-volume ratio of the membrane compartment in a RED device, equilibrium is expected to be achieved within 4 hours of incubation (Waters et al., 2008). Stock solutions of test compounds and warfarin were prepared in DMSO at concentrations of 5 mM and subsequently diluted to a final concentration of 0.5 mM in DMSO:water (1:1, v/v). Incubation mixtures were prepared by diluting the stock solution into human plasma obtained from a commercial source (BioReclamation IVT). Final concentrations of compounds in incubation mixture were 5 µM. Human plasma was prewarmed in a water bath at 37°C prior to the experiment. In total, 400 µl of the stopping solution (100 ng/ml tolbutamide in acetonitrile) was added to a 96-well deep well sample collection plate on ice. In a RED device, 500 µl of PBS was added to the white chambers (receiver side), and aliquots (300 µl) of each incubation mixture were spiked into the red wells (donor side). A sample (40 µl) of the incubation mixture was transferred into the 0-minute wells on the sample collection plate. The device and remaining spiked plasma samples were incubated at 37°C for 4 hours with shaking at 150 rpm. After the incubation period, 40 µl of the remaining spiked plasma was transferred to the sample collection plate. All samples in the RED device were mixed by pipetting prior to aliquoting (40 µl) from each donor well into a well containing 160 µl of PBS buffer. A sample (160 µl) of each receiver well was aliquoted into a tube containing 40 µl of blank plasma. PBS (160 µl) was added to the 0-minute and 240-minute stability wells. Analysis of samples was performed using LC/MS/MS. For all samples, peak area ratios were used to determine percent unbound. Plasma proteins were precipitated with 400 μl of acetonitrile containing 100 ng/ml tolbutamide as a mass spectral internal standard. The resulting mixtures were vortex-mixed, followed by centrifugation for 15 minutes at >3500 rpm/min. A sample (100 µl) of the supernatant/well was transferred to a clean 96-well plate containing 100 µl of ultrapure water/well. The plate was vortexed for 1 minute at >1700 rpm/min. Aliquots (4 µl) of the resulting supernatant were injected onto the LC/MS/MS system to obtain peak area ratios for each compound to determine fraction unbound in plasma. Equilibrium dialysis method for measuring fup is amenable to automation and is generally accepted as the gold standard (Trainor, 2007).
Adipocyte and Myocyte Partition.
Intracellular partition of compounds in adipocytes and myocytes was determined using a protocol described previously (Treyer et al., 2018). Primary human adipocytes and myocytes were obtained from commercial sources (Lonza, MD). The test compounds and controls at a final concentration of 0.5 μM were incubated with fully differentiated myocytes and adipocytes plated in culture in triplicate at 37°C (5% CO2) with shaking at 100 rpm for 45 minutes. After the end of the incubation, the medium was transferred to a stop solution containing acetonitrile and internal standard (100 ng/ml tolbutamide in acetonitrile). The cell layer was washed with 200 µl of cold Hanks’ buffered salt solution and extracted with stop solution (100 ng/ml tolbutamide in acetonitrile). Both the intracellular and extracellular compound concentrations were analyzed using LC/MS/MS. The cell protein concentration was determined by the bicinchoninic acid assay. Intracellular drug accumulation (Kp) was calculated from the peak area ratios of the analyte to internal standard in the medium, cells, and protein concentration from the following equation. Protein content was quantified using the bicinchoninic acid assay in representative wells to calculate the cellular volume (
), assuming 6.5 μl/mg protein (Treyer et al., 2018). Amount of drug in the cells
) was estimated using peak area ratio and volume of cell lysate (area ratio × volume of cell lysate).
refers to corrected medium concentration. Intracellular accumulation was determined using cell lysate concentration × volume of cell lysate (150 µl). Subsequently, the
or
is calculated accounting from protein binding in plasma.
Predictions Based on Experimental Data
Mechanistic Models for Kp Prediction.
Experimental data (log D, fup, BPR) were used as input parameters individually or in combination to predict Kp (Lukacova et al., 2008) and subsequently were used to calculate VD,ss using the following relationship:where
is the volume of plasma;
is the volume of erythrocytes (
; E/P is the erythrocyte-to-plasma ratio, which is derived by the equation BPR + hematocrit − 1)/hematocrit; and
and
are the plasma tissue partition ratio and volume, respectively, for the
tissue (Nigade et al., 2019).
Tissue-Level Kp Prediction.
We used five strategies for predicting VD,ss using adipocytes and myocyte Kp values:
Adipocyte-only method: Adipocyte Kp values were used to calculate partitioning into fat (
). Kp for other organs was assumed to be 1 to predict VD,ss using the following equation:
Myocyte-only method: Myocyte Kp values were used to calculate partitioning into muscle tissue (
), and Kp for other organs was assumed to be 1 to predict VD,ss using the following equation:
Combined method: Both adipocyte and myocyte Kp values were used to calculate fat and muscle volumes, respectively. Kp for all nonfat and muscles organs was assumed to be 1 to predict VD,ss.
Average method: Average of adipocyte and myocyte Kp values were used as Kp for all nonfat and muscle tissues. Both adipocyte and myocyte Kp values were used to calculate fat and muscle volumes, respectively, to predict VD,ss.
Separate method: Mechanistic Kp (Lukacova et al., 2008) calculations were used for nonfat or nonmuscle organs. Both adipocyte and myocyte Kp values were used to calculate fat and muscle volumes, respectively. Both of the volumes were subsequently added to predict VD,ss as follows:
Results
As summarized in Fig. 1, we investigated the performance of the most common VD,ss prediction strategies, sensitivity of input parameters that influence VD,ss predictions, impact of experimental data on mechanistic VD,ss predictions, and whether adipocyte and myocyte cell partitioning could improve predictive performance by using a large compound data set. An in silico–only approach was applied using a set of 956 compounds (the ATOM in silico set) related to the Lombardo intravenous dosing drug set (n = 1352 drugs) in which VD,ss values were reported (Lombardo et al., 2018). A separate set of compounds, the ATOM experimental set (n = 254 compounds), had additional in vitro data collected under uniform experimental conditions (see Materials and Methods; Supplemental Table 2) and was used as a comparator against the purely in silico methods. Although the ATOM experimental data set was selected based on the compound availability from an initial set of 331 drugs, it represented chemical diversity of the clinical data set (Supplemental Fig. 1).
The comparative assessments of various in silico approaches evaluated to predict human VD,ss for two discrete sets of compounds are summarized in Fig. 2 and Table 2. Details of ATOM ML models used to predict input parameters for mechanistic VD,ss predictions are shown in (Supplemental Table 1). Model/featurization combination that resulted in the best models varied by data sets. MOE or graph convolution featurization with random forest or neural network models most frequently outperformed other featurization and models investigated in this study. Relative to other in silico methods, mechanistic VD,ss predictions (both by ATOM and ADMET ML models) and two-species allometry demonstrated superior predictive performance, with 62%–71% of compounds within 3-fold of observed VD,ss for both data sets (Table 2). In contrast, scaling from single species using allometric methods performed poorly, with only 38%–47% of compounds within 3-fold (Table 2). Trends in predictive performance (such as percentage within 2-, 3-, and 10-fold; AAFE; and Pearson’s r2) across various in silico models were comparable using either the smaller or larger data sets (Table 2, 283 and 956 compounds), with an exception for direct ML model. Predictive performance of the direct ML model to predict VD,ss increased significantly when the ML model was built using a larger data set (Fig. 3; Table 2). The percentage of compounds within 2-, 3-, and 10-fold increased to 58%, 75%, and 98% from 36%, 55%, and 88%, respectively (Fig. 2; Table 2). Similarly, there was significant improvement in r2 values (from 0.14 to 0.52) and AAFE (decreased from 3.3 to 2.2). The scatter plots of direct ML model predictions are shown in Fig. 3. Additional scatter plots of predicted VD,ss compared with reported (Lombardo et al., 2018) values across both data sets and various in silico methods are presented in Supplemental Fig. 2.
Summary of model performance of in silico VD,ss prediction methodologies: (A) ATOM in silico set (n = 956 compounds) and (B) ATOM experimental set (n = 254 compounds).
Summary of model performance of in silico VD,ss prediction methodologies for Lombardo intravenous dosing drug set (n = 1352 drugs) divided into two subsets: 1) ATOM in silico set (>940 compounds) and 2) ATOM experimental set (n > 280 compounds)
Predicted vs. observed VD,ss using direct ML models: (A) the ML model built was using a smaller data set (287 compounds), and predictions were tested on a large in silico set (956 compounds) and (B) vice versa. Crosslines indicate 2-, 3-, and 10-fold limits.
Experimentally measured log D, fup, and BPR in vitro assays for 254 compounds are summarized in Supplemental Table 2. Although 331 compounds were originally included, some of the compounds showed analytical or recovery issues in different assays and were removed from the data sets. Figure 4 and Table 3 summarize predictive performance of various combinations of experimental data (Supplemental Table 2) as input parameters. Scatter/kernel density estimation plots of mechanistic VD,ss predictions using various combinations of experimental data (fup, BPR, and log D) as input parameters are shown in Supplemental Fig. 4. The highest percentage of compounds within 3-fold of prediction error was observed when experimentally determined fup and BPR were used as input parameters, with 81% of the compounds within 3-fold of Lombardo reference values; a good correlation between predicted and observed values (r2 = 0.58) was seen.
Predictive performance of mechanistic Kp prediction methods using various combinations of experimental (Exp.) data.
Summary of mechanistic VD,ss predictive performance using experimental data (fup, BPR, and log D) as input parameters
Correlation between observed and predicted VD,ss for 254 compounds using experimental fup and BPR data as input parameters is shown in Fig. 5. Among the experimental parameters investigated, VD,ss predictions were sensitive to BPR. VD,ss predictions within 3-fold dropped to 73% from 81%, and r2 reduced from 0.58 to 0.42 when only fup was used instead of fup and BPR. In absence of experimental data, assuming BPR as 1 could be recommended, as better performance was observed when the BPR value was assumed to be 1 instead of inputting ML-predicted values (Table 3); 63% of the compounds were predicted within 2-fold when BPR was assumed to be 1, compared with 56% when BPR was predicted from ML models in combination with measured fup. This highlights that VD,ss predictions are sensitive to errors in BPR predictions from ML models and that the best performance across all the methods is with measured fup and BPR values. In contrast, complementing measured log D to mechanistic predictions with fup and BPR measured data did not improve predictive performance any further (Table 3). Since predicted values from log D ML models (both ADMET and ATOM) were in close agreement with measured values (Supplemental Fig. 3), it is not surprising to see that measurement of log D values did not improve VD,ss predictions. Figure 5A displays the correlation of predicted-to-observed VD,ss classified by ionization class (Lombardo et al., 2018). Anionic and zwitterionic compounds are the best-predicted classes compared with neutral compounds. The kernel density estimation (Seaborn Python library: https://seaborn.pydata.org/tutorial/distributions.html) plot in Fig. 5B demonstrates underlying distribution of the points in the Fig. 5A scatter plot. Figure 5B suggests that overall predictions using mechanistic predictions using measured fup and BPR are directly correlated, and a majority of the predictions are on the unity line, highlighting that there is no overall trend of overpredicting or underpredicting VD,ss.
(A) Scatter plot [colored by ionic state reported in Lombardo et al. (2018)]. (B) Kernel density plot showing correlation between observed and predicted VD,ss for 254 compounds using experimental (exp) fup and BPR data as input parameters. Crosslines indicate 2-, 3-, and 10-fold limits.
As fat and muscle contribute to 60% of body volume, the impact of experimental adipocyte and myocyte cell partition in improving VD,ss prediction was investigated. Measured intracellular partitioning of 189 compounds in adipocytes and myocytes is presented in Supplemental Table 3. The impact of adipocyte and myocyte cell partition on predictive performance for the same set of compounds was compared with that from the best predictive model (fup and BPR experimental data as input parameters; Fig. 6; Table 4). Good correlation between observed versus predicted VD,ss was noted when either adipocyte or myocyte or both Kp values were used (r2 of 0.41–0.48, Table 4). Although the percentage of compounds within 3-fold, r2, and AAFE were not significantly different using either adipocyte or myocyte partitioning, percentage of compounds within 2-fold was significantly higher when VD,ss was predicted using adipocyte Kp values (54% vs. 41%, Table 4). The combination of both adipocyte and myocyte partitioning with different strategies did not improve predictive performance any further (Table 4). For the same set of compounds, VD,ss predicted using only fup and BPR experimental data demonstrated higher percentage of compounds with 2- and 3-fold compared with predictions based on adipocyte or myocyte data (Fig. 6; Table 4).
Predictive performance using adipocyte (Kp fat) and myocyte (Kp muscle) partitioning experimental (exp) data.
Performance of VD,ss prediction methods utilizing adipocyte and myocyte Kp experimental data
Across all the prediction methods evaluated using different data sets, there was a good correlation between AAFE and percentage of compounds within 2- or 3-fold of observed. As anticipated, prediction methods in which lower AAFEs were observed demonstrated the highest percentage of compounds within 3-fold. Among all the methods investigated, mechanistic VD,ss predictions utilizing measured fup and BPR as input parameters demonstrated superior performance, with lowest AAFE, highest r2, and percentage of compounds within 3-fold.
Discussion
Mechanistic VD,ss Predictions.
Kp calculations use physiologic parameters of the tissue and physicochemical properties of the drug to ascertain how compounds partition themselves between plasma and tissue. Based on preliminary evaluations and other reports in the literature (Graham et al., 2012), the Lukacova method (Lukacova et al., 2008) was used as a method of choice for mechanistic VD,ss predictions. Key prerequisite input parameters to predict mechanistic VD,ss are pKa, log D, log P, fup, and BPR. Therefore, estimating these input parameters either by in silico methods or by experimental measurements, and impact of measured parameters on mechanistic VD,ss predictions have been explored.
Mechanistic VD,ss predictions using input parameters predicted by either ATOM ML models or ADMET Predictor demonstrated similar performance across data sets (Table 2). Therefore, either of the two ML models set (ATOM or ADMET Predictor) can be used to predict mechanistic VD,ss in silico. It is important to note that ML models for BPR [ATOM ML or ADMET Predictor (from user manual)] were built using very small data sets (Supplemental Table 1), and predictive performances of ML models to predict BPR are questionable. When predicted BPR values were replaced with experimental data, significant improvement in mechanistic VD,ss predictive performance was observed; r2 increased from 0.38 to 0.51 and percentage within 3-fold increased from 66% to 79%, highlighting the sensitivity of VD,ss predictions to BPR values (Fig. 4; Table 3). As BPR is a key parameter, particularly for calculation of intracellular acidic phospholipid binding of strongly basic drugs, it could be anticipated to improve the predictions. However, impact of BPR measurement was not definitely demonstrated in literature until recently (Yau et al., 2020). The current evaluations (Table 3) clearly demonstrate the importance of measuring BPR in predicting VD,ss and the need to fill the existing gaps in BPR data sets used to build predictive ML models. It is noteworthy that with only two in vitro measurements (fup and BPR), 81% of compounds are within 3-fold of observed VD,ss (Table 3), with AAFE of 2.0.
Because it can impact both the pharmacokinetics and pharmacodynamics of a drug, fup is measured routinely in drug discovery (Smith et al., 2010). On the other hand, BPR of compounds in the early discovery phase is relatively less routinely measured and might lead to missed opportunities not only in predicting VD,ss (as observed in this study) but also in predicting the impact on overall pharmacokinetics of a compound (Kalamaridis and DiLoreto, 2014). Comparable predictive performance was noted by Chan et al. (2018) using a smaller data set of 152 clinical compounds. They demonstrated that mechanistic VD,ss predictions were accurate or superior to empirical approaches based on the extrapolation of VD,ss from preclinical species (Chan et al., 2018). In addition to superior performance of mechanistic VD,ss prediction methods (using either ML-predicted or experimental input parameters), a mechanistic approach uniquely offers the ability to calculate partitioning (Kp) of compounds into various tissues.
Allometric Scaling.
Traditionally, prediction of human VD,ss has relied on scaling of VD,ss obtained from preclinical species using allometric equations (Jones et al., 2011). Although allometry has some limitations in predicting distribution of highly protein-bound drugs, it has been a valuable technique to predict human PK parameters to determine first-time-in-human dose (Choi et al., 2019). To leverage existing data from animal studies during early drug discovery, use of ML-predicted VD,ss employing allometric scaling from preclinical species was explored. Although there continue to be translational questions about interspecies scaling, it was hypothesized that deployment of this technique could allow for much wider chemical space coverage relative to human VD,ss trained models, as well as to provide insight into mechanisms not captured by mechanistic models such as transporter-driven tissue uptake. Although ML models to predict VD,ss and fup values in preclinical species have demonstrated good performance (Supplemental Table 1), single-species scaling performed poorly in predicting human VD,ss (Table 2, <50% were within 3-fold). This poor performance could be due to magnification of errors in predictions of VD,ss and/or fup values in addition to limitations of single-species scaling. Several studies have shown that plasma protein binding corrections significantly enhanced predictive performance of allometric scaling from preclinical VD,ss (Zou et al., 2012). As the VD,ss predictions are inversely proportional to fup in preclinical species (see Materials and Methods for equations), errors in the predictions of fup values will have a significant impact on VD,ss predictions. Therefore, we investigated VD,ss comparisons without fup corrections. Direct correlation of predicted dog VD,ss (without fup corrections) with human VD,ss demonstrated improved performance, with 48%, 65%, and 97% of compounds within 2-, 3-, and 10-fold of observed human VD,ss, respectively, when compared with fup accounting for the difference between dog and human (23%, 37%, and 75%, Table 2). This supports that the poor predictive accuracy of the dog fup model magnified the prediction errors. However, similar improved performance or correlations were not observed in the case extrapolating from rat VD,ss predictions. In contrast, human VD,ss scaled using both rat and dog by the Wajima method demonstrated predictive performance similar to mechanistic models (Table 2). Although overall predictive performance is not significantly different between the two methods, it is noteworthy that mechanistic models were relatively better at predicting anionic compounds within 2-fold compared with the Wajima method (Supplemental Fig. 7). VD,ss predictions classified by ionization class across various methods can be found in Supplemental Fig. 6.
Direct ML Models.
Previously, we observed that the data set size has a direct impact on model predictivity for several pharmacokinetic related data sets (Minnich et al., 2020). As anticipated, ML models built using smaller data sets, such as that for BPR, showed lower model performance statistics compared with models built using a larger data set (Supplemental Table 1). Furthermore, the direct ML model built on a larger data set (using 970 clinical compounds) outperformed other in silico methods, including the mechanistic VD,ss method (Table 2). When utilizing direct ML models built on a larger data set, 75% of compounds (Table 2) were predicted within 3-fold of observed VD,ss, with excellent correlation (Fig. 3B). It is important to highlight that the clinical data set is highly diverse across physicochemical, in vitro ADME, and in vivo PK properties (Lombardo et al., 2018). Models built on diverse data sets of chemical space have a greater applicability domain and generalizability (Simeon et al., 2019). Therefore, direct ML predictions of VD,ss might be the most computationally efficient and predictive way to process in silico predictions of VD,ss for de novo compounds. One limitation of the current model is the relatively small training set, possibly restricting the application of the model to certain chemotypes. In such cases, models that are limited to structurally related analogs may prove more predictive than global models built on a diverse set of compounds (Simeon et al., 2019). Despite some differences in hyperparameters and data set splits used relative to our study, Simeon et al. (2019) demonstrated similar predictive performance for a direct ML model built using a data set of 941 compounds. These independent studies provide promising evidence of improved performances of direct ML models with enhanced data sets of clinical compounds.
Predictions Using Adipocyte and Myocyte Cell Partitioning.
Muscle and fat are tissues with larger physiologic volumes (60% of tissue volume), and distribution of compounds to these tissues have a major impact on the VD,ss of compounds in human (Davies and Morris, 1993). Björkman (2002) evaluated relative contributions of various tissue partition coefficients (Kp, tissues) in predicting VD,ss in rat and observed an excellent linear correlation (>0.99) between VD,ss when calculated using only Kp values from muscle and fat. In this study, we hypothesized that intracellular partitioning of compounds into human adipocytes and myocytes in vitro could be used as a surrogate to determine fat and muscle Kp values and subsequently be used to estimate human VD,ss. In addition, measuring Kp values directly in human cells could improve translation to human tissues. Higher predictive performance was observed, but only when one of the adipocyte partition or myocyte partition values was included to predict VD,ss (Table 4). Adipocyte and myocyte partition values and predicted VD,ss were highly correlated (r2 > 0.7), suggesting that measurement of partition in only one cell type is adequate. Between the two measurements, adipocyte partition (only) showed better performance, particularly with respect to the percentage of compounds within 2-fold compared with myocyte partition (Kp muscle only). Combination of both adipocyte and myocyte partition in various combinations did not provide significant improvement in VD,ss predictions (Table 4). Although,
showed good correlation to human VD,ss, it failed to predict compounds with low VD,ss (<1 l/kg) because of volume contributions from other tissues (assumption of Kp = 1) (Supplemental Fig. 5a). Surprisingly, predictive performance was lower when fat and muscle volumes were predicted using both adipocyte and myocyte measured data, and the volume of the remaining tissues was predicted using mechanistic Kp prediction method. Only 56% of the compounds were within 3-fold compared with 63% when Kp was assumed to be 1 for other tissues (Table 4). However, it improved prediction of compounds with low VD,ss. Measured adipocyte and myocyte partition data provided in Supplemental Table 3 enable further exploration of VD,ss prediction methods.
Conclusions
One of the purposes of comparing various in silico VD,ss prediction methods was to establish the best in silico approaches to predict VD,ss for de novo compounds. Based on the extensive comparisons of results across the in silico methods (Table 2), we conclude that 1) the mechanistic VD,ss prediction methods using a combination of ML models for predicting physicochemical properties paired with mechanistic equations for Kp or 2) the Wajima method employing predicted rat and dog VD,ss are our recommended in silico approaches to predict human VD,ss. If a larger training data set of chemically diverse VD,ss experimental values is available, then direct ML predictions of VD,ss might be the most computationally efficient and predictive way to process in silico predictions of VD,ss for de novo compounds. Once these de novo compounds have been synthesized in discovery, it is most useful to experimentally measure BPR and fup to get to a more accurate estimation of human VD,ss. Based on our analysis, BPR is the most sensitive physicochemical property to determine VD,ss in silico. Further, we investigated the utility of adipocyte and myocyte partitioning in predicting VD,ss. If fat or muscle partition coefficients are being considered as part of the model, adipocyte Kp measurements may provide more predictive power than either myocyte Kp alone or adipocyte and myocyte combined. In summary, the scale of prediction strategies evaluated and size of data sets used in this study are novel and significantly larger than those presented in the literature thus far. In addition, we investigated novel methodologies such as adipocyte and myocyte partitioning in predicting VD,ss. Finally, we have provided several novel in vitro data sets (e.g., BPR, adipocyte Kp, myocyte Kp) generated using a single protocol for 254 clinical compounds that will enable the research community to further enhance VD,ss prediction methods.
Authorship Contributions
Participated in research design: Polli, Pasikanti, Weber, Murad, Crouch.
Conducted experiments: McComas, Pasikanti.
Performed data analysis: Murad, Madej, Pasikanti, Minnich.
Wrote or contributed to the writing of the manuscript: Pasikanti, Murad, Polli.
Footnotes
- Received September 30, 2020.
- Accepted November 3, 2020.
↵1 N.M. and K.K.P. contributed equally as co-first authors.
This work represents a multi-institutional effort. Funding sources include the following: Lawrence Livermore National Laboratory internal funds; the National Nuclear Security Administration; GlaxoSmithKline, LLC; and federal funds from National Institutes of Health National Cancer Institute and the Department of Health and Human Services [Contract No. 75N91019D00024]. This work was performed under the auspices of the US Department of Energy by Lawrence Livermore National Laboratory [Contract No. DE-AC52-07NA27344].
↵
This article has supplemental material available at dmd.aspetjournals.org.
This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes. The authors declare no competing financial interest.
Abbreviations
- AAFE
- absolute average fold error
- ATOM
- Accelerating Therapeutics for Opportunities in Medicine
- BPR
- blood-to-plasma partition ratio
- CHI
- chromatographic hydrophobicity index
- ECFP
- extended connectivity fingerprint
- fup
- fraction unbound in plasma
- Kp
- tissue-to-plasma partition coefficient
- LC/MS/MS
- liquid chromatography with tandem mass spectrometry
- ML
- machine learning
- MOE
- molecular operating environment
- PK
- pharmacokinetic
- RED
- rapid equilibrium dialysis
- VD,ss
- volume of distribution at steady state
- Copyright © 2021 The Author(s).
This is an open access article distributed under the CC BY Attribution 4.0 International license.