Abstract
Predicting in vivo pharmacokinetic parameters such as clearance from in vitro data is a crucial part of the drug-development process. There is a commonly cited trend that drugs that are highly protein-bound and are substrates for hepatic uptake transporters often yield the worst predictions. Given this information, 11 different data sets using human microsomes and hepatocytes were evaluated to search for trends in accuracy, extent of protein binding, and drug classification based on the Biopharmaceutics Drug Disposition Classification System (BDDCS), which makes predictions about transporter effects. As previously reported, both in vitro systems (microsomes and hepatocytes) gave a large number of inaccurate results, defined as predictions falling more than 2-fold outside of in vivo values. The weighted average of the percentage of inaccuracy was 66.5%. BDDCS class 2 drugs, which are subject to transporter effects in vivo unlike class 1 compounds, had a higher percentage of inaccurate predictions and often had slightly larger bias. However, since the weighted average of the percentage of inaccuracy was still high in both classes (81.9% for class 2 and 62.3% for class 1), it may be currently hard to use BDDCS class to predict potential accuracy. The results of this study emphasize the need for improved in vitro to in vivo extrapolation experimental methods, as using physiologically based scaling is still not accurate, and BDDCS cannot currently help predict accurate results.
Introduction
The current drug-development process is expensive, time-consuming, and inefficient due to compound attrition (Pammolli et al., 2011). Although failures due to pharmacokinetic parameters have decreased in recent years (Waring et al., 2015), continued improvement in pharmacokinetic predictions is crucial.
Metabolic stability studies are some of the earliest in vitro studies conducted during drug development to determine the rate and extent to which a molecule is metabolized, and can be useful for rank ordering candidates. After measuring in vitro metabolic turnover, or intrinsic clearance (CLint), in vivo hepatic clearance can be predicted using in vitro–in vivo extrapolation (IVIVE) methods. A common approach is to apply physiologically based scaling factors to the raw in vitro data, such as hepatocellularity for studies using hepatocytes or a factor to account for incomplete microsomal recovery for microsomes, and to then apply a model of hepatic disposition, such as the well stirred model (Houston, 1994). Although the results are often used in the drug-development process, there is perhaps an overemphasis placed on their reliability.
The first part of the present study examined the overall accuracy of hepatic clearance predictions in the field at this time. Many groups have attempted IVIVE, tried to create new models to improve predictions from old in vitro values, or investigated different experimental setups. A study published 10 years ago collected and examined results from 85 compounds, concluding there was a paucity of literature data (Nagilla et al., 2006); however, much work has been done since then.
When examining the accuracy of these values, a prediction bias has been found that is unresolved from human variability and experimental uncertainty (Hallifax and Houston, 2009). There is also a commonly cited trend that substrates for hepatic uptake transporters and highly protein-bound compounds yield the poorest predictions (Soars et al., 2007). The Biopharmaceutics Drug Disposition Classification System (BDDCS), which categorizes transporter effects on drug disposition, says class 1 compounds exhibit minimal clinically relevant transporter effects, whereas class 2 compounds may be governed by transporter effects in the gut and liver (Wu and Benet, 2005). BDDCS has become an important part of early drug discovery for predicting routes of elimination, food effects, and potential drug interactions (Wu and Benet, 2005). Given this trend, the main objective of this study was to determine if BDDCS classification could be a determinant of accurate IVIVE results.
Materials and Methods
A literature search was conducted for previously described compounds for which both in vitro and in vivo clearance data were available. Studies using human microsomes as well as human hepatocytes were considered, as both systems are routinely used in the pharmaceutical industry. The terms used as keywords to help in the search included “in vitro-in vivo extrapolation,” “intrinsic clearance,” “microsomes,” “hepatocytes,” or a combination of these.
All of the studies considered here used the well stirred model in their predictions, and predictions were made using physiologically based scaling factors, not empirical or regression-based factors. The data sets were examined separately, excluding re-examination of previously published data, as different experimental setups (such as the inclusion of serum in incubations) and scaling (such as the inclusion of fub and fuinc versus no binding terms) were used in each. Similarly, repeated drugs were not removed due to value differences among data sets. Overall evaluations were also tabulated. The data evaluated can be found in Supplemental Table 1.
The accuracy of predictions was determined based on whether the predictions fell within 2-fold of the true in vivo values, as has been a standard cutoff in previous studies (Zuegge et al., 2001; Blanchard et al., 2006; Fagerholm, 2007).
To measure bias, the average fold error (AFE) was calculated using the following equation (Obach et al., 1997):
AFE was recorded as the whole number reciprocal if less than 1.
The precision was also calculated with the root mean squared error (RMSE) using the following (Sheiner and Beal, 1981):
To divide the compounds based on their BDDCS classification, two publications categorizing over 900 drugs and over 175 drugs were consulted (Benet et al., 2011; Hosey et al., 2016). Five compounds were also classified here for the first time (class 1: amobarbital, bufuralol, levoprotiline, and triprolidine; class 2: tenidap). Trends in the accuracy of predictions compared with class 1 and class 2 drugs, where metabolism is the main route of elimination, were examined. Protein binding was also considered if the values used in the prediction calculations were available, as the interplay between protein binding, transporters, and enzymes is known to be important (Benet, 2009). Drugs with high protein binding were defined as having a free fraction less than or equal to 0.05.
Results
Seven different papers were examined that fit the criteria mentioned earlier (Obach, 1999; McGinnity et al., 2004; Ito and Houston, 2005; Riley et al., 2005; Brown et al., 2007; Hallifax et al., 2010; Sohlenius-Sternbeck et al., 2010). Hallifax et al. (2010) compiled a large database of predictions from many of the papers also examined here; however, not all drugs from the original papers were included, and different values of CLin vivo were often compared, causing the same drugs to be accurately or inaccurately predicted based on the value choices. Furthermore, although it could be argued that the more recent Hallifax et al. (2010) paper provides refined values from the original papers, looking at the percentage of inaccuracy and AFE both overall and for class 1 and class 2 drugs reveals that the Hallifax et al. (2010) data often actually have a comparable or higher percentage of inaccuracy and AFE values compared with the original papers. All papers were therefore examined to try to obtain a fuller picture of the relationship to BDDCS. Five human microsome data sets, some with multiple scaling options, were included in this evaluation for a total of 332 values, and six human hepatocyte data sets were also included for a total of 332 values. The percentage of inaccurate predictions (more than 2-fold difference) for each data set as well as the AFE and RMSE are shown in Table 1. Every data set examined has 41.0% or greater inaccuracy, and AFE values are as high as 21.7. The paper by Sohlenius-Sternbeck et al. (2010) only provided individual prediction values using a regression model, so further analysis could not be conducted. However, since it is the most recent paper examined, the summary statistics using the well stirred model with protein binding that were given were still included in the table for comparison. The weighted average for the percentage of inaccurate results for microsomes is 66.8%, for hepatocytes is 66.2%, and overall is 66.5%.
The same papers and data sets were used to examine BDDCS trends. Class 1 and class 2 drugs were compiled from each set, and the inaccuracy of the predictions, AFE, and RMSE for each class was determined (Table 2). As expected, class 2 drugs have a higher percentage of inaccurate predictions than class 1 drugs in every case except one, where all predictions were inaccurate. The AFE was either slightly higher or almost identical for class 2 drugs compared with class 1 drugs. Considering a total of 305 class 1 drug values, the weighted average of the percentage of inaccurate predictions is 62.3%. For a total of 155 class 2 drug values, the weighted average of the percentage of inaccuracy is 81.9%. [The total number of class 1 and class 2 drugs is less than 644, since individual drugs are not enumerated in Sohlenius-Sternbeck et al. (2010) and some unapproved proprietary compounds are included in other data sets.] For class 1 drugs, studies done in microsomes have a weighted average of 63.3% inaccuracy, whereas studies in hepatocytes are 66.2% inaccurate. For class 2 drugs, studies in microsomes have a weighted average of prediction inaccuracy of 85.6%, whereas studies in hepatocytes have a 78.4% average.
Finally, given that substrates of transporters and highly bound drugs often have the poorest clearance predictions (Soars et al., 2007), protein-binding differences were examined between the two BDDCS classes. First, the percentage of drugs with inaccurate predictions that are also highly protein-bound in both classes was determined (Table 3). There are more inaccurate class 2 drugs that are highly protein-bound than class 1 drugs in every case examined. The weighted average of inaccurate class 1 drugs with high protein binding is 19.8%, whereas the weighted average for class 2 is 67.3%. Since class 2 drugs in general are often highly protein-bound (Broccatelli et al., 2012), the numbers of highly bound drugs in both classes that have inaccurate predictions were also determined (Table 4). These results agree with several other conclusions that highly protein-bound compounds are often poorly predicted. Class 1 highly protein-bound drugs were inaccurately predicted 81.3% of the time, and class 2 highly bound drugs had an 85.7% average inaccuracy rate. In four data sets, highly bound class 2 drugs had a higher percentage of inaccuracy than class 1 drugs; in one data set, the opposite was true; and in the last set, all highly bound drugs were inaccurate.
Looking at the bias between the high and low protein-binding drugs in the two classes (Table 5), it is difficult to see trends between the two classes; however, the bias is always higher for the high protein-binding drugs, except in the case of the data from Obach (1999), using fub and fuinc, and Brown et al. (2007), where there are only two class 1 high protein-binding drugs and four class 2 low protein-binding drugs, perhaps skewing the results.
Discussion
Being able to accurately predict pharmacokinetic parameters, especially clearance, early in the drug-development process is a key part of lead optimization. However, although some studies have claimed to find success in predicting in vivo clearance from in vitro data, others have questioned the reliability (Masimirembwa et al., 2003). Underpredicting in vivo clearance may result in inefficiency in the drug-discovery pipeline or an ineffective therapeutic dosing regimen, whereas overpredicting in vivo clearance may lead to missed opportunities that were rejected early in the development process (Clarke and Jeffrey, 2001).
The goal of this study was to compile data to examine the accuracy of the prediction methods for in vivo clearance and relate this accuracy to BDDCS classification. For the 11 data sets considered, there is a large percentage of inaccuracy. To have a true understanding of the accuracy of in vitro methods, physiologically scaled in vitro estimations and observed in vivo clearance were directly compared, since incorporating established physiologic scaling factors as well as unbound fractions in the blood and possibly in vitro matrix should, in theory, give accurate predictions. This is in contrast to some groups creating linear regression equations from reference compound data and then applying an empirical scaling factor to try to further improve predictions (Sohlenius-Sternbeck et al., 2012). The fact that 66.5% of predictions overall are inaccurate emphasizes the idea that a mechanistic understanding of this inaccuracy still needs to be determined before IVIVE predictions can be completely trusted.
BDDCS classification and protein binding were then examined to evaluate if accurate results could be separated from inaccurate results to help determine whether predictions can be trusted in the future or not. Class 1 drugs, or those that are extensively metabolized and highly soluble, appear to overwhelm transporter effects, whereas class 2 drugs, also extensively metabolized but poorly soluble, can be affected by efflux transporters in the gut and both uptake and efflux transporters in the liver (Shugarts and Benet, 2009). Given the trend that poorly predicted compounds are often substrates for transporters (Soars et al., 2007), it was expected that class 1 drugs that have no clinically relevant transporter effects would yield better predictions than class 2 drugs. The other part of the trend is that poorly predicted compounds are also often highly protein-bound, which is why protein binding was considered when data were available (Ring et al., 2011). Overall, the hypothesis was that class 2 drugs would be more poorly predicted due to the fact that they are substrates for transporters, and these poorly predicted class 2 drugs would also be highly protein-bound.
As expected, class 2 drugs yielded poorer predictions in every case examined; however, there was still large inaccuracy for both class 1 and class 2 drugs. Class 2 drugs also often had a higher AFE, but the difference was not enough (or sometimes did not exist at all) to indicate bias. However, AFE provides a better measure of bias than RMSE, which is highly influenced by the marked differences in CLint values from study to study. For example, the values reported by Brown et al. (2007) for predicted and measured CLint for propofol were 2773 and 5052 ml/min/kg, respectively, whereas for the same drug McGinnity et al. (2004) reported 283 and 24 ml/min/kg. At this point in time, with the current methodology, relying on BDDCS class cannot confidently provide information about whether predictions will be accurate or not. This agrees with previous findings from Poulin et al. (2012), who found that predictivity was similar between classes for a human microsome data set of 42 drugs. It is interesting to note that microsomes and hepatocytes gave similar prediction accuracies in both class 1 and class 2 drugs. A bigger difference between the two systems would have been expected for class 2 drugs where transporters play a role, since necessary uptake transporters are not present in microsomes. This again emphasizes that there are likely major missing determinants when trying to mimic the interplay between protein binding, uptake, and metabolism in vitro.
Poulin et al. (2012) also suggested that an approach involving determination of effective fraction unbound in plasma based on albumin-facilitated hepatic uptake of acidic/neutral drugs improved the prediction accuracy and precision for 25 high protein-binding drugs. Hallifax and Houston (2012) examined this approach for 107 drugs studied in hepatocytes and microsomes, also finding an increase in prediction accuracy but no change in precision, and reported that there was no evidence that prediction bias was associated with measured fraction unbound in plasma. These latter authors emphasized the need for further “mechanistic elucidation to improve prediction methodology rather than empirical correction of bias” (Hallifax and Houston, 2012).
Last, protein binding was considered along with BDDCS. Given current trends, class 2 drugs with high protein binding would have been expected to yield the poorest results. There were more inaccurate class 2 drugs that had higher protein binding than class 1, but this may be because class 2 drugs generally have higher protein binding than class 1 (Broccatelli et al., 2012). This coupled to the fact that there may be a slight dependency of bias on protein binding, both here and as found previously with hepatocytes by Hallifax et al. (2010), could explain some of the difference seen between the inaccuracies in class 1 and class 2 drugs. However, on average, highly bound drugs in both classes had similar high percentages of inaccuracy, and there were no clear trends in the bias or precision of highly bound drugs between classes.
This study emphasizes the fact that the in vitro to in vivo extrapolation of hepatic clearance needs to be improved through a better understanding of clearance mechanisms, as in vitro methods on their own are often not accurate, and looking at BDDCS class cannot separate out which compounds will have accurate predictions.
Authorship Contributions
Participated in research design: Bowman, Benet.
Conducted experiments: Bowman.
Performed data analysis: Bowman.
Wrote or contributed to the writing of the manuscript: Bowman, Benet.
Footnotes
- Received May 11, 2016.
- Accepted August 11, 2016.
C.M.B. was supported by the National Science Foundation Graduate Research Fellowship Program [Grant 1144247].
↵This article has supplemental material available at dmd.aspetjournals.org.
Abbreviations
- AFE
- average fold error
- BDDCS
- Biopharmaceutics Drug Disposition Classification System
- CLint
- intrinsic clearance
- IVIVE
- in vitro to in vivo extrapolation
- RMSE
- root mean squared error
- Copyright © 2016 by The American Society for Pharmacology and Experimental Therapeutics