Abstract
The Innovation and Quality Induction Working Group presents an assessment of best practice for data interpretation of in vitro induction, specifically, response thresholds, variability, application of controls, and translation to clinical risk assessment with focus on CYP3A4 mRNA. Single concentration control data and Emax/EC50 data for prototypical CYP3A4 inducers were compiled from many human hepatocyte donors in different laboratories. Clinical CYP3A induction and in vitro data were gathered for 51 compounds, 16 of which were proprietary. A large degree of variability was observed in both the clinical and in vitro induction responses; however, analysis confirmed in vitro data are able to predict clinical induction risk. Following extensive examination of this large data set, the following recommendations are proposed. a) Cytochrome P450 induction should continue to be evaluated in three separate human donors in vitro. b) In light of empirically divergent responses in rifampicin control and most test inducers, normalization of data to percent positive control appears to be of limited benefit. c) With concentration dependence, 2-fold induction is an acceptable threshold for positive identification of in vitro CYP3A4 mRNA induction. d) To reduce the risk of false positives, in the absence of a concentration-dependent response, induction ≥ 2-fold should be observed in more than one donor to classify a compound as an in vitro inducer. e) If qualifying a compound as negative for CYP3A4 mRNA induction, the magnitude of maximal rifampicin response in that donor should be ≥ 10-fold. f) Inclusion of a negative control adds no value beyond that of the vehicle control.
Introduction
Regulatory agencies have issued guidelines and guidances for the conduct of drug-drug interaction (DDI) studies with specific sections focusing on human cytochrome P450 (P450) induction. The European Medicines Agency (EMA) 2012 guideline (http://www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2012/07/WC500129606.pdf), the Pharmaceutical and Medical Devices Agency (PMDA) 2014 guidance [Drug Interaction Guideline for Drug Development and Labeling Recommendations (The Japanese Ministry of Health, Labour, and Welfare MHLW), updated 2017, English translation not yet available], and the Food and Drug Administration (FDA) 2017 draft guidance (In Vitro Metabolism and Transporter Mediated Drug-Drug Interaction Studies Guidance for Industry https://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM581965.pdf; http://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm292362.pdf) specify that in vitro P450 induction assessment be conducted in human hepatocytes from three different donors using mRNA as the primary endpoint. All three agencies consider a 2-fold increase in mRNA the threshold for a positive in vitro induction signal. The EMA and PMDA also specify that this increase must be concentration dependent. The FDA states that a ≥2-fold increase and a response of ≥20% of positive control are interpreted as a positive finding. The EMA and PMDA state that an in vitro induction response of <100% (i.e., <2-fold) is only negative if it is also <20% of the positive control response. The agencies agree that evaluation should adequately explore clinically relevant drug concentrations for the maximum therapeutic dose, although the exact definition differs. EMA calls for 50-fold mean maximum steady-state (Cmax,ss) unbound (Cmax,ss,u) concentration for hepatic and 0.1 × dose/250 ml for intestinal induction assessment. The PMDA requests at least 10-fold Cmax,ss,u. The FDA asks that, if solubility allows, at least one concentration should be an order of magnitude greater than Cmax,ss,u, with the caveat that if protein binding is >99% the fraction unbound in plasma be capped at 0.01. All three agencies agree that the in vitro donor providing the most sensitive, worst-case positive response be used to determine the clinical induction risk.
Once an in vitro induction assessment has been deemed positive, the agencies provide recommendations for subsequent assessment of whether a clinical DDI study is warranted. This step involves the use of mathematical models to predict the DDI risk based on the relevant clinical concentration and in vitro maximum fold increase (or induction) minus baseline of 1-fold (Emax) and EC50 values. Risk assessment falls into three general categories: 1) basic models or R values; 2) correlation methods, where extensive in vitro calibration is performed (Fahmi and Ripp, 2010); or 3) mechanistic models that use either static or dynamic concentrations of inducer to predict the area under the curve (AUC) ratio (AUCR). The latter two approaches use the clinical definitions of bioequivalence for DDI to flag induction risk, namely, a victim drug AUCR of 0.8 or less. The simplest calculation or R value approach (see equation A in Table 1), is recommended as a first step by the FDA and PMDA but not the EMA, where the concentration achieving 2-fold induction (F2) is considered the basic method (Table 1, equation B). Interestingly, the 2017 FDA draft guidance added a 10-fold multiplier to unbound drug concentration and changed the threshold from R < 0.9 to R < 0.8 as a trigger for further evaluation of DDI risk (Table 1, equation C). Common to all three agency recommendations are the static mechanistic model (Einolf, 2007; Einolf et al., 2014; Vieira et al., 2014), which considers induction at both the hepatic and intestinal level (for CYP3A inducers) in relation to the fraction of victim drug that is metabolized by a specific P450 (Table 1, equation D), and a correlation method, the relative induction score (RIS) (Fahmi and Ripp, 2010) (Table 1, equation E), which relies on calibration to known clinical inducers in that human hepatocyte donor. Notably, the FDA and PMDA (but not the EMA) guidances include an option of dynamic mechanistic assessment, such as physiologically based pharmacokinetic modeling, for induction DDI. Finally, when a test compound has in vitro P450 induction and inhibition (either reversible or time dependent), then the FDA and EMA both caution against risk assessment of induction and inhibition in a combined approach.
The International Consortium of Innovation and Quality (IQ) in Pharmaceutical Development Induction Working Group (IWG) recently highlighted several areas of regulatory recommendations that would benefit from further evaluation (Hariparsad et al., 2017). Recommendations from the IWG were provided on the evaluation of downregulation, in vitro assessment of CYP2C induction, and the use of CITCO (6-(4-chlorophenyl)imidazo[2,1-b][1,3]thiazole-5-carbaldehyde-O-(3,4-dichlorobenzyl)oxime) as a positive control for CYP2B6. Two other areas were highlighted by the IWG for further evaluation, namely, in vitro data interpretation and induction time course. This paper focuses on data interpretation; specifically, what constitutes a positive in vitro induction signal and how to assess whether this induction signal is clinically relevant.
IQ member companies shared blinded clinical induction data for proprietary compounds along with the corresponding in vitro data. The literature reports of clinical induction are dominated by CYP3A, with very few examples of CYP1A2 (Gabriel et al., 2016) and CYP2B6 (Fahmi et al., 2016). The data set gathered reflected this and all data were for CYP3A4, with the exception of one clinically relevant CYP1A2 DDI. Therefore, the following evaluation of in vitro P450 induction data interpretation, namely, response thresholds, variability, application of controls and translation to clinical risk assessment, and the subsequent recommendations are focused on induction of CYP3A4.
Materials and Methods
Proprietary Inducer Data from within IQ Member Companies
To allow for an assessment of induction by proprietary compounds from IQ consortium member companies, a template (https://iqconsortium.org/initiatives/working-groups/induction/) was developed to collate the necessary data and Supplemental Material. The survey was distributed by the IQ Secretariat to representatives of IQ Consortium member companies. It was stipulated that responses should be reflective of the company since only one response was permitted from each company. Surveys were returned to the IQ Secretariat, who then blinded the data as unnamed company and compound, for example “Company A compound 1.” This was then streamlined to compound (Cmpd) for Cmpds 1–16. Compound identity was further blinded by requiring both in vitro and in vivo data in molar concentrations and withholding the molecular weight. Companies were asked to provide regulatory quality data rather than discovery screening data, and where available to include data for positive and negative controls that were run in the same assay as the test compound. The template was built to be relatively exhaustive and to collect the majority of the data generated in an in vitro induction study. As with any survey, limitations do exist, including the expectation that all information requested in the template would not be provided by every company (Hariparsad et al., 2017). Different assay designs—and especially data from studies before the 2012 EMA and FDA regulatory guidances—would often result in less comprehensive data sets. Companies were also asked to provide any evidence of time-dependent inhibition and/or autoinduction, in vitro and in vivo.
In vitro parameters collected included time of incubation, cellular overlay (i.e., matrigel), plate layout (e.g., 96-well plates), media used, supplements added, any additional protein in the media, any viability method and viability cutoff values for cytotoxicity assessment, housekeeping gene used, method of mRNA analysis, probe substrates for P450 activity, enzyme(s) involved in the compound’s metabolism, and estimation of the fraction metabolized by P450 (i.e., the fraction of dose eliminated by a specific P450).
Clinical data requested included Cmax, average concentration, and AUC, at both single and multiple doses of the proprietary compound, along with the blood-to-plasma ratio and fraction unbound in plasma. For the DDI study, companies provided the identity of the probe drug, dosing regimen, AUC, Cmax, and time maximal concentration is reached after dosing and pre- and postadministration of the potential inducer to steady state.
Prototypical Inducer Data from the Literature
In vivo DDI data used for this analysis were also gathered from the University of Washington (Seattle, WA) drug interaction database (www.druginteractioninfo.org). The objects (hereafter, called victim drugs) included in this assessment were those recommended by the FDA (https://www.fda.gov/Drugs/DevelopmentApprovalProcess/DevelopmentResources/DrugInteractionsLabeling/ucm093664.htm#table3-1). In addition to collecting the CYP3A clinical induction studies by considering the substrates recommended by regulatory agencies (designated as CYP3A sensitive), a second-tier data collection was employed. Here, the focus was to collect all positive and negative clinical induction studies for the perpetrators to build knowledge around the thresholds for true in vitro and in vivo negatives. When CYP3A was determined to contribute to the overall metabolism of the victim drug, the clinical study was included as part of the “all data” or complete analysis. Additionally, to account for perpetrators that exhibited both in vitro induction and inhibition mechanisms (reversible or time dependent) positive and negative clinical inhibition studies were also collected from the University of Washington drug interaction database and sorted in the same manner as described for the clinical induction studies. A minimum of 5 days of repeat dosing was selected as the threshold to include in clinical studies since this would likely establish steady-state conditions by taking into account the half-life of both the clinical inducer and CYP3A enzyme (reported to be 23–87 hours) (Ramsden et al., 2015). The clinical data set collected for rifampicin was limited to a dose level of 600 mg daily, which is the therapeutically relevant dose resulting in maximal in vivo induction (Kozawa et al., 2009). Additionally, the dose level for ritonavir was restricted to >100 mg daily to reflect both its clinical use as a boosting agent and earlier therapeutic doses (Ruane et al., 2007). Clinical induction data were collected for compounds with existing in vitro data made available from member companies and focused on identification of compounds with mild or no clinical induction. Therefore, not all clinically relevant inducers are captured within this data set (e.g., modafanil and avasimibe).
Median as well as worst-case clinical AUCR values were used to evaluate the ability of the in vitro parameters to predict the observed clinical effect. (The median is preferable to the mean in representing the center of a population because it is less susceptible to bias when non-normality or outliers are present). In the case of the in vitro parameters, both the worst-case donor and median induction parameters were used for modeling purposes. Using the complete set of in vitro data to fit a three-parameter sigmoidal dose-response model, a common Hill function model used in pharmacology (Table 1, equation F), correlation approaches were established using the slope and RIS. The RIS model was used as described previously (Fahmi and Ripp, 2010) by fitting the data using the Cmax,ss,u of inducers to generate a curve against known clinical induction response and then inputting the Cmax,ss,u of test compounds to predict the percentage of change in the AUC. The estimated portal concentration in the RIS model was also applied, as recommended in the EMA guideline. In the case of literature compounds, the gut concentration was estimated for evaluation of the F2 value (Table 1, equation B) and for inclusion into the mechanistic static models. The mechanistic static model was evaluated with input concentrations by using the estimated portal concentration and the estimated gut concentration, as recommended by regulatory guidances. In addition, the Cmax,ss,u was used for the hepatic portion and the calculated hepatic portal concentration was used as the input for the gut portion. The concentration resulting in 2-fold induction (F2) was used, as described in the EMA guideline, by considering 30- and 50-fold Cmax,ss,u as the inducer concentration. The R3 model, as described in the FDA DDI guidance from 2012, was evaluated using multiple approaches; total and Cmax,ss,u with a cutoff value of 0.9 and a d value of 1 [R3 = 0.9 (total and unbound)]; total Cmax,ss and a cutoff value of 0.8 (R3 = 0.8, d = 1); gut concentration as the input (gut), cutoff value of 0.95 and the Cmax,ss,u (R3 = 0.95); applying a universal scaling factor value of 0.3 determined from empirical fitting of the full data set to varying d values with the goal of increasing the quantitative accuracy (R3 = 0.9, d = 0.3, with the total Cmax,ss as input); slope value with the total and Cmax,ss,u as inputs [R3 = 0.9, slope (total), R3 = 0.9, slope (unbound)]; the average unbound or total concentration (average unbound, average total); and finally, limiting the maximum plasma protein binding to 1% (fraction unbound in plasma >0.01). In addition, the recommended approach in the draft FDA and PMDA DDI guidance documents from 2017 was evaluated by using the R3 equation as described previously with a 10-fold multiplier for inducer concentration. Additionally, a 50-fold multiplier for inducer concentration was used to explore the impact on the number of false negative induction DDI predictions.
Culture of Cryopreserved Human Hepatocytes for Induction
The in vitro data presented encompass data from member companies for proprietary and well-known or prototypical inducer compounds, data from the literature, and data generated by the IWG. Different conditions were employed by laboratories (Hariparsad et al., 2017) that reflect general protocols for generating in vitro induction data. Various lots of human cryopreserved hepatocytes, from both males and females of different ages and racial origin, were obtained from several commercial vendors; CellzDirect (Durham, NC), Bioreclamation In Vitro Technologies (Baltimore, MD), Corning Life Sciences (Woburn, MA), and XenoTech LLC (Kansas City, KS). As detailed in previous publications (Fahmi et al., 2010; Sane et al., 2016), cryopreserved human hepatocytes were thawed in hepatocyte thawing medium and seeded in collagen I coated 24- or 96-well plates at cell densities of 0.5–1 × 106 viable cells per well in hepatocyte plating medium. Viability, as determined by trypan blue exclusion or other methods, was 85% or better when cells were plated. The cells were initially maintained overnight at 37°C in a humidified incubator, with 95% atmospheric air and 5% CO2, in hepatocyte incubation media. Following overnight incubation, the cells were either treated with compounds or were overlaid with matrigel to form sandwich cultures, maintained for an additional 24 hours then treated with compounds. Compounds were dissolved in dimethylsulfoxide (DMSO) and added to the culture medium at various concentrations (final DMSO concentration, 0.1% or 0.5%). After daily treatment of 2 to 3 days, the medium was removed and the cells were washed with phosphate-buffered saline. The cells were lysed in lysis buffer and prepared for RNA isolation. Cell viability was assessed by visual inspection of the monolayer, checking for confluency, and morphology. Different companies used different plating conditions and a representation of the conditions is shown in Supplemental Table 1.
mRNA Preparation and Analysis
Following the isolation of RNA with commercially available kits, cDNA was synthesized using standard polymerase chain reaction (PCR) protocols. Designated P450 enzymes and an endogenous probe [e.g., glyceraldehyde-3-phosphate dehydrogenase (GAPDH)] mRNA levels were quantified by real-time PCR. The gene-specific primer/probe sets were typically obtained from Applied Biosystems Incorporated (Foster City, CA). The relative quantity of the target cDNA compared with that of the housekeeping gene was determined by the delta delta cycle time (Ct) method (Livak and Schmittgen, 2001). This relative quantification measures the change in mRNA expression in a test sample, relative to that in a vehicle control sample (final DMSO concentration, 0.1% or 0.5%). To reduce variability, Ct values >32 were excluded from the analysis, since this is indicative of low expression.
CYP3A Enzyme Activity
Midazolam 1′-hydroxylase or testosterone 6β-hydroxylase activities were measured in situ with methods similar to those described by Zhang et al. (2010). Briefly, following the treatment period, cell culture medium was removed, hepatocytes were rinsed, and marker substrate reactions were started by the addition of either midazolam (30 µM) or testosterone (200 µM). Following 30-minute incubation at 37°C, marker substrate reactions were stopped by removal of an aliquot from each well and combining with acetonitrile containing internal standard (deuterated metabolite). Metabolite formation was quantified by liquid chromatography–tandem mass spectrometry.
In Vitro Human Hepatocyte Induction Assay for Clinically Weak Inducers
In vitro induction data for clinically weak inducers (defined as eliciting a clinical AUCR of 0.5–0.8 for a victim drug) were available for most compounds from literature resources or IWG member companies. In vitro induction parameters were generated for felbamate, rufinamide, oxcarbazepine, flucloxacillin, and lersivirine using four human hepatocyte donors in four laboratories since no published or IWG-derived values were available. The human hepatocyte donors were obtained from different commercial vendors; including Triangle Research Laboratories (Durham, NC), Bioreclamation In Vitro Technologies, Corning Life Sciences, and XenoTech LLC. The tested compounds were purchased from Sigma-Aldrich (St. Louis, MO) or MedChem Express (Monmouth Junction, NJ). The member companies followed their internal induction protocols to generate the data. Two companies used sandwich-cultured hepatocytes and two used monolayer-cultured hepatocytes. Top test concentrations were selected to cover the estimated gut exposure (0.1 × Dose/250 ml) and 50-fold Cmax,ss,u, with consideration of solubility and cytotoxicity limits. Compounds were dissolved in DMSO and added to the culture medium at seven or eight concentrations (final DMSO concentration, 0.1% or 0.5%).
In Vitro Reversible and Time-Dependent P450 Inhibition for Prototypical Inducers
Using the University of Washington drug interaction database, a literature review was conducted to evaluate whether the in vitro inducers were also in vitro reversible or time-dependent inhibitors. In cases where inhibition parameters were available from the literature, the data were scrutinized to ensure that the methodology for deriving the parameters was sound. Where information on the inhibition potential was not available, the inhibition potential was evaluated by the IWG and used to determine whether mixed mechanisms of DDI (inhibition and induction) could impact the in vitro in vivo extrapolation (IVIVE) (see the Supplemental Material for the methods).
Analysis of Basal Enzyme Levels and Single Point Data of Vehicle and Negative and Positive Controls
Member companies were invited to submit historical in vitro induction data sets obtained from multiple repeated experiments with single concentration negative and positive control inducers. Given the limited application of negative controls across participating laboratories, flumazenil was selected for further evaluation as a negative control. An additional consideration was the availability of in vitro CYP3A data sets with sufficient size to perform statistical analysis. Specifically, statistical analysis of intradonor variability was performed on CYP3A4 mRNA from flumazenil-treated hepatocyte donors, where there existed a minimum of 20 repeated experiments. Based on this selection criterion, subsequent data analysis was performed on 10 individual hepatocyte donors from a single participating laboratory. For data sets with positive control inducers using single concentrations, data analysis was performed on 15 individual hepatocyte donors from two participating laboratories, where a minimum of 10 repeated experiments were available. Both CYP3A4 mRNA and CYP3A enzyme activity were analyzed.
The intradonor variation in rifampicin fold-induction response was further interrogated in three hepatocyte donors, namely, H2, H4, and H12, which were selected based on variability observed in rifampicin-CYP3A4 mRNA response to be representative of low, mid, and high intradonor variability with large sample sizes. Where available, additional gene expression (reverse transcription PCR) data for CYP3A4 and the relevant housekeeping gene (18S or GAPDH) from the vehicle control (DMSO), positive control (rifampicin), and negative control (flumazenil) treatment groups were also analyzed. These data sets included the Ct, ΔCt (i.e., the change in Ct for the gene of interest relative to housekeeping gene), and fold induction ΔΔCt (i.e., the change in ΔCt for the test compound relative to vehicle control) values. Similarly, these laboratories supplied additional data for CYP3A activity, including enzymatic rates (midazolam 1′-hydroxylation or testosterone 6β-hydroxylation) for the vehicle control (DMSO) and rifampicin-treated groups.
Data Normalization as Percentage of Positive Control
Test compound maximum fold-induction data were expressed as percentage of positive control rifampicin response, where the total signal is the signal from the positive control (e.g., 10 μM rifampicin) and the blank signal is the signal from the solvent-treated wells (or 1-fold) (see Table 1, equation G) (Sinz et al., 2006). To maximize the available data for analysis, several sources of in vitro induction data were combined: IWG-generated data for weak clinical inducers, IWG-gathered member data for prototypical and proprietary compounds, and data published by Zhang et al. (2014). Data were normalized to the rifampicin-fitted maximal fold induction rather than the response at a given concentration (e.g., 10 μM rifampicin) since this was not available for all data sets.
In Vitro Data Analysis: Curve Fitting, Emax, EC50, F2, and Slope Analysis
In vitro concentration-induction response data were collected from the literature or provided by IWG member companies. The data selected for analysis was determined to meet quality criteria if the tested concentration range included adequate points to define a baseline (no response) and maximal effect response prior to fitting. Ideally, typical sigmoidal concentration-response data span no effect to full effect, with a minimum of 5 to 6 data points. Nonlinear regression analysis has been recommended for fitting concentration-dependent induction response, as described previously for a typical physiology or pharmacology response (Meddings et al., 1989). To remove data fitting as a source of variability, collated induction data were refit using the sigmoidal model described previously using GraphPad Prism versions 6.0 and 7.0 (GraphPad Software, La Jolla, CA). Induction parameters were determined by plotting the in vitro fold-induction data (mRNA and enzyme activity normalized to the control) against the nominal in vitro concentration using GraphPad Prism and two concentration-response models (Table 1, equations F and H). The baseline was set to 1, assuming that the vehicle control represents no change and equals a fold induction of 1. The best-fitting model was determined based on a sum of squares F-test and Akaike’s information criteria results. Note that for IVIVE, the maximal fold induction was converted to Emax by subtracting the baseline of 1-fold. In the case of atypical or bell-shaped concentration-response curves, where the higher concentration gave a lower response than the preceding concentration by more than 20%, the higher concentration data were excluded from the fitting. In most of these cases cytotoxicity was a plausible explanation for decreased induction response at higher concentrations. Note that assessment of cytotoxicity was defined by the laboratory that generated the data; a summary of these methods was provided in a previous IWG publication (Hariparsad et al., 2017). No other data exclusion criteria were applied. The initial slope was also determined by fitting the data using linear regression in GraphPad Prism as a surrogate for full induction parameters in the cases where solubility or cytotoxicity may limit the ability to estimate the clinical risk from the in vitro data.
Rifampicin CYP3A4 mRNA concentration-induction response data, generated in 38 human hepatocyte donors and over a concentration range of 0.01–30 µM, were collated from IQ member companies using their preferred conditions. The data were fit in GraphPad Prism version 7.0 using a three-parameter log(agonist) versus response equation (as detailed in Table 1, equation F) to determine the fitted EC50 and Emax values.
A similar exercise was undertaken to summarize the fitted EC50 and Emax parameters for CYP3A4 mRNA compound data for the following: troglitazone (10 donors from three laboratories, concentration range of 0.01–20 µM); pioglitazone (12 donors from five laboratories, concentration range of 0.05–150 µM); ritonavir (18 donors from four laboratories, concentration range of 0.01–100 µM); nifedipine (21 donors from six laboratories, concentration range of 0.05–300 µM); phenobarbital (21 donors from seven laboratories, concentration range of 0.9–3000 µM); carbamazepine (25 donors from seven laboratories, concentration range of 0.01–500 µM); rosiglitazone (26 donors from seven laboratories, concentration range of 0.05–300 µM); and phenytoin (28 donors from seven laboratories, concentration range of 0.1–1000 µM). Mean, S.D., median, minimum, maximum, and %CV values for each compound data set were calculated using GraphPad Prism version 7.
To evaluate intradonor variability, three laboratories provided data for nine donors, where data were available from at least three separate experiments to determine EC50 and Emax values, on different days in the same donor, using standard company methods. Mean, S.D., median, minimum, maximum, and %CV values for each donor were calculated using GraphPad Prism version 7.
Clinical Risk Assessment
The clinical relevance of in vitro induction was assessed by considering the recommendations in regulatory guidance documents as described by equations A–E in Table 1. Since a degree of variability was observed in the clinical induction response, the median and worst-case in vitro induction parameters were compared with both the median and worst-case AUCR values. In addition, the substrate specificity was considered by binning clinical trials according to the contribution of CYP3A to the overall clearance. In cases where the magnitude of clinical induction was substrate dependent (e.g., for ritonavir), additional information on the metabolic pathways was obtained by a literature review (Supplemental Table 2). This review was helpful for evaluating whether the maximal induction response could be mediated through a coregulated induced enzyme (other than CYP3A), especially in cases where there were mixed mechanisms of DDI observed. Where the plasma free fraction was reported to be <1%, both the reported value and 1% (as recommended in the regulatory guidances) were used to estimate the Cmax,ss,u in the equations. All of the in vitro induction parameters were fit using each equation and the worst-case and median donor data were used to evaluate the IVIVE. The rates of false positive and false negative predictions were used to assess the utility of the various IVIVE methods. The equations are described in Table 1, equations J–M. Additionally, the ability of the equation to result in quantitative predictions was assessed by comparing the predictions from in vitro parameters with the clinically observed AUCR.
Statistical Analysis
Evaluation of Normality.
Normal quantile plots in the distribution platform of the JMP 12.0.1 software (SAS Institute Inc., Cary, NC) were employed to evaluate normality of per-donor distributions of fold induction of negative and positive controls. The distributions of negative controls were not systematically non-normal; therefore, probability estimates for negative controls assume that the data are normally distributed. The majority of distributions of positive controls were positively skewed, necessitating a log transformation of the positive control data prior to estimation of probabilities. Indeed, both the 2001 FDA (https://www.fda.gov/downloads/drugs/guidances/ucm070244.pdf) and 2010 EMA (http://www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2010/01/WC500070039.pdf) guidances on bioequivalence recommend a log transformation prior to data analysis. Data sets with a normal distribution were graphed on an arithmetic scale, whereas those exhibiting a non-normal distribution were graphed with a log scale y-axis.
Limit of Blank and Limit of Detection Values.
Calculations of the limit of blank (LoB) and limit of detection (LoD) values were adapted from equations published by Armbruster and Pry (2008). Briefly,andwhere blank is the negative control (flumazenil), and variation (S.D.) of the low concentration sample is assumed to be equal to the variation in the blank response. The LoB represents the fold-induction value for which there is a 95% probability that a blank, or negative control, response falls below. The LoD represents the fold-induction value for which there is a 95% probability that a response above this value is a true positive response (i.e., 5% type I or II error).
Estimation of Probability of Exceeding X-Fold Induction (per Donor).
For negative controls, the mean and S.D. of the fold-induction values for each donor (intradonor) were calculated by Excel 2010, and then the probability for that donor to exceed X-fold induction was estimated by the Excel function 1-NORM.DIST(X,Mean,StDev,True), where “X” is the fold induction of interest, “Mean” and “StDev” are the empirical intradonor mean and S.D. of the fold-induction data of each donor, and the flag “True” instructs the NORM.DIST function to provide the corresponding cumulative normal probability. For positive controls, each fold-induction value of each donor was first transformed by the natural logarithm function (LN) in Excel 2010, and then the mean and S.D. of the log-transformed values of each donor were calculated by Excel. Finally, the probability of exceeding X-fold induction for each donor was estimated by the Excel function 1-NORM.DIST(LN(X),Mean(LN induction),StDev(LN induction),True), where the terms within the NORM.DIST function are as defined previously, but now applied to the log-transformed induction data of each donor.
Monte Carlo Simulation of the Probability That 0, 1, 2, or 3 of Three Randomly Selected Donors Will Exceed X-Fold Induction.
The variabilities observed in the 10 negative control donors and 15 positive control donors were assumed to be representative of their respective populations. For negative control donors, the @Risk 7.5.1 software (Palisade Corporation, Ithaca, NY) was employed, with an Excel worksheet, to randomly select three donors at a time from among the 10 available donors, and for each selected donor to simulate a fold-induction value from a normal distribution possessing that donor’s fold-induction mean and S.D. values. From each set of three donors, the number (0, 1, 2, or 3) of donors exceeding X-fold induction was counted and logged by @Risk 7.5.1. This process was repeated 100,000 times to determine the probability that 0, 1, 2, or 3 donors, among three randomly selected donors, would exceed X-fold induction. For positive control donors, the same calculation process was employed and repeated 100,000 times, except that for each donor a log-transformed fold-induction value was simulated from a normal distribution possessing that donor’s log-transformed fold-induction mean and S.D. For a positive control donor, X-fold induction is exceeded when the simulated log-transformed value exceeds LN(X).
Results
Establishing a Threshold for a Positive versus Negative In Vitro CYP3A4 mRNA Induction Response.
To evaluate potential thresholds for positive or negative in vitro induction response, the variability in in vitro human hepatocyte induction experiments was interrogated by analyzing CYP3A4 mRNA and activity data generated with a negative control compound, namely, flumazenil, repeated under the same experimental conditions. Fold-induction data for flumazenil were collected and analyzed from 10 hepatocyte donors, where data from ≥20 repeated experiments were available in each donor for CYP3A4 mRNA expression. In total, data were collected from 314 individual experiments for CYP3A4 mRNA (range: 23–54 experiments/donor) and from 111 individual experiments for CYP3A activity (range: 4–24 experiments/donor) (Table 2).
Individual flumazenil data for CYP3A4 mRNA and CYP3A activity, across the 10 hepatocyte donors, are illustrated in Fig. 1, A and B, respectively. Summarized data from statistical analyses are presented in Tables 2 and 3. Since mRNA is the recommended primary endpoint in most P450 induction experiments, subsequent data analyses focused on the variability observed in the CYP3A4 mRNA data sets. Flumazenil-CYP3A4 mRNA data demonstrated a normal distribution and, therefore, were plotted on an arithmetic y-axis (Fig. 1, A), and calculations of mean and S.D. values were performed without log transformation (Table 2). The majority (300/314; 95.5%) of individual experimental data points for flumazenil-CYP3A4 mRNA were within 2-fold (0.5- to 2-fold) of the vehicle control, DMSO. The mean fold-induction values for flumazenil-CYP3A4 mRNA ranged from 1.01- to 1.53-fold (overall mean of 1.20-fold), which tracked closely with the vehicle control (represented by 1-fold change) as expected with a true negative control; however, there was notable intradonor variability. In five out of 10 donors examined (50%) there were no reported responses outside the 2-fold range (i.e., <0.5- or >2-fold). In the other five donors, one or more values were outside the 2-fold range (1 <0.5-fold; 13 >2-fold). The calculated probabilities of a flumazenil-CYP3A4 mRNA response exceeding 2-fold within a single donor ranged from 0% to as high as 20.4% (donor H2).
The intradonor variability in the flumazenil-CYP3A4 mRNA response was explored further with two orthogonal methodologies. First, to better understand the magnitude of the intradonor variability for the negative control, the mean and S.D. values of the flumazenil responses, within each donor, were used to calculate the LoB and LoD values (Table 2). The LoB is the fold-induction value beneath which there is a 95% probability that the response is a true negative. Conversely, the LoD is the fold-induction value above which there is a 95% probability that the response is a true positive. Across the 10 hepatocyte donors examined, the calculated LoB or true negative value was <2-fold in 9 out of 10 donors (the mean of 10 donors was 1.86-fold). Therefore, a CYP3A4 mRNA fold-induction value ≤1.86-fold represents a true negative response, with 95% probability based on the data sets examined. The fold-induction value indicative of a true positive response above background variation, with 95% confidence or LoD, was calculated for the flumazenil-CYP3A4 mRNA data sets based on a mean and S.D. approach. This analysis resulted in a LoD value ranging from 1.61- to 3.41-fold (i.e., >2-fold in five out of 10 donors). Similarly, the calculated threshold for a true positive response above background variation for CYP3A4 mRNA across all data sets was 2.52-fold. Therefore, a CYP3A4 mRNA fold-induction value ≥2.52-fold represents a true positive response with 95% probability, based on the data sets examined.
The observation of negative control values for flumazenil-CYP3A4 mRNA exceeding 2-fold was confirmed with data from a second company. Briefly, CYP3A4 mRNA data were obtained from 23 experiments conducted across nine hepatocyte donors, following treatment with a single concentration of flumazenil (30 µM). In these experiments, the observed mean fold-induction value for flumazenil-CYP3A4 mRNA was 1.30 (minimum, 0.88-fold; maximum, 3.37-fold), with calculated LoB and LoD values of 2.12- and 2.95-fold, respectively.
The probability of a flumazenil-CYP3A4 mRNA response exceeding a 2-fold threshold in a single concentration negative control treatment group in three randomized human hepatocyte donors was assessed with Monte Carlo simulations (Table 3). The simulations incorporated variability parameters (i.e., mean and S.D. values) derived from data reported across the 10 donors (314 experiments). When simulated with 100,000 iterations of individual experiments containing three donors each, the probability of observing a flumazenil-CYP3A4 mRNA response <2-fold in all three donors was 91.9%. Conversely, there was a probability of 8.1% that flumazenil would produce a CYP3A4 mRNA response of ≥2-fold in one or more donors. Therefore, flumazenil is likely to cause a false positive response in approximately 8% of cases if a 2-fold increase in CYP3A4 mRNA defines the threshold between a positive and negative CYP3A4 mRNA in vitro response.
For CYP3A activity, less intradonor variability in the flumazenil response was observed compared with CYP3A4 mRNA. Across the 10 donors examined (n = 111 experiments), the mean fold-induction values for flumazenil-CYP3A ranged from 0.95- to 1.09-fold (mean, 1.03-fold). The calculated overall mean LoB and LoD values were 1.20-fold (range: 1.07- to 1.30-fold) and 1.37-fold (range: 1.14- to 1.50-fold), respectively. There were no observations of flumazenil-CYP3A activities >2-fold, and therefore the projected frequency of exceeding 2-fold was not determined.
Establishing Thresholds of Positive In Vitro Induction Response to Ensure Adequate Dynamic Range.
The results of rifampicin induction in 15 hepatocyte donors repeated on multiple occasions are shown in Fig. 1, C and D, for CYP3A4 mRNA and CYP3A activity, respectively. Summarized statistical analyses are presented in Tables 4 and 5. In total, data were collected from 581 individual experiments for rifampicin-CYP3A4 mRNA (range: 13–70 experiments/donor) and from 377 individual experiments for rifampicin-CYP3A activity (range: 13–70 experiments/donor). Subsequent data analyses, as with flumazenil, focused on the variability observed in the rifampicin-CYP3A4 mRNA data sets. In all cases, the rifampicin-CYP3A4 mRNA response was reported as fold induction compared with the vehicle control, DMSO. Rifampicin-CYP3A4 mRNA data sets demonstrated a non-normal distribution and are graphed on a log-based y-axis in Fig. 1, C and D, and calculation of probabilities assumed a lognormal distribution (Table 4). The median rifampicin-CYP3A4 mRNA fold-induction values ranged from 7.1- to 75-fold across the 15 donors. There was notable intradonor variability in response to rifampicin with dynamic response ranges (minimum/maximum fold-induction response) of 3.4- to 41.5-fold and %CV values ranging from 33.6% to 93.1%. The %CV values (or relative S.D. values) as an indicator of variability were not dependent on the magnitude of the rifampicin-CYP3A4 mRNA response; however, the S.D. values increased in proportion to the mean response. In this regard, a higher fold change would be expected to be more variable (i.e., larger S.D.).
Based on the observed intradonor variability in the rifampicin-CYP3A4 mRNA response, the likelihood of exceeding a predefined positive control threshold (i.e., 6-, 10-, or 20-fold) for each hepatocyte donor was evaluated (Table 4). The 6-fold positive control threshold was derived from EMA and FDA guidances, whereas the 10- and 20-fold thresholds were based on empirical cutoff values used by some consortium member companies. The 6-fold positive control threshold assumes that 1) the minimum positive in vitro induction signal is 2-fold (100% increase), 2) the minimum in vitro signal (2-fold) represents no more than 20% of the positive control response, and 3) a 6-fold response equates to 500% increase when the vehicle control is set equal to 1-fold. When the desired rifampicin-CYP3A4 mRNA positive control response was set to 6-fold, the probability of exceeding this threshold ranged from 70% to 100% across the 15 donors examined. As the desired positive control threshold increases, the probability of achieving the response decreases. The probability of achieving rifampicin-CYP3A4 mRNA responses of greater than 10- or 20-fold across all donors ranged from 34% to 100% or 4% to 94%, respectively.
The probability of a rifampicin-CYP3A4 mRNA response above a 6-, 10-, or 20-fold threshold was further examined with Monte Carlo simulations that incorporated variability parameters reported across the 15 donors (581 experiments) (Table 5). When simulated with 100,000 iterations the probability of observing a rifampicin-CYP3A4 mRNA response >6-fold in all three donors was 78.4%, such that rifampicin would produce a response above the desired threshold in all three donors in nearly four out of five experiments, which equates to a 21.6% fail rate. The probabilities of obtaining a rifampicin-CYP3A4 mRNA response >10- and 20-fold in all three donors were 40.9% and 4.94%, respectively.
As generally observed, the amplitude of the fold-induction response for rifampicin-induced CYP3A activity was lower than the rifampicin-CYP3A4 mRNA response (Fahmi et al., 2010). Also, there was less intradonor variability observed for rifampicin-induced CYP3A activity. Across all 15 hepatocyte donors examined, the median fold-induction values for rifampicin-CYP3A activity ranged from 3.6- to 18.1-fold (mRNA, 7.1- to 75-fold). The %CV values ranged from 18.2% to 61.7%, which were on average less than corresponding %CV values for CYP3A4 mRNA. Monte Carlo simulations were not performed for rifampicin-CYP3A activity.
Basal P450 Expression and Impact on Fold Induction.
The basis for the observed intradonor variability in the rifampicin-CYP3A4 mRNA response across repeat experiments was further explored in hepatocyte donor H2 by analysis of reverse transcription PCR data. Raw data (Ct values) were collected for CYP3A4 and a housekeeping gene (GAPDH) from multiple treatment groups, including the vehicle (DMSO), negative (flumazenil), and positive (rifampicin) controls (Fig. 2). Figure 2A shows housekeeping gene Ct values for all treatment groups plotted in chronological order of experimentation (>50 experiments). Among all three treatment groups GAPDH Ct values tracked similarly and there was a consistent interexperimental variation regardless of time (experiments conducted over ∼1.5 years). Figure 2B shows raw CYP3A4 Ct values for vehicle (DMSO) and negative (flumazenil) controls. Data were rank ordered by increasing CYP3A4 Ct values from the DMSO-treated samples. Since Ct values are inversely proportional to transcript levels, the experiments with the highest basal CYP3A4 transcript levels (lowest Ct values) are on the left-hand side of the graph. Flumazenil CYP3A4 Ct values tracked closely with the DMSO data. CYP3A4 Ct values were normalized to GAPDH Ct values and the resultant delta Ct (ΔCt) values are plotted in Fig. 2C. CYP3A4 ΔCt values for DMSO and flumazenil were generally similar. Across the experiments, the range of ΔCt values for DMSO-CYP3A4 was approximately 7, which equates to a 128-fold difference in basal CYP3A4 transcript levels (calculated by 27). In all cases, the rifampicin-CYP3A4 ΔCt values were lower than the corresponding vehicle control values, denoting higher levels of CYP3A4 transcript, as expected.
In Fig. 2D, the resultant fold-induction values (ΔΔCt) for rifampicin-CYP3A4 mRNA are ranked based on basal CYP3A4 mRNA expression (highest basal expression on the left-hand side). Figure 2D also shows that the magnitude of the rifampicin-CYP3A4 mRNA response inversely correlates with basal CYP3A4 mRNA levels. This observation suggests that hepatocytes with low basal CYP3A4 mRNA levels may demonstrate high CYP3A4 mRNA fold-induction responses to rifampicin. Similar findings for CYP3A4 mRNA were observed in two additional donors (H4 and H12 in Supplemental Figs. 1 and 2, respectively). This effect was less pronounced for rifampicin-CYP3A activity response but was based on fewer experiments from donors H2, H4, and H12 (Supplemental Fig. 3).
The potential for assay noise to systematically affect the magnitude of the rifampicin-CYP3A4 mRNA response was evaluated by comparison with the corresponding intra-assay flumazenil response. This assessment was conducted across multiple repeated experiments within the same hepatocyte donor. As the fold induction for rifampicin increased in donor H2, there was no corresponding change in the negative control (flumazenil) response, confirming that the variability was not a function of assay noise (Fig. 2, B and D). Similar results were observed in other donors (data not shown).
The number of experiments that might be necessary to capture the range of variability in the rifampicin-CYP3A4 mRNA response described previously was evaluated, with data visualized based on chronological order of experimentation (Fig. 2, E and F). Figure 2E shows CYP3A4 ΔCt values for DMSO and rifampicin plotted by chronological experiment order and Fig. 2F illustrates the resultant rifampicin-CYP3A4 mRNA fold-induction values. There was no clear trend in the data with respect to time in either ΔCt or fold-induction values. Consequently, the number of repeat experiments required to capture variability in the rifampicin-CYP3A4 mRNA induction response in a single hepatocyte donor is considerable (e.g., ≥5 repeat experiments) and may vary between donors.
Normalizing In Vitro Induction Data to a Positive Control.
Multiple data sets, with maximum fold induction for rifampicin and test compound, were combined to explore the utility of normalizing data as the percentage of positive control. Figure 3A shows the percentage of positive control (rifampicin) data for CYP3A4 mRNA induction for 30 compounds in three donors. The untransformed fold-induction data are shown in Supplemental Fig. 4. Note that in some donors the rifampicin response was on the low side (∼6-fold). Since data for test compound indicated positive in vitro CYP3A4 mRNA induction (>2-fold), this data set holds value and was included. There were marked differences observed for compounds when looking at the percentage of rifampicin control response across donors. For example, the carbamazepine responses were 52%, 28% and 218%; the phenobarbital responses were 96%, 36% and 106%; and the phenytoin responses were 40%, 23%, and 44% (where rifampicin maximum induction results were 7-, 16-, and 7-fold for the first, second, and third donors, respectively, within the same laboratory). A similar trend was observed in a data set generated across laboratories using different donors. Here, the felbamate responses were 20%, 34%, and 33%; and the oxcarbazepine responses were 28%, 105%, and 88% of the rifampicin responses (where rifampicin maximum induction results were 19-, 13-, and 19-fold for the first, second, and third donors, respectively). Similarly, in a third data set, where each compound was tested in a single laboratory using multiple donors, Cmpd 1 responses were 64%, 121%, and 136% of the rifampicin responses, which were 25-, 24-, and 17-fold, respectively; Cmpd 4 responses were 15%, 18%, and 44% of the rifampicin responses, which were 41-, 133-, and 96-fold, respectively; and Cmpd 7 responses were 21%, 27%, and 20% of the rifampicin responses, which were 17-, 12-, and 11-fold, respectively.
Finally, the utility of normalization to a positive control response to address intradonor variability was explored. Figure 3B shows rosiglitazone and pioglitazone induction as percentage of positive control response (rifampicin CYP3A4 mRNA) in three different donors, in which experiments were repeated on five separate occasions within the same laboratory. Within a single donor over time, similar to the previous data set, a lack of normalization was observed, with the percentage of positive control values spanning a wide range for each compound. For example, the rosiglitazone responses were 88%, 31%, and 61% and the pioglitazone responses were 47%, 41%, and 77% of the rifampicin response for each donor in the second experimental repeat. In the third experimental repeat, the rosiglitazone responses were 76%, 13%, and 129% and the pioglitazone responses were 31%, 100%, and 85% of the rifampicin response for each donor.
In Vitro Induction Parameters and Reproducibility across Donors and Laboratories.
Following analysis of a large data set of single concentration data from two laboratories, the IWG extended analysis to concentration-response induction data generated in multiple laboratories under different conditions in multiple human donors. Rifampicin CYP3A4 mRNA EC50 and Emax values were collated from five literature sources (at least n = 3 unique donors for inclusion) and from multiple IQ member companies (Supplemental Table 3). Variability was observed for both the EC50 and Emax parameters calculated within the six data sets (as given by %CV) (EC50: 51.6%–144% CV; Emax: 28.6%–104% CV). Overall, the mean and median values across the data sets were within 2-fold of each other with the exception of the EC50 value for the IWG data, which was within 2.5-fold.
An additional rifampicin data set was collected to further examine this variability. The reproducibility within a donor under the same experimental conditions in the same laboratory was examined. Rifampicin CYP3A4 EC50 and Emax data were collated from three different companies (nine donors) where at least three experiments were available for each donor (Fig. 4; Supplemental Table 4). Variability, within each donor (expressed as %CV) ranged from 28.6% to 77.3% for EC50 and 22.9% to 125% for Emax values. The mean and median values of the data set were within 2-fold of one another. The spread in minimum-to-maximum values observed within each donor ranged from 1.1- to 9.5-fold for EC50 and 1.49- to 9.19-fold for Emax. The variability observed in CYP3A4 EC50 and Emax parameters was not unique to rifampicin (Fig. 5; Supplemental Table 5). Similar %CV values were noted for eight other CYP3A4 inducers (troglitazone, pioglitazone, ritonavir, nifedipine, phenobarbital, carbamazepine, rosiglitazone, and phenytoin) and ranged from 72% to 133% for EC50 and 59% to 119% for Emax values.
Data Set for DDI IVIVE.
In vitro CYP3A4 and clinical data were collected for 51 compounds covering both clinical and in vitro induction response from inhibition, no effect, and induction (Figs. 5 and 6; Supplemental Table 6).
In Vitro Data Set.
For most inducers, a minimum of three donors were available for generating median induction parameters. In the case of saquinavir, teriflunomide, and Cmpds 3, 8, 9, and 15, data were only available—or induction parameters could only be defined—from two donors. In the case of Cmpds 5 and 14, induction parameters could only be determined from one of the three donors investigated. For Cmpd 5, only one donor resulted in measurable increases in CYP3A4 mRNA (>2-fold) and two donors were negative. For Cmpd 14, while three donors were evaluated, only one donor included enough concentrations to characterize the concentration response profile. In both of these cases, the clinical observation was inhibition. The weak in vitro inducers, defined as those eliciting a <3-fold CYP3A4 mRNA induction in at least one of the donors, were aprepitant, omeprazole, pioglitazone, pleconaril, and terbinafine. In some cases, moderate-to-strong clinical inducers, including carbamazepine, Cmpd 7, and phenytoin, had at least one donor with an Emax value <4-fold. In general, the in vitro variability for all of the inducers was consistent with that observed for rifampicin (Fig. 5). There were some trends discernible for EC50, where moderate and strong clinical inducers generally exhibited much lower EC50 values compared with compounds that had weak or no clinical induction (Fig. 5A). However, an exception was noted for Cmpd 3, which showed moderate clinical induction due to its relatively high unbound circulating concentration (5.6 µM). As one might expect, there was no trend in EC50 values with clinical DDI magnitude for the compounds that exhibited both in vitro induction and inhibition (Fig. 5B). In general, the Emax values for rifampicin, while variable, were higher than those observed from weak or nonclinical inducers such as perampanel or lersivirine. The Emax values for compounds with in vitro induction only (Fig. 5C) generally trended down with increasing EC50 value. There were no discernible trends in Emax values for compounds that exhibited both in vitro induction and inhibition (Fig. 5D). In the case of rifapentine, nifedipine, and rosiglitazone, the Emax values were comparable to those determined for rifampicin, although these drugs resulted in no clinical induction (Fig. 5D).
Clinical Data Set.
The IWG collected data for 35 literature compounds and 16 proprietary compounds from the IQ member companies. When considering the median clinical AUCR and DDI category relative to the 2012 FDA guidance, there were eight compounds with clinical inhibition, 16 with no effect (AUCR: 0.8–1.25), 16 with weak induction (AUCR: 0.5–0.8), nine with moderate induction (AUCR: 0.2–0.5), and two with strong induction (AUCR: <0.2). When considering the worst-case (or greatest induction) clinical AUCR, there were six compounds with clinical inhibition, nine with no effect, 15 with weak induction, 16 with moderate induction, and five with strong induction (Supplemental Table 7). Of these compounds, 31 out of 51 (61%) exhibited mixed DDI mechanisms toward CYP3A (i.e., in vitro induction plus inhibition and/or inactivation).
Data from 1048 clinical trials were collected for in vitro CYP3A inducers (Supplemental Table 9). These trials included all substrates with some role of CYP3A in the overall metabolism, as determined by literature searches for in vitro or in vivo metabolism data. When the clinical data were refined to include only rifampicin doses 600 mg or greater and dosing regimens of 5 days or longer, there were 835 data sets remaining. This translated to a total of 181 clinical DDI data sets, when considering only the sensitive CYP3A substrates, and 74 studies that used the recommended index substrates, midazolam or triazolam (71 and three, respectively) (Fig. 6). All of the proprietary clinical data sets included midazolam as the probe substrate to assess induction of CYP3A. In general, the AUCR range was similar whether all data or only the sensitive CYP3A victim drugs were considered, with the exception of some potent mixed mechanism DDI compounds (e.g., ritonavir). The prevalence of induction (i.e., AUCR < 0.8) was determined to be 56% using median AUCR values and 72% using the worst-case AUCR values. Despite this refinement of the data, a reasonable degree of variability remained in the clinical induction response as can be visualized in the rifampicin and ritonavir data (Fig. 6).
Translating In Vitro Induction Data to Clinically Relevant Risk of Induction DDI.
The large data sets collected (Figs. 5 and 6) enabled evaluation of various simplistic models for predicting clinical induction risk. The potential for each method to provide meaningful risk assessment was considered based on the number of false negative or false positive compounds (Table 6).
High false positive rates (>35.7%) were observed when comparing the output from the recommended models and the median observed clinical AUCR, with the exception of the mechanistic static model that considered both induction and inhibition (16.7% false positive rate using the median in vitro donor induction data). The quantitative prediction accuracy, using the induction/inhibition mechanistic static model (17% within bioequivalence and 43% within 2-fold), was not as high as that of other methods such as the R3 using the unbound average concentration at steady state (31% within bioequivalence and 94% within 2-fold when using the median in vitro donor data) and the percentage of false negatives was higher with the inhibition/induction mechanistic static model than other methods (27%–36%). Compiling all in vitro data into the RIS or slope correlation curves enabled quantitative prediction and a minimal number of false negatives (Table 6). A noted limitation of this approach is that no test sets were available to evaluate true predictive performance since predictions were made for compounds that were used to build the correlation model. A similar observation was made using a d-value of 0.3, based on the large multidonor in vitro data set collected here. When an R3 cutoff value of 0.8 is used rather than 0.9, with the total Cmax,ss as the input and a d-value of 1, the percentage of true negatives was significantly improved from 3% to 17% with only a small effect on the false negatives (increased from one to two). Using the recommended equation in the draft FDA 2017 DDI guidance (Table 1, equation C), which incorporates a 10-fold multiplier to the Cmax,ss,u, resulted in two more false negatives (pleconaril and Cmpd 15, in addition to dexamethasone) than the 2012 guidance (dexamethasone). Applying a multiplier of 50-fold rather than 10-fold reduced the number of false negatives from three to zero. Using the gut concentration as the input for the R3 and F2 models also reduced false negatives. Limiting the input for unbound plasma protein binding to 1% resulted in fewer false negatives. However, those false negatives that remained (dexamethasone and oxcarbazepine) had only moderate plasma protein binding, and the inclusion of compounds with unbound plasma concentrations <1%, including Cmpd 13, efavirenz, rosiglitazone, and teriflunomide, resulted in appropriate binning when the reported unbound plasma protein value was used. Of all of the methodologies investigated, using the average Cmax,ss,u resulted in the fewest number of false positives but increased the number of false negatives (from one to six when using the median induction parameters). The average Cmax,ss,u also resulted in the highest number of predictions within 2-fold or bioequivalence, 94% and 31%, respectively. Using the Cmax,ss,u for the hepatic component and the portal concentration for the gut component resulted in two false negatives (dexamethasone and pleconaril) and improved the percentage of false positives over many of the other IVIVE methods.
When in vitro induction parameters cannot be defined, either due to solubility or cytotoxicity limitations, the F2 or slope values can often be estimated. The slope tended to overpredict the magnitude of induction compared with the EC50 and Emax values, while the F2 value resulted in four false negatives (dexamethasone, pleconaril, and Cmpds 2 and 15) compared with one false negative using the R3 equation with total Cmax, d = 1, and a cutoff value of 0.8. When the F2/Cmax,ss,u multiplier was reduced from 50- to 30-fold, there was no impact on the false negative rate. However, the false positive rate decreased from 83% to 70% using median data and from 87% to 78% using worst-case data. To evaluate the ability of the F2 value to predict induction at the gut level, the F2 equation was solved for the dose level of perpetrator using molecular weight and the equation in the EMA guideline (0.1 × Dose/250 ml). When applying a cutoff value of 0.25 for dose level = F2/therapeutic dose level, the only false negative observed was dexamethasone.
Discussion
The IWG compiled extensive in vitro and clinical induction data sets focusing on interpretation of in vitro induction data for CYP3A4 mRNA and its clinical relevance. Strikingly, there was a large degree of variability in both clinical and in vitro induction responses (Figs. 5 and 6). Variability occurred, irrespective of experimental conditions, laboratory, and test compound, and was not solely accounted for by differences in donor response as previously suspected. Importantly, despite being variable, in vitro induction data have utility in clinical DDI risk assessment and decision making. Six recommendations are derived from this analysis.
1. P450 Induction Should Continue to Be Evaluated in Three Separate Human Donors In Vitro.
In vitro CYP3A4 mRNA data for rifampicin included diverse sets of multiple repeats within a donor, from the same laboratory/experimental condition (Fig. 4). The intradonor variability was similar to that observed between donors and across laboratories (Fig. 5; Supplemental Table 3). Beyond rifampicin, variability existed across the compound data set (Fig. 5). Of note, CYP3A4 activity appeared to be less variable for the single concentration rifampicin data set (Table 4). However, there were insufficient EC50 and Emax data for further evaluation. Given the observed variability, in vitro P450 induction should continue to be evaluated in three separate human donors, thus supporting existing recommendations from regulators.
Why might this variability exist? It is possible that small differences in cell culture, temperature, plate agitation, pipetting speed, and media change times between each experiment could drive variability since all may impact efficient attachment, cell morphology, and basal P450 expression (Hamilton et al., 2001; Hewitt et al., 2007). Intradonor variability in basal P450 expression appeared to determine variability in fold-induction response for rifampicin (Fig. 2). Additionally, differences in intracellular drug concentration in response to changes in enzyme or transporter expression could contribute to a range of induction responses in inter- and intradonor experiments (Chu et al., 2013; Sun et al., 2017).
2. In Light of Empirically Divergent Responses in Rifampicin Control and Most Test Inducers, Normalization of Data to Percentage of Positive Control Appears to Be of Limited Benefit.
To account for donor variability in induction response, normalization to the percentage of positive control was previously suggested (Bjornsson et al., 2003). The assumption was that although the absolute fold-induction value might be different between donors, the relative magnitude of response for different compounds would be preserved within a donor. This is commonly used for reporter gene assays (Persson et al., 2006; Sinz et al., 2006), but reports in human hepatocytes are from smaller studies (Kamiguchi et al., 2010). The range of percentage of control response for each compound shown in Fig. 3A suggests that this does not normalize the induction response across donors or laboratories; nor does it normalize response within a donor over time (Fig. 3B). Thus, normalization to percentage of rifampicin response provides limited benefit in aiding data interpretation. Why is this normalization not successful? There is no mechanistic evaluation that explains the compelling data observed here. Do different metabolic pathways predominate in different donors for a compound (Richert et al., 2006; Heslop et al., 2017)? In this case, changes in metabolism between donors, resulting in different effective drug concentrations, might explain why test and control compound responses do not track. Several genetic variants of PXR and CAR exist and could contribute to interindividual variation in induction response (Lamba et al., 2005). Furthermore, if test and control compounds differ in regulation of PXR and CAR and donors differ in PXR and CAR expression, then test and control responses may not track (Faucette et al., 2006). Subtle differences in intracellular concentration between donors and compounds could also be confounding. This could occur due to multiple factors, such as small changes in amount of drug dosed in vitro, differences in seeding density and cell attachment (and thus changes in unbound fraction in the incubation), and different drivers of cellular uptake such as transporter expression and albumin concentration (Miyauchi et al., 2018).
3. With Concentration Dependence, 2-Fold Induction Is an Acceptable Threshold for Positive Identification of In Vitro CYP3A4 mRNA Induction.
Regulatory agencies recommend >2-fold change relative to vehicle control to identify a positive in vitro inducer. This recommendation has evoked considerable discussion as being too stringent a threshold for clinical relevance (Fahmi et al., 2010), especially for changes in CYP3A4 mRNA. A large flumazenil CYP3A4 mRNA data set helped interrogate the appropriateness of a 2-fold cutoff. Flumazenil is not an inducer of CYP3A4 mRNA or activity in vitro (Fahmi et al., 2010), nor is it a CYP3A inducer clinically (Ma et al., 2009; Fahmi et al., 2010). A LoB and LoD analysis in 10 donors was used to understand thresholds based on assay signal-to-noise results. This defined a true negative as ≤1.86-fold and true positive as ≥2.52-fold (Table 2). This was consistent with a smaller data set (true negative ≤2.12-fold; true positive ≥2.95-fold). This statistical analysis supports 2-fold as the threshold to define positive induction of CYP3A4 mRNA. A compound can confidently be assigned as having no CYP3A4 induction if three donors all result in <2-fold induction at clinically relevant concentrations. It should be noted that identifying a compound as positive in vitro does not necessarily mean a clinical study is warranted, it only indicates that further evaluation of risk is required using mathematical DDI prediction models.
4. To Reduce the Risk of False Positives in the Absence of a Concentration-Dependent Response, Induction ≥ 2-fold Should Be Observed in More Than One Donor to Classify a Compound As an In Vitro Inducer.
Monte Carlo simulations of flumazenil (100,000 iterations; Table 3) indicate that the probability of observing a false positive of >2-fold response in one of three donors is ∼8%. Thus, a single donor with a weak CYP3A4 mRNA induction >2-fold is not sufficient to define a true positive. Two or more data points above 2-fold and concentration dependence are recommended to confidently define a positive. For weak induction, the IWG acknowledges that defining concentration dependence can be somewhat subjective. The following should be considered: evidence of concentration response (visual inspection), statistical significance (correlation and linear regression), and relevance (i.e., above 2-fold). If only one donor exhibits induction, adding a fourth could be considered to probe for a false positive. If the fourth donor is negative, this strongly suggests a false positive and may obviate the need for follow up. To further avoid false positives, in the absence of a concentration dependence (providing cytotoxicity or solubility is not limiting), if only a single point is >2-fold CYP3A4 mRNA in more than one donor additional investigation is warranted to contextualize in vitro observations to clinical relevance.
5. If Qualifying a Compound as Negative for CYP3A4 mRNA Induction, the Magnitude of Maximal Rifampicin Response in That Donor Should Be ≥ 10-Fold.
Applying a minimum rifampicin response threshold ensures that weak inducers are not overlooked. A ≥10-fold threshold provides sufficient dynamic range, giving confidence in a negative determination (<2-fold), and affords a window to determine weak but clinically relevant inducers (e.g., pleconaril, felbamate, and Cmpd 7, which had median Emax values of 3-, 4.7-, and 3.4-fold, respectively) (Supplemental Table 6). This threshold is only critical when a test compound has data <2-fold and the result is being interpreted as negative for in vitro induction. For compounds with clear concentration response and EC50 and Emax values readily determined, those data are of value, independent of the rifampicin response in that experiment. The selection of 10-fold is somewhat arbitrary but pragmatic and data driven. Using single concentration rifampicin (Table 5), setting the threshold at >20-fold would result in too many donors not passing. Indeed, Monte Carlo simulations indicate a ∼5% frequency of all three donors tested reaching >20-fold. Conversely, while setting the threshold at 6-fold would result in most data sets falling into range, there would not be a sufficient window to detect weak inducers between the true and false positive frequency since there is an 8% probability of false positives >2-fold. At the proposed >10-fold threshold, there is >40% probability of all three donors tested falling into range (Table 5). The potential for a high in vitro assay failure rate is naturally concerning. However, the additional in vitro investment becomes more palatable in contrast to potentially unnecessary clinical DDI trials due to insufficient confidence in defining negative in vitro induction. Finally, it should be noted that the 15 hepatocyte donors examined here were initially characterized to produce, at minimum, a >6-fold rifampicin-CYP3A4 mRNA response. Thus, the calculated probabilities could be biased based on this initial acceptance criterion. An alternative approach might be a weak inducer control to demonstrate confidence that clinically relevant inducers in the 3- to 4-fold range could be identified. However, there is insufficient data available to evaluate the utility of this approach.
6. Inclusion of a Negative Control Adds No Value Beyond That of the Vehicle Control.
Vehicle and negative control data are superimposable (Fig. 2C). The flumazenil data were useful for determining false positive frequency. An appropriate vehicle control should be included.
The rate of false positive prediction of induction DDI was generally lower when using the in vitro donor median versus the worst-case parameters. Previously, various static and dynamic modeling methods were used to predict clinical CYP3A induction in 28 clinical trials for 13 compounds (Einolf et al., 2014). Expanding this, we evaluated data from over 1000 clinical trials for 51 compounds. However, dynamic modeling was out of scope. All prediction methods, which were variations of the five different approaches detailed by regulatory agencies (F2, RIS, R3, slope correlation, and static modeling), had some incidence of false positive prediction (Fig. 7) compared with the median AUCR (Table 6). However, the rate of false positive prediction was lower (19 out of 23 methods) when using in vitro donor median versus worst-case parameters. Conversely, the rate of false negative predictions was higher (10 out of 23 methods) using in vitro median versus worst case (Table 6; Supplemental Table 8), particularly with unbound concentrations. Using slope or F2 as in vitro induction input parameters served as a reasonable surrogate for EC50 and Emax when binning the clinical induction risk. Additionally, applying unbound concentrations generally lowered the false positive rate but increased false negatives. Quantitative accuracy, as assessed by percentage of predictions within 2-fold, was better when unbound concentrations were used. Thus, the situational preference for avoiding false negatives or false positives could drive selection of the prediction approach. Historically, regulatory agencies advocated total plasma concentration as a conservative estimate to avoid false negative results in I/Ki calculations (Zhang et al., 2009). Importantly, median donor data improve quantitative estimation of risk by increasing the number of predictions within bioequivalence and 2-fold of observed clinical data (Table 6). Thus, median data of three in vitro donors, rather than the worst-case donor, should be considered for induction DDI risk assessment.
The previous analysis and recommendations only pertain to CYP3A4 mRNA. It is possible that some findings are P450 isoform specific and additional work is necessary to evaluate CYP1A2 and CYP2B6. Since recent regulatory recommendations have focused on changes in mRNA, there were limited enzyme activity data to mine. A prevalence of CYP3A4 time-dependent inhibitors limits the value of P450 activity as an endpoint. However, in the absence of time-dependent inhibition, it retains significant value. Using activity, would the apparent decrease in variability of single concentration positive control (Fig. 1) extend to EC50 and Emax data? Additionally, attempting to avoid false positives, if CYP3A4 mRNA response is >2-fold but activity is increased <2-fold (without time-dependent inhibition), would there be more confidence in a negative induction definition? Another shortfall of the current analysis is the use of nominal in vitro concentrations since actual concentration data were not available. Not accounting for incubational binding or compound loss by metabolism may result in overestimation of EC50, subsequently impacting IVIVE (Sun et al., 2017). Finally, while physiologically based pharmacokinetic modeling was out of scope, dynamic simulation of inducer concentration could yield further improvements to IVIVE (Einolf et al., 2014; Almond et al., 2016; Ke et al., 2016). Given the incidence of complex DDI involving multiple mechanisms, predicting DDI should address both inhibition and induction (Kirby et al., 2011; Fukushima et al., 2013).
The IWG has presented a data-driven evaluation of in vitro human P450 induction with several recommendations (Fig. 8). The analysis supports the regulators’ recommendations to use three human donors in vitro to assess induction and application of a 2-fold CYP3A4 mRNA threshold, coupled with concentration dependency, to determine a positive in vitro induction signal. The IWG proposes several actions around use of controls to aid data interpretation, and showed that while both in vitro and in vivo induction data are somewhat variable, simple static models of clinical risk using in vitro data can be used for decision making.
Acknowledgments
We thank the IQ IWG for valuable discussions on the data and for manuscript review; the IQ Drug Metabolism Leadership Group for manuscript review; the IQ member companies for data; Dr. Scott Obach (Pfizer), Dr. Christopher Gibson (Merck), Dr. Michael Sinz (Bristol Myers Squibb), and Dr. Edward LeCluyse (LifeNet Health) for valuable feedback on the manuscript drafts; Sophie Mukadam (Genentech) for assistance in literature data mining; Philip Yates (Pfizer) for valuable statistical discussions; and Marina Slavsky (Sanofi), Amanda Moore (Vertex Pharmaceuticals), and Thuy Ho, Rob Clark, and Sarah Trisdale (Corning Life Sciences) for in vitro human hepatocyte induction studies for clinically weak inducers.
Authorship Contributions
Participated in research design: Kenny, Ramsden, Buckley, Dallas, Fung, Mohutsky, Einolf, Chen, Dekeyser, Fitzgerald, Goosen, Siu, Walsky, Zhang, Tweedie, Hariparsad.
Conducted experiments: Ramsden, Fitzgerald, Zhang, Hariparsad.
Performed data analysis: Kenny, Ramsden, Buckley, Dallas, Fung, Mohutsky, Hariparsad.
Wrote or contributed to the writing of the manuscript: Kenny, Ramsden, Buckley, Dallas, Fung, Mohutsky, Einolf, Tweedie, Hariparsad.
Footnotes
- Received April 12, 2018.
- Accepted June 18, 2018.
↵1 J.R.K., D.R., and D.B.B. contributed equally to this work.
↵This article has supplemental material available at dmd.aspetjournals.org.
Abbreviations
- AUC
- area under the curve
- AUCR
- area under the curve ratio
- Cmax,ss
- maximum steady state concentration
- Cmax,ss,u
- unbound maximum steady state concentration
- Cmpd
- compound
- Ct
- cycle time
- DDI
- drug-drug interaction
- DMSO
- dimethylsulfoxide
- EMA
- European Medicines Agency
- Emax
- maximum fold increase (or induction) minus baseline of 1-fold
- F2
- concentration achieving 2-fold induction
- FDA
- Food and Drug Administration
- GAPDH
- glyceraldehyde-3-phosphate dehydrogenase
- IVIVE
- in vitro in vivo extrapolation
- IQ
- innovation and quality
- IWG
- Induction Working Group
- LoB
- limit of blank
- LoD
- limit of detection
- P450
- cytochrome P450
- PCR
- polymerase chain reaction
- PMDA
- Pharmaceutical and Medical Devices Agency
- RIS
- relative induction score
- Copyright © 2018 The Author(s).
This is an open access article distributed under the CC BY Attribution 4.0 International license.