## Abstract

Translational and ADME Sciences Leadership Group Induction Working Group (IWG) presents an analysis on the time course for cytochrome P450 induction in primary human hepatocytes. Induction of CYP1A2, CYP2B6, and CYP3A4 was evaluated by seven IWG laboratories after incubation with prototypical inducers (omeprazole, phenobarbital, rifampicin, or efavirenz) for 6–72 hours. The effect of incubation duration and model-fitting approaches on induction parameters (E_{max} and EC_{50}) and drug-drug interaction (DDI) risk assessment was determined. Despite variability in induction response across hepatocyte donors, the following recommendations are proposed: 1) 48 hours should be the primary time point for in vitro assessment of induction based on mRNA level or activity, with no further benefit from 72 hours; 2) when using mRNA, 24-hour incubations provide reliable assessment of induction and DDI risk; 3) if validated using prototypical inducers (>10-fold induction), 12-hour incubations may provide an estimate of induction potential, including characterization as negative if <2-fold induction of mRNA and no concentration dependence; 4) atypical dose-response (“bell-shaped”) curves can be addressed by removing points outside an established confidence interval and %CV; 5) when maximum fold induction is well defined, the choice of nonlinear regression model has limited impact on estimated induction parameters; 6) when the maximum fold induction is not well defined, conservative DDI risk assessment can be obtained using sigmoidal three-parameter fit or constraining logistic three- or four-parameter fits to the maximum observed fold induction; 7) preliminary data suggest initial slope of the fold induction curve can be used to estimate E_{max}/EC_{50} and for induction risk assessment.

**Significance Statement** Regulatory agencies provide inconsistent guidance on the optimum length of time to evaluate cytochrome P450 induction in human hepatocytes, with EMA recommending 72 hours and FDA suggesting 48–72 hours. The Induction Working Group analyzed a large data set generated by seven member companies and determined that induction response and drug-drug risk assessment determined after 48-hour incubations were representative of 72-hour incubations. Additional recommendations are provided on model-fitting techniques for induction parameter estimation and addressing atypical concentration-response curves.

## Introduction

Regulatory agencies continue to update and evolve their guidance for the conduct of in vitro studies to evaluate the propensity for induction-mediated drug-drug interactions (DDIs). The final guidance released by the Food and Drug Administration (FDA) in early 2020, in addition to the latest guidance (2012) from the European Medicines Agency (EMA) and Pharmaceutical and Medical Devices Agency (2014; finalized in 2018), provided recommendations on the conduct, interpretation, and risk assessment for the likelihood of a clinical DDI arising from cytochrome P450 induction. The Translational and ADME Sciences Leadership Group Induction Working Group (IWG) has previously commented on cytochrome P450 induction and the regulatory guidance. Hariparsad et al. (2017) presented results from an industry survey related to induction evaluation and provided data-driven recommendations on the evaluation of cytochrome P450 downregulation, in vitro assessment of CYP2C induction, and the use of CITCO 6-(4chlorophenyl)imidazo[2,1-b][1,3]thiazole-5-carbaldehyde-*O*-(3,4dichlorobenzyl)oxime as a positive control for CYP2B6. In a follow-up manuscript, Kenny et al. (2018) provided an extensive analysis of CYP3A4 induction response, thresholds, and variability and made key recommendations related to number of donors, criteria for characterizing positive and negative in vitro induction including the 2-fold cutoff, the value of negative controls, and indexing response to prototypical inducers. More recently, Ramsden et al., 2019 sought to identify contributors to variable outcomes in clinical DDI induction data and methods for understanding how these factors impact characterization of induction.

The appropriate duration of incubation of hepatocytes with test article was identified as an area requiring further exploration and remains a topic of high interest. Of note, recommendations from the regulatory agencies are inconsistent: the FDA recommends incubations of 48–72 hours, whereas the EMA requires clear justification for incubations less than 72 hours. To this end, the IWG sought to evaluate the appropriate incubation time with human hepatocytes to adequately assess cytochrome P450 induction. In 2007, a survey of the pharmaceutical industry indicated that assessment of cytochrome P450 induction is routinely conducted after 48 hours (73% of the respondents), with some investigators using even shorter incubations (Hewitt et al., 2007). This observation was replicated in a subsequent survey conducted by the IWG 10 years later, in which 71% of respondents indicated that they also used 48 hours as the primary incubation time despite the EMA guideline released in 2012, which proposed 72 hours (Hariparsad et al., 2017). Member companies justified this based on their historical data and the switch from enzyme activity to mRNA as the primary endpoint for evaluating induction potential. The utility of shorter incubations for the assessment of cytochrome P450 induction can be particularly advantageous in cases in which prolonged exposure of higher test article concentration results in cytotoxicity, which prevents reliable determination of the key induction parameters: EC_{50} and E_{max} (the maximum fold induction). This strategy was successfully applied by Sane et al. (2016), who monitored changes in mRNA levels after a short incubation (10 hours) in hepatocytes to assess the risk for induction-mediated DDIs caused by deleobuvir, which was cytotoxic after 24 or 48 hours.

Recognizing the value in establishing the relationship between incubation duration, induction response, and DDI risk assessment, the primary objective of the present study was to characterize the time course of cytochrome P450 induction in human hepatocytes after treatment with prototypical inducers (omeprazole, phenobarbital, efavirenz, or rifampicin) and determine the optimal incubation time for induction DDI risk assessment. To enable comparison of induction response across the seven IWG laboratories participating in this study, a secondary objective was to examine the feasibility of describing a consistent approach toward induction data processing and model fitting.

## Materials and Methods

#### Reagents.

Bupropion, efavirenz, phenobarbital, omeprazole, phenacetin, acetaminophen, hydroxybupropion, 1′OH midazolam, and rifampicin were purchased from Sigma-Aldrich (St. Louis, MO). Midazolam was purchased from Cerilliant (Round Rock, TX). Isotopically labeled internal metabolite standards were purchased from Corning Life Sciences (Woburn, MA). The RNeasy Mini Kit was from Qiagen (Valencia, CA), and the cDNA Reverse Transcription Kit was obtained from Applied Biosystems (Foster City, CA). All cell culture reagents were purchased from Life Technologies, BioIVT, Lonza, or Corning Life Sciences unless otherwise noted. Reagents were not standardized for consistency across laboratories but were of highest purity and chemical grade, and potential differences were not expected to affect the overall induction response.

#### Culture of Cryopreserved Human Hepatocytes.

Human cryopreserved hepatocytes from both males and females of different ages and racial origin were obtained from several commercial vendors (Supplemental Table 1): CellzDirect (Durham, NC), Thermo Fisher (Waltham, MA), Bioreclamation In Vitro Technologies (Baltimore, MD), Corning Life Sciences, and XenoTech LLC (Kansas City, KS). To better represent data provided during regulatory reviews, participating laboratories used their standard procedures, and no modifications or standardizations of procedures were suggested. A standardized approach for model fitting and estimation of induction parameters E_{max} and EC_{50,} however, was employed to ensure consistent DDI risk assessment across all seven member laboratories. As detailed in previous publications (Fahmi and Ripp, 2010; Zhang et al., 2014; Sane et al., 2016; Sun et al., 2017), cryopreserved human hepatocytes were thawed in hepatocyte thawing medium and were seeded in collagen I–coated 24- or 96-well plates at cell densities of 0.5–1 × 10^{6} viable cells/ml in hepatocyte plating medium. Viability, as determined by trypan blue dye exclusion, was at least 85% when cells were plated. Cells were initially maintained overnight at 37°C in a humidified incubator (with 95% atmospheric air and 5% CO_{2}) in hepatocyte incubation media. After individual laboratory standard acclimation periods (24–48 hours), the cells were treated with compounds. When sandwich cultured hepatocytes were used, cells were overlaid with matrigel between 4 and 24 hours postattachment, maintained for an additional 24 hours under incubated settings, and then treated with compounds. Compounds were dissolved in DMSO and added to the culture medium at various concentrations (final DMSO concentration was 0.1%; Supplemental Table 2) in triplicate. Wells containing 0.1% DMSO only were included as controls. The concentration range was designed to adequately describe the induction parameters by considering published induction parameters, historical data within the IWG, and solubility/cytotoxicity limitations for the inducers. Media was aspirated and replaced with fresh media containing inducers every 24 hours for 24- to 72-hour treatment plates. Cell viability was assessed by visual inspection of the monolayer, checking for confluency and morphology. After the designated treatment time (6, 12, 24, 48, or 72 hours), the medium was removed, and the cells were washed with an appropriate buffer (i.e., PBS or HBSS).

#### Determination of Relative mRNA Levels.

The cells were lysed using lysis buffer and prepared for RNA isolation. After the isolation of RNA using commercially available kits, cDNA was synthesized using standard polymerase chain reaction (PCR) protocols. Cytochrome P450 (1A2, 2B6, 2C8, 2C9, 2C19, and 3A4) and an endogenous housekeeping gene control (e.g., glyceraldehyde-3- phosphate dehydrogenase, *β*-actin, or *β*2-microglobulin) were quantified by real-time PCR. The gene-specific primer/probe sets were obtained from Applied Biosystems, and real-time PCR was performed using cytochrome P450 (1A2, 2B6, 2C8, 2C9, 2C19, and 3A4) and the endogenous control target cDNAs. The relative quantity of the target cDNA compared with that of the endogenous control was determined by the *ΔΔ* threshold cycle method (Applied Biosystems User Bulletin 2). Threshold cycle values >32 were excluded from the analysis. Relative quantification measured the change in mRNA expression in test samples relative to that in vehicle control sample (0.1% DMSO).

#### Determination of Relative Enzyme Activity.

After designated treatment times, hepatocyte cultures were washed and incubated with single or cocktail probe substrates (phenacetin, bupropion, and/or midazolam) according to individual company standard practices. All enzyme activities were determined by measuring the metabolite formation of the specific probe substrate for each enzyme. The standard curves with single metabolite or cocktail metabolites were prepared and analyzed by Liquid Chromatography with tandem mass spectrometry for single and cocktail assays, respectively. Note that CYP2C activity was not determined in the present study, because of the low dynamic range in response (Hariparsad et al., 2017).

#### Model-Fitting Fold Induction Data.

Detailed description of the methods employed for model-fitting fold induction data are described below, and the results of various approaches are reviewed in the *Results* and *Discussion* sections. Explicit guidelines for initial data inspection and quality control concerns are not provided. Consistent with the need to evaluate for potential outliers (reference United States Pharmacopeia 1032) and their effect on subsequent analyses, for the analyses presented here, triplicate samples were excluded from curve fitting if the CV was greater than 30% (approximately 10% for mRNA and 4% for activity). In general, traditional statistical outlier tests (e.g., Grubb’s), readily available in some commercial software tools, are not recommended to identify potential outliers from triplicate data due to small sample sizes (Laio et al., 2010).

#### Confirmation of Concentration Dependence Prior to Model Fitting.

To reduce model fitting attempts for weak or highly variable fold induction data and its impact on the DDI risk assessment, an initial statistical procedure for demonstrating a concentration dependence is suggested. Two methods were used to evaluate concentration dependence: standard linear regression and Spearman’s nonparametric rank correlation coefficient. If either method indicated a statistically significant increase (i.e., a nonzero slope/correlation), then nonlinear regression model fitting was performed. This step was adopted from a decision tree to evaluate time-dependent inhibitors, another DDI risk component that can involve design or curve fitting challenges (Yates et al., 2012).

#### Model Fitting Fold Induction Data and Estimation of Induction Parameters E_{max} and EC_{50}.

In vitro concentration-response data, based on mRNA or activity, were generated by seven IWG member companies. Each inducer-isoform pair was investigated in a single experiment at each laboratory, with each concentration of inducer evaluated in triplicate. Induction parameters E_{max} and EC_{50} were determined by plotting the in vitro fold induction data (mRNA or enzyme activity normalized to the vehicle control) against the nominal in vitro concentration and analyzed using nonlinear regression models in GraphPad Prism (version 8). To the best of our knowledge, only empirical models are available and routinely estimated for induction. A consistent feature of these models is the assumed presence of a fold induction plateau, i.e., E_{max}. Since different laboratories historically used a variety of different nonlinear regression models, five nonlinear regression models were evaluated:

Logistic 3P equation [Log(Agonist) versus response (three parameter)](1)

Logistic 4P equation [Log(Agonist) versus response (four parameter)](2)

Sigmoidal 3P(3)

E_{max} model (Hill model)(4)

Hyperbolic model (one site)(5)

For all models, *y* = relative fold induction, [*I*] is the test article concentration, *EC*_{50} is the concentration eliciting half-maximal induction, and *E*_{max} is the maximum fold induction. For Eq 1-3, bottom is the lowest fold induction and is constrained to a value of 1.0 (i.e., induction response is normalized to vehicle control, where 1.0-fold induction is baseline, or no induction). For eqs. 2–4, *h* is the Hill slope. Model estimates were reported when the standard error (S.E.) of the estimate was less than 50% of the model estimate. Model fits were compared using small-sample-size–corrected Akaike’s information criteria (AICc) (Burnham and Anderson, 2002) and AICc weight (Wagenmakers and Farrell, 2004). The default equation used for comparison of induction parameters determined at different time points was eq. 1 (logistic 3P equation); in cases in which the maximum fold induction was not achieved, eq. 3 (sigmoidal 3P equation) was used. See *Results* section (*Comparison of Nonlinear Regression Models For Estimation of E*_{max} *and EC*_{50} *from Fold Induction Data*) for additional details on this approach.

#### Estimation of E_{max}/EC_{50} Using Linearization (Initial Slope Approaches).

The initial slope of the fold induction versus concentration curve at low concentrations (typically less than the EC_{30}) can provide an estimate of E_{max}/EC_{50} under the assumptions suitable for such a simplified model (Shou et al., 2008). It is important to note that this relationship is only applicable if in vivo concentrations of the inducer are low ([I] < < EC_{50}), and the initial slope is determined by linear regression of the fold induction versus (non-log) concentration. To investigate this approach to estimate E_{max}/EC_{50}, the following criteria were implemented: the slope was determined using at least four data points, with at least one yielding >2-fold induction response. Initial slopes were reported when *R*^{2} > 0.9 and the S.E. of the slope was less than 50% of the estimated slope.

#### Comparison of Model Estimates Determined at 6, 12, 24, or 48 hours, with Model Estimates Determined at 72 hours.

To examine the relationship between incubation duration and time, fold error was determined by comparing the parameter estimates obtained at 72 hours with those obtained at earlier time points. Overall average fold error (AFE) between parameter estimates (i.e., E_{max} or EC_{50}) determined at earlier time points (6, 12, 24, or 48 hours) and 72 hours was calculated using eq. 6:(6)

Calculation of AFE was paired such that parameter estimates of E_{max} or EC_{50} at 72 hours were compared with the parameter estimates generated by the same laboratory at early time points (6, 12, 24, or 48 hours).

#### Prediction of In Vivo DDI.

The change in exposure of a victim drug due to cytochrome P450 induction was predicted using the following steady-state approaches (eqs. 7–9):(7)

[*I*_{u}] is the unbound inducer concentration.

Equation 8 eliminates the need of individually determined E_{max} and EC_{50} values and instead uses a combined term, E_{max}/EC_{50}. Note that this approximation is only true when [*I*_{u}] < < EC_{50}.(8)

Equation 9 uses the slope of the fold induction versus concentration estimate (i.e., at concentrations lower than the EC_{50}) as an estimate of E_{max}/EC_{50} as previously discussed by Shou et al. (2008).(9)

DDI risk assessment, as recommended by the final FDA guidance was determined using R3 (eqs. 10 and 11).(10)where *d* is the scaling factor and assumed to be 1; *I*_{max,u} is the maximal unbound plasma concentration of the inducer.

Using the initial slope to estimate E_{max}/EC_{50}:(11)

## Results

#### Time- and Concentration-Dependent Increases in Fold Induction of Cytochrome P450 mRNA and Activity.

Time-dependent increases in fold induction of cytochrome P450 isoforms after treatment with the prototypical cytochrome P450 inducers omeprazole, phenobarbital, efavirenz, or rifampicin are summarized in Fig. 1 (mRNA) and Fig. 2 (activity). Concentration-dependent increases in cytochrome P450 mRNA and activity after 72 hours of incubation with inducers are summarized in Fig. 3.

#### Induction of CYP1A2.

Omeprazole elicited robust increases in CYP1A2 mRNA (up to 33-fold in one laboratory) after 6 hours of incubation (Fig. 1A), which increased in a time-dependent manner, up to ∼16–72-fold, after 72 hours. Maximal fold induction of CYP1A2 mRNA by omeprazole occurred between 24 and 48 hours. In contrast, activity (Fig. 2A) achieved maximal fold induction between 48 and 72 hours. Phenobarbital caused concentration-dependent increases (i.e., nonzero slope) in CYP1A2 mRNA (Fig. 3A) and activity (Fig. 3B), but the overall maximal fold induction was low (only ∼2-fold for both mRNA level and activity) and highly variable between laboratories. This weak but concentration-dependent induction of CYP1A2 by phenobarbital lacked time dependence for mRNA (Fig. 1A), with median fold induction of 1.8–2.9 from 6 to 72 hours (Table 1). Phenobarbital-mediated increases in CYP1A2 activity, however, increased time-dependently to approximately 3.3-fold (median) after 48 hours, with minimal increase after 72 hours (Fig. 2A).

#### Induction of CYP2B6.

Phenobarbital, rifampicin, and efavirenz demonstrated time-dependent increases in CYP2B6 mRNA and activity, and consistent with CYP1A2, longer incubation times appeared to be required to achieve maximum fold induction of activity (Fig. 2B) compared with mRNA (Fig. 1B). CYP2B6 mRNA achieved near-maximal fold induction between 24 and 48 hours after phenobarbital or rifampicin treatment, whereas activity required at least 48 hours. Although efavirenz demonstrated concentration-dependent increases in CYP2B6 mRNA and activity, a marked decrease in fold induction was observed at the two highest concentrations tested (Fig. 3, C and D; characterized as a “bell-shaped” curve), which was likely due to cytotoxicity. Overall, both rifampicin- and efavirenz-mediated induction of CYP2B6 demonstrated less time dependence than phenobarbital in particular, with minimal additional increases in mRNA observed after 12–72 hours of incubation compared with 6 hours (Fig. 1B).

#### Induction of CYP2C Isoforms.

Phenobarbital and rifampicin elicited weak but concentration-dependent increases in CYP2C8, CYP2C9, and CYP2C19 mRNA (Fig. 3, A and E), and of these three isoforms, CYP2C8 appeared to be the most sensitive to induction, with up to, on average, ∼8-fold induction by phenobarbital and ∼5-fold induction by rifampicin after 72 hours of incubation (note that CYP2C activity was not determined). Although fold induction of CYP2C8 appeared to increase up to 24 hours, further time-dependent increases in mRNA were not as clear between 24 and 48 hours for both inducers (Fig. 1C). Overall induction of CYP2C isoforms by efavirenz was low, with no concentration-dependent (i.e., significant nonzero slope/correlation) increases observed for CYP2C19 or CYP2C9. In contrast, efavirenz caused concentration-dependent increases in CYP2C8 mRNA (up to an average of 3-fold at 10 μM), and although the average fold induction increased between 6 and 12 hours, there was a weak trend for increasing fold induction between 24 and 48 hours. In addition, at the highest concentrations tested (20 and 30 μM), efavirenz elicited decreases in the mRNA for all CYP2C isoforms investigated, which was similar to what was observed for CYP2B6.

#### Induction of CYP3A4.

Rifampicin elicited a time-dependent increase in CYP3A4 mRNA (Fig. 1F), and maximal fold induction appeared to occur after 24 hours, with no marked increases in mRNA after additional incubation time (48 or 72 hours) in six of seven laboratories. Efavirenz also displayed time-dependent induction of CYP3A4 mRNA (Fig. 1F), with most laboratories achieving near-maximal induction between 12 and 24 hours and minimal increase after 48 or 72 hours. As was observed with CYP1A2 and CYP2B6, additional time was required to achieve maximal induction of CYP3A4 activity, with both rifampicin and efavirenz requiring at least 48 hours (Fig. 2C). Both inducers demonstrated clear concentration dependence of both mRNA (Fig. 3, C and E) and activity (Fig. 3, D and F), but the two highest concentrations of efavirenz (20 and 30 μM) caused marked decreases in CYP3A4 mRNA and activity (compared with the response at 10 μM), resulting in bell-shaped curves consistent with the similar decreases observed for CYP2B and CYP2C isoforms. In contrast, rifampicin did not elicit decreases in fold induction of mRNA or activity at the higher concentrations tested.

#### Effect of Incubation Time on Estimates of Induction Parameters E_{max} and EC_{50}.

The fold induction after treatment of human hepatocytes with the full concentration range (eight concentrations) of each inducer was determined at 6, 12, 24, 48, and 72 hours, which enabled estimation of the E_{max} and EC_{50} at each time point (Table 1). Box and whisker plots summarizing the estimated E_{max} and EC_{50} values for all inducers and cytochrome P450 isoforms are also shown in Supplemental Figs. 1 and 2. In general, the model estimated values for E_{max} demonstrated similar trends to the maximum observed fold induction. Notably, E_{max} based on activity required longer incubation times to reach maximum than those from mRNA. Qualitatively, phenobarbital, efavirenz, and rifampicin achieved maximum E_{max} values for CYP2B6 and CYP3A4 mRNA after 24 hours, with minimal additional increase in the median E_{max} values with increasing incubation time up to 72 hours. In contrast, omeprazole-mediated induction of CYP1A2 required additional time to reach maximal response; at 12 hours, the maximum E_{max} was only ∼20-fold, compared with 60- to 100-fold at 24–72 hours. E_{max} estimates based on activity displayed a greater time dependence than estimates based on mRNA, and most inducers required at least 48 hours to achieve E_{max}. In addition, overall variability in maximal activity response across laboratories was less compared with mRNA, which is likely due to the higher degree of donor variability in the basal expression of cytochrome P450 mRNA and the higher dynamic range to detect small changes in mRNA compared with determination of cytochrome P450 activity by probe substrate turnover.

In contrast to E_{max}, estimates of potency (EC_{50}) were insensitive to incubation time, with no clear trend to increase or decrease with time (Supplemental Fig. 2). However, variability in potency was observed, which varied as much as 10- or 100-fold across laboratories (i.e., individual donors, since each laboratory used a different donor), and unlike estimates of E_{max}, this variability was not higher after earlier incubation times compared with later ones. Based on previous work by the IWG, this variability in induction response is likely not due to differences in experimental procedure but to intrinsic differences in induction response by the different hepatocyte donors (Kenny et al., 2018). In addition, this variability (up to two orders of magnitude) observed in EC_{50} across the seven participating laboratories is consistent with a previous IWG publication indicating 130-fold range in EC_{50} (rifampicin-3A4) based on data from 38 donors and nine different participating laboratories (Supplemental Table 3, Kenny et al., 2018).

#### Sensitivity to Induction at Different Time Points.

To further examine the relationship between incubation time and induction, the percentage of hepatocyte donor lots demonstrating a positive response (defined as a maximal fold induction >2-fold) at 6, 12, 24, 48, or 72 hours after treatment with prototypical inducers is summarized in Table 2. Rifampicin required the shortest incubation time to elicit positive responses, with 100% of hepatocyte donors displaying >2-fold induction of CYP3A4 mRNA after 6 hours incubation. CYP1A2 mRNA appeared to be the least sensitive to induction, with only 40%–67% of donors achieving >2-fold induction after a 6-hour incubation with omeprazole or phenobarbital. As expected, activity-based determination of E_{max} at early time points was less sensitive than mRNA, and after 6 hours, none of the hepatocyte donors achieved >2-fold induction of CYP3A4 or CYP2B6 activity after treatment with rifampicin or efavirenz, respectively. The minimum incubation time required to achieve 100% positive induction response for CYP1A2, CYP2B6, and CYP3A4 activity with omeprazole, phenobarbital, or rifampicin, respectively, was 24 hours. In addition, it was of interest to determine the percentage of hepatocyte donors achieving >10-fold induction after incubations of 6–72 hours, and these values are included in the last row of Table 2. After 6 hours of incubation, only 20% of the donors achieved >10-fold induction of CYP3A4 mRNA, and incubations >12 hours were required to achieve this level of induction in >80% of donors.

#### Comparison of the AFE of E_{max}, EC_{50} and E_{max}/EC_{50} Estimates Relative to 72 hours.

Similar to previous reports (Kenny et al., 2018), large variability in induction response was observed between individual hepatocyte donors, complicating comparison of E_{max} and EC_{50} estimates across time points. Therefore, to summarize the relationship between induction response and time across laboratories, a paired approach was used in which estimates of E_{max} and EC_{50} generated within the same laboratory were compared and then aggregated across companies, isoforms, and/or compounds. AFE was determined for induction parameter estimates measured at earlier time points (6, 12, 24, or 48 hours) compared with estimates determined at 72 hours; individual fold error was determined for each specific inducer-isoform pair with data from the same laboratory. AFE and percentage within 2-fold of E_{max}, EC_{50}, or E_{max}/EC_{50} estimates at 6, 12, 24, or 48 hours compared with 72 hours are summarized in Fig. 4 and Table 3. Acceptable agreement between earlier time points and 72 hours was defined as AFE between 0.5 and 2.0, representing a 2-fold under- or overprediction, respectively. At 48 hours, all three parameters (E_{max}, EC_{50}, and E_{max}/EC_{50}) agreed with estimates at 72 hours, with AFE ranging from 0.79 to 1.1, indicating that incubations at 48 and 72 hours provided consistent induction responses with respect to maximal induction response and potency. In addition, >88% of E_{max} estimates (based on both mRNA and activity) determined at 48 hours were within 2-fold of estimates at 72 hours, further supporting good agreement between 48- and 72-hour endpoints.

In general, E_{max} determined at earlier time points based on mRNA demonstrated greater agreement with 72-hour determinations than E_{max} estimates based on activity. After 12- or 24-hour incubations, mRNA-based estimates were still within 2-fold of estimates determined at 72 hours (AFE = 0.58 and 0.74, respectively), whereas estimates based on activity were approximately 2- to 3-fold underpredictive of the 72-hour values, with AFE = 0.50 and 0.27, respectively. In addition, the number of laboratories reporting measurable cytochrome P450 activity at 6 and 12 hours was notably lower than for incubations of 24–72 hours, suggesting these short incubations may not be suitable for induction assessment. In contrast, at 24–72 hours, the majority of laboratories reported mRNA and activity values for the AFE calculation, increasing confidence in the 2-fold agreement observed.

Estimates of potency (EC_{50}) were relatively consistent (but still within 2-fold) across time points, with the AFE values ranging from 1.1 to 1.3 (mRNA) or 0.73 to 1.9 (activity) over the incubation times investigated. In addition, considerable variability in the AFE was observed for EC_{50} values, with 42%–70% (mRNA) or 21%–44% (activity) falling within 2-fold.

#### Comparison Across Time Points for Specific Cytochrome P450 Isoforms.

To determine whether the relationship between parameter estimates and time was consistent across different cytochrome P450 isoforms, the AFE for individual cytochrome P450 isoforms was calculated and also included (in addition to the overall values) in Table 3. Overall, the AFE values for the individual cytochrome P450 isoform-inducer pairs were similar to the overall AFE calculated for all isoforms comparing 6, 12, 24, or 48 hours with 72 hours. However, one notable exception was at 6 hours, when E_{max} values for CYP1A2 and CYP2B6 activities at 6 hours underpredicted values at 72 hours by 10-fold (AFE = 0.10), compared with a 4.5-fold underprediction for CYP3A4 (AFE = 0.22). This discrepancy in AFE for different cytochrome P450 isoforms was not apparent at later time points (48 and 72 hours), when the AFE values for CYP1A2, 2B6, and 3A4 isoforms were relatively consistent, with good agreement (within 2-fold) for all three parameters investigated. Interestingly, at 48 hours, CYP1A2 (activity, E_{max}) exhibited an AFE of 0.81, compared with AFEs of 0.90 for CYP2B6 and CYP3A4, suggesting that CYP1A2 activity may continue to increase after incubations longer than 72 hours.

With respect to assessments determined at earlier time points, E_{max} estimates based on mRNA for all isoforms exhibited improved agreement than estimates based on activity. CYP2B6, in particular, had 5- to 10-fold underprediction of 72-hour E_{max} based on activity at 6 and 12 hours (AFE of 0.21–0.10); in contrast, E_{max} based on mRNA was underpredicted by 1.7- to 2.3-fold.

#### Comparison of CYP3A Drug-Drug Interaction Risk Assessment with Respect to Time.

To explore the relationship between incubation duration and DDI risk assessment, the magnitude of a rifampicin-mediated in vivo drug-drug interaction (i.e., the change in victim exposure, AUCr, calculated using eq. 7) was predicted for CYP3A using the E_{max} and EC_{50} estimates at 6,12, 24, 48, or 72 hours (Fig. 5, A and B, for mRNA and activity, respectively). The corresponding AFE for AUCr determined at 6, 12, 24, or 48 compared with 72 hours is shown in Fig. 5, C and D. It should be noted that since the magnitude of induction-mediated DDI increases with decreasing AUCr, AFE values >1.0 represent underpredictions, unlike for E_{max}, for which AFE >1.0 represents overpredictions. As was observed for the individual E_{max} and EC_{50} estimates, there was good agreement for the predicted AUCr between 48 and 72 hours (AFE = 0.92 and 1.3, for mRNA and activity, respectively). AUCr determined at 48 hours in seven of seven laboratories (100%) was within 2-fold of the AUCr determined at 72 hours based on mRNA, and for activity, it was 85%, further supporting the use of 48-hour incubations. Predicted AUCr based on mRNA remained acceptable after 12- or 24-hour incubations (AFE = 1.5 or 0.92, respectively), but risk assessment using activity tended to underpredict at these earlier time points by 3.8-fold (12 hours) and 2.1-fold (24 hours). In addition, AUCr determined at 24 hours by six of seven laboratories (86%) based on mRNA was within 2-fold of AUCr determined at 72 hours, further emphasizing the utility of this shorter incubation for DDI risk assessment. Although encouraging for rifampicin-CYP3A4 induction, the use of 24-hour incubations for induction risk assessment should be considered carefully, since across all isoforms and inducers investigated, 68% of mRNA-based determinations of E_{max} were within 2-fold of determinations from 72 hours. Additional work is required to further validate the use of 24-hour incubations for quantitative induction DDI risk assessment, in particular for non-CYP3A4 isoforms.

#### Guidelines on Model-Fitting Fold Induction Data.

Initial attempts to collate the large volume of induction time course data generated by seven IWG laboratories revealed inconsistencies in the approach toward the curve fitting of fold induction data. Individual laboratories had independently developed separate “best practices” around several key criteria, such as the choice of nonlinear regression model and suitable acceptance criteria for parameter estimates. In addition, different strategies were implemented to address incomplete concentration-response curves, in which the fold induction over the concentration range tested was insufficient to define E_{max}. Fitting nonlinear models to incomplete curves has been shown to generate biased and/or imprecise estimates (Dutta et al., 1996; Schoemaker et al., 1998; Kirby et al., 2011). Further, correlated parameter estimates (such as E_{max} and EC_{50}) complicate simple comparisons and have design implications (Sebaugh, 2011). Discriminating between (non)linear models is challenging in particular for standard induction experimental designs like those evaluated here [see Spiess and Neumeyer (2010) for a critique of R^{2} in the nonlinear setting and Kirby et al. (2011) and Brewer et al. (2016) for the use of information theoretical measures in a linear regression setting]. Kenakin (2009) suggests constraining one or more parameter estimates, if necessary, and suggests that the difference between the observed and estimated E_{max} be less than 25%.

Recognizing the need for consistent data processing and analysis for the present time course analysis, the large induction data set was reviewed with the following objectives in mind: 1) establish criteria for excluding data points for atypical dose-response curves, 2) recommend strategies to address incomplete (E_{max} not achieved) dose-response curves and risk assessment for DDI, and 3) determine the optimum model(s) for nonlinear regression and estimation of induction parameters.

#### Intrasample Assay Variability.

During the initial data review, the %CV was calculated for each set of triplicates. Despite varying a large number of conditions (e.g., donor, company, concentration, etc.), the average or median fold induction %CV appeared comparable across time, inducers, and isoforms (Supplemental Fig. 3). Considering the large range of measured fold inductions, this suggested the intrasample variance was proportional to the average, a common feature of in vitro assays. The average %CV differed between mRNA and activity, a potential intrinsic distinction between the two assays. For our analyses, these data suggested the use of a combined variability estimate. United States Pharmacopeia <1032 > recommends the use of a robust pooled variance estimate when confronted with a need to identify, for example, “outliers.” Since the majority of the %CV for mRNA and activity was below 30%, an approximate average acceptable %CV for triplicate determinations for the present studies was set to 30%. Individual laboratories are encouraged to establish their own historical estimates. Now, if a known intrasample variance is assumed, then a large-sample two-sided confidence interval for the true fold induction mean is x-bar ± z_{α/2} *σ*/√n. For the largest observed average fold induction based on a set of triplicates, and a proxy for E_{max}, replacing *σ* with its estimate %CV × x-bar and dividing the resulting interval by the observed average yields 1 ± z_{α/2} × 0.30/√3. As an approximate illustration, for *α* = 0.10, the right-hand portion simplifies to ±0.28. Subject to the precise choice of *α* and z_{α/2} versus t_{α/2}(df), where the degrees of freedom can be specified by the design, in addition to refining the %CV estimate, an approximate range of plausible average fold inductions, consistent with the observed maximum fold induction, can be calculated. Although we do not claim these intervals have optimal or desirable coverage properties, such an interval allows experimentalists to define a range of values consistent with the largest observed fold induction.

#### Special Considerations: Bell-Shaped and Incomplete Dose-Response Curves with Poorly Defined Maximum Fold Induction.

Typical fold induction versus concentration plots resemble classic sigmoidal dose-response curves (Fig. 6A, rifampicin). Importantly, fold induction is assumed to approximately plateau at high concentrations using conventional empirical models. Assuming a constant average at high concentration, although incorrect, is preferred to a decreasing nonmonotonic relationship. In contrast, experimentalists can be confronted with bell-shaped curves, which are characterized by a decrease in fold induction at concentrations higher than the observed maximum fold induction concentration, creating a bell shape (Fig. 6B, efavirenz). The paradoxical decrease at high test article concentrations is usually due to cytotoxicity, and assays confirming this loss in hepatocyte viability can provide definitive evidence to exclude these data points from curve fitting and analysis. However, cytotoxic endpoints are typically indicative of near-terminal cell death and may not provide enough resolution to identify scenarios in which nonselective cellular injury obscures cytochrome P450 induction of mRNA or activity. In addition, reduced fold induction could be due to downregulation of cytochrome P450 (Hariparsad et al., 2017), or other complex or secondary pharmacology effects may be present. Therefore, cytotoxicity (as determined by standard assay endpoints) may not be detected at higher test article concentrations where the fold induction decreases. Incorporating data where fold induction decreases at higher concentrations into the curve fit, in the absence of the use of a robust or weighting strategy to mitigate their effect, can bias the resulting nonlinear estimates, e.g., underestimate E_{max}.

To establish a rationale for excluding (decreasing) fold induction data at higher concentrations without accompanying cytotoxicity data prior to curve fitting, the previously stated two-sided confidence interval is used to establish a range of acceptable fold induction above or below the highest observed fold induction.

Table 4 summarizes several two-sided confidence intervals (0.8–0.975) and %CV (10%–30%) for a plausible lowest and highest fold induction range assuming an observed fold induction of 100. For example, assuming an overall %CV of 30% and a two-sided 90% confidence interval, a 30% decrease in fold induction is plausible given sampling variability and the statistical precision of the observed “average” E_{max}. Here, it is assumed that the observed E_{max} is representative of the actual unobserved E_{max} and that the potential for bias is acceptable. In this example, concentrations yielding a decrease in fold induction above that associated with the observed E_{max} are removed from the analysis when the fold induction is <70% of the observed E_{max.} This approach provides a potentially conservative criterion to exclude data from curve fitting based on a given confidence level and expected variability (%CV). Individual laboratories are therefore advised to establish their own expectations around variability and a suitable confidence interval when setting criteria to exclude points and adequately model nonmonotonic data. Removing data at higher concentrations using this approach typically results in an incomplete curve in which the plateau (E_{max}) is not adequately described; strategies to address this scenario are reviewed below.

In addition to bell-shaped curves, atypical or incomplete fold induction versus concentration-response curves may also arise when an apparent plateau of maximal induction response is not achieved, resulting in poor estimates of E_{max}. Failure to observe consistent maximal fold induction at high concentrations could be due to cytotoxicity or solubility limitations. Using the same confidence interval for the largest observed fold induction, E_{max} estimates that are appreciably larger than the observed maximum fold induction should be interpreted with caution or rejected. Confidence intervals/S.E. estimates for E_{max} from nonlinear regression models can be highly variable for small samples, especially when E_{max} is not achieved. As described above, based on a two-sided 90% confidence interval and a CV of 30%, E_{max} values greater than 130 should be treated with suspicion. Although unintentional, this recommendation agrees with the suggested guidelines from Kenakin (2009).

#### Comparison of Nonlinear Regression Models for Estimation of E_{max} and EC_{50} from Fold Induction Data.

Estimation of E_{max} and EC_{50} for rifampicin or efavirenz by fitting fold-induction data generated by a single laboratory to five commonly used nonlinear regression models is summarized in Table 5. The rifampicin concentration range tested (0–20 μM) displayed a classic sigmoidal concentration-response curve with well defined minimum and maximum fold induction, and all five models yielded similar estimates of E_{max} and EC_{50.} Review of the AICc weights for each model fit indicated that the logistic 4P model had the highest probability of being the best-fitting model (0.94), whereas all other models had very low probabilities (0.00497–0.0286). Remarkably, despite this large difference in AICc weight, the induction parameter estimates were very similar for all five models (17.7–19.5 for E_{max} and 0.287–0.340 μM for EC_{50}). Therefore, assessment of the probability of the best-fitting model for fitting fold induction data using Akaike information criterion has minimal impact on the estimation of E_{max} and EC_{50} for induction curves that define the maximum fold induction. In addition, since all five models provided consistent estimates of E_{max} and EC_{50}, risk assessment for induction-mediated DDI is also unaffected by the choice of model for fitting for curves resembling this rifampicin dose-response.

To further investigate the effect of model selection on induction parameter estimates for other cytochrome P450 inducer and cytochrome P450 isoforms, fold induction data determined by each laboratory were used to fit the five models, and the E_{max} and EC_{50} values were compared (Supplemental Fig. 4).

The absolute average fold error (AAFE) determined for each model’s estimate of E_{max} and EC_{50} was compared with the best model’s estimate, determined by AICc weight. Overall, as was observed for rifampicin and CYP3A4, all five model fits produced similar estimates of E_{max}, with >80% of the model fits for all models within ±20% (AAFE = 1.2) of the best model fit (Supplemental Fig. 4A). Estimates of EC_{50} also showed good agreement across all models, with >75% between ±20% and 50% of the best model estimate (Supplemental Fig. 4B; AAFE = 1.2–1.5). Therefore, as observed for rifampicin-3A4, model selection overall for all isoforms and inducers investigated has little impact on the actual estimated induction parameters E_{max} and EC_{50} for typical fold induction curves.

The effect of model selection on induction parameter estimates was also investigated for atypical dose-response curves (Table 5). In contrast to rifampicin, the five models provided different induction parameter estimates characterizing the fold induction of CYP3A4 mRNA by efavirenz, which displays a bell-shaped curve (Fig. 6B). The two highest concentrations in which the fold induction decreased more than 30% compared with the maximum fold induction were excluded from the analysis. Consequently, the remaining efavirenz concentrations did not elicit adequate fold induction to define the E_{max} (i.e., an atypical dose-response in which E_{max} is not achieved). Three of the models (logistic 3P, E_{max}, and hyperbolic fits) estimated E_{max} values that were >2-fold higher than the observed maximal fold induction (15-fold) and were rejected based on the developed criteria (i.e., >130% of observed E_{max}; with a maximum observed fold induction of 15, E_{max} estimates >19.5%, or 130% of 15, would be rejected). Interestingly, comparison of AICc weight suggests the best model was the sigmoidal 3P fit (AIC weight = 0.70), and the next best fit was the logistic four-parameter (AIC weight = 0.297); the estimated E_{max} values using these two models were within 130% of the observed maximum fold induction. In contrast, the logistic 3P, E_{max}, and hyperbolic models all exhibited low probability of being the best model (AICc weight < 0.00327) and provided very high estimates of E_{max} that were >2-fold higher than the observed maximum fold induction. Therefore, for this example of a curve with poorly defined E_{max}, models with higher probability of being the best-fitting model as determined by AICc weight also provide E_{max} estimates that are closer to the observed maximum fold induction.

#### Initial Slope as an Estimate of E_{max}/EC_{50}.

In cases in which nonlinear regression models fail to provide reliable induction parameter estimates, the initial slope of the fold induction versus (linear) concentration plot can provide an estimation of E_{max}/EC_{50} under the assumptions suitable for such a simplified model (Shou et al., 2008). To investigate the utility of this approach, the initial slopes for all isoform-inducer pairs were calculated by linear regression of the fold induction versus (non-log) concentration at each time point, and the AFE of the slope compared with the actual E_{max}/EC_{50} (determined at 48 hours) is summarized in Fig. 7. On average, the initial slope underpredicted the actual E_{max}/EC_{50} by about 2-fold after 48 hours of incubation, with AFE of 0.45 (mRNA) and 0.63 (activity). Overall, initial slopes determined at early time points (6–24 hours) based on mRNA or activity yielded poor estimates of E_{max}/EC_{50} (AFE = 0.11–0.29). In addition, this analysis was repeated (data not shown) by comparing initial slope to the E_{max}/EC_{50} determined at 72 hours, and the AFE was similar to that observed after 48 hours of incubation, with the initial slope (determined at 48 hours) underpredicting by approximately 2-fold.

#### Effect of Model-Fitting Approaches on Induction DDI Risk Assessment.

Because of the consistency in the fitted E_{max} and EC_{50}, DDI risk assessment for inducers displaying a typical sigmoidal dose-response (with well defined maximum fold induction) would be similar regardless of the model-fitting approach. However, for incomplete dose-response curves (such as efavirenz) with poorly defined maximal fold induction, DDI risk assessment would likely be dependent on the model selected to estimate E_{max} and EC_{50}. To further explore the relationship between the selection of model to fit induction data and DDI risk assessment for an atypical dose-response curve, Table 6 summarizes the model estimates for E_{max}, EC_{50}, and the corresponding the concentration resulting in 2-fold induction (F2) and R3 (eq. 10) using the logistic 3P, sigmoidal 3P, or a constrained logistic 3P fit for efavirenz-mediated induction of CYP3A4 mRNA. Note that this curve is also presented in Fig. 6B. Based on our criteria, the estimated E_{max} using the logistic 3P equation would be rejected, since it is >130% above the observed maximum fold induction, but the induction risk assessment was included for comparison using this estimate. Using the logistic 3P equation to fit the data yielded a less conservative estimate of R3 (0.27) compared with the other approaches, which was mainly due to the decrease in potency (EC_{50} = 16.5) due to a much higher estimate of E_{max} (38.5). This example highlights the corresponding right shift (underestimation of potency) in EC_{50} when overestimating E_{max}. In addition, R3 appears to be more sensitive to EC_{50} than E_{max} in this example, since even though the logistic 3P estimated a >2-fold E_{max} than the other approaches, the overall predicted in vivo interaction was less, which was primarily driven by the decrease in potency (i.e., right shift and increased EC_{50}). The next approach explored was to constrain the E_{max} to the observed maximum fold induction using the logistic 3P fit, which yielded an E_{max} of 15.1 and an EC_{50} of 3.1 μM. These values were similar to the model estimates using the sigmoidal 3P fit (15.1 and 3.6 μM, respectively), and the corresponding predicted reduction in a victim drug’s exposure was similar for both approaches (R3 = 0.19–0.20).

#### Use of Initial Slope for Estimation of E_{max}/EC_{50} and DDI Risk Assessment.

Despite a tendency to underpredict the E_{max}/EC_{50} by about 2-fold, the utility of using the initial slope to assess induction DDI risk for efavirenz was investigated (last row of Table 6). Using the initial linear slope of the fold induction versus concentration yielded an estimated E_{max}/EC_{50} of 1.5 and a corresponding predicted R3 (eq. 11) of 0.35, which is an approximate 2-fold underprediction compared with assessment based on the sigmoidal 3P fit (0.19). Comparison of these two approaches to determine R3 was further investigated using data from all participating laboratories, and the AFEs are summarized in Table 7 for omeprazole, efavirenz, and rifampicin. Interestingly, the R3 values determined using the initial slope estimation compared with the sigmoidal 3P fit exhibited good agreement (AFE = 0.955 and 0.823) for efavirenz-CYP2B6 and rifampicin-CYP3A4. Acceptable agreement (within 2-fold) was also observed for omeprazole-1A2 and efavirenz-3A4 (AFE = 0.685 for both). This agreement to estimate R3 despite an ∼2-fold underprediction of E_{max}/EC_{50} using the initial slope is likely due to the conservative nature of the R3 equation, which utilizes a 10× factor on the input unbound inducer concentration, which effectively minimizes the underprediction of E_{max}/EC_{50}. Although preliminary, this limited data set indicates agreement (overall AFE = 0.83) between the R3 estimated using the initial slope and the R3 value calculated using E_{max} and EC_{50} determined using a sigmoidal 3P fit.

#### General Guidelines for Estimation of E_{max} and EC_{50} from Fold Induction Data.

A summary of the recommended approach toward model-fitting fold induction data is presented in Fig. 8.

## Discussion

Regulatory agencies recommend assessment of induction-mediated DDI using primary human hepatocyte cultures for up to 72 hours, which has been adopted in industry (Meunier et al., 2000; Chu et al., 2009). However, evaluating induction using shorter durations could be advantageous because of the potential for cytotoxicity after prolonged exposure to new chemical entities (NCEs), and therefore shorter incubation times could enable induction DDI risk assessment that would otherwise not be possible. The objective of the present study was to compare the induction response and corresponding DDI risk assessment in primary human hepatocytes after treatment with four prototypical cytochrome P450 inducers. This study was designed to systematically evaluate the time course of cytochrome P450 induction using mRNA and enzyme activity endpoints. Importantly, the study was conducted by seven different laboratories using independent hepatocyte donors and employing their own protocols.

Overall, maximal fold induction of mRNA occurred after shorter duration than that required for activity, which is consistent with literature reports. Li et al. (1997) and LeCluyse et al. (2000) previously demonstrated that CYP3A activity increases time-dependently up to 72 hours, whereas maximal CYP3A4 mRNA levels have been observed after 18–24 hours (Drocourt et al., 2001) and within 24 hours for CYP1A2 (Grover et al., 2007). Zhang et al. (2010) also reported that CYP1A2, 2B6, and 3A4 exhibited robust increases in mRNA after 6 hours and achieved maximal induction of mRNA levels within 24 hours. Faucette et al. (2004) reported maximal induction of CYP3A4 (by rifampicin) within 2–4 days, whereas CYP2B6 activity continued to increase until 4 days of treatment. In the present study, average maximal fold induction of CYP1A2, CYP2B6, and CYP3A4 mRNA occurred after 24- to 48-hour incubations (Fig. 1, A, B, and F), whereas 48 hours was required to achieve maximal activity (Fig. 2). Estimates of the E_{max} followed the same trend, with mRNA-based assessments determined at 24 and 48 hours exhibiting good agreement with 72 hours (AFE = 0.74 and 0.94, respectively, Fig. 4A), whereas assessments based on activity required at least 48 hours to fall within 2-fold of 72 hours.

The utility of earlier assessment of cytochrome P450 induction based on mRNA was reported by Sane et al. (2016), with maximal induction occurring between 8 and 12 hours and corresponding DDI risk assessment consistent with assessments at 48–72 hours. Zhang et al. (2010) also concluded that 24- or 48-hour assessments based on mRNA are suitable for routine testing without sacrificing assay robustness. An additional factor supporting shorter durations for assessment of induction is the reported loss in enzyme activity in cultured primary hepatocytes resulting in lower basal cytochrome P450 activity compared with freshly isolated hepatocytes (Hamilton et al., 2001; Rodríguez-Antona et al., 2002; Elaut et al., 2006). Hepatic dedifferentiation and the contribution of micro-RNA have been proposed drivers for this loss in activity (Bell et al., 2016; Lauschke et al., 2016), but the precise mechanism has yet to be determined. Nevertheless, these decreases in cytochrome P450 activity in hepatocyte culture further support the use of shorter incubations to assess the risk for induction-mediated DDI.

The time to achieve maximum fold induction can also be considered in context with respect to the reported degradation rates (k_{deg}) for cytochrome P450 protein or mRNA. The time to maximum induction is analogous to achieving a new steady-state concentration and is therefore intrinsically related to the k_{deg}. Recent reports (Ramsden et al., 2015; Takahashi et al., 2017) using long-term hepatocyte cocultures have estimated k_{deg} for CYP3A4 protein to be ∼0.02 h^{−1} (half-life = ∼30 hours), which is consistent with values determined in hepatocytes (Maurel, 1996; Yang et al., 2008) and liver slices (Renwick et al., 2000). A similar k_{deg} value has been reported for CYP3A4 mRNA (Yamashita et al., 2013). To achieve “true” steady-state kinetics and a maximum fold induction, five half-lives (150 hours) would be required—within 2-fold of the 3-day period reported by LeCluyse et al. (2000) and ∼3-fold higher than our data (suggesting 48 hours is sufficient). However, after a single half-life (∼30 hours), induction should be within 50% of steady state, and after two half-lives, it would be within 75%. Therefore, similar maximal fold induction after 48 and 72 hours is consistent with achieving between 50% and 75% of E_{max} after 1.5 half-lives.

Previously, the IWG reported weak and variable induction of CYP2C8, CYP2C9, and CYP2C19 (Hariparsad et al., 2017). In the present study, phenobarbital and rifampicin elicited concentration-dependent increases in CYP2C9 and CYP2C19 mRNA, but the maximum fold induction was weak (<2-fold; Fig. 3, C and E). Induction of CYP2C8 mRNA displayed minimal time dependence, with near-maximal induction occurring after 12–24 hours and minimal increases after 48 or 72 hours (Fig. 1D). These data suggest that longer incubation times are not required to achieve maximal fold induction of cytochrome P450 mRNA, even for weak induction (i.e., CYP2C8). Therefore, conclusions around agreement between E_{max} and EC_{50} estimates at 24 or 48 hours and estimates after 72 hours are likely applicable to both weak and strong inducers.

### General Recommendations on Incubation Duration for Induction Risk Assessment and Model Fitting

#### Incubations of 48 and 72 hours in Primary Hepatocytes Provide Equivalent Assessment of Cytochrome P450 Induction Based on mRNA or Cytochrome P450 Activity.

Across all donors, cytochrome P450 isoforms, and inducers, E_{max} and EC_{50} estimates demonstrated excellent agreement (AFE > 0.87) between 48- and 72-hour incubations based on both mRNA and activity. In addition, 72-hour incubations provided no improvement over 48 hours for predicting the likelihood of a rifampicin-mediated DDI (AFE = 0.92–1.3). The IWG recommends incubations of 48 hours for assessing the risk for induction-mediated DDI for cytochrome P450 inducers using mRNA or activity as the endpoint.

#### Assessment of Cytochrome P450 Induction After 24 hours Based on mRNA, but Not Activity, Provides Reliable Determination of Induction Parameter Estimates and DDI Risk.

Incubations for 24 hours yielded mRNA-based induction estimates of E_{max}/EC_{50} that were similar to 72 hours (AFE = 0.55, 1.8-fold underpredicted; Fig. 4), whereas activity-based assessments underpredicted by 2.8-fold. Risk assessment for rifampicin-mediated DDI based on CYP3A4 mRNA determined after 24 hours also exhibited good agreement with 72 hours (AFE = 0.92, Fig. 5, A and C), with six of seven laboratories (86%) predicting within 2-fold, supporting the utility of 24-hour incubations. However, when considering the variability in mRNA response across all isoforms and inducers (68% of E_{max} determinations were within 2-fold of 72 hours), induction assessment after 24 hours should be interpreted carefully, in particular for non-CYP3A4 interactions.

#### Compounds Causing Less Than 2-fold Induction of CYP3A4 mRNA and No Concentration Dependence after 12 hours Can Be Classified as Negative for Cytochrome P450 Induction.

Previously, IWG recommended that compounds demonstrating <2-fold induction of CYP3A4 mRNA (48 or 72 hours) and no evidence of concentration-response be classified as negative for induction when the corresponding rifampicin response in the same donor is ≥10-fold (Kenny et al., 2018). In the present study, six of seven laboratories reported >10-fold CYP3A4 induction with rifampicin after 12 hours, and all laboratories were >10-fold for durations ≥24 hours. Therefore, compounds demonstrating <2-fold induction (and no concentration-dependent response) of mRNA after 12 or 24 hours can be classified as negative for CYP3A4 induction.

#### Atypical Concentration Versus Fold Induction Responses (i.e., Bell-Shaped Curves or Failure to Achieve E_{max}) Can Be Addressed by Removing Points Outside an Established Confidence Interval with an Expected %CV.

With an expected estimate of variability (30% CV) and a two-sided confidence interval of 0.95, concentrations above that associated with the E_{max} exhibiting a paradoxical decrease in fold induction are removed from induction curve fitting when the fold induction is <70% of the observed E_{max}. Similarly, estimates of E_{max} that are >130% of the observed maximum fold induction are interpreted with caution, based on the same estimates of variability and assumed confidence interval. The estimate of 30% is based on the overall triplicate variability observed across all seven laboratories participating in this study; investigators are encouraged to leverage their own estimates of variability to establish individual guidelines.

#### When the Maximum and Minimum Fold Induction Are Well Defined by the Concentration Range Tested, Selection of the Best-Fitting Model Using AICc Criteria Has Little Impact on the Estimated E_{max} and EC_{50}.

For fold induction curves demonstrating classic sigmoidal kinetics, all five nonlinear regression models provided similar estimates of E_{max} and EC_{50}, with >75% falling within ±20%–50% of the best estimate as determined by the highest AICc weight. This consistency suggests that AICc criteria provide limited value to determine the best model with respect to the estimates of E_{max} and EC_{50} and DDI risk assessment.

#### When the Maximum Fold Induction Is Not Well Defined, Conservative Estimates of E_{max} and EC_{50} Can Be Obtained Using Sigmoidal 3P Fit or Constraining Logistic 3/4P Fits to the Maximum Observed Fold Induction.

The sigmoidal 3P fit and constraining logistic three- or four-parameter fits to the maximum observed fold induction underestimate the E_{max} and provide an overestimation of potency (right-shifted EC_{50}). Therefore, sensitivity analysis assuming E_{max} > the fitted E_{max} and EC_{50} < the fitted EC_{50} using the constrained logistic 3P/4P or sigmoidal 3P fits provide a conservative approach to DDI risk assessment.

#### Preliminary Data Suggest Initial Slope of Fold Induction Data May Approximate E_{max}/EC_{50} and Enable DDI Risk Assessment in Scenarios in which E_{max} and EC_{50} Cannot Be Reliably Estimated.

Importantly, the experimental design was not optimized to characterize the initial slope, and evaluation of induction response at concentrations below the EC_{50} would enable more accurate characterization. Recognizing these limitations, additional work is needed to further validate this approach for induction-based DDI assessment, as it could provide a valuable approach when induction parameters cannot be derived experimentally. Encouragingly, preliminary analysis indicated the initial slope estimate of R3 for omeprazole, efavirenz, or rifampicin was in good agreement (AFE = 0.83) with R3 determined using E_{max} and EC_{50} (using a sigmoidal 3P fit). Further investigation of this approach is ongoing.

The IWG has presented a data-driven evaluation of the time course for cytochrome P450 induction by four prototypical inducers (omeprazole, phenobarbital, efavirenz, and rifampicin). Although there was variability in induction response across different human hepatocyte donors and laboratories, consistent analysis using our model-fitting guidelines allowed several recommendations to be made regarding the appropriate incubation duration to obtain reliable induction parameter estimates. A key conclusion from this analysis is that 48-hour incubations provide equivalent assessment of induction response and corresponding in vivo DDI risk assessment compared with 72-hour incubations, indicating that longer incubations provide little benefit.

## Acknowledgments

The authors would like to thank the IQ IWG for valuable discussion of data and manuscript review; the IQ TALG for manuscript review; the IQ member companies for data; and Drs. Ronald S. Obach (Pfizer), Christopher Gibson (Merck), and Michael Sinz (BMS) for valuable feedback on draft manuscripts. We would also like to thank the following individuals who contributed to in vitro (wet laboratory) induction studies presented in the manuscript: Stephanie Piekos (Boehringer Ingelheim), Carlo Sensenhauser (Janssen), Amanda Moore and Hong Tsao (Vertex), Thuy Ho, Rob Clark and Sarah Trisdale (Corning Life Sciences), Xiaowei He (Eisai), and Kelly Nulick (Pfizer).

## Authorship Contributions

*Participated in research design:* Ramsden, Dallas, Einolf, Palamanda, Chen, Goosen, Siu, Zhang, Tweedie, Hariparsad, Jones.

*Performed data analysis:* Wong, Ramsden, Fung, Goosen, Yates.

*Wrote or contributed to the writing of the manuscript:* Wong, Ramsden, Dallas, Fung, Einolf, Chen, Goosen, Siu, Zhang, Tweedie, Hariparsad, Jones, Yates.

## Footnotes

- Received April 2, 2020.
- Accepted October 21, 2020.
↵1 Current affiliation: Pliant Therapeutics, South San Francisco, California.

↵2 Current affiliation: Takeda, Cambridge, Massachusetts.

↵3 Current affiliation: SK Life Science Inc, Paramus, New Jersey.

↵4 Current affiliation: AbbVie, Worcester, Massachusetts.

↵5 Current affiliation: DMPK, Research and Early Development, Oncology R&D, AstraZeneca. Waltham, Massachusetts.

↵6 Current affiliation: Pharmaron, Rushden, United Kingdom.

↵7 Current affiliation: Bristol Myers Squibb, New Brunswick, New Jersey.

↵This article has supplemental material available at dmd.aspetjournals.org.

## Abbreviations

- AICc
- Akaike’s information criteria
- AFE
- average fold error
- AAFE
- absolute average fold error
- AUCr
- Ratio of AUC (area under of the curve) of victim substrate in the presence of the inducer compared to AUC in the absence of the inducer
- DDI
- drug-drug interaction
- EMA
- European Medicines Agency
- E
_{max} - maximal fold induction (fitted)
- FDA
- Food and Drug Administration
- IWG
- Induction Working Group
- P
- parameter
- PCR
- polymerase chain reaction
- R3
- R3 value (FDA), used to predict the magnitude of an induction-mediated drug-drug interaction

- Copyright © 2020 by The American Society for Pharmacology and Experimental Therapeutic