Abstract
It is widely recognized that preclinical drug discovery can be improved via the parallel assessment of bioactivity, absorption, distribution, metabolism, excretion, and toxicity properties of molecules. Highthroughput computational methods may enable such assessment at the earliest, least expensive discovery stages, such as during screening compound libraries and the hittolead process. As an attempt to predict drug metabolism and toxicity, we have developed an approach for evaluation of the rate of Ndealkylation mediated by two of the most important human cytochrome P450s (P450), namely CYP3A4 and CYP2D6. We have taken a novel approach by using descriptors generated for the whole molecule, the reaction centroid, and the leaving group, and then applying neural network computations and sensitivity analysis to generate quantitative structuremetabolism relationship models. The quality of these models was assessed by using the crossvalidated correlation coefficients of 0.82 for CYP3A4 and 0.79 for CYP2D6 as well as external test molecules for each enzyme. The relative performance of different neural networks was also compared, and modular neural networks with two hidden layers provided the best predictive ability. Functional dependencies between the neural network input and output variables, generalization ability, and limitations of the described approach are also discussed. These models represent an initial approach to predicting the rate of P450mediated metabolism and may be applied and integrated with other models for P450 binding to produce a systemsbased approach for predicting drug metabolism.
Quantitative structuremetabolism relationship (QSMR) models allow the estimation of complex metabolismrelated phenomena from relatively simple calculated molecular properties or descriptors. Such models can be used for the design of structural analogs of bioactive compounds with improved pharmacokinetic properties (Bouska et al., 1997; Madsen et al., 2002; Humphreys et al., 2003), evaluation of excretion kinetics (Holmes et al., 1995; Bollard et al., 1996; Cupid et al., 1996), estimation of approximate rates of metabolic conversion for prodrugs or soft drug candidates (Buchwald and Bodor, 1999; Bodor, 1999), and assessment of potential toxic effects of novel compounds. (Di Carlo et al., 1986a,b; Di Carlo, 1990; Altomare et al., 1992).
The computational prediction of the metabolic fate of novel compounds is a nontrivial problem. First, an indiscriminate pooling of metabolic data from different species in the commercially available databases substantially distorts any attempt at generalization (Darvas, 1988). The metabolic pathways and corresponding networks can be very different even in close mammalian species, so that any use of such pooled data is problematic (Mulder, 1990). Second, in vitro and in vivo data may differ substantially even for the same species. The metabolic fate of a drug delivered to the human liver after intravenous administration is often quite different from that observed in the liver microsomal fraction in vitro. Third, metabolism of the same drug may vary substantially between individuals depending on the expression level of particular enzymes, polymorphisms in enzymeencoding and regulatory genes, and the presence of particular isoenzymes in normal (Hayashi et al., 1991) and disease states (Kato et al., 1995). In determining a QSMR, the key question is whether it is the complete molecular structure or its structural components that actually undergo metabolism. The answer is vital for choosing relevant descriptors for QSMR models and the subsequent prediction of metabolic fate for novel compounds. Finally, such QSMR algorithms should be highly effective in handling very large virtual and real discovery compound databases.
We have applied QSMR to the cytochrome P450 (P450) enzymes involved in the phase I metabolism of both exogenous and endogenous compounds (Ioannides and Parke, 1996). The recent reviews on P450s detail their chemistry, regulation, membrane topology, and molecular biology, and provide initial models for substratebinding sites (Ioannides and Parke, 1996; Ekins et al., 2003). Of over 50 human P450 genes cloned and described, only three P450 families, a halfdozen subfamilies, and fewer than a dozen isoenzymes have been shown to play any significant role in hepatic processing of drugs (Estabrook, 1996). These important P450 enzymes generally have broad and overlapping substrate specificity, which poses a serious challenge to the prediction of therapeutic or toxic outcomes of xenobiotic metabolism.
Specific P450 enzymesubstrate recognition interactions have been studied extensively, including several QSAR and pharmacophore models that have been built and analyzed in reviews (Smith et al., 1997a,b; Lewis et al., 1998, 2002; de Groot et al., 1999; Lewis, 2000; Szklarz et al., 2000; Ekins et al., 2001; de Groot and Ekins, 2002). Electronic models for P450mediated metabolism have been produced (Jones et al., 2002; Jones and Korzekwa, 1996), which have combined aliphatic and aromatic oxidation reactions to generate predictions for regioselectivities. More recently, this approach has been used to identify the major sites of human CYP3A4 metabolism (Singh et al., 2003). Although useful and thoughtfully designed, these various models used above have limitations, as they were generated with small sets of molecules for a selection of P450s. In contrast, another approach that can be taken is to carefully collate a large amount of substrateproduct reactions for human P450s and then categorize the metabolic reactions according to the particular type of chemical transformation [e.g., Ndealkylation, Odealkylation, and sulfur (II) oxidation] (Korolev et al., 2003). Recently, we have developed computational algorithms for assessment of the general probability of P450mediated transformation for druglike compounds (Korolev et al., 2003) and for prediction of the K_{m} of drugs to the active sites of P450 enzymes using this data base (Balakin et al., 2004).
In the present study we have evaluated the possibility of more accurately predicting the rates of P450mediated metabolic Ndealkylation reactions. The work focuses in particular on Ndealkylation mediated by two of the clinically relevant human P450 enzymes, CYP3A4 and CYP2D6. These models should serve as a starting point for the development of an integrated automated system for the prediction of metabolic and toxic effects of organic compounds in humans (Bugrim et al., 2004).
Materials and Methods
Data Bases. A total of 83 metabolic Ndealkylation reactions with experimental log V_{max} values for two major human P450s, CYP3A4 and CYP2D6, were studied in this work (Table 1). Together, both P450s are responsible for hepatic metabolism of ca. 80% of drugs in humans (Yan and Caldwell, 2001). For each reaction, structures of initial substrates and products were obtained from the MetaDrug data base (GeneGo, St. Joseph, MI). These Ndealkylation reactions in human enzyme assays generally followed MichaelisMenten kinetics allowing calculation of V_{max} values, which ranged from 1 × 10^{6} to 3.3 × 10^{3} pmol/min/pmol of enzyme.
To clean the input data for their subsequent use in QSAR modeling, we performed an analysis of the initial training data set obtained from the MetaDrug data base. The analysis is based on Sammon nonlinear mapping (NLM) (Sammon, 1969) of the initial substrates' property space. NLM is an advanced multivariate statistical technique that approximates local geometric relationships on a two or threedimensional plot. Sammon maps have previously been used for the visualization of protein sequence relationships in two dimensions and comparisons between large compound collections, represented by a set of molecular descriptors (Agrafiotis and Lobanov, 1997; Agrafiotis et al., 1999). In this work, we used NLM for analysis of heterogeneity of the initial data set of Ndealkylation reaction substrates. Five molecular descriptors, molecular weight, logarithm of 1octanol/water partition coefficient (log P), the number of Hbond donors and acceptors, and the number of rotatable bonds were calculated for the entire initial data set of CYP3A4 and CYP2D6 substrates. These descriptors encode the most significant molecular features, such as molecular size, lipophilicity, Hbonding capacity, and flexibility, and are commonly associated with molecular properties determining druglikeness of small molecule compounds. The Sammon NLM procedure allows the creation of a 2D image of the studied fivedimensional property space.
The Sammon map generation was conducted using a program developed internally at Chemical Diversity Labs as part of the ChemoSoft™ software suite (Chemical Diversity Labs, Inc. San Diego, CA.). The nonlinear map was built based on the following parameters: maximal number of iterations 300, optimization step 0.3; Euclidean distance was used as a similarity measure.
After the outliers were removed with this technique, we obtained two sets of metabolic Ndealkylation reactions mediated by CYP3A4 and CYP2D6 enzymes (Table 1). Twenty one molecules are common between these two enzymes, but are characterized with different logV_{max} values. The substrates are listed in Table 1 with their logV_{max} values for the corresponding CYP3A4 and CYP2D6mediated reactions.
Neural Network Modeling. The NeuroSolution 4.0 program (NeuroDimension, Inc. Gainesville, Fl.) was used for all neural network operations. Unless otherwise stated, the modular neural networks with two hidden layers were generated. Modular feedforward networks are a special class of multilayer perceptrons. These networks process their input using several parallel multilayer perceptrons, and then recombine the results. This action tends to create some structure within the topology, which will foster specialization of function in each submodule. Using modular networks, one needs a smaller number of weights for the same size network (i.e., the same number of input variables). This tends to speed up training times and reduce the number of required training examples. The training was performed over 1000 iterations. All the computations were performed using a personal computer workstation with the Pentium 1.8 GHz processor on a Windows 2000 platform.
Elements of the Substrate's Structural Organization. Three types of molecular fragments belonging to the initial substrate molecules were considered in this work Fig. 1. Thus, for each initial molecule for which the CYP3A4 or CYP2D6mediated Ndealkylation occurs, the whole structure (A), the centroid with topological radius equal to three bonds (B), and the cleaved leaving fragment (C) were studied. Such a dissection strategy arose from our basic theoretical assumption that these elements of a substrate's organization will be crucial for the metabolic Ndealkylation rate. Such an assumption has a solid theoretical basis. The significance of the wholemolecular level for the P450 substrate/nonsubstrate properties has been noted in the literature (Smith et al., 1997a,b; Lewis et al., 1998; de Groot et al., 1999; Szklarz et al., 2000; Ekins et al., 2001; de Groot and Ekins, 2002; Korolev et al., 2003), and subsequently, most of the reported QSMR studies utilize the properties of the molecule as a whole. On the other hand, it is evident that the properties of local fragments participating in the metabolic reaction can influence the transformation kinetics. For instance, it has been shown that the enzymatic hydrolysis of carboxylic esters depends on the steric hindrance of the reaction site (Buchwald, 2001). From the structuremetabolism relationship of a series of human immunodeficiency virus protease inhibitors, the compounds having a specific substituent pattern near the reaction site were found to be able to avoid glucuronidation (Mimoto et al., 2004). As we will show, this basic assumption resulting in the definition of the elements of a substrate's structural organization was confirmed by our QSMR modeling experiments.
Descriptors. Molecular descriptors were calculated for the three structural types A to C (Fig. 1) using the Cerius^{2} (Accelrys, San Diego, CA) and ChemoSoft (Chemical Diversity Labs, Inc., San Diego, CA) software tools. A wide range of molecular descriptors of different types were calculated for all initial substrates, including electronic, topological, spatial, structural, and thermodynamic descriptors. Electronic descriptors included polarizability and dipole moment. Topological descriptors included Wiener (Wiener, 1947) and Zagreb (Gutman et al., 1991) indices, Kier and Hall molecular connectivity indices (Hall and Kier, 1991), Kier's shape indices (Hall and Kier, 1991), the molecular flexibility index (Hall and Kier, 1991), and Balaban indices (Balaban, 1982). Spatial descriptors included radius of gyration, Jurs descriptors (Stanton and Jurs, 1990), shadow indices (Rohrbaugh and Jurs, 1987), area, density, principal moment of inertia, and molecular volume. The Jurs descriptors were calculated from threedimensional energyminimized molecular conformations by mapping partial charges on solventaccessible surface areas of particular atoms. Their utility for the analysis of selective recognition factors for various systems has been shown in several publications in the past decade (Wessel et al., 1998; Eldred at al., 1999). Structural descriptors included numbers of rotatable bonds, hydrogenbond acceptors, hydrogenbond donors, molecular weight, and aromatic density. Finally, the thermodynamic descriptors included log D (at pH 7.4) and molar refractivity. Topological, spatial, and structural descriptors were calculated for topological centroids and leaving fragments. This yielded a total of 120 initial descriptors for each metabolic reaction.
Selection of Molecular Descriptors. Reducing the number of independent variables is crucial when attempting to model small data sets. Generally, the smaller the data set, the greater the chance of overfitting the data when using a large number of descriptors. Therefore, predictive modeling usually involves two stages that can be concurrent or distinct: 1) a feature reduction stage and 2) a predictive modeling stage. After relevant features have been identified, the predictive modeling stage is initiated. Feature selection for the P4503A4mediated Ndealkylation in this work is based on a sensitivity analysis (LeCun et al., 1990; Bigus, 1996). The main objective of the sensitivity analysis is to determine the saliency of each of the features in a model and to reduce the number of features. Sensitivity analysis explores a trained machine learning model to determine the sensitivities for the descriptive features of the model and to make the feature selection based on these sensitivities. In other words, the testing process provides a measure of the relative importance among the inputs of the neural model.
All the descriptors were scaled between 1 and 1. The descriptor data set was extended with an additional random “phantom” variable to scale the sensitivities. The underlying assumption is that descriptors with sensitivities less than this random variable are not important for the model. This random variable can come from different distributions. In this case, the random variable was obtained from a normal distribution. A feedforward backpropagated neural network was generated and trained using the entire training set (31 objects) and 121 input variables, which included 120 calculated descriptors and one phantom variable. After the neural network had been trained, a sensitivity measure per feature was obtained, and the procedure was repeated three times. These sensitivities were then combined as the average of three runs to obtain the final sensitivity value for each feature. The sensitivities were then sorted in ascending order and all features with sensitivities smaller than or similar to the random phantom variable were dropped. This elimination process was done in successive iterations for feature reduction stages, constructing a new model based on the new reduced feature set.
This iterative feature elimination process with sensitivity analysis was halted when no more features could be dropped (there were no more features with a sensitivity below the sensitivity of the random scale variable). We carried out a systematic trainingtesting experiment based on the crossvalidation leaveoneout (LOO) procedure to further reduce the number of inputs, more accurately select descriptors, and find the optimal architecture of the modular neural network. The descriptors were sorted according to the average sensitivity coefficient (SC; the standard deviation of each output divided by the standard deviation of the input which was varied to create the output) as shown in Table 2, and several LOO crossvalidation cycles were performed with a gradually reduced number of input variables. In the first round, we used the entire 24descriptor set; in the second round, we removed one descriptor from the end of the sorted list and generated the models using the remaining 23 descriptors, and so on. The models with different numbers of input variables were assessed using the crossvalidated correlation coefficient q^{2}. Another parameter used for evaluation of the predictive ability of the generated models is σ (eq. 1), originally proposed by So and Karplus (1997).
where N is a number of compounds in the training set, n is a number of input variables, and y_{i,obs.} and y_{i,pred.} are observed and crossvalidated predicted activity. Using σ as a selection criterion, one can discriminate between models that give similar correlation coefficients but are different in the number of variables (n). Thus, a compromise between the quality of the model and the risk of overfitting the data can be reached. More specifically, when the neural network has too many adjustable weights compared with the number of training data, the network can memorize the training set.
After the relevant descriptors were found (Table 2), an optimal learning algorithm was identified. Several different neural networks were tested using the crossvalidation LOO procedure. It should be noted that we also tried to leave larger fractions out, but even in the case of leavetwoout models, the predictive ability of the networks (expressed as q^{2}) appeared to be reduced (data not shown). Different neural network architectures (Table 3) were automatically built as implemented in the NeuroSolution program and assessed using the LOO value. LOO works by leaving one data point out of the training set and giving the remaining instances (31 in the case of the P4503A4 reaction set) to the learning algorithms for training. The process was repeated 32 times so that each example is a part of the test set only once. The LOO procedure tests the model's generalization accuracy, whereas training set accuracy tests only the model's ability to memorize. A comparative LOO analysis was conducted on models trained using several different learning algorithms and the entire 24descriptor set. The resulting values for average training (r^{2}) and crossvalidation (q^{2}) coefficients are reported in Table 3. Among the neural networks tested, modular neural networks with 2 hidden layers provided the best predictive ability. This learning algorithm was used in all further experiments.
Model Testing. The developed models were validated using two external test sets, which were not used for training. Nine P4503A4mediated and five P4502D6mediated Ndealkylation reactions with known V_{max} values were collected from the literature. After all the necessary molecular descriptors were calculated as described previously, we separately tested the models for P4503A4 and P4502D6 to predict Ndealkylation V_{max} values.
Results
Molecule Selection. When we tried to model the entire initial data set, the quality of the models generated was low. Therefore we took steps to remove outlier molecules and produce a more homogeneous “local” model. The first Sammon map used for the training set molecule selection is shown in Fig. 2. Most of the compounds (ca. 90%) are located in a compact region of the map as a long, curved island. Outliers are depicted as black circles and exemplified by four arbitrary structures. In a typical case these outliers represent structurally dissimilar, generally nondruglike compounds. All such compounds were removed from the data set to ensure some degree of homogeneity of properties for the remaining training set selection. Such a process is an important factor for successful QSAR modeling in this case, when a relatively small number of objects with experimental log V_{max} values are available.
Molecule Feature Selection. The molecular descriptors predominantly selected in this study are related to the wholemolecular level and are related mainly to the family of Jurs descriptors (Stanton and Jurs, 1990). Other selected factors are the hydrogen bonding capacity, molecular geometry, and topological complexity. The significance of these molecular properties in this particular task is in an agreement with previous observations, where similar or closely related molecular features were considered as the key factors affecting the P450 active site binding affinity. The descriptors related to structural types B and C (Fig. 1), such as density, Zagreb index (the measure of topological complexity, branching) and radius of gyration encode the steric hindrance of the reaction center. In addition, they may relate to the ease of expulsion of the leaving group from the active site. These are all key factors affecting the rate of metabolic conversions. In general, it can be concluded that molecular descriptors selected with the use of the described statistical algorithm adequately describe the molecular properties determining metabolic behavior and, in this case, give a reasonable set of factors governing the rates of metabolic Ndealkylation.
Neural Network Architecture Selection. Along with the number of input variables, the number of hidden units in the neural network architecture is another important parameter, which is closely related to the generalization ability of the neural network. The principal difficulty here is that the optimal architecture depends strongly on the characteristics of the particular problem to be solved, and no a priori recommendations can usually be made to achieve the best predictive ability. To study the influence of the neural network architecture on its predictive ability, LOO crossvalidation computations with different numbers of input variables and hidden units were performed. The number of input neurons varied from 5 to 24, and the number of hidden units varied from 2 to 12 (Fig. 3, a and b). The threedimensional plot of dependence of q^{2} on the number of input (N) and hidden (n) units is shown in Fig. 3a. Thus, the change from 5 to 12 input units brought a meaningful shift in q^{2} value from 0.55 to 0.6 to 0.75 to 0.82. This effect leveled off at 12 to 14 descriptors, and then the q^{2} did not change significantly upon a further increase of N. Changing the number of hidden units does not cause such a dramatic change in the fitting performance of the neural network: sinusoidallike changes in q^{2} value were observed upon the increase of n from 2 to 12, with two maxima at 4 and 10 hidden units. This dependence is most pronounced for a small number of input variables (812), whereas for larger N, the number of hidden units seems to be of low importance for the generalization ability of the neural network. The importance of hidden units for this QSMR model indirectly implies the presence and importance of higher order terms in the QSMR equation modeled by neural network. The threedimensional plot of the dependence of the parameter σ on the number of input and hidden units (Fig. 3b) allows us to find the optimal solution. For clarity, only the area restricted by 10 and 15 input units is shown. There are two local minima on the surface shown here, corresponding to two optimal neural network architectures. The best σ value is achieved for 12 input and 4 hidden units. It is noteworthy that the neural network model using the subset of only 12 inputs provides similar predictive ability as compared with the network developed using 24 input variables. This could be the result of filtering out redundant, or nearly redundant, parameters from the set of independent variables.
The goodness of fit, as judged by the squared correlation coefficient for the training selection, r^{2}, can serve as an additional selection criterion for the best model (although there are some limitations to this if the model is overfitted). The closer this value is to 1, the better the model is. For reasonable regression models, q^{2} should be close to r^{2}, and is usually smaller (Wold, 1991). Figure 4 shows the dependence of q^{2} and average value of r^{2} on the number of input variables for LOO crossvalidation experiments using the modular neural network with four hidden units. The observed dependences are similar to each other: the change from 5 to 12 input units causes clear increases in r^{2} and q^{2} values; the predictive ability and the goodness of fit do not change significantly upon a further increase of N. It can be generally concluded from the plots shown that for the studied training set, 12 input independent variables (descriptors 112 in Table 2) and 4 hidden units make a good compromise between the generalization abilities of the modular neural network and the number of adjustable weights.
QSMR Modeling the CYP3A4 NDealkylation Set. The QSMR models with the lowest σ value were used for further analysis. Figure 5 shows the crossvalidated versus observed reaction rates for the best LOO crossvalidation experiment. There are no outliers in this model and the overall good conformity between the predicted and observed log V_{max} values resulted in comparable r^{2} and q^{2} values of 0.85 and 0.82, respectively.
Functional plots are useful tools when analyzing the explicit dependencies between input and output variables in the generated neural network models (So and Karplus, 1997). The functional dependence plot for an independent variable was generated by keeping all but one of the 12 descriptors fixed at a constant value (average value) while scanning the variation of log V_{max} with respect to changes in one descriptor between its mean ± standard deviation. We generated such functional dependencies for each of 12 descriptors and each of 31 QSMR models produced in the course of the LOO crossvalidation procedure. The averaged data with the standard deviation interval were generated for 31 training/crossvalidation cycles. Some interesting conclusions can be drawn from this type of analysis. All 12 descriptors can be divided into two categories according to their observed functional dependencies. The first category comprises six descriptors, for which the relatively stable dependencies between input and output variables were observed in these 31 trainingvalidation cycles. In the second category of the remaining six descriptors, the character of functional dependencies can be altered, sometimes substantially, throughout the models.
For instance, the density of centroids (B_density; Fig. 6) can serve as a measure of steric hindrance of the reaction site. The negative correlation of this latter descriptor with the CYP3A4mediated Ndealkylation rate is intuitively evident and is in good agreement with experimental observations of medicinal chemists in that molecules with sterically hindered reaction sites are usually poor substrates of enzymatic reactions (Buchwald, 2001). Figure 7 illustrates this well using two examples from our reference set. (R)Fluoxetine possesses a lowdensity, sterically unhindered reaction site that is a rapidly metabolized CYP3A4 substrate (Margolis et al., 2000) compared with the sterically hindered mephobarbital. By contrast, disopyramide with its bulky Nsubstituents can be characterized by a relatively low Ndealkylation rate (Echizen et al., 2000). Five other “stable” descriptors are related to the family of Jurs charged partial surface area parameters (Stanton and Jurs, 1990). These descriptors encode hybrid electronic and geometric information and capture the ability of the molecule to form hydrogen bonds. It seems these descriptors are important for selective substrateenzyme interactions in the CYP3A4 active site. However, it should be noted that such discussion of onedimensional sections through multidimensional surfaces is often only qualitative, whereas quantitative prediction of the metabolic reaction rates requires application of the neural network trained with the relevant input descriptors.
Each of the 12 descriptors used was left out of the set of input variables, and the remaining descriptors were provided to the learning algorithm for training. Figure 8 shows the calculated r^{2} and q^{2} values obtained in the course of the LOO crossvalidation experiment with the reduced set of 11 input variables. One can see that, whereas the training set correlation goodness of fit (r^{2}) is almost unaffected, the predictive ability of the models is very sensitive to the type and nature of the input variable removed. The most significant reduction in predictive ability is obtained after removal of B_density and A_PNSA1. Both descriptors belong to the first category, and the observed results are in full agreement with their importance noted above for metabolic Ndealkylation modeling. At the same time, removal of descriptors with an unstable functional dependence, such as A_DPSA3 and A_FNSA1, can also result in a substantial reduction of predictive ability (0.630.65 q^{2} values) as compared with the models based on the entire 12descriptor set. An effect similar to this was observed in another enzyme system (Andrea and Kalayeh, 1991; So and Richards, 1992).
Comparison of Descriptors Calculated for Different Substrate Elements. In this work, we have used descriptors derived from three different elements of the substrate's organization (Fig. 1): whole molecule (type A), topological centroid with r = 3 (type B), and leaving fragment (type C). The direct comparison based on calculated squared crossvalidation correlation coefficients q^{2} is problematic here due to insufficient numbers of descriptors available for type B and C fragments. Therefore, to evaluate their relative importance in the development of predictive QSMR models, we compared the average sensitivity coefficients for descriptors belonging to each particular category. For the entire 24descriptor set (Table 2) selected by the sensitivity analysis, we conducted a LOO crossvalidation experiment and determined the sensitivity coefficients for each descriptor. Then, we calculated the average sensitivity coefficients for each descriptor category based on these 31 runs (Fig. 9). It appears that the wholemolecular descriptors and topological centroids have the largest average SCs (0.80 and 0.83, correspondingly), followed by the leaving fragments (0.52). Such estimations, though based on indirect data, give a valuable insight into the nature of factors affecting the rates of metabolic Ndealkylation reactions.
QSMR Models for CYP2D6 NDealkylation Set. From the extensive QSMR studies described above, we have determined a set of molecular features responsible for Ndealkylation rates. It can be assumed that this set of molecular features should be able to model the rates of Ndealkylation reactions mediated by other P450s different from CYP3A4. In addition, from a practical point of view, the development of an automated program for prediction of metabolic reaction rates for different enzymes would ideally require the application of a unified set of descriptors for metabolic reactions belonging to one particular chemical type. The CYP2D6 enzyme represents another important member of the P450 superfamily responsible for hepatic metabolism of ca. 30% of drugs (Yan et al., 2004). For QSMR modeling of CYP2D6mediated Ndealkylation reactions (Table 1), we used the same 12descriptor set as generated and used for CYP3A4 Ndealkylation (Fig. 8).
For this CYP2D6 training set, we performed a LOO procedure, which generated 36 QSMR models. Figure 10 shows the crossvalidated versus observed log V_{max} values for this model. This plot demonstrates good prediction quality with good q^{2} and r^{2} values (0.79 and 0.80, correspondingly). The general conclusion that emerges from this experiment is that for the CYP2D6 Ndealkylation reaction set, the developed QSMR models based on the same 12 molecular descriptors as for CYP3A4 Ndealkylation provide reasonable generalization accuracy and predictive power.
Test Sets for CYP3A4 and CYP2D6. There are several ways to evaluate the predictive ability of a computational model; leaving groups out and scrambling the descriptors with the biological activity are perhaps the most widely used. The most valuable test is an external set of molecules that have been excluded from the modelbuilding process (Ekins, 2003). In this study, nine CYP3A4mediated and five CYP2D6mediated Ndealkylation reactions with known V_{max} values were collected from the literature and used to test the respective models. A comparison of the calculated and experimental data for the test set reactions (Table 4) demonstrates a good predictive power of the developed models with R^{2} values equal to 0.90 and 0.94 for CYP3A4 and CYP2D6, respectively.
Discussion
In this study, we have described a neural network QSMR analysis of metabolic Ndealkylation reaction rates for two major P450s, CYP3A4 and CYP2D6. This work is a continuation of numerous studies in the field of development of computational models for these P450s (Smith et al., 1997a,b; Lewis et al., 1998; de Groot et al., 1999. Korolev et al., 2003. Balakin et al., 2004; Szklarz et al., 2000; Ekins et al., 2001, 2003; de Groot and Ekins, 2002). To be an effective P450 substrate, molecules should possess a definite avidity to the active sites of P450 enzymes. Upon binding, the molecule can interact either with the heme prosthetic group or with the other regions of the active site. The intermolecular interactions involving polypeptide chains, such as hydrophobic and electrostatic interactions, van der Waals forces, and Hbond formation, are important for binding. The specific microenvironment of the active site of a particular P450 isoform determines the molecular features that a molecule should possess to effectively bind to that site. One of the main conclusions of this study is the requirement for a rigorous appraisal of the properties of a molecule as a whole (structural type A), rather than just relying on knowledge of isolated fragments and functional groups metabolized. In addition, our data indicate the significance of considering properties of topological centroids (type B), which encode the steric hindrance of the reaction site. Based on our calculations, the nature of leaving fragments (type C) is less important for CYP3A4 and is perhaps due to the proposed large volume and complexity of the active site(s) (Ekins et al., 2003). Nevertheless, the complete removal of descriptors belonging to this fragment type leads to a definite reduction of the predictive ability (Fig. 9). As shown by an extensive statistical experiment with diverse architectures of the neural network, the modular neural network with 12 input neurons and 2 hidden layers with 4 processing elements is an appropriate choice for the cases studied. Robust quantitative dependencies were found in these models, which incorporate higher order terms in the QSMR equation.
In summary, we have demonstrated the feasibility of constructing QSMR models for predicting the approximate human P450mediated Ndealkylation rates of prospective new medicinal agents. The models, developed from the available human metabolism data for CYP3A4 and CYP2D6, performed well, and good regression statistics were achieved, despite the inherent complexity of the systems involved. Using an external test set of molecules not included in the models for both enzymes, we were able to show good correlations (R^{2} values equal to 0.90 and 0.94 for CYP3A4 and CYP2D6, respectively) between the experimental and predicted V_{max} values. These neural network models can be readily used for scoring druglike compounds in drug discovery projects, assuming their molecular descriptors are within the range of these current models.
The limitations of the developed models are related to the experimental measurement of metabolismrelated parameters, which are inherently prone to errors. For instance, kinetic constants for the same compound vary substantially between studies, depending on the enzyme's source (recombinant P450s, human liver microsomes). In some cases, the reported V_{max} values for the same compound can vary by 2 to 3 orders of magnitude, which can seriously impact regressionbased QSMR modeling. We obviously cannot expect that the QSMR models based on such small training sets as described in these studies will be predictive for all available compounds that could be used for future testing. The refinement of the models is certainly possible with the availability and incorporation of more compound metabolism data. Future work in expanding the models is underway, alongside the investigation of other algorithms and descriptors for building P450 QSMR models. The latter is important since, presently, descriptors calculated for each enantiomer will be identical; hence pharmacophoretype approaches may be useful for differentiation between each isomer (Ekins et al., 2001).
The overall methodology described here can be extended to the analysis of other types of metabolic transformations (such as Odealkylation) and includes the following steps: 1) dissection of a substrate molecule into the topological centroid of the reaction site and the leaving fragment, 2) calculation of a specific set of descriptors for each element of the substrate's organization, followed by 3) neural network computations. This work illustrates an approach to mining the human P450 metabolism knowledge space using the information from a comprehensive commercially available data base. Further accumulation of experimental data and computational models in this area will pave the way for the development of an integrated and automated system for the prediction of metabolic and toxic effects of organic compounds in humans (Korolev et al., 2003; Balakin et al., 2004; Bugrim et al., 2004). Such an approach incorporates xenobiotic and endobiotic metabolic pathway data bases (along with the routes for regulation of these pathways) together with methods for their reconstruction, representing the application of systems biology (Ekins et al., 2002, 2004). The value of such a software system, incorporating P450 models like those described in this study, will be in the prioritization and selection of molecules for purchase or synthesis in drug discovery, alongside the calculation of other predicted physicochemical properties.
Footnotes

This work is supported by National Institutes of Health Grant 1R43GM06912401 “In Silico Assessment of Drug Metabolism and Toxicity”.

doi:10.1124/dmd.104.000364.

ABBREVIATIONS: QSMR, quantitative structuremetabolism relationship; P450, cytochrome P450; NLM, nonlinear mapping; LOO, leaveoneout; SC, sensitivity coefficient.
 Received April 21, 2004.
 Accepted July 12, 2004.
 The American Society for Pharmacology and Experimental Therapeutics