TY - JOUR T1 - USING OPEN SOURCE COMPUTATIONAL TOOLS FOR PREDICTING HUMAN METABOLIC STABILITY AND ADDITIONAL ADME/TOX PROPERTIES JF - Drug Metabolism and Disposition JO - Drug Metab Dispos DO - 10.1124/dmd.110.034918 SP - dmd.110.034918 AU - Rishi Gupta AU - Eric M Gifford AU - Ted Liston AU - Chris L Waller AU - Moses Hohman AU - Barry A Bunin AU - Sean Ekins Y1 - 2010/08/06 UR - http://dmd.aspetjournals.org/content/early/2010/08/06/dmd.110.034918.abstract N2 - Ligand-based computational models could be more readily shared between researchers and organizations if they were generated with open source molecular descriptors (e.g. chemistry development kit, CDK) and modeling algorithms, as this would negate the requirement for proprietary commercial software. We initially evaluated open source descriptors and model building algorithms using a training set of approximately 50,000 molecules and a test set of approximately 25,000 molecules with human liver microsomal metabolic stability data. A C5.0 decision tree model demonstrated that CDK descriptors together with a set of SMARTS keys had good statistics (Kappa = 0.43, sensitivity = 0.57, specificity 0.91, positive predicted value (PPV) = 0.64) equivalent to models built with commercial MOE2D and the same set of SMARTS keys (Kappa = 0.43, sensitivity = 0.58, specificity 0.91, PPV = 0.63). Extending the dataset to ~193,000 molecules and generating a continuous model using Cubist with a combination of CDK and SMARTS keys or MOE2D and SMARTS keys confirmed this observation. When the continuous predictions and actual values were binned to get a categorical score we observed a similar Kappa statistic (0.42). The same combination of descriptor set and modeling method was applied to passive permeability and P-gp efflux data with similar model testing statistics. In summary, open source tools demonstrated comparable predictive results to commercial software with attendant cost savings. We discuss the advantages and disadvantages of open source descriptors and the opportunity for their use as a tool for organizations to share data precompetitively, avoiding repetition and assisting drug discovery. ER -