Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology

J Chem Inf Comput Sci. 2000 May;40(3):773-7. doi: 10.1021/ci9901338.

Abstract

An accurate and generally applicable method for estimating aqueous solubilities for a diverse set of 1297 organic compounds based on multilinear regression and artificial neural network modeling was developed. Molecular connectivity, shape, and atom-type electrotopological state (E-state) indices were used as structural parameters. The data set was divided into a training set of 884 compounds and a randomly chosen test set of 413 compounds. The structural parameters in a 30-12-1 artificial neural network included 24 atom-type E-state indices and six other topological indices, and for the test set, a predictive r2 = 0.92 and s = 0.60 were achieved. With the same parameters the statistics in the multilinear regression were r2 = 0.88 and s = 0.71, respectively.