To address pharmacokinetic and toxicological issues in drug development, once the main source of late attrition of drug candidates, many pharmaceutical companies have now implemented early DMPK (Drug Metabolism and Pharmacokinetics) or early toxicological studies. However, such approaches are difficult to emulate in the academic drug discovery environment. Therefore, we began an initiative “Development of a Drug Discovery Informatics System” in collaboration with several other research groups. The main aim of this initiative is to develop more accurate prediction systems for DMPK and toxicological properties primarily targeting academic scientists. Our group’s focus is to develop a pharmacokinetics database and prediction models.

Any good prediction system depends on high-volume, high-quality training datasets. We collected pharmacokinetic and physicochemical parameters from the public bioactivity database, ChEMBL. However, since ChEMBL compiles data obtained in different experimental conditions, we developed a curation workflow to select the data measured in compatible conditions and to reformat the results as appropriate for our prediction system.

In addition to the public data, we have acquired both in vitro and in vivo experimental data under unified protocols. The in vitro experiments include physicochemical parameters such as solubility and distribution coefficient, and pharmacokinetic parameters such as metabolic stability, protein binding in plasma, protein binding in brain homogenate, and blood-to-plasma concentration ratio. In addition, we collected efflux ratio of P-glycoprotein (P-gp), which is the major transporter in gut and brain. The in vivo data include the drug concentrations in plasma and tissues after oral or intravenous administration of the drug and pharmacokinetic parameters calculated therefrom.

We are currently developing several prediction models using these data, and we intend to provide them sequentially.

Current version contains following chemicals and activity data.

Number of records
All registered compounds 30,391
Freebase compounds 27,100
Freebase compounds with different connection 25,277
Parameter Species in-house data curated public data predicted data
Name Type current to be released current to be released current
Physicochemical parameters
Solubility (pH 7.4) Sol7.4 20 165 367 17,886
Solubility (pH 1.2) Sol1.2 20 165
Distribution coefficient (pH 7.4) logD7.4 20 120
In vitro parameters
Unbound fraction in plasma Fu,p Human 20 459 2,319 17,886
Rat 20 459
Unbound fraction in brain homogenate Fu,brain Rat 20 459
Mammal 253 17,886
Blood-to-plasma concentration ratio Rb Human
Rat 20 165
Permeability coefficient (LLC-PK1) Papp Human 468
Permeability coefficient (Caco-2) Papp Human 4,408 17,886
P-gp net efflux ratio NER Human 468
Metabolic stability in liver microsome CLint Human 20 163 5,275
Rat 20 163
In vivo parameters
Drug concentration in plasma (p.o., i.v.) C Rat 20 100
Drug distribution in tissues (brain, CSF, heart, kidney, liver, lung, muscle, plasma) C Rat 20 100
Initial drug concentration in plasma C0 Rat 20 39
Maximum drug concentration Cmax Rat 20 96
Elimination half-life of a drug T1/2 Rat 20 100
Time to reach maximum drug concentration Tmax Rat 20 96
Area under the drug concentration-time curve AUC Rat 20 100
Mean residence time of a drug MRT Rat 20 100
Tissue-to plasma concentration ratio Kp Rat 20 100
Apparent volume of distribution Vd Rat 20 39
Apparent volume of distribution at oral administration Vd/F Rat 20 96
Clearance CL Rat 20 39
Oral clearance CL/F Rat 20 96
Hepatic clearance CLh Human
Renal Clearance CLr Human 401 17,886
Renal clearance ratio CR Human 17,886
Fraction absorbed Fa Human 945 17,886
Fraction excreted unchanged in urine Fe Human 343 17,886
Bioavailability F Rat 20 35
Parameter Current
Name Type
Toxicity data
IC50 for hERG channel IC50 9,114
IC50 for Cav1.2 channel IC50 204
IC50 for Kv1.5 channel IC50 686
IC50 for Nav1.5 channel IC50 1,321
Link to Hepatotoxicity database 606