To address pharmacokinetic and toxicological issues in drug development, once the main source of late attrition of drug candidates, many pharmaceutical companies have now implemented early DMPK (Drug Metabolism and Pharmacokinetics) or early toxicological studies. However, such approaches are difficult to emulate in the academic drug discovery environment. Therefore, we began an initiative “Development of a Drug Discovery Informatics System” in collaboration with several other research groups. The main aim of this initiative is to develop more accurate prediction systems for DMPK and toxicological properties primarily targeting academic scientists. Our group’s focus is to develop a pharmacokinetics database and prediction models.

Any good prediction system depends on high-volume, high-quality training datasets. We collected pharmacokinetic and physicochemical parameters from the public bioactivity database, ChEMBL. However, since ChEMBL compiles data obtained in different experimental conditions, we developed a curation workflow to select the data measured in compatible conditions and to reformat the results as appropriate for our prediction system.

In addition to the public data, we have acquired both in vitro and in vivo experimental data under unified protocols. The in vitro experiments include physicochemical parameters such as solubility and distribution coefficient, and pharmacokinetic parameters such as metabolic stability, protein binding in plasma, protein binding in brain homogenate, and blood-to-plasma concentration ratio. In addition, we collected efflux ratio of P-glycoprotein (P-gp), which is the major transporter in gut and brain. The in vivo data include the drug concentrations in plasma and tissues after oral or intravenous administration of the drug and pharmacokinetic parameters calculated therefrom.

We are currently developing several prediction models using these data, and we intend to provide them sequentially. Any questions or comments are welcomed to drumap[at]nibiohn.go.jp (please replace [at] with @).

Current version contains following chemicals and activity data.

Number of records
All registered compounds 30,628
Freebase compounds 27,237
Freebase compounds ignoring stereo structure 25,387
Physicochemical parameters
Parameter name Parameter type Species Our experimental data Curated public data predicted data
Solubility (pH 7.4) Sol7.4 None 163 367 27,237
Solubility (pH 1.2) Sol1.2 None 163
Distribution coefficient (pH 7.4) logD7.4 None 120
In vitro parameters
Parameter name Parameter type Species Our experimental data Curated public data predicted data
Fraction unbound in plasma Fu,p Human 441 2,319 27,237 x 2
Rat 422 27,237 x 2
Fraction unbound in brain homogenate Fu,brain Rat 443
Mammal 253 27,237
Blood-to-plasma concentration ratio Rb Human 213
Hepatic intrinsic clearance in liver microsome CLint Human 166 5,275 27,237
Rat 167
Probability metabolized by each cytochrome P-450 (1A2, 2C9, 2D6, 3A4) CYP Human 27,237 x 4
Site metabolized by each cytochrome P-450 (1A2, 3A4) CYP Human 2,673 + 2,814
Apical-to-basolateral apparent permeability coefficient (Caco-2) Papp Human 4,408 27,237
Apical-to-basolateral apparent permeability coefficient (LLC-PK1) Papp Human 462 27,237
P-gp net efflux ratio (LLC-PK1) NER Human 446 27,237
In vivo parameters
Parameter name Parameter type Species Our experimental data Curated public data predicted data
Drug concentration in 7 tissues*1 or plasma C Rat 100
Initial drug concentration in plasma C0 Rat 39
Peak drug concentration in 7 tissues*1 or plasma Cmax Rat 96
Elimination half-life of a drug in 7 tissues*1 or plasma T1/2 Rat 100
Time to reach peak drug concentration in 7 tissues*1 or plasma Tmax Rat 96
Area under the drug concentration-time curve in 7 tissues*1 or plasma AUC Rat 100
Mean residence time of a drug in plasma MRT Rat 100
Volume of distribution Vd Rat 39
Apparent volume of distribution at oral administration Vd/F Rat 96
Clearance CL Rat 39
Apparent clearance at oral administration CL/F Rat 96
Renal clearance CLr Human 401 27,237
Bioavailability F Rat 35
Fraction absorbed Fa Human 945 27,237
Fraction excreted in urine Fe Human 343 27,237
Excretion type in urine CR type Human 27,237
Toxicity data
Parameter name Parameter type Species Data provided by accompanying projects
IC50 for hERG channel IC50 Human 9,114
IC50 for Cav1.2 channel IC50 Human 204
IC50 for Kv1.5 channel IC50 Human 686
IC50 for Nav1.5 channel IC50 Human 1,321
Link to Hepatotoxicity database IC50 Human and rat 620