Drugs/CDs complexation

Current research demonstrated that machine learning can be adopted to build up high-accurate predictive models in drugs/cyclodextrins (CDs) systems. Molecular descriptors of compounds and experimental conditions were employed as inputs, while complexation free energy as outputs. Results showed that the light gradient boosting machine provided significantly improved predictive performance over random forest and deep learning. The mean absolute error was 1.38 kJ/mol and squared correlation coefficient was 0.86. In conclusion, the developed predictive models were able to quickly and accu-rately predict the solubilizing capacity of CD systems.

Figure 1: the statistic of the dataset.
Figure 2: the performance of the model.
Table 1: the parameters and statistics.
Item Value
Training set number: 2318
Validation set number: 339
Test set number: 335
Training set R2/RMSE 0.86/1.94
Validation set R2/RMSE 0.87/1.63
Test set R2/RMSE 0.86/1.83
Related publication

Zhao Q, Ye Z, Su Y, et al. Predicting complexation performance between cyclodextrins and guest molecules by integrated machine learning and molecular modeling techniques[J].

Acta Pharmaceutica Sinica B, 2019, 9(6): 1241-1252.
Solid Dispersion Stability

Amorphous solid dispersion (SD) is an effective solubilization technique for water-insoluble drugs. However, physical stability issue of solid dispersions still heavily hindered the development of this technique. Traditional stability experiments need to be tested at least three to six months, which is time-consuming and unpredictable. In this research, a novel prediction model for physical stability of solid dispersion formulations was developed by machine learning techniques. 646 stability data points were collected and described by over 20 molecular descriptors. All data was classified into the training set (60%), validation set (20%), and testing set (20%) by the improved maximum dissimilarity algorithm (MD-FIS). Eight machine learning approaches were compared and random forest (RF) model achieved the best prediction accuracy.

Figure 1: the statistic of the dataset.
Table 1: the parameters and statistics.
Item 3 month 6 month
Training set: 407 407
Validation set: 121 121
Test set: 121 121
Training set ACC 0.951 0.941
Validation set ACC 0.833 0.817
Test set ACC 0.9 0.842
Related publication

Han R, Xiong H, Ye Z, et al. Predicting physical stability of solid dispersions by machine learning techniques[J].

Journal of Controlled Release, 2019, 311: 16-25.
Phospholipid Complex Combination

The drug-phospholipid complexation technique is widely used as an increasingly common method to improve the bioavailability of drugs. This project aims to predict the complexation rate of phospholipid complex by machine learning methods. 363 drug/phospholipid complex formulations were collected from the literature. The datasets were trained by several popular machine learning methods including RandomForest, SVM, KNN, Naive Bayes, Decision Tree, XGBoost, LightGBM. By analysis and compare these methods, the RandomForest got better performance. The predictive accuracy and AUC of RandomForest was 0.772/0.824 on training set(5-fold cross validation), 0.808/0.902 for the test set, respectively.

Figure 1: the statistic of the phospholipids in dataset.
Table 1: the parameters and statistics.
Item Combination
Training set: 290
Test set: 73
Training set ACC/AUC 0.772/0.824
Test set ACC/AUC 0.808/0.902
Related publication

Gao H, Ye Z, Dong J, et al. Predicting drug/phospholipid complexation by the lightGBM method[J].

Chemical Physics Letters, 2020, 747: 137354.
Properties of drug nanocrystals

Nanocrystals have exhibited great advantages for solving the dissolution issue of water insoluble drugs. In this study, the machine learning models for the three nanocrystal preparation methods were constructed for the nanocrystal size and PDI predictions. The results demonstrated that random forest (RF) exhibited well predictive performance for the ball wet milling (BWM), high-pressure homogenization (HPH) methods and anti solvent method (ASP). Here, by submitting a molecule, we can predict both size and PDI for the three methods.

Figure 1: the statistic of the antisolvent dataset.
Figure 2: the statistic of the HPH dataset.
Figure 3: the statistic of the BWM dataset.
Table 1: the statistics of the models.
Dataset Size(Qcv2/Rt2) PDI(Qcv2/Rt2)
Antisolvent: 0.434/0.419 0.593/0.518
HPH: 0.585/0.554 0.786/0.690
BWM: 0.699/0.786 0.837/0.796
Related publication

He Y, Ye Z, Liu X, et al. Can machine learning predict drug nanocrystals?[J].

Journal of Controlled Release, 2020, 322: 274-285.
Properties of Self-Emulsifying Drug Delivery System

Self-emulsifying drug delivery system (SEDDS), a thermodynamic stable nano formulation, is consists of oil, surfactant, co-surfactant and APIs. For oral administration, drugs are dissolved in the SEDDS instead of solid state, which benefit to the absorption in the Gi tract. Here, a pseudo-ternary phase diagram predicting model was constructed by random forest algorithm and this model could predict the self-emulsion status of the formulation under specified conditions.

Figure 1: the statistic of the oli types in the dataset.
Figure 2: the statistic of the surfactant types in the dataset.
Figure 3: the statistic of the cosurfactant types in the dataset.
Table 1: the statistics of the models.
ACC(5-fold CV/Test) AUC(5-fold CV/Test) SE(5-fold CV/Test) SP(5-fold CV/Test)
0.925/0.926 0.977/0.978 0.922/0.927 0.927/0.926
Related publication

Gao H, Jia H, Dong J, et al. Integrated in silico formulation design of self-emulsifying drug delivery systems[J].

Acta Pharmaceutica Sinica B, 2021, 11(11): 3585-3594 (JCRC Q1, IF: 14.90).
Properties of drug liposome

Liposome is a spherical vesicle that consists of a bilayer of lipid and internal hydrophilic cavity with particle sizes ranging from 30 nm to several micrometers. As its great biocompatibility, biodegradability and low toxicity, liposome is regarded as a promising drug delivery system, not only for drug molecule delivery, but for genes and bio-macromolecules. The composition of lipids, particle size, surface charge and lamellarity would decide the structure and morphology of liposome particles. The various preparation methods have been applied in liposome products. With the development of liposome technology, a number of liposome-based drug formulations have been available for market and for clinic trails. In current research, size, PDI, zeta potential and encapsulation of liposome are regarded as key parameters for formulation evaluation. Herein, we develop a prediction platform for these key parameters of liposome.

Table 1: the statistics of the models.
Properties Qcv2/Rt2 RMSEcv/RMSEt MAEcv/MAEt
Zeta: 0.571/0.644 17.528/12.559 11.086/8.299
PDI: 0.408/0.604 0.081/0.055 0.059/0.042
Size: 0.707/0.823 145.271/116.002 75.263/67.198
Encap: 0.764/0.724 16.185/17.926 10.853/11.333
Solubility in different solvent

Drug solubility is a basic property for the formulation design, especially in different solvents. Here, we collected a current largest dataset containing more than 4000 data points. Then, we employed lightGBM and RandomForest algorithms to build a robust model. This tool enables users to predict drug solubility in 27 kinds of solvent.

Table 1: the performance of the model.
Items Values
Qcv2 0.922
RMSEcv 0.380
MAEcv 0.180
Rt2 0.910
RMSEt 0.395
MAEt 0.177
Figure 1: the statistics of the solvents.
Related publication

Ye Z, Defang Ouyang. Prediction of small-molecule compound solubility in organic solvents by machine learning algorithms[J].

Journal of Cheminformatics, 98(2021).