Skip to content

EthanT-E/Crystalization-prediction-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

$\Delta G_c = -\Delta H_m ( 1 - \frac{T_c}{T_m})$

Based upon the work by Rand1 a gradient boosted decision tree for predicting enthlapy of melting ($\Delta H_m$) and then approximate the driving force of crystallisation ($\Delta G_c$) of Fused Ring Electron Acceptors (FREA) according to the equation above. Where $\frac{T_c}{T_m}$ is approximated to 0.785.

Requirements

  • OS: Ubuntu 24.04.3 (if on windows use wsl)
  • Conda: version 26.1.1
  • Python: 3.14 and packages in the requirments.yml file use:
  1. conda env create -f requirements.yml
  2. conda activate cryst_pred

How to use

This is intended to be used on in the terminal and all commands within the Crystalization-prediction-models directory.

Formatting the input set and CAS file

The input set must be in a csv format with the first column being SMILES and second NAME with the compounds SMILES and name separated by a comma. For an example of this inspect the data/input_set.csv.

optionally the CAS numbers of the compounds can also be added. This is done in a separate file with a column called CAS. The input set and CAS set must be inputted in the same order in both the input file and CAS file.

Using the premade model

To use the premade model make an input file of the compounds which you want tested (as described above) then run bash run.sh. The terminal then will provide prompts to input the path to the input set e.g. data/input_set.csv and the option to add CAS number e.g. data/input_set_CAS.csv. The results of the software will then be outputted into the output/results.csv file to be inspected.

Making model

If you want to make a new model with different data this is possible by using the Model_maker.py file.

Changing the training data

To change the training data please edit the contents of the data/dHm_dataset.csv file without changing the column names or file name. Remember the units are kJ mol-1. Also if you want to change the descriptors used in the model please edit the descriptors names on the same line.

Making the new model

With the new data or changed descriptors the next step is to optimize the hyperparameters. This can be done by running the optimize.py file python ./Code/optimize.py <NUMBER OF TRIALS>. From this the values of the best hyperparameters will be displayed and need to manually input into the Model_maker.py file in the area bellow:

image

From there run python ./Code/Model_maker.py and type yes to the download of the model. Optionally you can plot the residuals via running this command python ./Code/Model_maker.py 1

To run the new model use the same steps as the premade model section

References

  1. H. M. Johnson, F. Gusev, J. T. Dull, Y. Seo, R. D. Priestley, O. Isayev and B. P. Rand, J. Am. Chem. Soc., 2024, 146, 21583-21590.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages