Description

$\Delta G_c = -\Delta H_m ( 1 - \frac{T_c}{T_m})$

Based upon the work by Rand¹ a gradient boosted decision tree for predicting enthlapy of melting ($\Delta H_m$) and then approximate the driving force of crystallisation ($\Delta G_c$) of Fused Ring Electron Acceptors (FREA) according to the equation above. Where $\frac{T_c}{T_m}$ is approximated to 0.785.

Requirements

OS: Ubuntu 24.04.3 (if on windows use wsl)
Conda: version 26.1.1
Python: 3.14 and packages in the requirments.yml file use:

conda env create -f requirements.yml
conda activate cryst_pred

How to use

This is intended to be used on in the terminal and all commands within the Crystalization-prediction-models directory.

Formatting the input set and CAS file

The input set must be in a csv format with the first column being SMILES and second NAME with the compounds SMILES and name separated by a comma. For an example of this inspect the data/input_set.csv.

optionally the CAS numbers of the compounds can also be added. This is done in a separate file with a column called CAS. The input set and CAS set must be inputted in the same order in both the input file and CAS file.

Using the premade model

To use the premade model make an input file of the compounds which you want tested (as described above) then run bash run.sh. The terminal then will provide prompts to input the path to the input set e.g. data/input_set.csv and the option to add CAS number e.g. data/input_set_CAS.csv. The results of the software will then be outputted into the output/results.csv file to be inspected.

Making model

If you want to make a new model with different data this is possible by using the Model_maker.py file.

Changing the training data

To change the training data please edit the contents of the data/dHm_dataset.csv file without changing the column names or file name. Remember the units are kJ mol^-1. Also if you want to change the descriptors used in the model please edit the descriptors names on the same line.

Making the new model

With the new data or changed descriptors the next step is to optimize the hyperparameters. This can be done by running the optimize.py file python ./Code/optimize.py <NUMBER OF TRIALS>. From this the values of the best hyperparameters will be displayed and need to manually input into the Model_maker.py file in the area bellow:

From there run python ./Code/Model_maker.py and type yes to the download of the model. Optionally you can plot the residuals via running this command python ./Code/Model_maker.py 1

To run the new model use the same steps as the premade model section

References

H. M. Johnson, F. Gusev, J. T. Dull, Y. Seo, R. D. Priestley, O. Isayev and B. P. Rand, J. Am. Chem. Soc., 2024, 146, 21583-21590.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Requirements

How to use

Formatting the input set and CAS file

Using the premade model

Making model

Changing the training data

Making the new model

References

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Code		Code
data		data
descriptor		descriptor
models		models
output		output
.gitignore		.gitignore
README.md		README.md
requirements.yml		requirements.yml
run.sh		run.sh

Folders and files

Latest commit

History

Repository files navigation

Description

Requirements

How to use

Formatting the input set and CAS file

Using the premade model

Making model

Changing the training data

Making the new model

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages