Based upon the work by Rand1 a gradient boosted decision tree for predicting enthlapy of melting (
- OS: Ubuntu 24.04.3 (if on windows use wsl)
- Conda: version 26.1.1
- Python: 3.14 and packages in the requirments.yml file use:
conda env create -f requirements.ymlconda activate cryst_pred
This is intended to be used on in the terminal and all commands within the Crystalization-prediction-models directory.
The input set must be in a csv format with the first column being SMILES and second NAME with the compounds SMILES and name separated by a comma. For an example of this inspect the data/input_set.csv.
optionally the CAS numbers of the compounds can also be added. This is done in a separate file with a column called CAS. The input set and CAS set must be inputted in the same order in both the input file and CAS file.
To use the premade model make an input file of the compounds which you want tested (as described above) then run bash run.sh. The terminal then will provide prompts to input the path to the input set e.g. data/input_set.csv and the option to add CAS number e.g. data/input_set_CAS.csv.
The results of the software will then be outputted into the output/results.csv file to be inspected.
If you want to make a new model with different data this is possible by using the Model_maker.py file.
To change the training data please edit the contents of the data/dHm_dataset.csv file without changing the column names or file name. Remember the units are kJ mol-1. Also if you want to change the descriptors used in the model please edit the descriptors names on the same line.
With the new data or changed descriptors the next step is to optimize the hyperparameters. This can be done by running the optimize.py file python ./Code/optimize.py <NUMBER OF TRIALS>. From this the values of the best hyperparameters will be displayed and need to manually input into the Model_maker.py file in the area bellow:
From there run python ./Code/Model_maker.py and type yes to the download of the model. Optionally you can plot the residuals via running this command python ./Code/Model_maker.py 1
To run the new model use the same steps as the premade model section
- H. M. Johnson, F. Gusev, J. T. Dull, Y. Seo, R. D. Priestley, O. Isayev and B. P. Rand, J. Am. Chem. Soc., 2024, 146, 21583-21590.