A machine learning framework for training artificial neural networks to predict yield strength of multi-principal element alloys (MPEAs) using cross-validation and automated feature engineering.
- Authors : Dishant Beniwal 1, Ashish Kaushik 1, Abhishek Tiwari 1, Pratik K. Ray 1
- Journal : Journal of Materials Research
- DOI : 10.1557/s43578-025-01753-x
The pre-trained model for Yield strength will soon be integrated in MAPAL alongside the already existing phase selection and hardness prediction models.
• Automated Neural Network Training: Build and train ANN models with customizable architectures • Cross-Validation: K-fold cross-validation with multiple runs for robust model evaluation • Feature Engineering: Automatic calculation of alloy features including:
- Elemental properties (mean, variance, etc.)
- Miedema enthalpy calculations (H_chem, H_el, H_mix, H_IM)
- Phase fraction predictions (f_FCC, f_BCC, f_IM) • Model Persistence: Automatic saving of trained models, predictions, and performance metrics • Convergence Monitoring: Ensures models converge properly with error threshold checking
| Package | Purpose |
|---|---|
pandas |
Data manipulation and analysis |
numpy |
Numerical computing |
tensorflow |
Neural network framework |
scikit-learn |
Machine learning utilities |
mapal |
Alloy featurization and pre-trained model transfer learning |
argparse |
Command line argument parsing |
openpyxl |
Excel file reading |
xlsxwriter |
Excel file writing with multi-sheet support |
• Clone this repository • Install required dependencies:
pip install pandas numpy tensorflow scikit-learn openpyxl xlsxwriter mapal| Argument | Type | Description | Required |
|---|---|---|---|
-input_filepath |
String | Path to the Excel input file containing model configurations | Yes |
-modIdStart |
Integer | Starting model ID from the input file | Yes |
-modIdEnd |
Integer | Ending model ID from the input file | Yes |
-saveDir |
String | Directory to save trained models | No (default: "trained-models-raw") |
• Single model training:
python run-autoANN.py -input_filepath models_config.xlsx -modIdStart 1 -modIdEnd 1• Multiple models training:
python run-autoANN.py -input_filepath models_config.xlsx -modIdStart 1 -modIdEnd 5• Custom save directory:
python run-autoANN.py -input_filepath models_config.xlsx -modIdStart 1 -modIdEnd 5 -saveDir my_models| Column | Data Type | Description | Example |
|---|---|---|---|
model_id |
Integer | Unique identifier for each model | 1, 2, 3... |
database_path |
String | Path to training database with the _database directory | "db-mpea-ys-ac-t25-seed1.xlsx" |
x |
String | Feature names separated by semicolons as defined in MAPAL | "(asymmetry,r_cov);(comp_avg,VEC);H_el" |
y |
String | Target variable name | "YS_MPa" |
cols_to_keep |
String | Columns in input database file to retain in output (comma-separated) | "alloy_name,phases,phase_code" |
layer_units |
String | Neural network layer sizes (comma-separated) | "64,32,16" |
activation_functions |
String | Activation functions (comma-separated) | "relu,relu,sigmoid" |
optimizer |
String | Optimizer name | "Adam" |
learning_rate |
Float | Learning rate value | 0.001 |
loss_function |
String | Loss function name | "mae" |
Kfold_split |
Integer | Number of cross-validation folds | 5 |
n_runs |
Integer | Number of training runs with random data-shuffling & model initialization in each run | 3 |
patience_val |
Integer | Early stopping patience | 50 |
max_epochs |
Integer | Maximum training epochs | 1000 |
check_error_value |
Float | Error convergence threshold | 200 |
check_after_iterations |
Integer | Check convergence after N epochs | 100 |
tensorboard_logs |
Integer | Enable TensorBoard logging (1 or 'true') | 1 |
For each trained model, the following directory structure is created:
| Directory/File | Description |
|---|---|
M-{model_id}/ |
Main model directory |
├── M-{model_id}_inputParam.xlsx |
Input parameters used for training |
├── M-{model_id}_xmin-feats.csv |
Feature minimum values for normalization |
├── M-{model_id}_xmax-feats.csv |
Feature maximum values for normalization |
├── M-{model_id}_CV-predictions-all-runs.xlsx |
Cross-validation predictions |
├── M-{model_id}_CV-performance-all-runs.xlsx |
Performance metrics summary |
├── M-{model_id}_model-checkpoints/ |
Training checkpoints directory |
├── M-{model_id}_model-save-final/ |
Final trained models directory |
├── M-{model_id}_csv-logs/ |
Training logs in CSV format |
└── M-{model_id}_tensorboard-logs/ |
TensorBoard logs (if enabled) |