Accepted at the ICML 2026 Workshop on Machine Learning for Audio
PianoKontext is a proof-of-concept model for variable-length expressive rendering of classial piano music. Given a deadpan audio synthesized from a MIDI score, it generates various expressive audios with different timings and dynamics.
Inspired by FLUX Kontext, PianoKontext is a flow matching model trained in the latent space of Music2Latent that enables contextual learning of score-performance dependencies solely through self-attention. Currently, it operates on segments up to 11 seconds.
Try it on Google Colab!
This repository requires Python 3.10 or greater.
# Run in your environment
pip install -r requirements.txt
For training, you might need to install additional packages.
More details will be added soon
- Download the MAESTRO and ASAP datasets.
- Install fluidsynth and a piano soundfont. Synthesize ASAP from MIDI to audio using the
synthesize_asap.pyscript. - Encode MAESTRO and ASAP with Music2Latent using the
encode_audio.pyscript. - Align the embeddings using the
run_align_embeddings.pyscript. It will produce the alignment files between ASAP and MAESTRO and a new metadata. - Calculate the embedding statistics. To this end, combine the rows from ASAP and MAESTRO metadata files. Run the
save_data_stats.pyscript. It will produce the embedding statistics for a joint deadpan-expressive dataset. - Run the
run_flux_training.pyscript to train a PianoKontext model.
- Music2Latent for Music2Latent and pretrained checkpoints
- FLUX Kontext for implementation details
- KAD toolkit for evaluation metrics implementation
- DTAIDistance for fast DTW implementation
