Skip to content

Demo training layout#3

Merged
yordabayev merged 5 commits into
mainfrom
demo-training-layout
May 19, 2026
Merged

Demo training layout#3
yordabayev merged 5 commits into
mainfrom
demo-training-layout

Conversation

@yordabayev

Copy link
Copy Markdown
Collaborator

Summary

  • Move the training script out of lock_gp and into demo/ so it is not installed with the package.
  • Move the example CR6261-H1 data directly under demo/ and update the training script/docs for the new location.
  • Keep one-hot encoding as demo-only utility code and update the e2e test to run python -m demo.train.

Keep demo training code and data outside the installed lock_gp package while preserving the e2e training path.
Move the example data files directly under demo and update the training path.
Comment thread demo/train.py
sequences = df["sequence"].astype(str).tolist()
y_np = df["fitness"].to_numpy(dtype=np.float64)

alphabet, _ = get_blosum50_matrix()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make sure there are comments in the code that clarify that alphabet has to be the one taken from get_blosum50_matrix for both lock and tanimoto? or is there a better way to enforce this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now alphabet is explicitly passed to TanimotoGP and LockGP. Currently it has to match the alphabet from get_blosum50_matrix but potentially can be updated to accept a different alphabet.

Thread the demo one-hot alphabet into BLOSUM-backed GP kernels and fail fast on alphabet order or size mismatches.
Enable Ruff's relative-import rule and convert package imports to absolute paths.
Comment thread demo/train.py
"""Fit Linear, LOCK, and Tanimoto GPs to CR6261-H1 dataset."""
parser = argparse.ArgumentParser(description="LOCK GP Demo")
parser.add_argument("--num-training", type=int, default=256, help="Number of training points.")
parser.add_argument("--train-size", type=int, default=256, help="Number of training points.")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed num-training into train-size.

Comment thread demo/train.py
sequences = df["sequence"].astype(str).tolist()
y_np = df["fitness"].to_numpy(dtype=np.float64)

alphabet, _ = get_blosum50_matrix()

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now alphabet is explicitly passed to TanimotoGP and LockGP. Currently it has to match the alphabet from get_blosum50_matrix but potentially can be updated to accept a different alphabet.

Allow users to pass a custom CSV to the training demo while preserving the bundled data as the default.
Comment thread demo/train.py
parser = argparse.ArgumentParser(description="LOCK GP Demo")
parser.add_argument("--num-training", type=int, default=256, help="Number of training points.")
parser.add_argument(
"--data",

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can now provide custom data.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really want that? this is a demo not a harness?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a small change - maybe people will want to quickly try their own data?

@yordabayev yordabayev merged commit c7cbe0d into main May 19, 2026
1 check passed
@martinjankowiak martinjankowiak deleted the demo-training-layout branch June 7, 2026 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants