Demo training layout by yordabayev · Pull Request #3 · generatebio/lock_gp

yordabayev · 2026-05-16T04:44:17Z

Summary

Move the training script out of lock_gp and into demo/ so it is not installed with the package.
Move the example CR6261-H1 data directly under demo/ and update the training script/docs for the new location.
Keep one-hot encoding as demo-only utility code and update the e2e test to run python -m demo.train.

Keep demo training code and data outside the installed lock_gp package while preserving the e2e training path.

Move the example data files directly under demo and update the training path.

martinjankowiak · 2026-05-16T16:17:37Z

    sequences = df["sequence"].astype(str).tolist()
    y_np = df["fitness"].to_numpy(dtype=np.float64)

    alphabet, _ = get_blosum50_matrix()


we should make sure there are comments in the code that clarify that alphabet has to be the one taken from get_blosum50_matrix for both lock and tanimoto? or is there a better way to enforce this?

Now alphabet is explicitly passed to TanimotoGP and LockGP. Currently it has to match the alphabet from get_blosum50_matrix but potentially can be updated to accept a different alphabet.

Thread the demo one-hot alphabet into BLOSUM-backed GP kernels and fail fast on alphabet order or size mismatches.

Enable Ruff's relative-import rule and convert package imports to absolute paths.

yordabayev · 2026-05-18T15:53:52Z

    """Fit Linear, LOCK, and Tanimoto GPs to CR6261-H1 dataset."""
    parser = argparse.ArgumentParser(description="LOCK GP Demo")
-    parser.add_argument("--num-training", type=int, default=256, help="Number of training points.")
+    parser.add_argument("--train-size", type=int, default=256, help="Number of training points.")


Renamed num-training into train-size.

yordabayev · 2026-05-18T15:59:36Z

    sequences = df["sequence"].astype(str).tolist()
    y_np = df["fitness"].to_numpy(dtype=np.float64)

    alphabet, _ = get_blosum50_matrix()


Now alphabet is explicitly passed to TanimotoGP and LockGP. Currently it has to match the alphabet from get_blosum50_matrix but potentially can be updated to accept a different alphabet.

Allow users to pass a custom CSV to the training demo while preserving the bundled data as the default.

yordabayev · 2026-05-18T21:38:06Z

    parser = argparse.ArgumentParser(description="LOCK GP Demo")
-    parser.add_argument("--num-training", type=int, default=256, help="Number of training points.")
+    parser.add_argument(
+        "--data",


Can now provide custom data.

do we really want that? this is a demo not a harness?

It is a small change - maybe people will want to quickly try their own data?

yordabayev added 2 commits May 16, 2026 04:38

Move training demo out of package

d0dd366

Keep demo training code and data outside the installed lock_gp package while preserving the e2e training path.

Flatten demo data files

d2ed770

Move the example data files directly under demo and update the training path.

martinjankowiak reviewed May 16, 2026

View reviewed changes

yordabayev added 2 commits May 18, 2026 15:41

Validate GP alphabets explicitly

1f7d5da

Thread the demo one-hot alphabet into BLOSUM-backed GP kernels and fail fast on alphabet order or size mismatches.

Enforce absolute imports

6081905

Enable Ruff's relative-import rule and convert package imports to absolute paths.

yordabayev commented May 18, 2026

View reviewed changes

Make demo data path configurable

3469783

Allow users to pass a custom CSV to the training demo while preserving the bundled data as the default.

yordabayev commented May 18, 2026

View reviewed changes

martinjankowiak approved these changes May 19, 2026

View reviewed changes

yordabayev merged commit c7cbe0d into main May 19, 2026
1 check passed

martinjankowiak deleted the demo-training-layout branch June 7, 2026 00:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Demo training layout#3

Demo training layout#3
yordabayev merged 5 commits into
mainfrom
demo-training-layout

yordabayev commented May 16, 2026

Uh oh!

martinjankowiak May 16, 2026

Uh oh!

yordabayev May 18, 2026

Uh oh!

yordabayev May 18, 2026

Uh oh!

yordabayev May 18, 2026

Uh oh!

yordabayev May 18, 2026

Uh oh!

martinjankowiak May 19, 2026

Uh oh!

yordabayev May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yordabayev commented May 16, 2026

Summary

Uh oh!

martinjankowiak May 16, 2026

Choose a reason for hiding this comment

Uh oh!

yordabayev May 18, 2026

Choose a reason for hiding this comment

Uh oh!

yordabayev May 18, 2026

Choose a reason for hiding this comment

Uh oh!

yordabayev May 18, 2026

Choose a reason for hiding this comment

Uh oh!

yordabayev May 18, 2026

Choose a reason for hiding this comment

Uh oh!

martinjankowiak May 19, 2026

Choose a reason for hiding this comment

Uh oh!

yordabayev May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants