Demo training layout#3
Conversation
Keep demo training code and data outside the installed lock_gp package while preserving the e2e training path.
Move the example data files directly under demo and update the training path.
| sequences = df["sequence"].astype(str).tolist() | ||
| y_np = df["fitness"].to_numpy(dtype=np.float64) | ||
|
|
||
| alphabet, _ = get_blosum50_matrix() |
There was a problem hiding this comment.
we should make sure there are comments in the code that clarify that alphabet has to be the one taken from get_blosum50_matrix for both lock and tanimoto? or is there a better way to enforce this?
There was a problem hiding this comment.
Now alphabet is explicitly passed to TanimotoGP and LockGP. Currently it has to match the alphabet from get_blosum50_matrix but potentially can be updated to accept a different alphabet.
Thread the demo one-hot alphabet into BLOSUM-backed GP kernels and fail fast on alphabet order or size mismatches.
Enable Ruff's relative-import rule and convert package imports to absolute paths.
| """Fit Linear, LOCK, and Tanimoto GPs to CR6261-H1 dataset.""" | ||
| parser = argparse.ArgumentParser(description="LOCK GP Demo") | ||
| parser.add_argument("--num-training", type=int, default=256, help="Number of training points.") | ||
| parser.add_argument("--train-size", type=int, default=256, help="Number of training points.") |
There was a problem hiding this comment.
Renamed num-training into train-size.
| sequences = df["sequence"].astype(str).tolist() | ||
| y_np = df["fitness"].to_numpy(dtype=np.float64) | ||
|
|
||
| alphabet, _ = get_blosum50_matrix() |
There was a problem hiding this comment.
Now alphabet is explicitly passed to TanimotoGP and LockGP. Currently it has to match the alphabet from get_blosum50_matrix but potentially can be updated to accept a different alphabet.
Allow users to pass a custom CSV to the training demo while preserving the bundled data as the default.
| parser = argparse.ArgumentParser(description="LOCK GP Demo") | ||
| parser.add_argument("--num-training", type=int, default=256, help="Number of training points.") | ||
| parser.add_argument( | ||
| "--data", |
There was a problem hiding this comment.
Can now provide custom data.
There was a problem hiding this comment.
do we really want that? this is a demo not a harness?
There was a problem hiding this comment.
It is a small change - maybe people will want to quickly try their own data?
Summary
lock_gpand intodemo/so it is not installed with the package.demo/and update the training script/docs for the new location.python -m demo.train.