Feature/breast cancer virchow#12
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Ray Serve FastAPI deployment that accepts LZ4-compressed tiles, obtains Virchow2 embeddings via a foundation-model handle, runs batched ONNX inference to produce scalar probabilities, and supplies Helm/RayService manifests to deploy the application. ChangesBreast Cancer Virchow2 Inference Service
Possibly related PRs
Suggested reviewers
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new Ray application breast-cancer-virchow2 along with its deployment configuration and Python implementation. The service decompresses input tiles, extracts embeddings using a remote Virchow2 foundation model, and runs a local ONNX model on CPU to generate predictions. Key feedback includes optimizing resource allocation by setting num_gpus to 0 since the deployment does not use a GPU, offloading CPU-bound image preprocessing to a thread pool using asyncio.to_thread to prevent blocking the event loop, and simplifying the pipeline by keeping the tile in HWC format to avoid redundant transpose operations.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
helm/rayservice/applications/breast-cancer-virchow2.yaml (1)
23-26: Monitor memory usage with the configured batch size.The deployment allocates 12 GiB memory with
max_batch_size: 128andtile_size: 224. With batch inference of 128 tiles at 224×224 pixels, plus model weights and intermediate tensors, verify this allocation is sufficient during load testing. If OOM errors occur, consider either reducingmax_batch_sizeor increasing the memory allocation.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@helm/rayservice/applications/breast-cancer-virchow2.yaml` around lines 23 - 26, The deployment may be underprovisioned for batch inference: with ray_actor_options.memory set to 12884901888 (12 GiB) while using max_batch_size: 128 and tile_size: 224, run load tests and monitor peak memory; if you observe OOMs, either raise ray_actor_options.memory (increase the numeric value) or reduce max_batch_size and/or tile_size in the model config to lower per-batch memory usage, then re-run tests until memory headroom is acceptable.models/breast_cancer_virchow2.py (1)
98-100: 💤 Low valueReplace debug prints with proper logging.
These print statements appear to be debug artifacts. For production deployments, use Python's logging module or Ray's logger for proper observability with log levels.
🔧 Suggested refactor
+import logging + +logger = logging.getLogger(__name__) + # ... in reconfigure(): - print(f"Using ONNX model path: {model_path}") - print(f"ONNX model exists: {model_path.exists()}") - print(f"ONNX model is file: {model_path.is_file()}") + logger.info(f"Using ONNX model path: {model_path}")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@models/breast_cancer_virchow2.py` around lines 98 - 100, The three debug print statements that output model_path, model_path.exists(), and model_path.is_file() should be replaced with proper logging: import and configure a logger (e.g., logger = logging.getLogger(__name__) or use ray.get_logger()), then replace the prints with a single logger.debug or logger.info call that includes model_path and the boolean checks (referencing model_path and the surrounding function in models/breast_cancer_virchow2.py) so observability uses log levels instead of print statements.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@helm/rayservice/applications/breast-cancer-virchow2.yaml`:
- Around line 9-14: The pip dependency list leaves several packages unpinned
(onnxruntime, lz4, timm, pillow) which risks non-reproducible deployments;
update the pip block to pin each package to a tested exact version or a tight
range (e.g., onnxruntime==<tested-version> or onnxruntime>=x.y.z,<x+1.0.0),
likewise pin lz4, timm, and pillow, and keep/adjust the existing mlflow<3.0
constraint to a specific compatible range if needed; ensure you run your
build/tests with the chosen versions to verify compatibility before committing.
---
Nitpick comments:
In `@helm/rayservice/applications/breast-cancer-virchow2.yaml`:
- Around line 23-26: The deployment may be underprovisioned for batch inference:
with ray_actor_options.memory set to 12884901888 (12 GiB) while using
max_batch_size: 128 and tile_size: 224, run load tests and monitor peak memory;
if you observe OOMs, either raise ray_actor_options.memory (increase the numeric
value) or reduce max_batch_size and/or tile_size in the model config to lower
per-batch memory usage, then re-run tests until memory headroom is acceptable.
In `@models/breast_cancer_virchow2.py`:
- Around line 98-100: The three debug print statements that output model_path,
model_path.exists(), and model_path.is_file() should be replaced with proper
logging: import and configure a logger (e.g., logger =
logging.getLogger(__name__) or use ray.get_logger()), then replace the prints
with a single logger.debug or logger.info call that includes model_path and the
boolean checks (referencing model_path and the surrounding function in
models/breast_cancer_virchow2.py) so observability uses log levels instead of
print statements.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dedc0520-29d5-4cad-a99d-d3f677396038
📒 Files selected for processing (3)
helm/rayservice/applications/breast-cancer-virchow2.yamlhelm/rayservice/values.yamlmodels/breast_cancer_virchow2.py
| self.session = ort.InferenceSession( | ||
| str(model_path), | ||
| providers=["CPUExecutionProvider"], | ||
| ) |
There was a problem hiding this comment.
Is it intended to run it only on CPUs?
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@helm/rayservice/applications/breast-cancer-virchow2.yaml`:
- Line 5: The working_dir value currently points to a moving branch archive
("working_dir:
https://github.com/RationAI/model-service/archive/refs/heads/main.zip"); update
this to an immutable revision by replacing the branch archive with a
commit-specific archive URL or a tagged release archive (e.g., use the archive
URL that includes a specific commit SHA or a release tag) so deployments are
reproducible; locate the working_dir entry in breast-cancer-virchow2.yaml and
change the URL accordingly (or switch to a fixed artifact registry URL) to pin
the runtime code to a single immutable revision.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dc403cd6-5ba6-429b-93ec-d2f71ad8c359
📒 Files selected for processing (2)
helm/rayservice/applications/breast-cancer-virchow2.yamlmodels/breast_cancer_virchow2.py
🚧 Files skipped from review as they are similar to previous changes (1)
- models/breast_cancer_virchow2.py
| virchow2 = timm.create_model( | ||
| "hf-hub:paige-ai/Virchow2", | ||
| pretrained=False, | ||
| num_classes=0, | ||
| mlp_layer=SwiGLUPacked, | ||
| act_layer=torch.nn.SiLU, | ||
| ) |
There was a problem hiding this comment.
Why are you creating full virchow?
There was a problem hiding this comment.
The internal Virchow2 predict method expects a preprocessed torch.Tensor, because the transform is only applied in the HTTP root endpoint. Since breast-cancer-virchow2 calls foundation_model.predict.remote(...) directly, it must apply the same transform before calling Virchow2, so I use this to get the exact same transformation
| return embedding.squeeze(0).cpu().numpy().astype(np.float32, copy=False) | ||
|
|
||
| @serve.batch | ||
| async def predict( |
There was a problem hiding this comment.
The predict method should take only a single sample and pass it to virchow for embeddings. After that you should pass it to a helper method that accepts batches and takes the embeddings and applies the head to it. Also it might be faster to use CPU than GPU if the head is only a single linear layer. But you should test that
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
models/breast_cancer_virchow2.py (1)
164-171:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReturn a 400 error for malformed payloads instead of bubbling 500s.
models/breast_cancer_virchow2.pyroot()currently callsself.lz4.decompress(...)and thennp.frombuffer(...).reshape(...)(lines 165-171) with no error handling; malformed client payloads will raise and return 500. Translate decompression/reshape failures into a400 Bad Request(catch the LZ4 decompression/frame error as well asValueErrorfromreshape).Suggested fix
-from fastapi import FastAPI, Request +from fastapi import FastAPI, HTTPException, Request ... `@fastapi.post`("/") async def root(self, request: Request) -> list[list[float]]: - data = await asyncio.to_thread(self.lz4.decompress, await request.body()) - - tile = np.frombuffer(data, dtype=np.uint8).reshape( - self.tile_size, - self.tile_size, - 3, - ) + try: + data = await asyncio.to_thread(self.lz4.decompress, await request.body()) + tile = np.frombuffer(data, dtype=np.uint8).reshape( + self.tile_size, + self.tile_size, + 3, + ) + except (RuntimeError, ValueError) as exc: + raise HTTPException(status_code=400, detail="Invalid LZ4 tile payload") from exc🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@models/breast_cancer_virchow2.py` around lines 164 - 171, The root handler currently calls self.lz4.decompress(...) and np.frombuffer(...).reshape(...) without handling bad input, causing 500s for malformed payloads; wrap the decompression (self.lz4.decompress) and the reshape (np.frombuffer(...).reshape) in a try/except that catches the lz4 decompression/frame error (catch the specific lz4 exception type if available, otherwise a broad exception around decompress) and ValueError from reshape, and return or raise a 400 Bad Request (e.g., raise fastapi.HTTPException(status_code=400, detail="malformed payload")) from the root function to translate decompression/reshape failures into HTTP 400 responses.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@models/breast_cancer_virchow2.py`:
- Around line 70-79: The code currently picks model_path = candidates[0] from
downloaded_path.rglob("model.onnx") which is nondeterministic and silently
accepts duplicates; change the logic around candidates to (1) convert rglob to a
list called candidates, (2) if not candidates raise the existing
FileNotFoundError, (3) if len(candidates) > 1 raise a clear error (e.g.,
RuntimeError or ValueError) that lists the duplicate paths to fail fast and
force the caller to resolve ambiguous artifacts, and (4) only then set
model_path to the single candidate; reference the variables downloaded_path,
candidates, rglob, and model_path when making the change.
---
Outside diff comments:
In `@models/breast_cancer_virchow2.py`:
- Around line 164-171: The root handler currently calls self.lz4.decompress(...)
and np.frombuffer(...).reshape(...) without handling bad input, causing 500s for
malformed payloads; wrap the decompression (self.lz4.decompress) and the reshape
(np.frombuffer(...).reshape) in a try/except that catches the lz4
decompression/frame error (catch the specific lz4 exception type if available,
otherwise a broad exception around decompress) and ValueError from reshape, and
return or raise a 400 Bad Request (e.g., raise
fastapi.HTTPException(status_code=400, detail="malformed payload")) from the
root function to translate decompression/reshape failures into HTTP 400
responses.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: db809bc8-035f-45e2-8f1c-4670b896c445
📒 Files selected for processing (1)
models/breast_cancer_virchow2.py
Deployment of breast cancer binary predictor using Virchow2 embeddings
Summary by CodeRabbit
New Features
Chores