Feature/breast cancer virchow by kahood23 · Pull Request #12 · RationAI/model-service

kahood23 · 2026-06-02T18:43:27Z

Deployment of breast cancer binary predictor using Virchow2 embeddings

Summary by CodeRabbit

New Features
- Added a new breast-cancer-virchow2 inference service with an HTTP POST endpoint that accepts compressed image tiles and returns a 1×1 probability map per tile.
- Uses Virchow2 foundation model embeddings plus a local model for per-tile prediction with GPU-accelerated batching and autoscaling.
Chores
- Registered the new application in deployment configuration for Helm/RayService.

coderabbitai · 2026-06-02T18:43:39Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a Ray Serve FastAPI deployment that accepts LZ4-compressed tiles, obtains Virchow2 embeddings via a foundation-model handle, runs batched ONNX inference to produce scalar probabilities, and supplies Helm/RayService manifests to deploy the application.

Changes

Breast Cancer Virchow2 Inference Service

Layer / File(s)	Summary
Config TypedDict and Ray Serve Deployment Structure `models/breast_cancer_virchow2.py`	`Config` TypedDict and module-level `FastAPI`; declares `BreastCancerVirchow2` Ray Serve deployment with FastAPI ingress.
Model and Transform Initialization `models/breast_cancer_virchow2.py`	`reconfigure()` binds the foundation-model handle, builds Virchow2 `timm` transform, locates `model.onnx` from provider output (raises FileNotFoundError if missing), creates ONNXRuntime CUDA session, captures tensor names, and sets batching params.
Tile Preparation for Virchow2 `models/breast_cancer_virchow2.py`	`_prepare_tile_for_virchow2()` converts CHW uint8 tile to HWC PIL image and applies the Virchow2 preprocessing transform to return a tensor.
Embedding Creation via Foundation Model `models/breast_cancer_virchow2.py`	`_create_embedding()` preprocesses a single tile, calls foundation-model `predict.remote`, normalizes embedding shapes, pools class and mean patch tokens, and returns float32 NumPy embeddings.
Batched ONNX Head Inference `models/breast_cancer_virchow2.py`	`_predict_head()` stacks embeddings into a float32 batch, runs ONNX session, and formats each scalar output as a 1×1 float32 array.
Predict Flow and HTTP Endpoint `models/breast_cancer_virchow2.py`	`predict()` coordinates embedding and head inference; `POST /` handler LZ4-decompresses request, reshapes to RGB tile using `tile_size`, transposes to CHW, calls `predict()`, and returns nested lists; exports `app = BreastCancerVirchow2.bind()`.
RayService Helm Deployment Configuration `helm/rayservice/applications/breast-cancer-virchow2.yaml`, `helm/rayservice/values.yaml`	New RayService manifest and Helm values entry for `breast-cancer-virchow2`: route `/breast-cancer-virchow2`, import path `models.breast_cancer_virchow2:app`, runtime env (GitHub ZIP, MLFLOW_TRACKING_URI, HF_HOME), `pip` deps, autoscaling (min 0/max 2, target 64), actor resources, and MLflow artifact provider settings.

Possibly related PRs

RationAI/model-service#2: Adds the Virchow2 foundation-model deployment and its provider/caching support used by this service.

Suggested reviewers

matejpekar
JakubPekar
Adames4

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibble tiles with eager paws,
embeddings hum without a pause,
ONNX sings on GPU light,
a tiny model keeps watch at night.
Hooray — detection hops in sight!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Feature/breast cancer virchow' is partially related to the changeset—it mentions the application domain (breast cancer virchow), but is overly broad and lacks specificity about the primary change.	Revise the title to be more specific and descriptive of the main change, such as 'Add Virchow2-based breast cancer classification service' or 'Deploy BreastCancerVirchow2 Ray Serve model'.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/breast-cancer-virchow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a new Ray application breast-cancer-virchow2 along with its deployment configuration and Python implementation. The service decompresses input tiles, extracts embeddings using a remote Virchow2 foundation model, and runs a local ONNX model on CPU to generate predictions. Key feedback includes optimizing resource allocation by setting num_gpus to 0 since the deployment does not use a GPU, offloading CPU-bound image preprocessing to a thread pool using asyncio.to_thread to prevent blocking the event loop, and simplifying the pipeline by keeping the tile in HWC format to avoid redundant transpose operations.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

helm/rayservice/applications/breast-cancer-virchow2.yaml (1)
23-26: Monitor memory usage with the configured batch size.

The deployment allocates 12 GiB memory with max_batch_size: 128 and tile_size: 224. With batch inference of 128 tiles at 224×224 pixels, plus model weights and intermediate tensors, verify this allocation is sufficient during load testing. If OOM errors occur, consider either reducing max_batch_size or increasing the memory allocation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@helm/rayservice/applications/breast-cancer-virchow2.yaml` around lines 23 -
26, The deployment may be underprovisioned for batch inference: with
ray_actor_options.memory set to 12884901888 (12 GiB) while using max_batch_size:
128 and tile_size: 224, run load tests and monitor peak memory; if you observe
OOMs, either raise ray_actor_options.memory (increase the numeric value) or
reduce max_batch_size and/or tile_size in the model config to lower per-batch
memory usage, then re-run tests until memory headroom is acceptable.
models/breast_cancer_virchow2.py (1)
98-100: 💤 Low value

Replace debug prints with proper logging.

These print statements appear to be debug artifacts. For production deployments, use Python's logging module or Ray's logger for proper observability with log levels.
🔧 Suggested refactor
+import logging
+
+logger = logging.getLogger(__name__)
+
 # ... in reconfigure():
-        print(f"Using ONNX model path: {model_path}")
-        print(f"ONNX model exists: {model_path.exists()}")
-        print(f"ONNX model is file: {model_path.is_file()}")
+        logger.info(f"Using ONNX model path: {model_path}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/breast_cancer_virchow2.py` around lines 98 - 100, The three debug
print statements that output model_path, model_path.exists(), and
model_path.is_file() should be replaced with proper logging: import and
configure a logger (e.g., logger = logging.getLogger(__name__) or use
ray.get_logger()), then replace the prints with a single logger.debug or
logger.info call that includes model_path and the boolean checks (referencing
model_path and the surrounding function in models/breast_cancer_virchow2.py) so
observability uses log levels instead of print statements.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@helm/rayservice/applications/breast-cancer-virchow2.yaml`:
- Around line 9-14: The pip dependency list leaves several packages unpinned
(onnxruntime, lz4, timm, pillow) which risks non-reproducible deployments;
update the pip block to pin each package to a tested exact version or a tight
range (e.g., onnxruntime==<tested-version> or onnxruntime>=x.y.z,<x+1.0.0),
likewise pin lz4, timm, and pillow, and keep/adjust the existing mlflow<3.0
constraint to a specific compatible range if needed; ensure you run your
build/tests with the chosen versions to verify compatibility before committing.

---

Nitpick comments:
In `@helm/rayservice/applications/breast-cancer-virchow2.yaml`:
- Around line 23-26: The deployment may be underprovisioned for batch inference:
with ray_actor_options.memory set to 12884901888 (12 GiB) while using
max_batch_size: 128 and tile_size: 224, run load tests and monitor peak memory;
if you observe OOMs, either raise ray_actor_options.memory (increase the numeric
value) or reduce max_batch_size and/or tile_size in the model config to lower
per-batch memory usage, then re-run tests until memory headroom is acceptable.

In `@models/breast_cancer_virchow2.py`:
- Around line 98-100: The three debug print statements that output model_path,
model_path.exists(), and model_path.is_file() should be replaced with proper
logging: import and configure a logger (e.g., logger =
logging.getLogger(__name__) or use ray.get_logger()), then replace the prints
with a single logger.debug or logger.info call that includes model_path and the
boolean checks (referencing model_path and the surrounding function in
models/breast_cancer_virchow2.py) so observability uses log levels instead of
print statements.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dedc0520-29d5-4cad-a99d-d3f677396038

📥 Commits

Reviewing files that changed from the base of the PR and between 3d37a6e and 5102177.

📒 Files selected for processing (3)

helm/rayservice/applications/breast-cancer-virchow2.yaml
helm/rayservice/values.yaml
models/breast_cancer_virchow2.py

Jurgee · 2026-06-04T16:18:37Z

+        self.session = ort.InferenceSession(
+            str(model_path),
+            providers=["CPUExecutionProvider"],
+        )


Is it intended to run it only on CPUs?

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@helm/rayservice/applications/breast-cancer-virchow2.yaml`:
- Line 5: The working_dir value currently points to a moving branch archive
("working_dir:
https://github.com/RationAI/model-service/archive/refs/heads/main.zip"); update
this to an immutable revision by replacing the branch archive with a
commit-specific archive URL or a tagged release archive (e.g., use the archive
URL that includes a specific commit SHA or a release tag) so deployments are
reproducible; locate the working_dir entry in breast-cancer-virchow2.yaml and
change the URL accordingly (or switch to a fixed artifact registry URL) to pin
the runtime code to a single immutable revision.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dc403cd6-5ba6-429b-93ec-d2f71ad8c359

📥 Commits

Reviewing files that changed from the base of the PR and between d35bd5b and b384791.

📒 Files selected for processing (2)

helm/rayservice/applications/breast-cancer-virchow2.yaml
models/breast_cancer_virchow2.py

🚧 Files skipped from review as they are similar to previous changes (1)

models/breast_cancer_virchow2.py

matejpekar · 2026-06-04T18:10:02Z

+        virchow2 = timm.create_model(
+            "hf-hub:paige-ai/Virchow2",
+            pretrained=False,
+            num_classes=0,
+            mlp_layer=SwiGLUPacked,
+            act_layer=torch.nn.SiLU,
+        )


Why are you creating full virchow?

The internal Virchow2 predict method expects a preprocessed torch.Tensor, because the transform is only applied in the HTTP root endpoint. Since breast-cancer-virchow2 calls foundation_model.predict.remote(...) directly, it must apply the same transform before calling Virchow2, so I use this to get the exact same transformation

matejpekar · 2026-06-04T18:18:07Z

+        return embedding.squeeze(0).cpu().numpy().astype(np.float32, copy=False)
+
+    @serve.batch
+    async def predict(


The predict method should take only a single sample and pass it to virchow for embeddings. After that you should pass it to a helper method that accepts batches and takes the embeddings and applies the head to it. Also it might be faster to use CPU than GPU if the head is only a single linear layer. But you should test that

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

models/breast_cancer_virchow2.py (1)

164-171: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return a 400 error for malformed payloads instead of bubbling 500s.

models/breast_cancer_virchow2.py root() currently calls self.lz4.decompress(...) and then np.frombuffer(...).reshape(...) (lines 165-171) with no error handling; malformed client payloads will raise and return 500. Translate decompression/reshape failures into a 400 Bad Request (catch the LZ4 decompression/frame error as well as ValueError from reshape).

Suggested fix

-from fastapi import FastAPI, Request
+from fastapi import FastAPI, HTTPException, Request
...
     `@fastapi.post`("/")
     async def root(self, request: Request) -> list[list[float]]:
-        data = await asyncio.to_thread(self.lz4.decompress, await request.body())
-
-        tile = np.frombuffer(data, dtype=np.uint8).reshape(
-            self.tile_size,
-            self.tile_size,
-            3,
-        )
+        try:
+            data = await asyncio.to_thread(self.lz4.decompress, await request.body())
+            tile = np.frombuffer(data, dtype=np.uint8).reshape(
+                self.tile_size,
+                self.tile_size,
+                3,
+            )
+        except (RuntimeError, ValueError) as exc:
+            raise HTTPException(status_code=400, detail="Invalid LZ4 tile payload") from exc

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/breast_cancer_virchow2.py` around lines 164 - 171, The root handler
currently calls self.lz4.decompress(...) and np.frombuffer(...).reshape(...)
without handling bad input, causing 500s for malformed payloads; wrap the
decompression (self.lz4.decompress) and the reshape (np.frombuffer(...).reshape)
in a try/except that catches the lz4 decompression/frame error (catch the
specific lz4 exception type if available, otherwise a broad exception around
decompress) and ValueError from reshape, and return or raise a 400 Bad Request
(e.g., raise fastapi.HTTPException(status_code=400, detail="malformed payload"))
from the root function to translate decompression/reshape failures into HTTP 400
responses.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@models/breast_cancer_virchow2.py`:
- Around line 70-79: The code currently picks model_path = candidates[0] from
downloaded_path.rglob("model.onnx") which is nondeterministic and silently
accepts duplicates; change the logic around candidates to (1) convert rglob to a
list called candidates, (2) if not candidates raise the existing
FileNotFoundError, (3) if len(candidates) > 1 raise a clear error (e.g.,
RuntimeError or ValueError) that lists the duplicate paths to fail fast and
force the caller to resolve ambiguous artifacts, and (4) only then set
model_path to the single candidate; reference the variables downloaded_path,
candidates, rglob, and model_path when making the change.

---

Outside diff comments:
In `@models/breast_cancer_virchow2.py`:
- Around line 164-171: The root handler currently calls self.lz4.decompress(...)
and np.frombuffer(...).reshape(...) without handling bad input, causing 500s for
malformed payloads; wrap the decompression (self.lz4.decompress) and the reshape
(np.frombuffer(...).reshape) in a try/except that catches the lz4
decompression/frame error (catch the specific lz4 exception type if available,
otherwise a broad exception around decompress) and ValueError from reshape, and
return or raise a 400 Bad Request (e.g., raise
fastapi.HTTPException(status_code=400, detail="malformed payload")) from the
root function to translate decompression/reshape failures into HTTP 400
responses.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: db809bc8-035f-45e2-8f1c-4670b896c445

📥 Commits

Reviewing files that changed from the base of the PR and between b384791 and ba2eb95.

📒 Files selected for processing (1)

models/breast_cancer_virchow2.py

kahood23 added 8 commits June 2, 2026 08:46

feat: breast cancer virchow model added

0246465

feat: update bc model output from predict

5b5e286

fix: values format

b97d7ef

fix: pip dependencies

8b57b68

fix: load onnx as a whole folder

bf28420

fix: imports placement

064373f

fix: model path

f3375e1

feat: full sha

5102177

kahood23 requested review from Jurgee and matejpekar June 2, 2026 18:43

kahood23 self-assigned this Jun 2, 2026

kahood23 requested review from a team and ejdam87 June 2, 2026 18:43

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread helm/rayservice/applications/breast-cancer-virchow2.yaml

Comment thread models/breast_cancer_virchow2.py Outdated

Comment thread models/breast_cancer_virchow2.py

Comment thread models/breast_cancer_virchow2.py

fix: formatting and typing

71df259

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread helm/rayservice/applications/breast-cancer-virchow2.yaml

kahood23 added 4 commits June 3, 2026 06:13

fix: num gpus set to 0

9beddce

feat: updated sha

d4604cc

feat: return num gpu to 1

cc4092f

fix: update sha

378f605

Jurgee requested changes Jun 3, 2026

View reviewed changes

Comment thread helm/rayservice/applications/breast-cancer-virchow2.yaml Outdated

Comment thread helm/rayservice/applications/breast-cancer-virchow2.yaml Outdated

kahood23 added 5 commits June 3, 2026 07:00

feat: try to fasten heatmap builder after faster single tile prediction

3ea57f1

feat: update sha

a1ee087

feat: point heatmap and virchow to branch

d35bd5b

feat: point working dir to main

6b5b6c0

fix: revert changes in heatmap builder

9a88352

kahood23 requested review from Jurgee and removed request for ejdam87 June 4, 2026 11:37

Jurgee requested changes Jun 4, 2026

View reviewed changes

fix: review changes, move to gpu and remove min replicas

b384791

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread helm/rayservice/applications/breast-cancer-virchow2.yaml

fix: cuda instead of gpu naming inconsistency

3f89133

matejpekar requested changes Jun 4, 2026

View reviewed changes

matejpekar mentioned this pull request Jun 4, 2026

feat: 7-class tissue classifier on Virchow2 embeddings #13

Open

feat: batching update

ba2eb95

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread models/breast_cancer_virchow2.py

kahood23 requested a review from matejpekar June 5, 2026 08:33

Conversation

kahood23 commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jurgee Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

matejpekar Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

kahood23 Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

matejpekar Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kahood23 commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading