Skip to content

Feature/breast cancer virchow#12

Open
kahood23 wants to merge 21 commits into
mainfrom
feature/breast-cancer-virchow
Open

Feature/breast cancer virchow#12
kahood23 wants to merge 21 commits into
mainfrom
feature/breast-cancer-virchow

Conversation

@kahood23

@kahood23 kahood23 commented Jun 2, 2026

Copy link
Copy Markdown

Deployment of breast cancer binary predictor using Virchow2 embeddings

Summary by CodeRabbit

  • New Features

    • Added a new breast-cancer-virchow2 inference service with an HTTP POST endpoint that accepts compressed image tiles and returns a 1×1 probability map per tile.
    • Uses Virchow2 foundation model embeddings plus a local model for per-tile prediction with GPU-accelerated batching and autoscaling.
  • Chores

    • Registered the new application in deployment configuration for Helm/RayService.

@kahood23 kahood23 requested review from Jurgee and matejpekar June 2, 2026 18:43
@kahood23 kahood23 self-assigned this Jun 2, 2026
@kahood23 kahood23 requested review from a team and ejdam87 June 2, 2026 18:43
@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a Ray Serve FastAPI deployment that accepts LZ4-compressed tiles, obtains Virchow2 embeddings via a foundation-model handle, runs batched ONNX inference to produce scalar probabilities, and supplies Helm/RayService manifests to deploy the application.

Changes

Breast Cancer Virchow2 Inference Service

Layer / File(s) Summary
Config TypedDict and Ray Serve Deployment Structure
models/breast_cancer_virchow2.py
Config TypedDict and module-level FastAPI; declares BreastCancerVirchow2 Ray Serve deployment with FastAPI ingress.
Model and Transform Initialization
models/breast_cancer_virchow2.py
reconfigure() binds the foundation-model handle, builds Virchow2 timm transform, locates model.onnx from provider output (raises FileNotFoundError if missing), creates ONNXRuntime CUDA session, captures tensor names, and sets batching params.
Tile Preparation for Virchow2
models/breast_cancer_virchow2.py
_prepare_tile_for_virchow2() converts CHW uint8 tile to HWC PIL image and applies the Virchow2 preprocessing transform to return a tensor.
Embedding Creation via Foundation Model
models/breast_cancer_virchow2.py
_create_embedding() preprocesses a single tile, calls foundation-model predict.remote, normalizes embedding shapes, pools class and mean patch tokens, and returns float32 NumPy embeddings.
Batched ONNX Head Inference
models/breast_cancer_virchow2.py
_predict_head() stacks embeddings into a float32 batch, runs ONNX session, and formats each scalar output as a 1×1 float32 array.
Predict Flow and HTTP Endpoint
models/breast_cancer_virchow2.py
predict() coordinates embedding and head inference; POST / handler LZ4-decompresses request, reshapes to RGB tile using tile_size, transposes to CHW, calls predict(), and returns nested lists; exports app = BreastCancerVirchow2.bind().
RayService Helm Deployment Configuration
helm/rayservice/applications/breast-cancer-virchow2.yaml, helm/rayservice/values.yaml
New RayService manifest and Helm values entry for breast-cancer-virchow2: route /breast-cancer-virchow2, import path models.breast_cancer_virchow2:app, runtime env (GitHub ZIP, MLFLOW_TRACKING_URI, HF_HOME), pip deps, autoscaling (min 0/max 2, target 64), actor resources, and MLflow artifact provider settings.

Possibly related PRs

  • RationAI/model-service#2: Adds the Virchow2 foundation-model deployment and its provider/caching support used by this service.

Suggested reviewers

  • matejpekar
  • JakubPekar
  • Adames4

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibble tiles with eager paws,
embeddings hum without a pause,
ONNX sings on GPU light,
a tiny model keeps watch at night.
Hooray — detection hops in sight!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Feature/breast cancer virchow' is partially related to the changeset—it mentions the application domain (breast cancer virchow), but is overly broad and lacks specificity about the primary change. Revise the title to be more specific and descriptive of the main change, such as 'Add Virchow2-based breast cancer classification service' or 'Deploy BreastCancerVirchow2 Ray Serve model'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/breast-cancer-virchow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Ray application breast-cancer-virchow2 along with its deployment configuration and Python implementation. The service decompresses input tiles, extracts embeddings using a remote Virchow2 foundation model, and runs a local ONNX model on CPU to generate predictions. Key feedback includes optimizing resource allocation by setting num_gpus to 0 since the deployment does not use a GPU, offloading CPU-bound image preprocessing to a thread pool using asyncio.to_thread to prevent blocking the event loop, and simplifying the pipeline by keeping the tile in HWC format to avoid redundant transpose operations.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread helm/rayservice/applications/breast-cancer-virchow2.yaml
Comment thread models/breast_cancer_virchow2.py Outdated
Comment thread models/breast_cancer_virchow2.py
Comment thread models/breast_cancer_virchow2.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
helm/rayservice/applications/breast-cancer-virchow2.yaml (1)

23-26: Monitor memory usage with the configured batch size.

The deployment allocates 12 GiB memory with max_batch_size: 128 and tile_size: 224. With batch inference of 128 tiles at 224×224 pixels, plus model weights and intermediate tensors, verify this allocation is sufficient during load testing. If OOM errors occur, consider either reducing max_batch_size or increasing the memory allocation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@helm/rayservice/applications/breast-cancer-virchow2.yaml` around lines 23 -
26, The deployment may be underprovisioned for batch inference: with
ray_actor_options.memory set to 12884901888 (12 GiB) while using max_batch_size:
128 and tile_size: 224, run load tests and monitor peak memory; if you observe
OOMs, either raise ray_actor_options.memory (increase the numeric value) or
reduce max_batch_size and/or tile_size in the model config to lower per-batch
memory usage, then re-run tests until memory headroom is acceptable.
models/breast_cancer_virchow2.py (1)

98-100: 💤 Low value

Replace debug prints with proper logging.

These print statements appear to be debug artifacts. For production deployments, use Python's logging module or Ray's logger for proper observability with log levels.

🔧 Suggested refactor
+import logging
+
+logger = logging.getLogger(__name__)
+
 # ... in reconfigure():
-        print(f"Using ONNX model path: {model_path}")
-        print(f"ONNX model exists: {model_path.exists()}")
-        print(f"ONNX model is file: {model_path.is_file()}")
+        logger.info(f"Using ONNX model path: {model_path}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/breast_cancer_virchow2.py` around lines 98 - 100, The three debug
print statements that output model_path, model_path.exists(), and
model_path.is_file() should be replaced with proper logging: import and
configure a logger (e.g., logger = logging.getLogger(__name__) or use
ray.get_logger()), then replace the prints with a single logger.debug or
logger.info call that includes model_path and the boolean checks (referencing
model_path and the surrounding function in models/breast_cancer_virchow2.py) so
observability uses log levels instead of print statements.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@helm/rayservice/applications/breast-cancer-virchow2.yaml`:
- Around line 9-14: The pip dependency list leaves several packages unpinned
(onnxruntime, lz4, timm, pillow) which risks non-reproducible deployments;
update the pip block to pin each package to a tested exact version or a tight
range (e.g., onnxruntime==<tested-version> or onnxruntime>=x.y.z,<x+1.0.0),
likewise pin lz4, timm, and pillow, and keep/adjust the existing mlflow<3.0
constraint to a specific compatible range if needed; ensure you run your
build/tests with the chosen versions to verify compatibility before committing.

---

Nitpick comments:
In `@helm/rayservice/applications/breast-cancer-virchow2.yaml`:
- Around line 23-26: The deployment may be underprovisioned for batch inference:
with ray_actor_options.memory set to 12884901888 (12 GiB) while using
max_batch_size: 128 and tile_size: 224, run load tests and monitor peak memory;
if you observe OOMs, either raise ray_actor_options.memory (increase the numeric
value) or reduce max_batch_size and/or tile_size in the model config to lower
per-batch memory usage, then re-run tests until memory headroom is acceptable.

In `@models/breast_cancer_virchow2.py`:
- Around line 98-100: The three debug print statements that output model_path,
model_path.exists(), and model_path.is_file() should be replaced with proper
logging: import and configure a logger (e.g., logger =
logging.getLogger(__name__) or use ray.get_logger()), then replace the prints
with a single logger.debug or logger.info call that includes model_path and the
boolean checks (referencing model_path and the surrounding function in
models/breast_cancer_virchow2.py) so observability uses log levels instead of
print statements.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dedc0520-29d5-4cad-a99d-d3f677396038

📥 Commits

Reviewing files that changed from the base of the PR and between 3d37a6e and 5102177.

📒 Files selected for processing (3)
  • helm/rayservice/applications/breast-cancer-virchow2.yaml
  • helm/rayservice/values.yaml
  • models/breast_cancer_virchow2.py

Comment thread helm/rayservice/applications/breast-cancer-virchow2.yaml
Comment thread helm/rayservice/applications/breast-cancer-virchow2.yaml Outdated
Comment thread helm/rayservice/applications/breast-cancer-virchow2.yaml Outdated
@kahood23 kahood23 requested review from Jurgee and removed request for ejdam87 June 4, 2026 11:37
Comment thread helm/rayservice/applications/breast-cancer-virchow2.yaml Outdated
Comment on lines +102 to +105
self.session = ort.InferenceSession(
str(model_path),
providers=["CPUExecutionProvider"],
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intended to run it only on CPUs?

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@helm/rayservice/applications/breast-cancer-virchow2.yaml`:
- Line 5: The working_dir value currently points to a moving branch archive
("working_dir:
https://github.com/RationAI/model-service/archive/refs/heads/main.zip"); update
this to an immutable revision by replacing the branch archive with a
commit-specific archive URL or a tagged release archive (e.g., use the archive
URL that includes a specific commit SHA or a release tag) so deployments are
reproducible; locate the working_dir entry in breast-cancer-virchow2.yaml and
change the URL accordingly (or switch to a fixed artifact registry URL) to pin
the runtime code to a single immutable revision.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dc403cd6-5ba6-429b-93ec-d2f71ad8c359

📥 Commits

Reviewing files that changed from the base of the PR and between d35bd5b and b384791.

📒 Files selected for processing (2)
  • helm/rayservice/applications/breast-cancer-virchow2.yaml
  • models/breast_cancer_virchow2.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • models/breast_cancer_virchow2.py

Comment thread helm/rayservice/applications/breast-cancer-virchow2.yaml
Comment on lines +52 to +58
virchow2 = timm.create_model(
"hf-hub:paige-ai/Virchow2",
pretrained=False,
num_classes=0,
mlp_layer=SwiGLUPacked,
act_layer=torch.nn.SiLU,
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you creating full virchow?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal Virchow2 predict method expects a preprocessed torch.Tensor, because the transform is only applied in the HTTP root endpoint. Since breast-cancer-virchow2 calls foundation_model.predict.remote(...) directly, it must apply the same transform before calling Virchow2, so I use this to get the exact same transformation

Comment thread models/breast_cancer_virchow2.py Outdated
Comment thread models/breast_cancer_virchow2.py Outdated
return embedding.squeeze(0).cpu().numpy().astype(np.float32, copy=False)

@serve.batch
async def predict(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The predict method should take only a single sample and pass it to virchow for embeddings. After that you should pass it to a helper method that accepts batches and takes the embeddings and applies the head to it. Also it might be faster to use CPU than GPU if the head is only a single linear layer. But you should test that

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
models/breast_cancer_virchow2.py (1)

164-171: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return a 400 error for malformed payloads instead of bubbling 500s.

models/breast_cancer_virchow2.py root() currently calls self.lz4.decompress(...) and then np.frombuffer(...).reshape(...) (lines 165-171) with no error handling; malformed client payloads will raise and return 500. Translate decompression/reshape failures into a 400 Bad Request (catch the LZ4 decompression/frame error as well as ValueError from reshape).

Suggested fix
-from fastapi import FastAPI, Request
+from fastapi import FastAPI, HTTPException, Request
...
     `@fastapi.post`("/")
     async def root(self, request: Request) -> list[list[float]]:
-        data = await asyncio.to_thread(self.lz4.decompress, await request.body())
-
-        tile = np.frombuffer(data, dtype=np.uint8).reshape(
-            self.tile_size,
-            self.tile_size,
-            3,
-        )
+        try:
+            data = await asyncio.to_thread(self.lz4.decompress, await request.body())
+            tile = np.frombuffer(data, dtype=np.uint8).reshape(
+                self.tile_size,
+                self.tile_size,
+                3,
+            )
+        except (RuntimeError, ValueError) as exc:
+            raise HTTPException(status_code=400, detail="Invalid LZ4 tile payload") from exc
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/breast_cancer_virchow2.py` around lines 164 - 171, The root handler
currently calls self.lz4.decompress(...) and np.frombuffer(...).reshape(...)
without handling bad input, causing 500s for malformed payloads; wrap the
decompression (self.lz4.decompress) and the reshape (np.frombuffer(...).reshape)
in a try/except that catches the lz4 decompression/frame error (catch the
specific lz4 exception type if available, otherwise a broad exception around
decompress) and ValueError from reshape, and return or raise a 400 Bad Request
(e.g., raise fastapi.HTTPException(status_code=400, detail="malformed payload"))
from the root function to translate decompression/reshape failures into HTTP 400
responses.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@models/breast_cancer_virchow2.py`:
- Around line 70-79: The code currently picks model_path = candidates[0] from
downloaded_path.rglob("model.onnx") which is nondeterministic and silently
accepts duplicates; change the logic around candidates to (1) convert rglob to a
list called candidates, (2) if not candidates raise the existing
FileNotFoundError, (3) if len(candidates) > 1 raise a clear error (e.g.,
RuntimeError or ValueError) that lists the duplicate paths to fail fast and
force the caller to resolve ambiguous artifacts, and (4) only then set
model_path to the single candidate; reference the variables downloaded_path,
candidates, rglob, and model_path when making the change.

---

Outside diff comments:
In `@models/breast_cancer_virchow2.py`:
- Around line 164-171: The root handler currently calls self.lz4.decompress(...)
and np.frombuffer(...).reshape(...) without handling bad input, causing 500s for
malformed payloads; wrap the decompression (self.lz4.decompress) and the reshape
(np.frombuffer(...).reshape) in a try/except that catches the lz4
decompression/frame error (catch the specific lz4 exception type if available,
otherwise a broad exception around decompress) and ValueError from reshape, and
return or raise a 400 Bad Request (e.g., raise
fastapi.HTTPException(status_code=400, detail="malformed payload")) from the
root function to translate decompression/reshape failures into HTTP 400
responses.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: db809bc8-035f-45e2-8f1c-4670b896c445

📥 Commits

Reviewing files that changed from the base of the PR and between b384791 and ba2eb95.

📒 Files selected for processing (1)
  • models/breast_cancer_virchow2.py

Comment thread models/breast_cancer_virchow2.py
@kahood23 kahood23 requested a review from matejpekar June 5, 2026 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants