Feature/breast cancer grading by melovskak · Pull Request #14 · RationAI/model-service

melovskak · 2026-06-05T17:38:34Z

Summary by CodeRabbit

New Features
- New breast cancer grading service available at /breast-grading-virchow2
- Tile-based medical image analysis with configurable tile size, channels, batching, and latency controls
- GPU-accelerated inference with autoscaling for optimized performance and resource use
- Model artifacts managed via MLflow-compatible storage for centralized model deployment

coderabbitai · 2026-06-05T17:39:09Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 97f243da-c947-474d-9c96-b7b7ad1bb389

📥 Commits

Reviewing files that changed from the base of the PR and between 75a89cf and 2f0cf79.

📒 Files selected for processing (2)

helm/rayservice/applications/breast-grading-virchow2.yaml
models/breast_grading_virchow2.py

🚧 Files skipped from review as they are similar to previous changes (2)

helm/rayservice/applications/breast-grading-virchow2.yaml
models/breast_grading_virchow2.py

📝 Walkthrough

Walkthrough

This PR adds a Ray Serve + FastAPI deployment that creates Virchow2 embeddings for input tiles and runs an ONNX linear head for breast cancer grading, plus Helm RayService manifests registering and configuring the deployment.

Changes

Breast-Grading-Virchow2 Deployment

Layer / File(s)	Summary
Contracts and configuration schema `models/breast_grading_virchow2.py`	Module imports, `Config` TypedDict with tile sizing, scaling, batching, and model fields; FastAPI app instance and Ray Serve deployment class declaration.
Initialization and resource acquisition `models/breast_grading_virchow2.py`	`reconfigure()` loads LZ4, builds Virchow2 transform, acquires remote Virchow2 handle, resolves `_target_` provider to download `model.onnx`, creates ONNX Runtime CUDA session, and configures batch size/timeouts.
Tile preprocessing, embedding, and batched head inference `models/breast_grading_virchow2.py`	`_prepare_tile_for_virchow2()` converts CHW→HWC and applies transform; `_create_embedding()` calls remote Virchow2 service and pools class + mean patch tokens; `_predict_head()` (`@serve.batch`) stacks embeddings, runs ONNX session, and reshapes logits; `predict()` wires single-tile flow.
API endpoint and response handling `models/breast_grading_virchow2.py`	FastAPI `POST /` handler decompresses LZ4 body, validates and reconstructs `(tile_size, tile_size, 3)` uint8 tile, transposes to CHW, invokes `predict()`, returns `result.tolist()`, and binds deployment via `app = BreastCancerGradingVirchow2.bind()`.
Helm RayService deployment configuration `helm/rayservice/applications/breast-grading-virchow2.yaml`, `helm/rayservice/values.yaml`	New Helm manifest and values entry registering `breast-grading-virchow2` with route `/breast-grading-virchow2`, runtime download URL, env/pip deps, Ray actor CPU/GPU/memory, autoscaling, and MLflow model artifact URI.

Sequence Diagram

sequenceDiagram
  participant Client as Client
  participant API as FastAPI Endpoint
  participant Deploy as BreastCancerGradingVirchow2
  participant Virchow2 as Virchow2 Service
  participant ONNX as ONNX Runtime

  Client->>API: POST / (LZ4-compressed tile)
  API->>API: Decompress LZ4 bytes
  API->>API: Reconstruct tile (tile_size, tile_size, 3)
  API->>Deploy: predict(tile_uint8)
  Deploy->>Deploy: _prepare_tile_for_virchow2 (CHW→HWC, apply transform)
  Deploy->>Virchow2: request embeddings (prepared tensor)
  Virchow2->>Deploy: return token embeddings
  Deploy->>Deploy: pool class + mean patch tokens -> embedding
  Deploy->>ONNX: _predict_head (batch embeddings) -> InferenceSession.run
  ONNX->>Deploy: return logits
  Deploy->>API: return result.tolist()
  API->>Client: 200 OK (result)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

RationAI/model-service#2: BreastCancerGradingVirchow2 depends on the Virchow2 Ray Serve deployment introduced in that PR for remote embedding generation via serve.get_app_handle().

Suggested reviewers

JakubPekar
ejdam87

Poem

🐰 I hop through tiles with curious cheer,
Virchow2 whispers embeddings near,
ONNX hums a verdict bright,
Tiles to grades in morning light,
A rabbit cheers this model's flight.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Feature/breast cancer grading' is vague and generic, using the prefix 'Feature/' which is a branch naming convention rather than a descriptive PR title. It lacks specificity about what is being added or implemented.	Use a more descriptive title like 'Add breast cancer grading service with Virchow2 integration' that clearly explains the main change being introduced.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/breast-cancer-grading

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a new Ray Serve deployment, BreastCancerGradingVirchow2, which processes image tiles, extracts embeddings using a Virchow2 foundation model, and evaluates them with a 4-class linear head ONNX model. Feedback on the implementation highlights several key areas for improvement: running the synchronous ONNX session in a separate thread to prevent blocking the event loop, adding a CPU fallback provider for the ONNX session, performing array operations directly in NumPy to avoid unnecessary PyTorch overhead, and correcting the return type annotation of the root endpoint to match its actual 3D structure.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…tionAI/model-service into feature/breast-cancer-grading

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@helm/rayservice/applications/breast-grading-virchow2.yaml`:
- Line 5: The working_dir in the RayService manifest is pointing to the feature
branch archive; update the working_dir value to use the moving main archive URL
so the deployed image always references main.zip (replace the current
https://github.com/RationAI/model-service/archive/refs/heads/feature/breast-cancer-grading.zip
with https://github.com/RationAI/model-service/archive/refs/heads/main.zip in
the working_dir field of the manifest).

In `@models/breast_grading_virchow2.py`:
- Around line 82-85: The ONNX session is created with only CUDAExecutionProvider
which can fail startup; update the session creation in the model initializer
(self.session = ort.InferenceSession(...)) to include a CPU fallback provider
list (e.g., ["CUDAExecutionProvider","CPUExecutionProvider"]) so the runtime can
start if CUDA initialization fails. Also, in the root() function where you
decompress and reshape tiles, add a sanity check that the decompressed byte
length equals tile_size * tile_size * 3 before calling reshape and raise or
handle a clear error if it doesn't to avoid malformed/oversized inputs causing
allocation/reshape failures.
- Around line 155-164: In the root() handler, validate and guard the LZ4 payload
before blindly decompressing and reshaping: call await request.body() into a
variable, then try to decompress with a bounded API if available (e.g.,
lz4.frame.decompress with a max_output_size) or call self.lz4.decompress inside
a try/except; after decompression assert len(decompressed) == self.tile_size *
self.tile_size * 3 before np.frombuffer/reshape, and if the length is wrong or
decompress/reshape raises, return a 400 Bad Request (catch exceptions around
self.lz4.decompress and the np.frombuffer(...).reshape call and map them to a
400 response) so malformed/oversized payloads don’t cause 5xx errors.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 635b4c34-793c-42f7-a6a6-0e1eb944e941

📥 Commits

Reviewing files that changed from the base of the PR and between 3d37a6e and 75a89cf.

📒 Files selected for processing (3)

helm/rayservice/applications/breast-grading-virchow2.yaml
helm/rayservice/values.yaml
models/breast_grading_virchow2.py

Jurgee · 2026-06-09T18:32:32Z

+        # Capture raw incoming network request body bytes
+        body_bytes = await request.body()
+
+        try:
+            # 1. Unzip raw compressed image tile bytes asynchronously in a thread worker
+            data = await asyncio.to_thread(self.lz4.decompress, body_bytes)
+
+            # 2. Size check
+            expected_bytes = self.tile_size * self.tile_size * 3
+            if len(data) != expected_bytes:
+                raise ValueError(
+                    f"Decompressed payload byte length mismatch. "
+                    f"Expected exactly {expected_bytes} bytes, but got {len(data)}."
+                )
+
+            # 3. Reconstruct the raw pixel array
+            tile = np.frombuffer(data, dtype=np.uint8).reshape(
+                self.tile_size,
+                self.tile_size,
+                3,
+            )
+        except (RuntimeError, ValueError) as err:
+            # 4. Gracefully map decompression or reshape shape errors to a clean HTTP 400
+            raise HTTPException(
+                status_code=400,
+                detail=f"Malformed or invalid compressed tile image payload: {err!s}",
+            ) from err


Remove try-except block

Suggested change

# Capture raw incoming network request body bytes

body_bytes = await request.body()

try:

# 1. Unzip raw compressed image tile bytes asynchronously in a thread worker

data = await asyncio.to_thread(self.lz4.decompress, body_bytes)

# 2. Size check

expected_bytes = self.tile_size * self.tile_size * 3

if len(data) != expected_bytes:

raise ValueError(

f"Decompressed payload byte length mismatch. "

f"Expected exactly {expected_bytes} bytes, but got {len(data)}."

)

# 3. Reconstruct the raw pixel array

tile = np.frombuffer(data, dtype=np.uint8).reshape(

self.tile_size,

self.tile_size,

3,

)

except (RuntimeError, ValueError) as err:

# 4. Gracefully map decompression or reshape shape errors to a clean HTTP 400

raise HTTPException(

status_code=400,

detail=f"Malformed or invalid compressed tile image payload: {err!s}",

) from err

data = await asyncio.to_thread(lz4.frame.decompress, await request.body())

image = np.frombuffer(data, dtype=np.uint8).reshape(

self.tile_size, self.tile_size, 3

)

Jurgee · 2026-06-09T18:33:41Z

+      MLFLOW_TRACKING_URI: http://mlflow-s3.rationai-mlflow
+      HF_HOME: /mnt/huggingface_cache
+    pip:
+      - timm


timm lib is already included in Dockerfile.gpu

Jurgee · 2026-06-09T18:36:42Z

+        return await self._predict_head(embedding)  # returns 4 raw logits per tile
+
+    @fastapi.post("/")
+    async def root(self, request: Request) -> list[list[list[float]]]:


async def root(self, request: Request) -> list[list[list[float]]]:

Suggested change

async def root(self, request: Request) -> list[list[list[float]]]:

async def root(self, request: Request) -> Response:

melovskak added 2 commits June 5, 2026 15:36

feat: initial breast grading

2cbf684

fix: add model uri

0f4957c

melovskak self-assigned this Jun 5, 2026

gemini-code-assist Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread models/breast_grading_virchow2.py Outdated

Comment thread models/breast_grading_virchow2.py

Comment thread models/breast_grading_virchow2.py

Comment thread models/breast_grading_virchow2.py Outdated

melovskak and others added 12 commits June 5, 2026 19:42

Update models/breast_grading_virchow2.py

8ab5b08

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update models/breast_grading_virchow2.py

eb394f1

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update models/breast_grading_virchow2.py

cb8fcd7

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update models/breast_grading_virchow2.py

5dd26d9

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

fix: n_channels set to 4

92a4e07

Merge branch 'feature/breast-cancer-grading' of https://github.com/Ra…

965e832

…tionAI/model-service into feature/breast-cancer-grading

fix: correct uri

dac55ab

fix: new onnx and updated model script

61efdf2

fix: lint

bfa25bd

test: correct dimensions

1f3224d

fix: lint

2ce6034

fix: lint

75a89cf

melovskak marked this pull request as ready for review June 6, 2026 23:16

melovskak requested review from a team, JakubPekar and ejdam87 June 6, 2026 23:16

coderabbitai Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread helm/rayservice/applications/breast-grading-virchow2.yaml Outdated

Comment thread models/breast_grading_virchow2.py

Comment thread models/breast_grading_virchow2.py Outdated

melovskak marked this pull request as draft June 7, 2026 09:47

melovskak added 7 commits June 7, 2026 10:08

test: add guard and add channels for heatmap builder

46df233

test: switch softmax and raw logits in output

8496c32

test: fix raise error

4f8ef11

test: num channels 4 or 8

50909c8

test: num channels 4 or 8

f45d29d

test: num channels 4 or 8

c9dd0a2

fix: revert to 4 logits output

dfe47bb

fix: working dir set to master in config

2f0cf79

melovskak marked this pull request as ready for review June 7, 2026 12:33

Jurgee requested changes Jun 9, 2026

View reviewed changes

Conversation

melovskak commented Jun 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jurgee Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Jurgee Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Jurgee Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

melovskak commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading