Skip to content

Feature/breast cancer grading#14

Open
melovskak wants to merge 22 commits into
mainfrom
feature/breast-cancer-grading
Open

Feature/breast cancer grading#14
melovskak wants to merge 22 commits into
mainfrom
feature/breast-cancer-grading

Conversation

@melovskak

@melovskak melovskak commented Jun 5, 2026

Copy link
Copy Markdown

Summary by CodeRabbit

  • New Features
    • New breast cancer grading service available at /breast-grading-virchow2
    • Tile-based medical image analysis with configurable tile size, channels, batching, and latency controls
    • GPU-accelerated inference with autoscaling for optimized performance and resource use
    • Model artifacts managed via MLflow-compatible storage for centralized model deployment

@melovskak melovskak self-assigned this Jun 5, 2026
@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 97f243da-c947-474d-9c96-b7b7ad1bb389

📥 Commits

Reviewing files that changed from the base of the PR and between 75a89cf and 2f0cf79.

📒 Files selected for processing (2)
  • helm/rayservice/applications/breast-grading-virchow2.yaml
  • models/breast_grading_virchow2.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • helm/rayservice/applications/breast-grading-virchow2.yaml
  • models/breast_grading_virchow2.py

📝 Walkthrough

Walkthrough

This PR adds a Ray Serve + FastAPI deployment that creates Virchow2 embeddings for input tiles and runs an ONNX linear head for breast cancer grading, plus Helm RayService manifests registering and configuring the deployment.

Changes

Breast-Grading-Virchow2 Deployment

Layer / File(s) Summary
Contracts and configuration schema
models/breast_grading_virchow2.py
Module imports, Config TypedDict with tile sizing, scaling, batching, and model fields; FastAPI app instance and Ray Serve deployment class declaration.
Initialization and resource acquisition
models/breast_grading_virchow2.py
reconfigure() loads LZ4, builds Virchow2 transform, acquires remote Virchow2 handle, resolves _target_ provider to download model.onnx, creates ONNX Runtime CUDA session, and configures batch size/timeouts.
Tile preprocessing, embedding, and batched head inference
models/breast_grading_virchow2.py
_prepare_tile_for_virchow2() converts CHW→HWC and applies transform; _create_embedding() calls remote Virchow2 service and pools class + mean patch tokens; _predict_head() (@serve.batch) stacks embeddings, runs ONNX session, and reshapes logits; predict() wires single-tile flow.
API endpoint and response handling
models/breast_grading_virchow2.py
FastAPI POST / handler decompresses LZ4 body, validates and reconstructs (tile_size, tile_size, 3) uint8 tile, transposes to CHW, invokes predict(), returns result.tolist(), and binds deployment via app = BreastCancerGradingVirchow2.bind().
Helm RayService deployment configuration
helm/rayservice/applications/breast-grading-virchow2.yaml, helm/rayservice/values.yaml
New Helm manifest and values entry registering breast-grading-virchow2 with route /breast-grading-virchow2, runtime download URL, env/pip deps, Ray actor CPU/GPU/memory, autoscaling, and MLflow model artifact URI.

Sequence Diagram

sequenceDiagram
  participant Client as Client
  participant API as FastAPI Endpoint
  participant Deploy as BreastCancerGradingVirchow2
  participant Virchow2 as Virchow2 Service
  participant ONNX as ONNX Runtime

  Client->>API: POST / (LZ4-compressed tile)
  API->>API: Decompress LZ4 bytes
  API->>API: Reconstruct tile (tile_size, tile_size, 3)
  API->>Deploy: predict(tile_uint8)
  Deploy->>Deploy: _prepare_tile_for_virchow2 (CHW→HWC, apply transform)
  Deploy->>Virchow2: request embeddings (prepared tensor)
  Virchow2->>Deploy: return token embeddings
  Deploy->>Deploy: pool class + mean patch tokens -> embedding
  Deploy->>ONNX: _predict_head (batch embeddings) -> InferenceSession.run
  ONNX->>Deploy: return logits
  Deploy->>API: return result.tolist()
  API->>Client: 200 OK (result)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • RationAI/model-service#2: BreastCancerGradingVirchow2 depends on the Virchow2 Ray Serve deployment introduced in that PR for remote embedding generation via serve.get_app_handle().

Suggested reviewers

  • JakubPekar
  • ejdam87

Poem

🐰 I hop through tiles with curious cheer,
Virchow2 whispers embeddings near,
ONNX hums a verdict bright,
Tiles to grades in morning light,
A rabbit cheers this model's flight.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Feature/breast cancer grading' is vague and generic, using the prefix 'Feature/' which is a branch naming convention rather than a descriptive PR title. It lacks specificity about what is being added or implemented. Use a more descriptive title like 'Add breast cancer grading service with Virchow2 integration' that clearly explains the main change being introduced.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/breast-cancer-grading

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Ray Serve deployment, BreastCancerGradingVirchow2, which processes image tiles, extracts embeddings using a Virchow2 foundation model, and evaluates them with a 4-class linear head ONNX model. Feedback on the implementation highlights several key areas for improvement: running the synchronous ONNX session in a separate thread to prevent blocking the event loop, adding a CPU fallback provider for the ONNX session, performing array operations directly in NumPy to avoid unnecessary PyTorch overhead, and correcting the return type annotation of the root endpoint to match its actual 3D structure.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread models/breast_grading_virchow2.py Outdated
Comment thread models/breast_grading_virchow2.py
Comment thread models/breast_grading_virchow2.py
Comment thread models/breast_grading_virchow2.py Outdated
melovskak and others added 12 commits June 5, 2026 19:42
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@melovskak melovskak marked this pull request as ready for review June 6, 2026 23:16
@melovskak melovskak requested review from a team, JakubPekar and ejdam87 June 6, 2026 23:16

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@helm/rayservice/applications/breast-grading-virchow2.yaml`:
- Line 5: The working_dir in the RayService manifest is pointing to the feature
branch archive; update the working_dir value to use the moving main archive URL
so the deployed image always references main.zip (replace the current
https://github.com/RationAI/model-service/archive/refs/heads/feature/breast-cancer-grading.zip
with https://github.com/RationAI/model-service/archive/refs/heads/main.zip in
the working_dir field of the manifest).

In `@models/breast_grading_virchow2.py`:
- Around line 82-85: The ONNX session is created with only CUDAExecutionProvider
which can fail startup; update the session creation in the model initializer
(self.session = ort.InferenceSession(...)) to include a CPU fallback provider
list (e.g., ["CUDAExecutionProvider","CPUExecutionProvider"]) so the runtime can
start if CUDA initialization fails. Also, in the root() function where you
decompress and reshape tiles, add a sanity check that the decompressed byte
length equals tile_size * tile_size * 3 before calling reshape and raise or
handle a clear error if it doesn't to avoid malformed/oversized inputs causing
allocation/reshape failures.
- Around line 155-164: In the root() handler, validate and guard the LZ4 payload
before blindly decompressing and reshaping: call await request.body() into a
variable, then try to decompress with a bounded API if available (e.g.,
lz4.frame.decompress with a max_output_size) or call self.lz4.decompress inside
a try/except; after decompression assert len(decompressed) == self.tile_size *
self.tile_size * 3 before np.frombuffer/reshape, and if the length is wrong or
decompress/reshape raises, return a 400 Bad Request (catch exceptions around
self.lz4.decompress and the np.frombuffer(...).reshape call and map them to a
400 response) so malformed/oversized payloads don’t cause 5xx errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 635b4c34-793c-42f7-a6a6-0e1eb944e941

📥 Commits

Reviewing files that changed from the base of the PR and between 3d37a6e and 75a89cf.

📒 Files selected for processing (3)
  • helm/rayservice/applications/breast-grading-virchow2.yaml
  • helm/rayservice/values.yaml
  • models/breast_grading_virchow2.py

Comment thread helm/rayservice/applications/breast-grading-virchow2.yaml Outdated
Comment thread models/breast_grading_virchow2.py
Comment thread models/breast_grading_virchow2.py Outdated
@melovskak melovskak marked this pull request as draft June 7, 2026 09:47
@melovskak melovskak marked this pull request as ready for review June 7, 2026 12:33
Comment on lines +154 to +180
# Capture raw incoming network request body bytes
body_bytes = await request.body()

try:
# 1. Unzip raw compressed image tile bytes asynchronously in a thread worker
data = await asyncio.to_thread(self.lz4.decompress, body_bytes)

# 2. Size check
expected_bytes = self.tile_size * self.tile_size * 3
if len(data) != expected_bytes:
raise ValueError(
f"Decompressed payload byte length mismatch. "
f"Expected exactly {expected_bytes} bytes, but got {len(data)}."
)

# 3. Reconstruct the raw pixel array
tile = np.frombuffer(data, dtype=np.uint8).reshape(
self.tile_size,
self.tile_size,
3,
)
except (RuntimeError, ValueError) as err:
# 4. Gracefully map decompression or reshape shape errors to a clean HTTP 400
raise HTTPException(
status_code=400,
detail=f"Malformed or invalid compressed tile image payload: {err!s}",
) from err

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove try-except block

Suggested change
# Capture raw incoming network request body bytes
body_bytes = await request.body()
try:
# 1. Unzip raw compressed image tile bytes asynchronously in a thread worker
data = await asyncio.to_thread(self.lz4.decompress, body_bytes)
# 2. Size check
expected_bytes = self.tile_size * self.tile_size * 3
if len(data) != expected_bytes:
raise ValueError(
f"Decompressed payload byte length mismatch. "
f"Expected exactly {expected_bytes} bytes, but got {len(data)}."
)
# 3. Reconstruct the raw pixel array
tile = np.frombuffer(data, dtype=np.uint8).reshape(
self.tile_size,
self.tile_size,
3,
)
except (RuntimeError, ValueError) as err:
# 4. Gracefully map decompression or reshape shape errors to a clean HTTP 400
raise HTTPException(
status_code=400,
detail=f"Malformed or invalid compressed tile image payload: {err!s}",
) from err
data = await asyncio.to_thread(lz4.frame.decompress, await request.body())
image = np.frombuffer(data, dtype=np.uint8).reshape(
self.tile_size, self.tile_size, 3
)

MLFLOW_TRACKING_URI: http://mlflow-s3.rationai-mlflow
HF_HOME: /mnt/huggingface_cache
pip:
- timm

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timm lib is already included in Dockerfile.gpu

return await self._predict_head(embedding) # returns 4 raw logits per tile

@fastapi.post("/")
async def root(self, request: Request) -> list[list[list[float]]]:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async def root(self, request: Request) -> list[list[list[float]]]:

Suggested change
async def root(self, request: Request) -> list[list[list[float]]]:
async def root(self, request: Request) -> Response:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants