Feature/breast cancer grading#14
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThis PR adds a Ray Serve + FastAPI deployment that creates Virchow2 embeddings for input tiles and runs an ONNX linear head for breast cancer grading, plus Helm RayService manifests registering and configuring the deployment. ChangesBreast-Grading-Virchow2 Deployment
Sequence DiagramsequenceDiagram
participant Client as Client
participant API as FastAPI Endpoint
participant Deploy as BreastCancerGradingVirchow2
participant Virchow2 as Virchow2 Service
participant ONNX as ONNX Runtime
Client->>API: POST / (LZ4-compressed tile)
API->>API: Decompress LZ4 bytes
API->>API: Reconstruct tile (tile_size, tile_size, 3)
API->>Deploy: predict(tile_uint8)
Deploy->>Deploy: _prepare_tile_for_virchow2 (CHW→HWC, apply transform)
Deploy->>Virchow2: request embeddings (prepared tensor)
Virchow2->>Deploy: return token embeddings
Deploy->>Deploy: pool class + mean patch tokens -> embedding
Deploy->>ONNX: _predict_head (batch embeddings) -> InferenceSession.run
ONNX->>Deploy: return logits
Deploy->>API: return result.tolist()
API->>Client: 200 OK (result)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new Ray Serve deployment, BreastCancerGradingVirchow2, which processes image tiles, extracts embeddings using a Virchow2 foundation model, and evaluates them with a 4-class linear head ONNX model. Feedback on the implementation highlights several key areas for improvement: running the synchronous ONNX session in a separate thread to prevent blocking the event loop, adding a CPU fallback provider for the ONNX session, performing array operations directly in NumPy to avoid unnecessary PyTorch overhead, and correcting the return type annotation of the root endpoint to match its actual 3D structure.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…tionAI/model-service into feature/breast-cancer-grading
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@helm/rayservice/applications/breast-grading-virchow2.yaml`:
- Line 5: The working_dir in the RayService manifest is pointing to the feature
branch archive; update the working_dir value to use the moving main archive URL
so the deployed image always references main.zip (replace the current
https://github.com/RationAI/model-service/archive/refs/heads/feature/breast-cancer-grading.zip
with https://github.com/RationAI/model-service/archive/refs/heads/main.zip in
the working_dir field of the manifest).
In `@models/breast_grading_virchow2.py`:
- Around line 82-85: The ONNX session is created with only CUDAExecutionProvider
which can fail startup; update the session creation in the model initializer
(self.session = ort.InferenceSession(...)) to include a CPU fallback provider
list (e.g., ["CUDAExecutionProvider","CPUExecutionProvider"]) so the runtime can
start if CUDA initialization fails. Also, in the root() function where you
decompress and reshape tiles, add a sanity check that the decompressed byte
length equals tile_size * tile_size * 3 before calling reshape and raise or
handle a clear error if it doesn't to avoid malformed/oversized inputs causing
allocation/reshape failures.
- Around line 155-164: In the root() handler, validate and guard the LZ4 payload
before blindly decompressing and reshaping: call await request.body() into a
variable, then try to decompress with a bounded API if available (e.g.,
lz4.frame.decompress with a max_output_size) or call self.lz4.decompress inside
a try/except; after decompression assert len(decompressed) == self.tile_size *
self.tile_size * 3 before np.frombuffer/reshape, and if the length is wrong or
decompress/reshape raises, return a 400 Bad Request (catch exceptions around
self.lz4.decompress and the np.frombuffer(...).reshape call and map them to a
400 response) so malformed/oversized payloads don’t cause 5xx errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 635b4c34-793c-42f7-a6a6-0e1eb944e941
📒 Files selected for processing (3)
helm/rayservice/applications/breast-grading-virchow2.yamlhelm/rayservice/values.yamlmodels/breast_grading_virchow2.py
| # Capture raw incoming network request body bytes | ||
| body_bytes = await request.body() | ||
|
|
||
| try: | ||
| # 1. Unzip raw compressed image tile bytes asynchronously in a thread worker | ||
| data = await asyncio.to_thread(self.lz4.decompress, body_bytes) | ||
|
|
||
| # 2. Size check | ||
| expected_bytes = self.tile_size * self.tile_size * 3 | ||
| if len(data) != expected_bytes: | ||
| raise ValueError( | ||
| f"Decompressed payload byte length mismatch. " | ||
| f"Expected exactly {expected_bytes} bytes, but got {len(data)}." | ||
| ) | ||
|
|
||
| # 3. Reconstruct the raw pixel array | ||
| tile = np.frombuffer(data, dtype=np.uint8).reshape( | ||
| self.tile_size, | ||
| self.tile_size, | ||
| 3, | ||
| ) | ||
| except (RuntimeError, ValueError) as err: | ||
| # 4. Gracefully map decompression or reshape shape errors to a clean HTTP 400 | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail=f"Malformed or invalid compressed tile image payload: {err!s}", | ||
| ) from err |
There was a problem hiding this comment.
Remove try-except block
| # Capture raw incoming network request body bytes | |
| body_bytes = await request.body() | |
| try: | |
| # 1. Unzip raw compressed image tile bytes asynchronously in a thread worker | |
| data = await asyncio.to_thread(self.lz4.decompress, body_bytes) | |
| # 2. Size check | |
| expected_bytes = self.tile_size * self.tile_size * 3 | |
| if len(data) != expected_bytes: | |
| raise ValueError( | |
| f"Decompressed payload byte length mismatch. " | |
| f"Expected exactly {expected_bytes} bytes, but got {len(data)}." | |
| ) | |
| # 3. Reconstruct the raw pixel array | |
| tile = np.frombuffer(data, dtype=np.uint8).reshape( | |
| self.tile_size, | |
| self.tile_size, | |
| 3, | |
| ) | |
| except (RuntimeError, ValueError) as err: | |
| # 4. Gracefully map decompression or reshape shape errors to a clean HTTP 400 | |
| raise HTTPException( | |
| status_code=400, | |
| detail=f"Malformed or invalid compressed tile image payload: {err!s}", | |
| ) from err | |
| data = await asyncio.to_thread(lz4.frame.decompress, await request.body()) | |
| image = np.frombuffer(data, dtype=np.uint8).reshape( | |
| self.tile_size, self.tile_size, 3 | |
| ) |
| MLFLOW_TRACKING_URI: http://mlflow-s3.rationai-mlflow | ||
| HF_HOME: /mnt/huggingface_cache | ||
| pip: | ||
| - timm |
There was a problem hiding this comment.
timm lib is already included in Dockerfile.gpu
| return await self._predict_head(embedding) # returns 4 raw logits per tile | ||
|
|
||
| @fastapi.post("/") | ||
| async def root(self, request: Request) -> list[list[list[float]]]: |
There was a problem hiding this comment.
async def root(self, request: Request) -> list[list[list[float]]]:
| async def root(self, request: Request) -> list[list[list[float]]]: | |
| async def root(self, request: Request) -> Response: |
Summary by CodeRabbit