Skip to content

Add MIL benchmark tutorial (MixMIL, ABMIL, representation benchmark)#3

Open
Dinesh-Adhithya-H wants to merge 1 commit into
lueckenlab:mainfrom
Dinesh-Adhithya-H:main
Open

Add MIL benchmark tutorial (MixMIL, ABMIL, representation benchmark)#3
Dinesh-Adhithya-H wants to merge 1 commit into
lueckenlab:mainfrom
Dinesh-Adhithya-H:main

Conversation

@Dinesh-Adhithya-H

Copy link
Copy Markdown

Fully executed notebook demonstrating patient-level prediction from single-cell data using Multiple Instance Learning on the COMBAT dataset. Covers MIL benchmark (MixMIL + ABMIL), attention visualisations, and a representation benchmark comparing pseudobulk / composition / MIL bag embeddings with a linear probe.

Fully executed notebook demonstrating patient-level prediction from
single-cell data using Multiple Instance Learning on the COMBAT dataset.
Covers MIL benchmark (MixMIL + ABMIL), attention visualisations, and a
representation benchmark comparing pseudobulk / composition / MIL bag
embeddings with a linear probe.
@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@review-notebook-app

review-notebook-app Bot commented May 10, 2026

Copy link
Copy Markdown

View / edit / reply to this conversation on ReviewNB

VladimirShitov commented on 2026-05-10T15:54:44Z
----------------------------------------------------------------

Line #5.    

why not list(set(metadata_cols))


@review-notebook-app

review-notebook-app Bot commented May 10, 2026

Copy link
Copy Markdown

View / edit / reply to this conversation on ReviewNB

VladimirShitov commented on 2026-05-10T15:54:45Z
----------------------------------------------------------------

Line #12.            adata = adata[adata.obs[lk].notna()].copy()

It can also be simplified. Copying data for each column can be crazy expensive for large datasets. Use something like:

cells_with_complete_data = adata.obs.dropna(subset=LABEL_COLS).index
adata = adata[cells_with_complete_data].copy()

@review-notebook-app

review-notebook-app Bot commented May 10, 2026

Copy link
Copy Markdown

View / edit / reply to this conversation on ReviewNB

VladimirShitov commented on 2026-05-10T15:54:46Z
----------------------------------------------------------------

This tutorial fully follows the supervised methods one. Update the existing tutorial instead of creating a new one.


@review-notebook-app

review-notebook-app Bot commented May 10, 2026

Copy link
Copy Markdown

View / edit / reply to this conversation on ReviewNB

VladimirShitov commented on 2026-05-10T15:54:46Z
----------------------------------------------------------------

This should be handled on the level of patpy classes, not circumvented in the notebooks


@review-notebook-app

review-notebook-app Bot commented May 10, 2026

Copy link
Copy Markdown

View / edit / reply to this conversation on ReviewNB

VladimirShitov commented on 2026-05-10T15:54:47Z
----------------------------------------------------------------

This is a lot of code, which looks very Claude-generated to me. Benchmarking results are interesting, but this is not the right way to write tutorials. Modifying this code will take a lot of time for new data, and there are too many plots that will make readers tired very soon. Please simplify:

  1. Useful pieces of code can be turned into functions and added to patpy
  2. Reduce the number of plots, only show essentials.
  3. Explain what is happening on the plots: how to read and interpret the plots, what are notable patterns?

No need to display everything, a better way would be to:

  1. Explain what we want to achieve and how wee are going to measure the success
  2. Focus on the most successfull method, dive deeper (here, you can add plots)

But as mentioned before, this should be an extension of the previous tutorial for supervised methods, not a separate one


@VladimirShitov

Copy link
Copy Markdown
Member

Sorry, this is not ready to be merged. The benchmarking results are interesting, but the tutorial contains too much technical information and AI-generated code, while not providing a consistent story for users. Moreover, it largely duplicates the existing tutorial. The added methods should extend the previous tutorial.

If you would like to give it another try, I'd highly appreciate it! Otherwise, focus on patpy PRs, and I'll update the tutorial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants