Add MIL benchmark tutorial (MixMIL, ABMIL, representation benchmark)#3
Add MIL benchmark tutorial (MixMIL, ABMIL, representation benchmark)#3Dinesh-Adhithya-H wants to merge 1 commit into
Conversation
Fully executed notebook demonstrating patient-level prediction from single-cell data using Multiple Instance Learning on the COMBAT dataset. Covers MIL benchmark (MixMIL + ABMIL), attention visualisations, and a representation benchmark comparing pseudobulk / composition / MIL bag embeddings with a linear probe.
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
View / edit / reply to this conversation on ReviewNB VladimirShitov commented on 2026-05-10T15:54:44Z Line #5. why not |
|
View / edit / reply to this conversation on ReviewNB VladimirShitov commented on 2026-05-10T15:54:45Z Line #12. adata = adata[adata.obs[lk].notna()].copy() It can also be simplified. Copying data for each column can be crazy expensive for large datasets. Use something like: cells_with_complete_data = adata.obs.dropna(subset=LABEL_COLS).index adata = adata[cells_with_complete_data].copy() |
|
View / edit / reply to this conversation on ReviewNB VladimirShitov commented on 2026-05-10T15:54:46Z This tutorial fully follows the supervised methods one. Update the existing tutorial instead of creating a new one. |
|
View / edit / reply to this conversation on ReviewNB VladimirShitov commented on 2026-05-10T15:54:46Z This should be handled on the level of patpy classes, not circumvented in the notebooks |
|
View / edit / reply to this conversation on ReviewNB VladimirShitov commented on 2026-05-10T15:54:47Z This is a lot of code, which looks very Claude-generated to me. Benchmarking results are interesting, but this is not the right way to write tutorials. Modifying this code will take a lot of time for new data, and there are too many plots that will make readers tired very soon. Please simplify:
No need to display everything, a better way would be to:
But as mentioned before, this should be an extension of the previous tutorial for supervised methods, not a separate one |
|
Sorry, this is not ready to be merged. The benchmarking results are interesting, but the tutorial contains too much technical information and AI-generated code, while not providing a consistent story for users. Moreover, it largely duplicates the existing tutorial. The added methods should extend the previous tutorial. If you would like to give it another try, I'd highly appreciate it! Otherwise, focus on patpy PRs, and I'll update the tutorial. |
Fully executed notebook demonstrating patient-level prediction from single-cell data using Multiple Instance Learning on the COMBAT dataset. Covers MIL benchmark (MixMIL + ABMIL), attention visualisations, and a representation benchmark comparing pseudobulk / composition / MIL bag embeddings with a linear probe.