[Question] Potential Routing Collapse observed in YOLO-Master-v0.1-N.pt via official MoE analysis script

Hi YOLO-Master authors,

Thank you for sharing this amazing work! I’ve been experimenting with the pre-trained weights for my project on small object segmentation and deeply appreciate the idea of Instance-conditional adaptive computation.

However, while analyzing the expert utilization of the pre-trained weights (`YOLO-Master-v0.1-N.pt`) using the official script, I noticed some unusual statistics that look like a potential **Routing Collapse**. I am a bit confused and would like to seek your clarification.

I used the official script provided at `ultralytics/nn/modules/moe/analysis.py` to diagnose the `YOLO-Master-v0.1-N.pt` model on the **MS COCO 2017 val dataset (5000 images)**.

The diagnosis report shows:
1. Total Tokens Processed: `15,003`. Since there are 3 router layers and 5001 forward passes (5000 val images + 1 warmup), this perfectly aligns with the instance-level routing design (1 token per image).
2. Static Expert Activation: For all 5001 images, the routers exclusively selected the exact same two experts with exactly a 50/50 split, regardless of the image content or scene complexity.

For example, in `model.12.routing`:
- Expert 6: 50.00% (5001 Hits)
- Expert 15: 50.00% (5001 Hits)
(Other layers like `model.6.routing` and `model.9.routing` exhibit the exact same 100% static behavior with their respective two experts).

Since the paper emphasizes that the ES-MOE block dynamically allocates computational resources according to scene complexity, I expected the expert distribution to vary across different images (e.g., crowded scenes vs. simple backgrounds). 

1. Is this static routing behavior expected for the `v0.1-N.pt` release? Did this specific checkpoint suffer from load balancing loss collapse during early training, causing it to fall back to a static network?
2. Regarding `MoEPruner`: I noticed the recent addition of the `MoEPruner` tool to prune experts with <15% utilization. Was this tool developed specifically to address this kind of routing redundancy observed in the current weights?

Environment:
- Weights: YOLO-Master-v0.1-N.pt (from assets)
- Dataset: MS COCO 2017 val
- Ultralytics Version: 8.3.240
- Device: CPU / GPU (both yield the same logic)

Looking forward to your insights! Thank you again for the great contribution to the community.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Potential Routing Collapse observed in YOLO-Master-v0.1-N.pt via official MoE analysis script #36

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Question] Potential Routing Collapse observed in YOLO-Master-v0.1-N.pt via official MoE analysis script #36

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions