Skip to content

[Question] Potential Routing Collapse observed in YOLO-Master-v0.1-N.pt via official MoE analysis script #36

Description

@YuSeanLuo

Hi YOLO-Master authors,

Thank you for sharing this amazing work! I’ve been experimenting with the pre-trained weights for my project on small object segmentation and deeply appreciate the idea of Instance-conditional adaptive computation.

However, while analyzing the expert utilization of the pre-trained weights (YOLO-Master-v0.1-N.pt) using the official script, I noticed some unusual statistics that look like a potential Routing Collapse. I am a bit confused and would like to seek your clarification.

I used the official script provided at ultralytics/nn/modules/moe/analysis.py to diagnose the YOLO-Master-v0.1-N.pt model on the MS COCO 2017 val dataset (5000 images).

The diagnosis report shows:

  1. Total Tokens Processed: 15,003. Since there are 3 router layers and 5001 forward passes (5000 val images + 1 warmup), this perfectly aligns with the instance-level routing design (1 token per image).
  2. Static Expert Activation: For all 5001 images, the routers exclusively selected the exact same two experts with exactly a 50/50 split, regardless of the image content or scene complexity.

For example, in model.12.routing:

  • Expert 6: 50.00% (5001 Hits)
  • Expert 15: 50.00% (5001 Hits)
    (Other layers like model.6.routing and model.9.routing exhibit the exact same 100% static behavior with their respective two experts).

Since the paper emphasizes that the ES-MOE block dynamically allocates computational resources according to scene complexity, I expected the expert distribution to vary across different images (e.g., crowded scenes vs. simple backgrounds).

  1. Is this static routing behavior expected for the v0.1-N.pt release? Did this specific checkpoint suffer from load balancing loss collapse during early training, causing it to fall back to a static network?
  2. Regarding MoEPruner: I noticed the recent addition of the MoEPruner tool to prune experts with <15% utilization. Was this tool developed specifically to address this kind of routing redundancy observed in the current weights?

Environment:

  • Weights: YOLO-Master-v0.1-N.pt (from assets)
  • Dataset: MS COCO 2017 val
  • Ultralytics Version: 8.3.240
  • Device: CPU / GPU (both yield the same logic)

Looking forward to your insights! Thank you again for the great contribution to the community.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions