OrderDP: A Theoretically Guaranteed Lossless Dynamic Data Pruning Framework

ICLR 2026 Poster | [Paper]

A plug-and-play dynamic data pruning framework with theoretical guarantees for lossless training acceleration.

Overview

Data pruning aims to reduce training cost by discarding samples that appear less informative, while preserving the accuracy of full-dataset training. Existing approaches often rely on heuristic importance scores, which can introduce biased gradient estimation and make their optimization behavior hard to characterize.

OrderDP addresses this issue with a simple two-stage strategy: it first samples a random subset of the training set, then keeps the top-q samples within that subset according to the current surrogate loss. This design yields a practical dynamic pruning rule with theoretical guarantees and strong empirical performance.

This repository contains the current PyTorch code release used for our CIFAR and ImageNet experiments, including:

full-data baselines,
InfoBatch comparison code used in the paper,
OrderDP implementations for CIFAR and ImageNet.

Repository Structure

OrderDP/
├── assets/
├── cifar/
│   ├── cifar_example.py
│   ├── orderDP_example.py
│   ├── model.py
│   ├── lars.py
│   ├── lamb.py
│   ├── infobatch/
│   └── orderDP/
└── imagenet/
    ├── prune_experiment_orderdp.py
    ├── prune_experiment_unsup.py
    ├── orderdp_dataloader.py
    ├── infobatch_dataloader.py
    ├── lars.py
    ├── r50_orderdp_90epoch.sh
    └── r50_unsup_90.sh

Environment

The codebase assumes a standard PyTorch training environment with:

Python 3.9+
PyTorch
torchvision
numpy
matplotlib

Install the missing dependencies in your own environment before running experiments.

CIFAR Experiments

Run commands from the repository root:

python3 cifar/cifar_example.py \
  --model r50 --dataset cifar100 --optimizer sgd --max-lr 0.03 \
  --batch-size 128 --num_epoch 200

InfoBatch baseline:

python3 cifar/cifar_example.py \
  --model r50 --dataset cifar100 --optimizer sgd --max-lr 0.03 \
  --ratio 0.5 --batch-size 128 --num_epoch 200 \
  --is_anealing 0 --available_GPU 0 --use_info_batch

OrderDP:

python3 cifar/orderDP_example.py \
  --model r50 --dataset cifar100 --optimizer sgd --max-lr 0.03 \
  --random_len_ratio 0.8 --top_q_ratio 0.375 \
  --batch-size 128 --num_epoch 200 --available_GPU 0 --use_orderDP

The CIFAR scripts support both CIFAR-10 and CIFAR-100 through --dataset.

ImageNet Experiments

Prepare ImageNet in the standard layout:

IMAGENET_ROOT/
├── train/
└── val/

Then run from the repository root:

bash imagenet/r50_orderdp_90epoch.sh

or

bash imagenet/r50_unsup_90.sh

If you prefer to launch the scripts manually, the main entry points are:

imagenet/prune_experiment_orderdp.py
imagenet/prune_experiment_unsup.py

Update the ImageNet path in the shell scripts or pass your own dataset path on the command line.

Notes

The repository is self-contained and does not require an external InfoBatch checkout.
Some utility implementations are adapted from the InfoBatch open-source release for fair comparison and reproducibility.
Training logs, downloaded datasets, and checkpoints are excluded through .gitignore.

Citation

If you find OrderDP useful in your research, please consider citing:

@inproceedings{
  jin2026orderdp,
  title={Order{DP}: A Theoretically Guaranteed Lossless Dynamic Data Pruning Framework},
  author={Chenhan Jin and Shengze Xu and Qingsong Wang and Fan JIA and Dingshuo Chen and Tieyong Zeng},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=e77QyyRQPz}
}

Acknowledgements

This work builds on the open-source PyTorch ecosystem and prior research on data pruning and importance sampling. Our implementation is inspired in part by the InfoBatch codebase (https://github.com/NUS-HPC-AI-Lab/InfoBatch). We thank the InfoBatch authors and the broader community for releasing code that supported comparison and reproducibility.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OrderDP: A Theoretically Guaranteed Lossless Dynamic Data Pruning Framework

Overview

Repository Structure

Environment

CIFAR Experiments

ImageNet Experiments

Notes

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
cifar		cifar
imagenet		imagenet
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

OrderDP: A Theoretically Guaranteed Lossless Dynamic Data Pruning Framework

Overview

Repository Structure

Environment

CIFAR Experiments

ImageNet Experiments

Notes

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages