Yihao Wang*, Yang Miao*, Wenshuai Zhao, Wenyan Yang, Zihan Wang, Joni Pajarinen, Luc Van Gool, Danda Pani Paudel, Juho Kannala, Xi Wang†, Arno Solin
*Equal contribution †Co-advisor
Aalto University | INSAIT, Sofia University | ETH Zurich | TU Munich | MCML | ELLIS Institute Finland | University of Oulu
PAWS perceives object articulations from in-the-wild egocentric video via hand interaction and geometric cues, enabling downstream applications including articulation model fine-tuning and robot manipulation.
We propose PAWS, a method that directly extracts object articulations from hand–object interactions in large-scale in-the-wild egocentric videos. PAWS is an unsupervised articulation detection pipeline that uses only hand interactions and sparse 3D information, requiring no annotated data. It produces scalable articulation labels covering a wide range of objects and environments. We evaluate our method on HD-EPIC and Arti4D datasets, achieving significant improvements over baselines, and further demonstrate downstream applications in 3D articulation prediction and real-world robot manipulation.
See the project website at https://aaltoml.github.io/PAWS/.
- HaWoR — World-Space Hand Motion Reconstruction from Egocentric Videos (CVPR 2025)
- VidBot — Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation (CVPR 2025)
- Articulate3D — Zero-Shot Text-Driven 3D Object Posing
@misc{wang2026pawsperceptionarticulationwild,
title={PAWS: Perception of Articulation in the Wild at Scale from Egocentric Videos},
author={Yihao Wang and Yang Miao and Wenshuai Zhao and Wenyan Yang and Zihan Wang and Joni Pajarinen and Luc Van Gool and Danda Pani Paudel and Juho Kannala and Xi Wang and Arno Solin},
year={2026},
eprint={2603.25539},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.25539},
}