A Multimodal Dataset for Force-Grounded, Cross-View Articulated Manipulation

Coupling what is seen, what is done, and what is felt during real human interaction with articulated objects.

1ETH Zurich · 2TU Munich · 3U. Freiburg · 4Microsoft · 5U. Bonn
Hoi! dataset teaser
3048
Sequences
381
Articulated Objects
38
Environments
4
Embodiments

Abstract

We present a dataset for force-grounded, cross-view articulated manipulation that couples what is seen with what is done and what is felt during real human interaction. The dataset contains 3048 sequences across 381 articulated objects in 38 environments. Each object is operated under four embodiments — (i) human hand, (ii) human hand with a wrist-mounted camera, (iii) handheld UMI gripper, and (iv) a custom Hoi! gripper — where the tool embodiment provides synchronized end-effector forces and tactile sensing. Our dataset offers a holistic view of interaction understanding from video, enabling researchers to evaluate how well methods transfer between human and robotic viewpoints, but also investigate underexplored modalities such as force sensing and prediction.

The Dataset

4 Manipulation Schemes

Hoi! gripper, human hand, hand with wrist-camera, and UMI gripper — enabling cross-embodiment research.

Multi-View Capture

Egocentric (Aria glasses), manipulation-centric (wrist/gripper), and exocentric (iPhone RGB-D) viewpoints.

Force & Tactile Sensing

Synchronized force/torque and DIGIT tactile sensing through the custom Hoi! gripper.

Spatial Alignment

All recordings registered to a common frame via Leica laser scans, with articulated and static states.

381 Articulated Objects

Drawers, cabinets, dishwashers, fridges — across 38 real-world kitchens, bathrooms, and bedrooms.

Temporally Aligned

Nanosecond-resolution timestamps across all recording modules within each session.

The Hoi! Gripper

The Hoi! gripper is a custom end-effector designed to bridge human and robotic manipulation. Worn like a handheld tool, it captures aligned force/torque and tactile sensing alongside egocentric video — making it the primary instrumented embodiment in the dataset.

  • F/T Sensing — 6-axis force/torque at the end-effector
  • Tactile Sensing — DIGIT sensor for contact texture and pressure
  • Stereo Camera — manipulation-centric stereo camera aligned with interaction
CAD Files — Soon
drag to rotate · scroll to zoom

Paper Figures

Key figures from the paper. See the full paper for all figures and details.

Recording pipeline
Recording Pipeline. The four manipulation schemes and recording modules used for capture.
Dataset statistics
Dataset Statistics. Distribution of environments and articulated interaction categories in the Hoi! dataset.
Force profiles
Force Profiles. Force/torque measurements during articulated object manipulation.

Explore Samples

RGB image sequences from each recording modality and a 3D Leica point cloud. Select a scene to explore.

Scene:
Human Hand
Human Hand view
Hoi! Gripper
Hoi! Gripper view
Exocentric (iPhone)
Exocentric view
UMI Gripper
UMI Gripper view

Leica Point Cloud — Bedroom

Click and drag to rotate · Scroll to zoom

Dataset Documentation

Loading documentation…

Download the Dataset

Loading dataset index…

Team

Contributors

KS
Kavya Shankar
U. Bonn

Citation

@InProceedings{Engelbracht_2026_CVPR,
    author    = {Engelbracht, Tim and Zurbrügg, René and Wohlrapp, Matteo and Büchner, Martin and Valada, Abhinav and Pollefeys, Marc and Blum, Hermann and Bauer, Zuria},
    title     = {Hoi! - A Multimodal Dataset for Force-Grounded, Cross-View Articulated Manipulation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
    pages     = {8880-8890}
}

License

The Hoi! dataset is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. You are free to share and adapt the material for any purpose, including commercially, as long as you give appropriate credit.

ETH Zurich TU Munich University of Freiburg University of Bonn Microsoft