Force-Grounded, Cross-View
Articulated Manipulation
Bridging what is seen, what is done, and what is felt for multimodal, physically grounded human-robot interaction.
About the Workshop
Providing robots with the ability to manipulate articulated objects (doors, drawers, tools, containers) remains a central challenge in robotics. A key bottleneck is the lack of large-scale, multimodal datasets that simultaneously capture what is seen, what is done, and what is felt during real physical interaction. Especially touch, tactility, and force feedback are so far underrepresented in available datasets and methods, yet critical for robust robotic deployment.
This workshop brings together the manipulation, egocentric vision, and robot learning communities to discuss emerging challenges in force-grounded, cross-view articulated manipulation. As a focus point, we host a public challenge based on the Hoi! dataset, which directly addresses challenges in embodiment transfer and force-grounding by providing synchronized visual, force, and tactile streams across human and robot embodiments.
Important Dates
All deadlines are 23:59 AoE (Anywhere on Earth).
Paper Submission
| Submission opens | June 16, 2026 |
| Submission deadline | August 1, 2026 |
| Notification to authors | August 7, 2026 |
| Camera-ready deadline | August 20, 2026 |
Competition
| Competition opens | July 1, 2026 |
| Submission deadline | August 21, 2026 |
| Decisions to participants | September 1, 2026 |
Call for Papers
We welcome submissions on topics related to force-grounded manipulation, tactile sensing, and interaction understanding. The workshop accepts full-length papers (8 pages) and extended abstracts (4 pages), excluding references, in ECCV 2026 format. Authors of accepted submissions will be invited to present at the poster session. Accepted full-length papers will be included in the workshop proceedings. Submissions must be anonymized for double-blind review.
Topics of Interest
- Human-Object Interaction & articulated objects
- Force & tactile sensing for manipulation
- Egocentric video understanding
- Dexterous grasping & in-hand manipulation
- Cross-view and cross-embodiment learning
- Robot learning from human demonstration
- Force and torque prediction from video
- Physics-informed video models
- Multimodal datasets & benchmarks for manipulation
- Foundation models for robotic manipulation
- Affordance, contact estimation & action anticipation
- Embodied AI & sim-to-real transfer
Submission Guidelines
- Follow the official ECCV 2026 author kit.
- Full papers: up to 8 pages of content + unlimited pages for references. Included in proceedings.
- Extended abstracts: up to 4 pages of content + references. Non-archival.
- Submissions must be anonymized (double-blind review).
- All accepted submissions will be invited for a poster presentation; top papers may be selected for oral spotlight talks.
- All accepted authors will be asked to provide a 5-minute spotlight video for the workshop website.
Invited Speakers
Speakers will be announced soon.
Dataset Challenge
The workshop challenge is centered around the Hoi! dataset, the first large-scale benchmark to jointly capture visual, force, and tactile information across human and robot embodiments during articulated object manipulation. It contains 3,048 sequences spanning 381 articulated objects across 38 indoor environments, recorded under four embodiments. The challenge evaluates how well methods leverage cross-view and multimodal signals to understand and predict articulated manipulation.
T1 — Cross-View Articulation Estimation
Participants develop methods to estimate the parameters governing the manipulation action of an articulated object part across embodiment viewpoints (e.g., third-person to egocentric, human to robot). Models may use any combination of provided visual streams.
- Method Input: Any Hoi! video stream, excluding depth or poses.
- Target Metric: Revolute/Prismatic classification, Motion Axis estimation, Articulation Limits estimation (increasing difficulty).
T2 — Force Estimation
Participants develop methods to estimate contact forces from video alone. This track targets the underexplored problem of physical interaction understanding from vision.
- Method Input: Egocentric Interaction Video.
- Target Metric: RMSE of the estimated interaction force during the demonstrated interaction.
Workshop Schedule
Preliminary half-day program. Times will be finalized closer to the event.
Organizers










