HA-VLN 2.0：An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions

Yifei Dong ^*

University of Washington

Fengyi Wu ^*

University of Washington

Qi He ^*

University of Washington

Zhi-Qi Cheng ^†

University of Washington

Heng Li

University of Washington

Minghan Li

Galbot

Zebang Cheng

University of Washington

Yuxuan Zhou

University of Mannheim

Jingdong Sun

Carnegie Mellon University

Qi Dai

Microsoft Research

Alexander G. Hauptmann

Carnegie Mellon University

^*Equal contribution. Work done during internship at UW., ^†Corresponding author.

Paper Code arXiv

**HA-VLN 2.0 Navigation Scenario.** HA-VLN 2.0 adds four key challenges: (i) unified discrete/continuous navigation with denser crowds, richer activities, and mixed indoor–outdoor scenes; (ii) stricter social-distance and collision constraints under partial observability; (iii) instructions explicitly grounded in human activities and spatial cues, improving language–vision alignment; and (iv) robust real-time planning amid occlusion and multi-human dynamics. Example: key positions (e.g., ①, ②) align with *instructional cues* referring to specific human behaviors. When the agent encounters a bystander on the phone (②, Decision A), it intelligently turns right to avert a potential collision. On the right, RGB and Depth observations illustrate the agent's panoramic view preceding decisions A, B, and C, capturing its dynamic responses to nearby humans.

Explore the Simulator

Abstract

Vision-and-Language Navigation (VLN) has been studied mainly in either discrete or continuous settings, with little attention to dynamic, crowded environments. We present HA-VLN 2.0, a unified benchmark introducing explicit social-awareness constraints. Our contributions are: (i) a standardized task and metrics capturing both goal accuracy and personal-space adherence; (ii) HAPS 2.0 dataset and simulators modeling multi-human interactions, outdoor contexts, and finer language–motion alignment; (iii) benchmarks on 16,844 socially grounded instructions, revealing sharp performance drops of leading agents under human dynamics and partial observability; and (iv) real-world robot experiments validating sim-to-real transfer, with an open leaderboard enabling transparent comparison. Results show that explicit social modeling improves navigation robustness and reduces collisions, underscoring the necessity of human-centric approaches. By releasing datasets, simulators, baselines, and protocols, HA-VLN 2.0 provides a strong foundation for safe, socially responsible navigation research.

HA-VLN Simulator

HA-VLN-CE simulator incorporates dynamic human activities into photorealistic Habitat environments. The annotation process includes: **1).** integrating the HAPS 2.0 dataset with 172 activities and 486 detailed 3D motion models across 58,320 frames; **2).** a two-stage annotation— **Stage 1:** coarse-to-fine using PSO algorithm and multi-view cameras, and **Stage 2:** human-in-the-loop for enhancing multi-human interactions and movements; **3).** real-time rendering using a signaling mechanism; and **4).** enabling agent-environment interactions.

We present several annotated instances of human subjects from the proposed HAPS 2.0 Dataset (overall and single), showcasing a variety of well-aligned motions, movements, and interactions.

Overall View of Nine Annotated Scenarios from HA-VLN Simulator (90 scans in total)

Single Humans with Movements (910 humans in total)

Visualization results of agent's trajectory

Navigation Instruction: Start by moving forward in the lounge area, where an individual is engaged in a phone conversation while pacing back and forth. Navigate carefully to avoid crossing their path. As you proceed, you will pass by a television mounted on the wall. Continue your movement, observing people relaxing and watching the TV, some seated comfortably on sofas. Further along, notice a group of friends raising their glasses in a toast, enjoying cocktails together. Maintain a steady course, ensuring you do not disrupt their gathering. Finally, reach the end of your path where a potted plant is situated next to a door. Stop at this location, positioning yourself near the plant and door without obstructing access.

Navigation Instruction: Exit the room and make a left turn. Proceed down the hallway where an individual is ironing clothes, carefully smoothing out wrinkles on garments. Continue walking and make another left turn. Enter the next room, which is a bedroom. Inside, someone is comfortably seated in bed, engrossed in reading a book. Move past the bed, ensuring not to disturb the reader. Turn left again to enter the bathroom. Once inside, position yourself near the sink and wait there, observing the surroundings without interfering with any activities.

Datasets

Download Here!

BibTeX citation

    @misc{dong2025havlnbenchmarkhumanawarenavigation,
      title={HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard}, 
      author={Yifei Dong and Fengyi Wu and Qi He and Heng Li and Minghan Li and Zebang Cheng and Yuxuan Zhou and Jingdong Sun and Qi Dai and Zhi-Qi Cheng and Alexander G Hauptmann},
      year={2025},
      eprint={2503.14229},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2503.14229}, 
}

Explore the Simulator

Explore the Simulator

Abstract

HA-VLN Simulator

Navigation Visualization

Datasets

BibTeX citation