HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard

Yifei Dong *

University of Washington

Fengyi Wu *

University of Washington

Qi He *

University of Washington

Heng Li

University of Washington

Minghan Li

Galbot

Zebang Cheng

University of Washington

Yuxuan Zhou

University of Mannheim

Jingdong Sun

Carnegie Mellon University

Qi Dai

Microsoft Research

Zhi-Qi Cheng

University of Washington

Alexander G. Hauptmann

Carnegie Mellon University

*Equal contribution. Work done during internship at UW., Corresponding author.

HA-VLN Navigation Scenario: An agent navigates environments with dynamic human activities, avoiding collisions by adjusting its path based on instructions and observations. Agent positions (e.g., ①, ②) align with instructions related to human movements. Agent decisions (e.g., A, B) represent actions taken in response to observed activities. For instance, at position ②, Decision A, the agent encounters a person on the phone and turns right to avoid collision. R, G, B and Depth observations (right side) show agent views preceding Decisions A, B, and C, capturing the agent's dynamic responses to human actions. (Zoom in for more details.)

Abstract

Vision-and-Language Navigation (VLN) systems often focus on either discrete (panoramic) or continuous (free-motion) paradigms alone, overlooking the complexities of human-populated, dynamic environments. We introduce a unified Human-Aware VLN (HA-VLN) benchmark that merges these paradigms under explicit social-awareness constraints. Our contributions include: (1) a standardized task definition that balances discrete-continuous navigation with personal-space requirements; (2) an enhanced human motion dataset (HAPS 2.0) and upgraded simulators capturing realistic multi-human interactions, outdoor contexts, and refined motion-language alignment; (3) extensive benchmarking on 16,844 human-centric instructions, revealing how multi-human dynamics and partial observability pose substantial challenges for leading VLN agents; (4) real-world robot tests validating sim-to-real transfer in crowded indoor spaces; and (5) a public leaderboard supporting transparent comparisons across discrete and continuous tasks. Empirical results show improved navigation success and fewer collisions when social context is integrated, underscoring the need for human-centric design. By releasing all datasets, simulators, agent code, and evaluation tools, we aim to advance safer, more capable, and socially responsible VLN research.

HA-VLN Simulator

HA-VLN-CE simulator incorporates dynamic human activities into photorealistic Habitat environments. The annotation process includes: 1). integrating the HAPS 2.0 dataset with 172 activities and 486 detailed 3D motion models across 58,320 frames; 2). a two-stage annotation— Stage 1: coarse-to-fine using PSO algorithm and multi-view cameras, and Stage 2: human-in-the-loop for enhancing multi-human interactions and movements; 3). real-time rendering using a signaling mechanism; and 4). enabling agent-environment interactions.

Explore the Simulator

We present several annotated instances of human subjects from the proposed HAPS 2.0 Dataset (overall and single), showcasing a variety of well-aligned motions, movements, and interactions.

Single Humans with Movements (910 humans in total)

Visualization results of agent's trajectory

Navigation Instruction: Start by moving forward in the lounge area, where an individual is engaged in a phone conversation while pacing back and forth. Navigate carefully to avoid crossing their path. As you proceed, you will pass by a television mounted on the wall. Continue your movement, observing people relaxing and watching the TV, some seated comfortably on sofas. Further along, notice a group of friends raising their glasses in a toast, enjoying cocktails together. Maintain a steady course, ensuring you do not disrupt their gathering. Finally, reach the end of your path where a potted plant is situated next to a door. Stop at this location, positioning yourself near the plant and door without obstructing access.
Navigation Instruction: Exit the room and make a left turn. Proceed down the hallway where an individual is ironing clothes, carefully smoothing out wrinkles on garments. Continue walking and make another left turn. Enter the next room, which is a bedroom. Inside, someone is comfortably seated in bed, engrossed in reading a book. Move past the bed, ensuring not to disturb the reader. Turn left again to enter the bathroom. Once inside, position yourself near the sink and wait there, observing the surroundings without interfering with any activities.

Datasets

Download Here!

Validation on Real-world Robots

To validate the performance of our navigation agents in real-world scenarios, we conducted experiments using a Unitree GO2-EDU quadruped robot.

Examples of the robot navigating in different real environments.

BibTeX citation

    @misc{dong2025havlnbenchmarkhumanawarenavigation,
      title={HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard}, 
      author={Yifei Dong and Fengyi Wu and Qi He and Heng Li and Minghan Li and Zebang Cheng and Yuxuan Zhou and Jingdong Sun and Qi Dai and Zhi-Qi Cheng and Alexander G Hauptmann},
      year={2025},
      eprint={2503.14229},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2503.14229}, 
}