MerNav

MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation

Dekang Qi¹, Shuang Zeng^1,2, Xinyuan Chang¹, Feng Xiong¹, Shichao Xie¹, Xiaolong Wu¹, Mu Xu¹,

¹Amap, Alibaba Group, ²Xi'an Jiaotong University

Abstract

Visual Language Navigation (VLN) is one of the fundamental capabilities for embodied intelligence and a critical challenge that urgently needs to be addressed. However, existing methods are still unsatisfactory in terms of both success rate (SR) and generalization: Supervised Fine-Tuning (SFT) approaches typically achieve higher SR, while Training-Free (TF) approaches often generalize better, but it is difficult to obtain both simultaneously. To this end, we propose a Memory-Execute-Review framework. It consists of three parts: a hierarchical memory module for providing information support, an execute module for routine decision-making and actions, and a review module for handling abnormal situations and correcting behavior. We validated the effectiveness of this framework on the Object Goal Navigation task. Across 4 datasets, our average SR achieved absolute improvements of 7% and 5% compared to all baseline methods under TF and Zero-Shot (ZS) settings, respectively. On the most commonly used HM3D_v0.1 and the more challenging open vocabulary dataset HM3D_OVON, the SR improved by 8% and 6%, under ZS settings. Furthermore, on the MP3D and HM3D_OVON datasets, our method not only outperformed all TF methods but also surpassed all SFT methods, achieving comprehensive leadership in both SR (5% and 2%) and generalization. Additionally, we deployed the MerNav model on the humanoid robot and conducted experiments in the real world. The project address is: https://qidekang.github.io/MerNav.github.io/

Approach

The framework of MerNav. Memory-Execute-Review. The Memory module provides priors and informational support for decision-making; under nominal conditions, task completion is handled by the Execute module. Meanwhile, the Review module continuously monitors the execution process from an independent perspective and, upon detecting anomalies or deviations, triggers corresponding corrective modes to rectify behavior.

Experiment

Real-World Visualization

Real Case 1, Object Goal: Plants

MerNav's decision-making process for finding object goal 'plants', https://github.com/QiDekang/MerNav.github.io/blob/main/images/real_case/palnt/plants.png

Real Case 2, Object Goal: Football

MerNav's decision-making process for finding object goal 'football', https://github.com/QiDekang/MerNav.github.io/blob/main/images/real_case/football/football.png

BibTeX Citation

                    @article{qi2026mernav,
                      title={MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation},
                      author={Qi, Dekang and Zeng, Shuang and Chang, Xinyuan and Xiong, Feng and Xie, Shichao and Wu, Xiaolong and Xu, Mu},
                      journal={arXiv preprint arXiv:2602.05467},
                      year={2026}
                    }