MerNav
MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation
Abstract
Visual Language Navigation (VLN) is one of the fundamental capabilities for embodied intelligence and a critical challenge that urgently needs to be addressed. However, existing methods are still unsatisfactory in terms of both success rate (SR) and generalization: Supervised Fine-Tuning (SFT) approaches typically achieve higher SR, while Training-Free (TF) approaches often generalize better, but it is difficult to obtain both simultaneously. To this end, we propose a Memory-Execute-Review framework. It consists of three parts: a hierarchical memory module for providing information support, an execute module for routine decision-making and actions, and a review module for handling abnormal situations and correcting behavior. We validated the effectiveness of this framework on the Object Goal Navigation task. Across 4 datasets, our average SR achieved absolute improvements of 7% and 5% compared to all baseline methods under TF and Zero-Shot (ZS) settings, respectively. On the most commonly used HM3D_v0.1 and the more challenging open vocabulary dataset HM3D_OVON, the SR improved by 8% and 6%, under ZS settings. Furthermore, on the MP3D and HM3D_OVON datasets, our method not only outperformed all TF methods but also surpassed all SFT methods, achieving comprehensive leadership in both SR (5% and 2%) and generalization. Additionally, we deployed the MerNav model on the humanoid robot and conducted experiments in the real world. The project address is: https://qidekang.github.io/MerNav.github.io/
Approach
Experiment
Real-World Visualization
Real Case 1, Object Goal: Plants
MerNav's decision-making process for finding object goal 'plants', https://github.com/QiDekang/MerNav.github.io/blob/main/images/real_case/palnt/plants.png
Real Case 2, Object Goal: Football
MerNav's decision-making process for finding object goal 'football', https://github.com/QiDekang/MerNav.github.io/blob/main/images/real_case/football/football.png
BibTeX Citation
@article{qi2026mernav,
title={MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation},
author={Qi, Dekang and Zeng, Shuang and Chang, Xinyuan and Xiong, Feng and Xie, Shichao and Wu, Xiaolong and Xu, Mu},
journal={arXiv preprint arXiv:2602.05467},
year={2026}
}