Is VLA Reasoning Faithful? Probing Safety of Chain-of-Causation

Nicanor Mayumu, Xiaoheng Deng, Patrick Mukala

Published May 19, 2026

Editorial review7.2

Relevance0.473

Freshness0.000

Why It Matters

What makes this one worth your time

Understanding the limitations of VLA models is crucial for improving their reliability in real-world applications, particularly in safety-critical environments.

This study uncovers critical faithfulness issues in VLA models, highlighting their reasoning inconsistencies.

Summary

The paper conducts a systematic analysis of faithfulness in Vision-Language-Action models, revealing significant issues with reasoning fidelity and trajectory consistency across various scenarios.

Key contributions

A systematic study of reasoning fidelity in VLA models with quantitative metrics.
Definition of entity and action fidelity with verification criteria.
Outline of a four-component safety architecture based on empirical findings.

Notable insights

The formalization of faithfulness information-theoretically provides a new lens for evaluating model reliability.
The identification of trajectory fragility under mild perturbations suggests vulnerabilities in model robustness.

Possible limitations

The abstract does not address potential implications of the findings on model design or training methodologies.
Specific details on the methodology used for analysis are not provided.

Abstract

arXiv:2605.17268v1 Announce Type: new Abstract: We present the first systematic study of faithfulness in Vision-Language-Action (VLA) driving models, analyzing 300 Alpamayo-R1-10B inferences across 100 diverse PhysicalAI-AV scenarios. Our main finding is that output natural-language rationales with trajectories may be significantly unfaithful: (i) overall reasoning fidelity is only 42.5%, with Chain-of-Causation matching scene reality less than half the time; (ii) 94 missed pedestrians in one-third of pedestrian-relevant scenes; (iii) 97.7% trajectory fragility under mild visual perturbations; and (iv) only 48.3% mean reasoning-action consistency, with 53.3% of inferences exhibiting low consistency, including 37.9% of stop-claimed cases where the model continues instead. We formalize faithfulness information-theoretically, define entity and action fidelity with verification criteria, and outline a four-component safety architecture aligned with these results.