RL-STPA: Adapting System-Theoretic Hazard Analysis for Safety-Critical Reinforcement Learning
Steven A. Senczyszyn, Timothy C. Havens, Nathaniel Rice, Jason E. Summers, Benjamin D. Werner, Benjamin J. Schumeg
Why It Matters
What makes this one worth your time
As reinforcement learning is increasingly applied in safety-critical domains, this framework provides a necessary tool for identifying potential hazards that traditional methods overlook.
RL-STPA offers a systematic approach to hazard analysis in safety-critical reinforcement learning.
Summary
The paper presents RL-STPA, a framework that adapts System-Theoretic Process Analysis for hazard analysis in safety-critical reinforcement learning applications, addressing challenges like black-box policies and distributional shifts.
Key contributions
- Introduction of RL-STPA framework for systematic hazard analysis in RL.
- Hierarchical subtask decomposition that incorporates temporal phase analysis.
- Iterative checkpoints for integrating hazard feedback into training processes.
Notable insights
- The use of hierarchical subtask decomposition to capture emergent behaviors in RL systems is a novel approach that leverages domain expertise.
- Coverage-guided perturbation testing introduces a method for exploring state-action space sensitivity, which is crucial for understanding RL policy robustness.
Possible limitations
- RL-STPA cannot provide formal guarantees for arbitrary neural policies.
- Not stated in the abstract.
Abstract
arXiv:2604.15201v1 Announce Type: new Abstract: As reinforcement learning (RL) deployments expand into safety-critical domains, existing evaluation methods fail to systematically identify hazards arising from the black-box nature of neural network enabled policies and distributional shift between training and deployment. This paper introduces Reinforcement Learning System-Theoretic Process Analysis (RL-STPA), a framework that adapts conventional STPA's systematic hazard analysis to address RL's unique challenges through three key contributions: hierarchical subtask decomposition using both temporal phase analysis and domain expertise to capture emergent behaviors, coverage-guided perturbation testing that explores the sensitivity of state-action spaces, and iterative checkpoints that feed identified hazards back into training through reward shaping and curriculum design. We demonstrate RL-STPA in the safety-critical test case of autonomous drone navigation and landing, revealing potential loss scenarios that can be missed by standard RL evaluations. The proposed framework provides practitioners with a toolkit for systematic hazard analysis, quantitative metrics for safety coverage assessment, and actionable guidelines for establishing operational safety bounds. While RL-STPA cannot provide formal guarantees for arbitrary neural policies, it offers a practical methodology for systematically evaluating and improving RL safety and robustness in safety-critical applications where exhaustive verification methods remain intractable.