AI Integrity: A New Paradigm for Verifiable AI Governance
Seulki Lee
Feedback
Why It Matters
This paper matters because it shifts the focus from outcome-based evaluations to a procedural approach that emphasizes transparency and auditability in AI systems, potentially leading to more trustworthy AI applications in critical sectors.
Contributions
- Introduction of AI Integrity as a new governance paradigm
- Development of the Authority Stack model
- Specification of the PRISM framework for measuring reasoning integrity
Insights
- AI Integrity emphasizes the importance of protecting the reasoning process from corruption and bias, rather than just evaluating outcomes.
Limitations
- The concept may face challenges in practical implementation due to the complexity of auditing AI systems' reasoning processes.
Tags
- alignment
- interpretability
- security
Abstract
arXiv:2604.11065v1 Announce Type: new Abstract: AI systems increasingly shape high-stakes decisions in healthcare, law, defense, and education, yet existing governance paradigms -- AI Ethics, AI Safety, and AI Alignment -- share a common limitation: they evaluate outcomes rather than verifying the reasoning process itself. This paper introduces AI Integrity, a concept defined as a state in which the Authority Stack of an AI system -- its layered hierarchy of values, epistemological standards, source preferences, and data selection criteria -- is protected from corruption, contamination, manipulation, and bias, and maintained in a verifiable manner. We distinguish AI Integrity from the three existing paradigms, define the Authority Stack as a 4-layer cascade model (Normative, Epistemic, Source, and Data Authority) grounded in established academic frameworks -- Schwartz Basic Human Values for normative authority, Walton argumentation schemes with GRADE/CEBM hierarchies for epistemic authority, and Source Credibility Theory for source authority -- characterize the distinction between legitimate cascading and Authority Pollution, and identify Integrity Hallucination as the central measurable threat to value consistency. We further specify the PRISM (Profile-based Reasoning Integrity Stack Measurement) framework as the operational methodology, defining six core metrics and a phased research roadmap. Unlike normative frameworks that prescribe which values are correct, AI Integrity is a procedural concept: it requires that the path from evidence to conclusion be transparent and auditable, regardless of which values a system holds.