Back to top papers

AI Integrity: A New Paradigm for Verifiable AI Governance

Seulki Lee

Score8.500
LLMn/a
Embedding0.479
Recencyn/a

Feedback

Why It Matters

This paper matters because it shifts the focus from outcome-based evaluations to a procedural approach that emphasizes transparency and auditability in AI systems, potentially leading to more trustworthy AI applications in critical sectors.

Contributions

  • Introduction of AI Integrity as a new governance paradigm
  • Development of the Authority Stack model
  • Specification of the PRISM framework for measuring reasoning integrity

Insights

  • AI Integrity emphasizes the importance of protecting the reasoning process from corruption and bias, rather than just evaluating outcomes.

Limitations

  • The concept may face challenges in practical implementation due to the complexity of auditing AI systems' reasoning processes.

Tags

  • alignment
  • interpretability
  • security

Abstract

arXiv:2604.11065v1 Announce Type: new Abstract: AI systems increasingly shape high-stakes decisions in healthcare, law, defense, and education, yet existing governance paradigms -- AI Ethics, AI Safety, and AI Alignment -- share a common limitation: they evaluate outcomes rather than verifying the reasoning process itself. This paper introduces AI Integrity, a concept defined as a state in which the Authority Stack of an AI system -- its layered hierarchy of values, epistemological standards, source preferences, and data selection criteria -- is protected from corruption, contamination, manipulation, and bias, and maintained in a verifiable manner. We distinguish AI Integrity from the three existing paradigms, define the Authority Stack as a 4-layer cascade model (Normative, Epistemic, Source, and Data Authority) grounded in established academic frameworks -- Schwartz Basic Human Values for normative authority, Walton argumentation schemes with GRADE/CEBM hierarchies for epistemic authority, and Source Credibility Theory for source authority -- characterize the distinction between legitimate cascading and Authority Pollution, and identify Integrity Hallucination as the central measurable threat to value consistency. We further specify the PRISM (Profile-based Reasoning Integrity Stack Measurement) framework as the operational methodology, defining six core metrics and a phased research roadmap. Unlike normative frameworks that prescribe which values are correct, AI Integrity is a procedural concept: it requires that the path from evidence to conclusion be transparent and auditable, regardless of which values a system holds.