Reinforcing privacy reasoning in LLMs via normative simulacra from fiction

Matt Franchi, Madiha Zahrah Choksi, Harold Triedman, Helen Nissenbaum

Published Apr 24, 2026

Editorial review6.8

Relevance0.465

Freshness0.000

Why It Matters

What makes this one worth your time

Improving LLMs' understanding of privacy norms is crucial for their safe deployment in real-world applications where privacy expectations vary by context.

Enhancing LLM privacy reasoning using normative simulacra from fiction.

Summary

The paper proposes a method to enhance privacy reasoning in large language models (LLMs) by extracting normative simulacra from fiction novels and using them for fine-tuning. This involves supervised learning followed by GRPO reinforcement learning, with a composite reward function that evaluates privacy reasoning against a normative universe. The approach is tested on CI-aligned benchmarks and shows improved privacy reasoning that aligns with human expectations.

Key contributions

Introduction of normative simulacra from fiction for LLM fine-tuning.
Development of a composite reward function for privacy reasoning.
Evaluation of the method on CI-aligned benchmarks showing improved alignment with human privacy expectations.

Notable insights

Using fiction novels as a source for normative simulacra to teach privacy reasoning.
Per-completion contrastive scoring to prevent overfitting to specific norms.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2604.20904v1 Announce Type: cross Abstract: Information handling practices of LLM agents are broadly misaligned with the contextual privacy expectations of their users. Contextual Integrity (CI) provides a principled framework, defining privacy as the appropriate flow of information within context-relative norms. However, existing approaches either double inference cost via supervisor-assistant architectures, or fine-tune on narrow task-specific data. We propose extracting normative simulacra (structured representations of norms and information flows) from fiction novels and using them to fine-tune LLMs via supervised learning followed by GRPO reinforcement learning. Our composite reward function combines programmatic signals, including task clarity (subsuming schema validity, construct discrimination, and extraction confidence), structural completeness, internal consistency, and context identification, with an LLM judge that evaluates whether the model's privacy reasoning is grounded in the held-out normative universe of the source text. To mitigate overfitting, we introduce per-completion contrastive scoring: each completion is evaluated against both the correct normative universe and a randomly selected wrong one, teaching the model to condition on context rather than memorize source-specific norms. We evaluate on five CI-aligned benchmarks spanning distinct societal contexts and ablate the contributions of RL and normative grounding. Across seven models, SFT introduces a conservative prior toward restricting information flow, improving recognition of privacy-relevant situations but not the correctness of privacy judgments. GRPO with normative grounding achieves the highest score on a law compliance benchmark and strongest correlation with crowdsourced human privacy expectations, demonstrating that fiction-derived normative simulacra can teach contextual privacy reasoning that transfers to real-world domains.