EUDAIMONIA: Evaluating Undesirable Dynamics in AI

Jun Rui Huang, Wang Bill Zhu, Ziyi Liu, Nathanael Fast, Ravi Iyer, Robin Jia

Published Jun 1, 2026

Editorial review6.8

Relevance0.498

Freshness0.000

Why It Matters

What makes this one worth your time

Understanding and mitigating the social risks of LLMs is crucial for ensuring user welfare and ethical AI deployment in real-world applications.

The paper evaluates social risks in LLM interactions using a new benchmark and framework.

Summary

The paper introduces the Social AI Design Code, a framework for evaluating the alignment of large language models (LLMs) with user welfare in social interactions, focusing on risks such as harmful intimacy and dependence. It operationalizes this framework with EUDAIMONIA, a benchmark consisting of user inputs and design-requirement violation checks, and evaluates 22 LLMs, finding persistent social-alignment issues.

Key contributions

Introduction of the Social AI Design Code for evaluating LLMs' alignment with user welfare.
Development of the EUDAIMONIA benchmark for assessing social risks in LLM interactions.
Evaluation of 22 recent LLMs, highlighting persistent social-alignment problems.

Notable insights

The use of a benchmark with violation checks to assess social alignment in LLMs is a novel approach.
Persistent social-alignment issues are identified, which are not easily solvable through test-time reasoning.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2605.30654v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used as conversational partners for companionship, emotional disclosure, and interpersonal advice, but the social dynamics of these interactions can create harms that are not captured by capability-oriented or traditional safety evaluations. We introduce the Social AI Design Code, a framework for evaluating whether LLMs align with user welfare in social interactions, including whether they encourage harmful intimacy, dependence, or prolonged engagement. To evaluate these risks in natural and diverse user-LLM interactions, we operationalize the code with EUDAIMONIA, a benchmark of 969 user inputs and 3,147 design-requirement violation checks built from WildChat through weak-to-strong filtration, multi-model relabeling, and controlled rewriting. Evaluating 22 recent LLMs, we find that even the strongest models, Claude-Opus-4.7 and GPT-5.5, violate 30.7% and 27.2% of checks, respectively. Extended thinking does not reduce violation rates, suggesting that these failures are persistent social-alignment problems rather than deficits solvable through test-time reasoning alone.