Modeling Community Attitude through Reaction Tone: A Human-AI Collaborative Framework for Evaluating LLM Alignment with Linguistic Behaviors in Online Communities

Nuan Wen, Xuezhe Ma

Published May 28, 2026

Editorial review7.2

Relevance0.482

Freshness0.000

Why It Matters

What makes this one worth your time

Understanding the limitations of LLMs in simulating community attitudes is crucial for improving their application in social analysis and ensuring more accurate representations of human behavior.

CARE benchmarks LLM discourse against real community reactions to assess alignment with sociolinguistic dynamics.

Summary

The paper introduces CARE, a framework for evaluating how well large language models simulate community reactions to real-world events by analyzing illocutionary tones and attitudes, revealing a 'realism gap' in current alignment strategies.

Key contributions

Introduction of the CARE framework for evaluating LLM discourse.
Characterization of a spectrum of illocutionary tones in community reactions.
Identification of a 'realism gap' in LLM simulations of community responses.

Notable insights

The framework emphasizes a reaction-centered approach, which may provide more nuanced evaluations of LLM performance compared to traditional methods.
The identification of divergent behavioral signatures among models suggests that alignment strategies need to be re-evaluated for better sociolinguistic fidelity.

Possible limitations

Not stated in the abstract.

Abstract

arXiv:2605.27388v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly utilized as proxies for computational social analysis; yet, their ability to faithfully represent the "thick descriptions" (Geertz, 1973) of human communities remains a critical challenge. Current evaluations often reduce social identity to static labels, sidelining how real-world groups navigate social shifts. To bridge this gap, we introduce CARE (Community-Aware Reaction Evaluation), a reaction-centered framework that benchmarks LLM-simulated discourse against the authentic, event-contingent responses of distinct communities to real-world news. By characterizing a fine-grained spectrum of illocutionary tones and the underlying attitudes they manifest--validated through human-AI collaboration--our diagnosis reveals a persistent "realism gap": steering LLMs with explicit community prompts fails to inherently improve simulation fidelity. Analysis further identifies divergent behavioral signatures among frontier models, suggesting that current alignment strategies remain insufficient for capturing the sociolinguistic dynamics of online groups.