Are LLMs Bad at Moral Reasoning?
Menghang Zhu, Seth Lazar
Why It Matters
What makes this one worth your time
Understanding the moral reasoning capabilities of LLMs is crucial for their safe deployment in dynamic environments where ethical considerations are important.
LLMs may be better at moral reasoning when tasked with generating evaluation rubrics rather than providing direct responses.
Summary
The paper argues that the MoReBench dataset can be used to show that large language models (LLMs) are more capable of moral reasoning than previously thought by having them generate scoring rubrics for moral analysis, which align better with human rubrics.
Key contributions
- Proposes a novel approach to evaluate LLMs' moral reasoning by having them generate scoring rubrics.
- Suggests that LLMs' rubric generation aligns closely with human rubrics, offering a more optimistic view of their moral reasoning capabilities.
Notable insights
- LLMs might perform better in moral reasoning tasks when generating evaluation rubrics rather than providing open-ended responses.
- Differences between LLM-generated rubrics and human rubrics may highlight the complexity of moral problems rather than deficiencies in LLMs.
Possible limitations
- Not stated in the abstract
Abstract
arXiv:2606.11635v1 Announce Type: cross Abstract: For highly capable AI systems to operate safely in dynamic, open-ended environments, they must be able to identify, understand, and respond to moral reasons for action, and constrain their behaviour accordingly. A growing body of research aims to evaluate this capacity -- moral competence -- in today's most capable AI systems, recently reaching broadly pessimistic conclusions. One of the most ambitious such papers collects gold-standard human-authored rubrics for evaluating moral reasoning in 1,000 cases, and benchmarks frontier AI models against those rubrics, with underwhelming results. In this paper, we argue that the MoReBench dataset can be redeployed to give a much more optimistic picture of LLMs' moral reasoning (an essential part of moral competence). We show that if, instead of scoring LLMs' responses to these cases against these rubrics, we instead give the LLMs the same task given to humans -- to generate scoring rubrics for the moral analysis of particular cases -- the rubrics they generate are both better calibrated to the human rubrics than their open-ended responses, and, where they differ, plausibly reflect nothing more than the vast dimensionality of most moral problems, as well as highlighting some human departures from the "rubric for creating rubrics". Taking these points into consideration, the MoReBench dataset suggests that LLMs are significantly more capable at moral reasoning than was previously believed.