Reliable Reasoning with Large Language Models via Preference-Based Maximum Satisfiability

Pedro Orvalho, Marta Kwiatkowska, Guillem Aleny\`a, Felip Many\`a

Published May 29, 2026Featured #8In the daily list May 30, 2026

Open on arXiv Read PDF

Daily score68.5

Editorial review7.5

Relevance0.454

Freshness0.722

Why It Matters

What makes this one worth your time

This research addresses a significant limitation in LLMs' ability to handle complex optimization tasks, making it relevant for applications in robotics and other constraint-heavy domains.

A novel method for enhancing LLMs' optimization capabilities through MaxSAT problem-solving.

Summary

The paper presents a hybrid reasoning approach that leverages large language models to generate Python code for solving optimization tasks framed as preference-based Maximum Satisfiability problems, which are then verified for correctness by an exact MaxSAT solver.

Key contributions

Introduction of a hybrid reasoning framework combining LLMs with MaxSAT solvers.
Demonstration of improved solution feasibility and optimality rates compared to existing baselines.
Evaluation across multiple families of preference-based reasoning tasks.

Notable insights

The use of preference-based MaxSAT allows for a structured approach to optimization that can handle user-defined constraints effectively.
The independent verification of solutions against canonical encodings enhances the reliability of the generated outputs.

Possible limitations

Not stated in the abstract.

Abstract

arXiv:2605.29687v1 Announce Type: new Abstract: Large Language Models (LLMs) excel at understanding natural language but struggle with optimisation tasks involving multiple constraints and user-defined preferences, which commonly arise in domains such as robotics. We propose a hybrid reasoning approach in which LLMs externalise reasoning through code generation. Given a natural language problem description, an LLM generates Python code that encodes user-defined constraints and preferences as a preference-based Maximum Satisfiability (MaxSAT) problem, which is then solved by an exact MaxSAT solver. To ensure correctness, solutions returned by the model-generated code are independently verified for feasibility and optimality against a canonical MaxSAT encoding, allowing for different encodings and multiple optimal solutions. We evaluate our approach using both open-source and closed-access LLMs on three families of preference-based reasoning tasks, and compare it against direct-answer, chain-of-thought, and program-of-thought baselines using the same models. While these baselines rarely produce feasible solutions, the MaxSAT-based pipeline achieves substantially higher acceptance rates, in some cases exceeding 80%. Our results demonstrate that LLM-driven code generation combined with preference-based MaxSAT enables solver-verifiable optimisation with respect to generated encodings, and substantially improves correctness under independently verified reference semantics.