MolLingo: Molecule-Native Representations for LLM-Powered Scientific Agents
Thao Nguyen, Heng Ji
Why It Matters
What makes this one worth your time
This work is relevant for AI researchers and engineers interested in leveraging LLMs for complex scientific tasks, particularly in molecular design and drug discovery.
MolLingo enhances molecular design by integrating multi-agent coordination and chemically meaningful representations.
Summary
The paper introduces MolLingo, a multi-agent system designed to automate molecular design by emulating a chemist's reasoning process. It coordinates multiple agents through a shared memory module and uses a novel BRICS-based Fragment Enumeration method to enable effective molecular reasoning. The system is evaluated on therapeutic design tasks, showing significant improvements over existing LLMs and specialized baselines.
Key contributions
- Introduction of MolLingo, a multi-agent system for molecular design.
- Development of BRICS-based Fragment Enumeration for molecular fragmentation.
- Demonstrated improvements in molecular design tasks over existing LLMs and baselines.
Notable insights
- The use of BRICS-based Fragment Enumeration for synthesis-aware molecular fragmentation is a clever approach to bridge molecular structure with LLM semantic space.
- Coordinating multiple agents with a shared memory module allows for iterative, evidence-driven reasoning in molecular design.
Possible limitations
- Not stated in the abstract
Abstract
arXiv:2605.27853v1 Announce Type: new Abstract: We present MolLingo, a multi-agent system that emulates the reasoning process of a chemist to automate molecular design. Existing LLM-based approaches either operate as standalone generative models without access to external tools or lack the multi-agent coordination and shared memory needed for iterative, evidence-driven reasoning across the molecular design pipeline. MolLingo addresses this by coordinating a Literature Agent, a Chemist Agent, and an Orchestrator through a shared memory module, with each agent equipped with domain-specific tools. To enable effective molecular reasoning, we introduce BRICS-based Fragment Enumeration (BFE), a synthesis-aware molecular fragmentation method that decomposes molecules into chemically meaningful building blocks represented as block-based SMILES paired with common chemical names. This representation bridges molecular structure and LLM semantic space, enabling block-level reasoning and editing that is difficult with raw SMILES alone. As a case study in early-stage therapeutic design, MolLingo further grounds the Chemist Agent's reasoning in binding site geometry and residue-level protein context derived from molecular docking to optimize molecules for stronger target binding. Across four benchmarks, MolLingo consistently outperforms frontier LLMs and specialized baselines, including a fourfold docking score improvement over GPT-5.4 despite using the same underlying model, consistent drug property optimization gains across multiple LLM backbones, and state-of-the-art results on TOMG-Bench, surpassing both frontier LLMs and the RL-based optimization method RePO. Our results suggest that LLMs are already capable molecular design assistants when guided through chemically meaningful representations and biologically grounded structural context. Code is available at: https://anonymous.4open.science/status/MolLingo-7450.