A Survey of Reasoning and Agentic Systems in Time Series with Large Language Models

Ching Chang, Yidan Shi, Defu Cao, Wei Yang, Jeehyun Hwang, Haixin Wang, Jiacheng Pang, Wei Wang, Yan Liu, Wen-Chih Peng, Tien-Fu Chen

Published Jun 11, 2026

Open on arXiv Read PDF

Editorial review6.8

Relevance0.462

Freshness0.189

Why It Matters

What makes this one worth your time

This survey organizes and clarifies the landscape of time series reasoning, which is crucial for researchers and practitioners aiming to develop robust systems that can understand and act on dynamic data.

A comprehensive survey on reasoning in time series using large language models.

Summary

The paper surveys reasoning and agentic systems in time series analysis, categorizing existing literature by reasoning topology and objectives while providing insights into evaluation practices and future directions.

Key contributions

A detailed categorization of reasoning topologies in time series analysis.
Identification of key objectives and evaluation practices in the field.
Curated datasets and benchmarks that support further study and deployment.

Notable insights

The paper introduces a structured taxonomy for reasoning topologies that can guide future research and development in time series analysis.
It emphasizes the importance of balancing computational cost with the need for grounding and self-correction in reasoning systems.

Possible limitations

Not stated in the abstract.

Abstract

arXiv:2509.11575v3 Announce Type: replace Abstract: Time series reasoning treats time as a first-class axis and incorporates intermediate evidence directly into the answer. This survey defines the problem and organizes the literature by reasoning topology with three families: direct reasoning in one step, linear chain reasoning with explicit intermediates, and branch-structured reasoning that explores, revises, and aggregates. The topology is crossed with the main objectives of the field, including traditional time series analysis, explanation and understanding, causal inference and decision making, and time series generation, while a compact tag set spans these axes and captures decomposition and verification, ensembling, tool use, knowledge access, multimodality, agent loops, and LLM alignment regimes. Methods and systems are reviewed across domains, showing what each topology enables and where it breaks down in faithfulness or robustness, along with curated datasets, benchmarks, and resources that support study and deployment (https://github.com/blacksnail789521/Time-Series-Reasoning-Survey). Evaluation practices that keep evidence visible and temporally aligned are highlighted, and guidance is distilled on matching topology to uncertainty, grounding with observable artifacts, planning for shift and streaming, and treating cost and latency as design budgets. We emphasize that reasoning structures must balance capacity for grounding and self-correction against computational cost and reproducibility, while future progress will likely depend on benchmarks that tie reasoning quality to utility and on closed-loop testbeds that trade off cost and risk under shift-aware, streaming, and long-horizon settings. Taken together, these directions mark a shift from narrow accuracy toward reliability at scale, enabling systems that not only analyze but also understand, explain, and act on dynamic worlds with traceable evidence and credible outcomes.