Back to today's list

Large Language Models for Agentic NetOps and AIOps: Architectures, Evaluation, and Safety

Muhammad Bilal, Jon Crowcroft, Ruizhi Wang, Xiaolong Xu, Schahram Dustdar

Published May 14, 2026
Editorial review6.8
Relevance0.478
Freshness0.000

Why It Matters

What makes this one worth your time

Understanding how large language models can be safely and effectively integrated into NetOps and AIOps is crucial for improving operational efficiency and reliability in IT systems.

The paper examines the role of large language models in enhancing NetOps and AIOps with a focus on safety and evaluation.

Summary

The paper explores the use of large language models in network operations (NetOps) and AI for IT operations (AIOps), focusing on architectures, evaluation methods, and safety concerns. It organizes existing literature around autonomy, tool scope, evidence traces, and assurance contracts, emphasizing the importance of the surrounding machinery for operational reliability. The paper also discusses the need for workflow-centered evaluation and addresses security, privacy, and governance risks.

Key contributions

  • Organizes literature around autonomy and assurance contracts for NetOps and AIOps.
  • Proposes workflow-centered evaluation methods for agentic systems.
  • Highlights security, privacy, and governance risks in operational control.

Notable insights

  • Operational reliability depends more on the surrounding machinery than on the language model itself.
  • Evaluation should focus on workflow-centered metrics rather than static question answering.

Possible limitations

  • Not stated in the abstract

Abstract

arXiv:2605.12729v1 Announce Type: cross Abstract: Large language models are increasingly being used to support network operations (NetOps) and artificial intelligence for IT operations (AIOps), including incident investigation, root-cause analysis, configuration synthesis, and limited self-healing. In both NetOps and AIOps, this shift is changing how tasks are managed. Agent-based operations work as workflows, from gathering evidence to taking action, following permissions, policies, and checks, and providing rollback options when necessary. This is crucial because operational decisions can have instant impacts. To make the argument concrete, we organise the relevant literature around the hierarchy of autonomy, tool scope, evidence traces, and assurance contracts. These contracts define what an agent may observe, propose, and execute. They also define the checks that must pass before any action is allowed. A consistent pattern appears across work on telemetry query recommendation, diagnosis, root-cause analysis, configuration synthesis, change planning, and limited self-healing. Operational reliability does not come chiefly from the model itself. It depends on the machinery around the model. We also argue that evaluation should go beyond static question answering. Agentic NetOps and AIOps systems require workflow-centred evaluation, including trace quality, bounded tool use, safe proposal generation, replay in sandboxed environments, and canary trials with rollback-aware scoring. Without these measures, a system may appear robust yet remain too fragile. Finally, we examine security, privacy, and governance risks that become acute when agents sit close to operational control surfaces. Taken together, the survey concludes that progress in intelligent NetOps and AIOps will depend on treating autonomy as a constrained operational control problem, whose outputs must be reliable, auditable, and securely deployable.