Large Language Models for Agentic NetOps and AIOps: Architectures, Evaluation, and Safety

Muhammad Bilal, Jon Crowcroft, Ruizhi Wang, Xiaolong Xu, Schahram Dustdar

Published Jun 17, 2026

Editorial review6.8

Relevance0.478

Freshness0.000

Why It Matters

What makes this one worth your time

Understanding how to effectively integrate language models into operational workflows could enhance the efficiency and reliability of network and IT operations, which is crucial for maintaining robust and secure systems.

The paper examines the integration of large language models in NetOps and AIOps, emphasizing the need for robust evaluation and safety measures.

Summary

The paper explores the use of large language models in network operations (NetOps) and AI for IT operations (AIOps), focusing on architectures, evaluation methods, and safety measures. It organizes existing literature around autonomy, tool scope, evidence traces, and assurance contracts, emphasizing the importance of the surrounding machinery for operational reliability. The paper argues for workflow-centered evaluation and addresses security, privacy, and governance risks.

Key contributions

Organizes literature around autonomy and assurance contracts in NetOps and AIOps.
Proposes workflow-centered evaluation metrics for agentic systems.
Highlights security, privacy, and governance risks in operational control.

Notable insights

Operational reliability is more dependent on the surrounding infrastructure than the language model itself.
Evaluation should focus on workflow-centered metrics rather than static question answering.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2605.12729v2 Announce Type: replace-cross Abstract: Large language models are increasingly being used to support network operations (NetOps) and artificial intelligence for IT operations (AIOps), including incident investigation, root-cause analysis, configuration synthesis, and limited self-healing. In both NetOps and AIOps, this shift is changing how tasks are managed. Agent-based operations work as workflows, from gathering evidence to taking action, following permissions, policies, and checks, and providing rollback options when necessary. This is crucial because operational decisions can have instant impacts. To make the argument concrete, we organise the relevant literature around the hierarchy of autonomy, tool scope, evidence traces, and assurance contracts. These contracts define what an agent may observe, propose, and execute. They also define the checks that must pass before any action is allowed. A consistent pattern appears across work on telemetry query recommendation, diagnosis, root-cause analysis, configuration synthesis, change planning, and limited self-healing. Operational reliability does not come chiefly from the model itself. It depends on the machinery around the model. We also argue that evaluation should go beyond static question answering. Agentic NetOps and AIOps systems require workflow-centred evaluation, including trace quality, bounded tool use, safe proposal generation, replay in sandboxed environments, and canary trials with rollback-aware scoring. Without these measures, a system may appear robust yet remain too fragile. Finally, we examine security, privacy, and governance risks that become acute when agents sit close to operational control surfaces. Taken together, the survey concludes that progress in intelligent NetOps and AIOps will depend on treating autonomy as a constrained operational control problem, whose outputs must be reliable, auditable, and securely deployable.