The Naming Problem
Every chatbot is now an "agent." Every RAG pipeline is "agentic." Every product with a submit button is "autonomous AI." The word has been stretched far enough that it's nearly meaningless, which is a problem because actual agents โ systems that can plan, use tools, observe results, and iterate โ are genuinely interesting and genuinely different from what most people are shipping.
Let me try to draw the line clearly.
What an LLM Actually Does
A large language model is a text completion engine. You give it tokens in, it predicts the most likely tokens out. With the right prompting and training, this produces remarkably useful behavior โ answering questions, writing code, summarizing documents. But the core operation is: one input, one output, done.
A chatbot is an LLM with a conversation loop. Each message gets appended to a growing context window and the model generates a response. There's memory within a session, but no action in the world beyond generating text.
What an Agent Actually Does
An agent is a system where an LLM is embedded in a loop with three additional components:
- Tools โ functions the model can call to interact with external systems (search the web, run code, read a file, call an API, send an email)
- Memory โ persistent state beyond a single context window (vector databases, structured storage, previous action history)
- A planning loop โ the model observes the result of each action and decides what to do next, rather than generating one response and stopping
The ReAct pattern (Reasoning + Acting) is the most common implementation. The model thinks through a step, takes an action, observes the result, thinks through the next step, and so on until it reaches a stopping condition.
Thought: I need to find the current BGP status on the edge router.
Action: run_command("show bgp summary")
Observation: Neighbor 203.0.113.1 is in Active state, not Established.
Thought: The neighbor is not establishing. I need to check the TCP connectivity.
Action: run_command("telnet 203.0.113.1 179")
Observation: Connection refused.
Thought: TCP port 179 is blocked. This is likely a firewall issue.
Action: check_firewall_rules("203.0.113.1", "179")
...
This is what makes agents categorically different from chatbots. They don't just answer questions โ they work through problems over multiple steps, adjusting based on what they find.
Why It Took Until 2025 to Work Reliably
Agents have been theoretically possible since tool-use APIs were added to frontier models in 2023. The reason they didn't work reliably in production until recently comes down to a few things:
Tool-call reliability. Early models hallucinated tool calls, called tools with wrong arguments, or failed to stop when a task was complete. Models in late 2024 and 2025 became dramatically more reliable at structured tool use โ they call the right tool, with the right arguments, at the right time.
Context length. Long agent runs accumulate a lot of context โ tool results, previous steps, observations. Early context limits meant agents would "forget" what they'd done. 128K and 200K context windows changed this.
Better frameworks. LangGraph, CrewAI, and Anthropic's own agent documentation gave developers patterns that actually worked โ state machines, interrupt-and-resume, human-in-the-loop checkpoints. The infrastructure matured alongside the models.
What Agents Are Actually Being Used For
The production use cases that proved out in 2025:
- Code agents โ reading a codebase, understanding context, making changes, running tests, iterating. GitHub Copilot Workspace and similar tools productionised this.
- Research agents โ searching multiple sources, synthesizing findings, citing sources, producing structured reports. Replaces hours of manual research on well-defined questions.
- Customer support agents โ with access to CRM, ticketing, and knowledge base tools, handling multi-step resolution paths rather than just routing.
- Network operations agents โ this is where my interest is personally. An agent that can run diagnostic commands, interpret output, escalate appropriately, and document what it found is genuinely useful in an enterprise NOC.
The Gap That Still Exists
Agents still fail in specific ways worth understanding if you're building with them:
They get stuck in loops on novel situations. They can over-call tools (running the same check fifteen times instead of moving on). They sometimes confidently take wrong actions on ambiguous instructions. They require careful prompt engineering to define stopping conditions clearly.
The solution isn't to avoid agents โ it's to build human-in-the-loop checkpoints at decision points that matter, constrain the action space to what's appropriate for the use case, and test failure modes before deploying.
The gap between a well-designed agent and a poorly-designed one is larger than the gap between models. The architecture matters more than the model choice.