Claude Agent SDK is Anthropic’s official framework for building long-running AI agents. It differs from the plain Claude API by providing mechanisms needed in real business processes out of the box: session persistence between calls, validated tool use, sub-agents, prompt caching, and streaming responses. This article shows when to reach for the SDK instead of writing your own API integration, which patterns we deploy with SMB clients, and how this stack maps onto our AI Assistant V0.1 package.
What is the Claude Agent SDK
Claude Agent SDK is a library provided by Anthropic in two language variants: Python and TypeScript. It is documented officially in the Anthropic documentation and acts as a layer on top of the raw Claude API. That layer means the SDK makes architectural decisions for us that we would otherwise need to design from scratch: when to start a new session, how to cache long contexts, how to handle tool errors.
In practice the SDK provides five key mechanisms. The first is prompt caching, which reduces the cost of repeated queries by roughly 90 percent. The second is tool use: calling application functions inside the model’s conversation with the user. The third is sub-agents: delegating tasks to separate Claude instances with their own context window. The fourth is session state, which keeps conversation history between calls without manually stitching context. The fifth is response streaming, essential for conversational interfaces such as Telegram or WhatsApp.
The SDK is not required to work with Claude. The model can be called over plain REST API just as well. The difference is that for multi-step agents (which most SMB businesses build) the SDK saves weeks of code and errors. For one-off calls like “summarise this document” the SDK is overkill.
When to use Claude Agent SDK vs the plain Claude API
The decision depends on four variables: the number of turns in the conversation, the frequency of tool calls, the weight of token cost, and the need to orchestrate several agents in parallel. The more of these apply to our case, the stronger the argument for the SDK.
We choose Claude Agent SDK when an agent runs multi-step conversations with persistent state (for example an operational assistant responding to related client queries throughout the week). We also pick it when we call tools frequently (Gmail, Calendar, CRM, an internal database) and need consistent error handling. The third scenario is projects where token cost matters: prompt caching cuts the bill by around 90 percent, which at 10,000 queries per month translates into a difference between 1,200 and 120 PLN per month on input tokens. The fourth is the need to run parallel sub-agents for specialised tasks such as research, validation, or generation.
We pick the plain Claude API when we want a one-off call (“translate this text”, “classify this message”), when we have our own agent orchestration (for example written in n8n or in a Next.js application layer), or when we build simple prototypes where a cost below 50 PLN per month does not justify caching optimisation. More on AI cost in SMBs is in our guide to AI implementation in business.
Four SMB patterns where the SDK pays off
From our practice deploying AI assistants in 5 to 30 person companies, four repeatable patterns have emerged in which Claude Agent SDK delivers the most value. Each one uses at least two key SDK mechanisms (prompt caching plus tool use, or sub-agents plus session state).
Pattern one: a Telegram assistant per client with persona. This is the stack of the AI Assistant V0.1 package. A bot for a single client with a dedicated persona (matter-of-fact style, company history embedded in the system prompt). The SDK handles conversation state between messages, prompt caching keeps the 8,000 tokens of persona in cache for 5 minutes, and tools connect to Gmail (read inbox, draft replies), Calendar (check available slots, create meetings), and CRM (fetch client data, update notes).
Pattern two: a workflow orchestrator with multiple tools. A single business command (for example “book a meeting with Jan, send him yesterday’s offer, log a note in CRM”) requires 4 to 8 tool calls in one cycle. The SDK manages that chain, handles errors per tool, and writes an audit trail for every action.
Pattern three: a document processing pipeline. PDF invoices, contracts, HR documents. The SDK with sub-agents lets us split the task: one sub-agent extracts fields, the second validates against the tax ID database, the third classifies cost type. Each one runs in isolated context, with no cross-contamination.
Pattern four: a customer support agent with escalation. Standard questions (order status, opening hours, return policy) are handled by the agent using RAG against the client’s knowledge base. Questions outside the rules are escalated to a human with full conversation context. An evaluation sub-agent decides on escalation in parallel with the main conversation. We described detailed customer support automation patterns in our article on B2B customer service automation.
Prompt caching: why it is critical for cost
Anthropic introduced prompt caching in 2024 and it is one of the most important economic features of Claude for business use. The mechanism is simple: we declare fragments of the prompt as cacheable (system prompt, tool definitions, long document context), and Anthropic keeps them in memory for 5 minutes. Every subsequent query in that time window costs 10 percent of the standard input token rate.
Concretely: Claude Sonnet costs 3 USD per million input tokens. Without cache, an agent with an 8,000 token system prompt running 1,000 queries per day uses 8 million tokens, that is 24 USD per day or 720 USD per month. With cache after the first call, the next 999 queries cost 10 percent of the rate, that is 2.40 USD per day or 72 USD per month. The difference is 648 USD per month on a single agent.
A real reference point from our practice: our internal agent orchestration system has an 8,000 token system prompt. After the first call inside the 5 minute window, subsequent queries cost 5 percent of the full rate. At 50 to 200 queries per day the difference is 90 to 95 PLN per month. Scaling to 5 agents per client times 10 clients is the difference between 4,500 and 450 PLN per month just on input tokens.
The SDK does caching by default. Without the SDK we would have to set cache_control headers manually on every API call and make sure the fragments are exactly identical (any tab change invalidates the cache). This is one of the strongest arguments for the SDK in long-running agents.
Tool use in the SDK: the calling pattern
Tool use is the mechanism that lets Claude call functions in our application mid-conversation. The schema of each tool is described in JSON Schema: name, parameters, types, description of behaviour. In response Claude can return a structure “call function X with parameters Y”, which our application interprets, executes, and returns the result back to the model.
Three aspects matter in business deployment. First: synchronous versus asynchronous tools. Fast operations (database read, file fetch) can be sync. Slow operations (sending an email, generating a PDF, calling a high-latency external API) should be async so they do not block the user conversation.
Second: error handling. Every tool can fail (Gmail API rate limit, missing permissions, invalid input). The SDK lets us define retry strategies per tool and communicates errors to Claude in a way that allows the model to decide what to do (try differently, escalate to a human, ask the user to fill in missing data).
Third: idempotency of tools with side effects. Sending an email, writing to CRM, charging a payment. Every such tool should have an idempotency identifier so a retry does not duplicate the action. This is more a design question than an SDK feature, but the SDK makes consistent use of the dispatcher pattern with a central audit trail easier.
Sub-agents: when to delegate to a separate agent
Sub-agents are the second mechanism, alongside prompt caching, that we most often pick from the entire SDK stack. They let us spawn a second Claude instance with its own context window, its own system prompt, and its own toolset. The parent agent can launch a sub-agent through the Task tool, receive its final report, and continue working.
Four situations in which a sub-agent is worth it: research at scale (a sub-agent reads 20 pages of documentation, the parent gets a 200 word report), parallelisation (5 sub-agents analyse 5 competitor offers in parallel), specialisation (a sub-agent with a lawyer system prompt reviews a contract, a sub-agent with an accountant system prompt reviews an invoice), and context isolation (a sub-agent debugs an error without polluting the main client conversation context).
Sub-agent is a pattern we describe in detail in our article on MCP, where we show how to connect sub-agents to external tools through the Model Context Protocol.
AI Assistant V0.1: our package built on Claude Agent SDK
None of the above is theory. From June 2026 we are launching the AI Assistant V0.1 package as a Hanse Studio sub-product. A Telegram bot per client with a dedicated persona, Gmail/Calendar/one CRM integrations, and 4 to 8 dedicated business commands. Setup 3,000 PLN one-time (kickoff, voice extraction from 30 to 50 client emails, configuration, deployment, 2 week onboarding). Retainer 800 PLN per month (hosting, monitoring, monthly improvements, up to 2 hours of small tweaks).
The technology stack under the hood: Claude Agent SDK as the orchestrator, n8n as the integration layer for tools that change slowly (Brevo, Mailchimp, simple webhooks), SQLite for state per user, Telegram Bot API as the input channel. Hosting on Mac Mini or Hetzner VPS. We compared this stack in our article on automation platforms.
If a concrete quote for your case is what you need, we invite you to a conversation. The full list of AI packages is in our offer.
FAQ
How is Claude Agent SDK different from LangChain or LlamaIndex?
The SDK is native to Claude and gives the best performance plus the simplest access to features like prompt caching and sub-agents. LangChain is a multi-LLM framework (Claude, GPT, Gemini, local models) at the cost of higher complexity and slight performance loss. LlamaIndex is a framework for RAG (retrieval augmented generation) on your own documents. For SMBs with a budget up to 5,000 PLN per month for AI we recommend the SDK plus a simple RAG layer, without LangChain.
How much does using Claude Agent SDK cost?
The SDK itself is free and open source. We pay only for the Claude API tokens consumed: Sonnet costs 3 USD per million input tokens and 15 USD per million output tokens. For a typical SMB assistant (1,000 to 3,000 queries per month with prompt caching) the real cost ranges between 50 and 250 PLN per month for tokens. Plus hosting, which on a Hetzner VPS starts from 30 PLN per month.
Is the SDK suitable for real-time applications?
Yes. The SDK supports response streaming: Claude sends tokens one by one rather than waiting for the full response. For conversational interfaces (Telegram, on-site chat, WhatsApp) this is the standard. The first token typically appears within 200 to 500 ms, the full response to a simple question within 2 to 5 seconds.
Can I self-host the SDK or does it only work in the cloud?
The SDK is a Python or TypeScript library. We host it anywhere we have a runtime: VPS, a dedicated server at the client’s office, AWS Lambda, Vercel, Cloudflare Workers, Mac Mini. Claude itself runs on the Anthropic side, but the agent logic with the SDK is 100 percent under our control. This matters for DACH clients who require data to stay within European infrastructure (Anthropic operates data centres in the EU).
