Build Your First AI Agent with n8n + Claude (2026 Step-by-Step)
AI Engineering Tutorial

Build Your First AI Agent with n8n + Claude

A 2026-current, no-fluff walkthrough of building a production-ready AI agent. Trigger, tool-calling, evaluation, and the safety rails most tutorials skip.

By Shee Tze Jin 2026-05-08 11 min read
Building an AI agent with n8n and Claude tutorial

~45 min

First working agent (no prior n8n experience)

RM 0–80

Monthly tool cost for low-volume workloads

3 layers

Trigger → Reasoning → Tools — the agent stack you actually need

5 rails

Safety controls before this touches production

Most "build your first AI agent" tutorials end at the demo and skip the parts that matter in production: how the agent fails, what it costs, and how you stop it from doing something embarrassing. This walkthrough is for the engineer or operator who wants the agent to still be running well next quarter.

We will build a customer support triage agent that reads incoming emails, classifies them by intent, drafts a reply, looks up the customer in a CRM, and posts a summary into Slack for human review before anything is sent. By the end you will have a real, defensible workflow — not a screenshot.

The architecture (in plain English)

Agent architecture: Trigger → Reasoning → Tools Email TriggerGmail / IMAP Claude AgentReason + decide CRM Lookup Draft Reply Slack Notify The reasoning layer (Claude) decides which tools to call. The tools are deterministic n8n nodes you already understand.

The mental model: the LLM is the reasoning engine; n8n is the nervous system; the tools are deterministic actions the agent is allowed to take. You design the system by deciding what the agent is allowed to do, then giving it just enough context to make sensible decisions.

Step 1: Set up n8n

You have two practical options in 2026. Use n8n Cloud if you want zero ops overhead — it costs around USD 24/month for the Starter tier and is more than enough for early workflows. Use self-hosted n8n on a small VPS (DigitalOcean, Hetzner, or a Malaysian provider like Exabytes or Shinjiru) if you need data residency or want to control costs at scale.

Either way, create a new workflow. Call it cs-triage-v1. Version your workflow names from day one — when you iterate (you will), the v2 version sits side by side instead of overwriting v1.

Step 2: Get a Claude API key

Go to console.anthropic.com, generate an API key, and store it in n8n under Credentials → Anthropic API. We recommend Claude Sonnet for triage agents in 2026 — it is fast enough for real-time email handling and accurate enough for classification work. Use Claude Opus only when the reasoning is genuinely hard.

Set a hard monthly spend cap on the Anthropic console. Pick a number you would not panic about if it ran away — RM 200 for a starter project is sensible. This is your circuit breaker.

Step 3: Wire up the email trigger

Add a Gmail node (or IMAP if you are using a non-Google mailbox) and configure it as a trigger. Filter aggressively — the agent should only see what you want it to see. A good starter filter: emails to support@yourcompany.com, not in spam, less than 30 KB. The size cap is a quiet but important rail; it prevents long-thread emails from blowing up your context window and your bill.

Step 4: Add the Claude AI Agent node

n8n's native AI Agent node (in 2026 this is under AIAI Agent) accepts three things: a system prompt, a user message, and a list of tools. The system prompt is where you set the agent's job description, constraints, and output format. The user message is the email body. The tools are other n8n nodes wrapped as callable functions.

A starter system prompt that we have iterated through dozens of deployments:

You are a customer support triage agent for [COMPANY]. For each incoming email, you will:
1. Classify the email into exactly one of: billing, technical, refund, general inquiry, spam.
2. Look up the sender in the CRM using the lookupCustomer tool.
3. Draft a polite, professional reply that addresses the question. Do not promise anything specific about pricing or refunds — escalate those to a human.
4. Output a JSON object with fields: classification, customer_tier, draft_reply, requires_human_review (boolean), reasoning.
Refuse to answer anything outside customer support. Never include personal data of other customers in the reply.

Notice what this prompt does and does not do. It is specific about the output format. It is explicit about scope. It tells the agent when to escalate. It does not try to be clever. Boring prompts win.

Step 5: Wrap your tools

The agent should not have direct database access. Wrap each capability as a tool with a narrow contract:

  • lookupCustomer(email): calls your CRM API, returns name, tier, and recent ticket count. Nothing else.
  • logTriage(classification, requires_review, reasoning): writes a row to a Google Sheet or database for audit and evaluation.

Keep the tool surface small. Every additional tool is a place the agent can fail in a way you did not expect.

Step 6: Add the human-in-the-loop step

Route the agent's draft to Slack with two buttons: Send and Edit and Send. Do not let the agent send anything without a human pressing a button — at least, not for the first 90 days. This single rail is what separates "demo" from "I would let this run my mailbox."

Use n8n's Wait for Webhook pattern to pause the workflow until a Slack action returns. When the human presses Send, the workflow continues into a Gmail send node. When they press Edit, the draft opens in a small internal form, and the edited version goes out instead. Both branches log the outcome to your evaluation sheet.

Step 7: Build evaluation from day one

Before the agent goes live, set up an evaluation routine. Twice a week, sample 20 random outputs from the log sheet and grade them on three dimensions: classification accuracy, draft quality, and escalation appropriateness. Track the score weekly. If the trend goes the wrong way, you know before the customer does.

This is the single biggest difference between a real production agent and a demo. The demo runs once and looks great. The production agent has a feedback loop that catches drift before it becomes a complaint.

The five safety rails

Before this touches production:

  • Hard spend cap on the LLM provider, set below your panic threshold.
  • Allowlist of tool actions — the agent can only call functions you have whitelisted, never arbitrary code.
  • Human-in-the-loop on outbound actions — for at least the first 90 days, no message goes out without human approval.
  • Audit logging of every input, decision, and output. PDPA-aware: store only what you need, with a retention policy.
  • Kill switch — a single config flag that disables the workflow without deleting it. You will need this the first time something goes wrong.

Where to go next

Once this triage agent is stable, the natural next steps are: add a second classification dimension (urgency); route urgent items to a different Slack channel with a faster SLA; integrate with your ticketing system so the agent's classification populates ticket metadata. Each step is incremental — you keep the rails, change one thing, evaluate.

If you want the in-depth, instructor-led version of this — building 6 production-grade agents over two days, with HRDC funding — our AI Agentic Automation programme is built around exactly this stack: n8n + Claude, with the safety rails in place from day one.

Frequently Asked Questions

No. n8n's AI Agent node and Claude's tool-calling API are designed to be configurable through forms and prompts rather than code. You will need to be comfortable with concepts like API keys, JSON, and HTTP webhooks, but you do not need to write Python or Javascript for a basic triage agent. For more sophisticated tools — custom data transformations, complex business logic — light scripting in n8n's Function nodes helps.

For a low-volume workflow processing a few hundred items a week, expect roughly USD 24/month for n8n Cloud plus RM 30–100 in Claude API usage. A small self-hosted n8n on a basic VPS plus Claude API can come in under RM 80/month total. Costs scale with volume — high-volume agents handling thousands of items a day typically spend RM 500–3,000/month, still far below the cost of the staff time they replace.

Three reasons we see most often in 2026 deployments: Claude's tool-calling reliability is consistently higher in production, particularly for multi-step workflows; the long context window makes it easier to give the agent enough situation-awareness without engineering tricks; and the built-in safety behaviours reduce the rate of awkward outputs. That said, the n8n AI Agent node works with multiple providers — choose based on your specific evaluation results, not the brand.

Five rails, in order of importance: keep a hard spend cap on the LLM provider; restrict the agent to a small allowlist of tools; require human approval on any outbound action for at least 90 days; log everything for audit; and have a kill switch that disables the workflow instantly. Most of the AI agent failures we have seen in Malaysian deployments would have been caught by one of these five rails.

Yes. AITraining2U's AI Agentic Automation programme — the in-depth, hands-on version of this tutorial — is HRDC SBL-KHAS claimable. Eligible Malaysian employers can fund the training at near-zero net cost. The programme covers the full stack including evaluation, safety rails, and integration patterns we use with corporate clients.

Want to apply this in your organisation?

AITraining2U runs HRDC-claimable corporate AI training for Malaysian organisations — from leadership awareness to hands-on builder workshops. Talk to us about a programme tailored to your team.