78% of tickets resolved without a human — it answers in seconds, checks itself, and knows when to escalate.
An autonomous support agent built as a LangGraph state machine. It classifies each ticket, retrieves a precise answer from your knowledge base behind a three-layer quality gate, scores its own response — and only sends it, or escalates to a human, when it's sure.
The impact
Six numbers that move when a self-checking agent handles the first response instead of a queue.
The brief
The problemThe same answer, typed by hand, for the 50th time today.
Every support team hits the same wall: 70–80% of incoming tickets are repetitive questions already answered in the docs — yet customers still wait 5–15 minutes (or hours) for a human to retype the same reply. Scale the team and payroll doubles; don't, and customers leave. Even when agents do respond, quality is a coin flip — some answers land, others miss entirely, with no quality gate and no way to know which actually help. Meanwhile your most expensive agents are stuck on “What's your return policy?” instead of the cases that genuinely need a human brain.
The solutionAn agent that answers, checks its own work, and escalates honestly.
I built an autonomous AI support agent that processes tickets end-to-end: it classifies by type and urgency, retrieves precise answers from the knowledge base via RAG, drafts a response, runs it through a quality gate — and only then sends it. If the score is too low, it automatically retries with a refined search. If the knowledge base can't answer, it escalates to a human immediately with a full context package — no hallucinations, no guessing, no “I'm sorry, I don't understand.”
For complaints, the agent switches to empathy-first mode, acknowledging the customer's frustration before solving. For vague messages like “help me pls,” it asks a clarifying question and waits for the reply before proceeding. Every path is covered, every edge case handled.
The result: ~78% of tickets resolved automatically with quality-checked responses, while human agents focus exclusively on the 22% that actually require judgment.
How it works
A LangGraph state machine — 8 nodes, conditional routing, retry cycles, and a human-in-the-loop interrupt.
Key features
3-layer RAG quality gate
Retrieval passes three gates: a cosine SimilarityPostprocessor (cutoff 0.5), an LLMRerank scoring each chunk 1–10, and a final relevance check for a direct answer — not just related text. Any failure escalates instead of guessing.
Auto-retry with quality scoring
Every draft gets a 0.0–1.0 quality score. Pass and it sends; fail and the agent retries with a refined search — up to twice — before handing off. Weak answers never reach the customer.
Smart escalation
The agent knows what it doesn't know. When the KB can't answer or quality stays low, it escalates to a human immediately — with category, priority, the draft, and the reason it bailed.
Empathy engine for complaints
Complaints route through an empathy node that sets tone instructions and an acknowledging preamble before the same RAG pipeline runs — so frustrated customers feel heard first, then helped.
Human-in-the-loop
Vague messages like “it doesn't work” trigger a clarifying question. The graph interrupts, waits for the customer's reply, and resumes from classification — no guessing at intent.
Full observability & cost tracking
Every node is traced in Langfuse — per-step timings, token counts, per-ticket cost. The Streamlit dashboard surfaces automation rate, average latency, and spend live.
Under the hood
- Model splitGemini 2.5 Flash for classify + quality (cheap); 3.0 Flash for generate + empathy (customer-facing quality).
- HTTP boundaryUI and API fully decoupled via REST — each deployable independently.
- State machine8 nodes with conditional routing, retry cycles, and interrupt/resume for human-in-the-loop.
- No synthesisRAG returns raw source nodes; the LLM generates from context, not LlamaIndex summaries.
- Fail-safe designVerification errors fail-open (proceed); RAG errors fail-closed (escalate).
Drowning in tickets you've already answered?
This agent clears the repetitive 78% with quality-checked answers and routes the rest to a human with full context — so your team only touches what needs a brain. I scope it, build it end to end, and wire in the observability to prove it works.