Case study 03 · Lab — Agents & Bots

78% of tickets resolved without a human — it answers in seconds, checks itself, and knows when to escalate.

An autonomous support agent built as a LangGraph state machine. It classifies each ticket, retrieves a precise answer from your knowledge base behind a three-layer quality gate, scores its own response — and only sends it, or escalates to a human, when it's sure.

Role
Solo — design & build
Domain
Customer support automation
Core stack
LangGraph · LlamaIndex · Gemini
Resolved
~78% auto · quality-checked
01

The impact

Six numbers that move when a self-checking agent handles the first response instead of a queue.

Response time
~3 sec
was 5–15 min
Automation rate
~78%
was 0% · all manual
Availability
24/7
was business hours
Quality control
100%
was random checks
Languages
Auto-detect
was 1 per agent
Cost per ticket
~$0.003
was $5–8
02

The brief

The problemThe same answer, typed by hand, for the 50th time today.

Every support team hits the same wall: 70–80% of incoming tickets are repetitive questions already answered in the docs — yet customers still wait 5–15 minutes (or hours) for a human to retype the same reply. Scale the team and payroll doubles; don't, and customers leave. Even when agents do respond, quality is a coin flip — some answers land, others miss entirely, with no quality gate and no way to know which actually help. Meanwhile your most expensive agents are stuck on “What's your return policy?” instead of the cases that genuinely need a human brain.

The solutionAn agent that answers, checks its own work, and escalates honestly.

I built an autonomous AI support agent that processes tickets end-to-end: it classifies by type and urgency, retrieves precise answers from the knowledge base via RAG, drafts a response, runs it through a quality gate — and only then sends it. If the score is too low, it automatically retries with a refined search. If the knowledge base can't answer, it escalates to a human immediately with a full context package — no hallucinations, no guessing, no “I'm sorry, I don't understand.”

For complaints, the agent switches to empathy-first mode, acknowledging the customer's frustration before solving. For vague messages like “help me pls,” it asks a clarifying question and waits for the reply before proceeding. Every path is covered, every edge case handled.

The result: ~78% of tickets resolved automatically with quality-checked responses, while human agents focus exclusively on the 22% that actually require judgment.


03

How it works

A LangGraph state machine — 8 nodes, conditional routing, retry cycles, and a human-in-the-loop interrupt.

Customer ticket
decisionClassifycategory · priority · language
complaint
Empathytone + preamble
question
unclear
Clarifywait for user reply↺ user reply → re-classify
complaint & question converge ↓
RAG search3-layer quality filter
no answer
Escalate
Generate responsegrounded in retrieved context
↺ RAG search
retry ×2
Quality checkscore 0.0 – 1.0
fail
Escalate
pass
Send replyquality-checked & delivered
Customer ticket
decisionClassifycategory · priority · language
routes to one of
complaint
Empathytone + preamble
question
→ straight to RAG search
unclear
Clarifywait for user reply↺ user reply → re-classify
complaint & question converge ↓
RAG search3-layer quality filter
no answer
Escalate
Generate responsegrounded in retrieved context
Quality checkscore 0.0 – 1.0
retry ×2
↺ RAG search
fail
Escalate
pass
Send replyquality-checked & delivered

04

Key features

3-layer RAG quality gate

Retrieval passes three gates: a cosine SimilarityPostprocessor (cutoff 0.5), an LLMRerank scoring each chunk 1–10, and a final relevance check for a direct answer — not just related text. Any failure escalates instead of guessing.

Auto-retry with quality scoring

Every draft gets a 0.0–1.0 quality score. Pass and it sends; fail and the agent retries with a refined search — up to twice — before handing off. Weak answers never reach the customer.

Smart escalation

The agent knows what it doesn't know. When the KB can't answer or quality stays low, it escalates to a human immediately — with category, priority, the draft, and the reason it bailed.

Empathy engine for complaints

Complaints route through an empathy node that sets tone instructions and an acknowledging preamble before the same RAG pipeline runs — so frustrated customers feel heard first, then helped.

Human-in-the-loop

Vague messages like “it doesn't work” trigger a clarifying question. The graph interrupts, waits for the customer's reply, and resumes from classification — no guessing at intent.

Full observability & cost tracking

Every node is traced in Langfuse — per-step timings, token counts, per-ticket cost. The Streamlit dashboard surfaces automation rate, average latency, and spend live.


05

Under the hood

Tech stack
Python 3.11 LangGraph LlamaIndex Qdrant Gemini 2.5 / 3.0 Flash FastAPI Streamlit Langfuse Pydantic Docker pytest
Design principles
  • Model splitGemini 2.5 Flash for classify + quality (cheap); 3.0 Flash for generate + empathy (customer-facing quality).
  • HTTP boundaryUI and API fully decoupled via REST — each deployable independently.
  • State machine8 nodes with conditional routing, retry cycles, and interrupt/resume for human-in-the-loop.
  • No synthesisRAG returns raw source nodes; the LLM generates from context, not LlamaIndex summaries.
  • Fail-safe designVerification errors fail-open (proceed); RAG errors fail-closed (escalate).
support_agent / request flow
Streamlit UI (chat + metrics)
↓ REST API (httpx)
FastAPI5 endpoints · /ticket /clarify /trace /metrics /health
LangGraph state machine (8 nodes · 14-field state)
Classify (2.5 Flash) → category · priority · language
RAG: Qdrant top-k 10 → rerank top-5 → relevance check
Generate (3.0 Flash) → Quality check → pass / retry / escalate
Langfuse tracingper-node spans · token counts · cost

Drowning in tickets you've already answered?

This agent clears the repetitive 78% with quality-checked answers and routes the rest to a human with full context — so your team only touches what needs a brain. I scope it, build it end to end, and wire in the observability to prove it works.