Case study 03 · Lab — Agents & Bots

A portfolio build — modeled on a real use case and taken end to end. It actually runs.

Built to resolve ~78% of tickets without a human — it answers in seconds, checks itself, and knows when to escalate.

An autonomous support agent built as a LangGraph state machine. It classifies each ticket, retrieves a precise answer from your knowledge base behind a three-layer quality gate, scores its own response — and only sends it, or escalates to a human, when it's sure.

Work with me ↗ View source

Role

Solo — design & build

Domain

Customer support automation

Core stack

LangGraph · LlamaIndex · Gemini

Resolved

~78% projected · quality-checked

The impact

Six numbers that move when a self-checking agent handles the first response instead of a queue.

Response time

~3 sec

was 5–15 min

Automation rate

~78%

projected · 70–80% repeat tickets

Availability

24/7

was business hours

Quality control

100%

was random checks

Languages

Auto-detect

was 1 per agent

Cost per ticket

~$0.003

was $5–8

How the ~$0.003 works out: about 6K input tokens (the question + retrieved context) and a ~300-token reply, run mostly on Gemini Flash at list pricing ($0.30–0.50 in / $2.50–3.00 out per 1M) — well under a cent, retries included.

The brief

The problemThe same answer, typed by hand, for the 50th time today.

Every support team hits the same wall: 70–80% of incoming tickets are repetitive questions already answered in the docs — yet customers still wait 5–15 minutes (or hours) for a human to retype the same reply. Scale the team and payroll doubles; don't, and customers leave. Even when agents do respond, quality is a coin flip — some answers land, others miss entirely, with no quality gate and no way to know which actually help. Meanwhile your most expensive agents are stuck on “What's your return policy?” instead of the cases that genuinely need a human brain.

The solutionAn agent that answers, checks its own work, and escalates honestly.

I built an autonomous AI support agent that processes tickets end-to-end: it classifies by type and urgency, retrieves precise answers from the knowledge base via RAG, drafts a response, runs it through a quality gate — and only then sends it. If the score is too low, it automatically retries with a refined search. If the knowledge base can't answer, it escalates to a human immediately with a full context package — no hallucinations, no guessing, no “I'm sorry, I don't understand.”

For complaints, the agent switches to empathy-first mode, acknowledging the customer's frustration before solving. For vague messages like “help me pls,” it asks a clarifying question and waits for the reply before proceeding. Every path is covered, every edge case handled.

Since 70–80% of incoming tickets are repeat questions, the agent is built to resolve ~78% of them automatically with quality-checked responses, while human agents focus on the 22% that actually require judgment.

How it works

A LangGraph state machine — 8 nodes, conditional routing, retry cycles, and a human-in-the-loop interrupt.

Customer ticket

decisionClassifycategory · priority · language

complaint

Empathytone + preamble

question

unclear

Clarifywait for user reply↺ user reply → re-classify

complaint & question converge ↓

RAG search3-layer quality filter

→no answer→

Escalate

Generate responsegrounded in retrieved context

↺ RAG search

←retry ×2

Quality checkscore 0.0 – 1.0

fail→

Escalate

pass

Send replyquality-checked & delivered

Customer ticket

decisionClassifycategory · priority · language

routes to one of

complaint

Empathytone + preamble

question

→ straight to RAG search

unclear

Clarifywait for user reply↺ user reply → re-classify

complaint & question converge ↓

RAG search3-layer quality filter

no answer

Escalate

Generate responsegrounded in retrieved context

Quality checkscore 0.0 – 1.0

retry ×2

↺ RAG search

fail

Escalate

pass

Send replyquality-checked & delivered

Key features

3-layer RAG quality gate

Retrieval passes three gates: a cosine SimilarityPostprocessor (cutoff 0.5), an LLMRerank scoring each chunk 1–10, and a final relevance check for a direct answer — not just related text. Any failure escalates instead of guessing.

Auto-retry with quality scoring

Every draft gets a 0.0–1.0 quality score. Pass and it sends; fail and the agent retries with a refined search — up to twice — before handing off. Weak answers never reach the customer.

Smart escalation

The agent knows what it doesn't know. When the KB can't answer or quality stays low, it escalates to a human immediately — with category, priority, the draft, and the reason it bailed.

Empathy engine for complaints

Complaints route through an empathy node that sets tone instructions and an acknowledging preamble before the same RAG pipeline runs — so frustrated customers feel heard first, then helped.

Human-in-the-loop

Vague messages like “it doesn't work” trigger a clarifying question. The graph interrupts, waits for the customer's reply, and resumes from classification — no guessing at intent.

Full observability & cost tracking

Every node is traced in Langfuse — per-step timings, token counts, per-ticket cost. The Streamlit dashboard surfaces automation rate, average latency, and spend live.

Under the hood

Tech stack

Python 3.11 LangGraph LlamaIndex Qdrant Gemini 2.5 / 3.0 Flash FastAPI Streamlit Langfuse Pydantic Docker pytest

Design principles

Model splitGemini 2.5 Flash for classify + quality (cheap); 3.0 Flash for generate + empathy (customer-facing quality).
HTTP boundaryUI and API fully decoupled via REST — each deployable independently.
State machine8 nodes with conditional routing, retry cycles, and interrupt/resume for human-in-the-loop.
No synthesisRAG returns raw source nodes; the LLM generates from context, not LlamaIndex summaries.
Fail-safe designVerification errors fail-open (proceed); RAG errors fail-closed (escalate).

support_agent / request flow

→ Streamlit UI (chat + metrics)

↓ REST API (httpx)

FastAPI5 endpoints · /ticket /clarify /trace /metrics /health

↓

LangGraph state machine (8 nodes · 14-field state)

↓

Classify (2.5 Flash) → category · priority · language

↓

RAG: Qdrant top-k 10 → rerank top-5 → relevance check

↓

Generate (3.0 Flash) → Quality check → pass / retry / escalate

↓

Langfuse tracingper-node spans · token counts · cost

Drowning in tickets you've already answered?

This agent is built to clear the repetitive ~78% with quality-checked answers and routes the rest to a human with full context — so your team only touches what needs a brain. I scope it, build it end to end, and wire in the observability to prove it works.

Work with me ↗ View source