Internal Helpdesk Agent
A RAG powered internal helpdesk that answers policy questions, routes tickets to the right team, and files bugs to Linear. Every response scored by an LLM judge before users see it.
Product rationale for account-based workflows
The hardest part of enterprise AI isn't the LLM — it's making output trustworthy enough for employees to rely on. RAG solves the hallucination problem. LLM-as-judge solves the quality visibility problem. Tool use solves the 'now what?' problem. This agent is a working proof-of-concept for all three.
What I built
A production-grade internal helpdesk agent that answers employee policy questions using retrieval-augmented generation, routes support tickets to the right Slack channel, and logs bug reports directly to Linear. Built with a full observability stack so every response is scored by an LLM judge before the user sees it.
Why it matters
Most AI chatbots hallucinate because they rely on model weights alone. This agent is grounded — every response is based on retrieved policy documents, not model memory. The LLM-as-judge layer catches quality issues automatically, before they reach users. It's a working reference for how to build AI agents you can actually trust in a business context.
How it works
- 124 company policy sections indexed into Pinecone as vector embeddings
- 2Search-first: retrieves the 4 most relevant sections before generating any response
- 3LLM-as-judge scores each response across relevance (94%), accuracy (93%), completeness (88%), and citation quality (90%)
- 4Tool calling routes tickets to Slack and logs bug/feature requests to Linear
- 5Full observability via LangSmith and Arize — every call traced and scored in real time
What it demonstrates
- ✓RAG pipeline: retrieves the 4 most relevant policy sections before every response
- ✓LLM-as-judge scoring across relevance, accuracy, completeness, and citation quality
- ✓Tool use: Linear for bug logging, Slack for ticket routing
Stack and tools
What I'd improve next
- →User feedback loop to improve retrieval ranking over time
- →Multi-tenant support with per-company knowledge bases
- →Analytics dashboard for common query categories and escalation rates