
Agentic RAG · LangGraph · FastAPI · Next.js · ChromaDB
Upload an EPUB or PDF and chat with a "sage" grounded in the book. Each turn runs through a bounded agentic LangGraph workflow: intent classification → human-in-the-loop clarification → question decomposition with parallel fan-out → a ReAct retrieval agent → grounded synthesis. Facts in the answer carry citations you can open to verify against the text, and a Reading Map shows which passages your answers actually drew on.
The project contrasts architectures on purpose. The chat turn is a fixed graph — each node has one job, so cost is predictable and steps are debuggable. Retrieval inside it is a bounded ReAct agent that picks its own tools (semantic search with HyDE + multi-query, SQLite FTS5 keyword search, structural chapter lookup, neighbour expansion) and loops until it has enough, capped at five iterations. Compound questions use plan-and-execute fan-out: decompose once, research each sub-question in parallel, synthesize once. The agent decides routing; generation always runs the same grounded prompt.
Grounding: a batched LLM mapper routes each fact to its supporting chunk(s), with a deterministic quote-guard; facts nothing supports get no citation. Reading progress counts what answers cited, not what retrieval merely fetched.
Human-in-the-loop: ambiguous questions pause the turn via LangGraph's interrupt() and resume the same thread on the user's reply, with a 30-minute TTL that degrades to a broad answer instead of breaking.
State & testing: a SqliteSaver checkpoints each turn (thread_id = session_id), bounded per thread; a 535-test suite runs with no network or API key.