← All projects

Document Intelligence Agent

Multi-modal document Q&A, structured data extraction from specification PDFs, and on-demand technical image generation, all routed through a single decision graph.

Enterprise project
LangGraphMulti-modalStructured Output
At a glance
RouteSelectAnalyzeAnswer

Why it matters

Engineering teams have thousands of pages of specifications, datasheets, drawings, and flow diagrams β€” and almost no time to read them during an incident. The document agent answers natural-language questions over the whole corpus, but it also does something most document assistants don’t: it extracts structured performance data from specification PDFs into a shape downstream analysis can actually consume, and generates technical imagery on request.

Capabilities

  • Intelligent routing between three paths β€” document analysis, direct conversation, and image generation β€” chosen by a combination of keyword signals and LLM classification.
  • Priority-based document selection from multiple categories (design docs rank above datasheets, datasheets above CAD), so the most authoritative source wins when answers disagree.
  • Structured extraction mode triggered by upstream agents, producing typed performance-curve data instead of free text.
  • Image generation with automatic prompt refinement on failure, including an image-to-image mode for variations.

What makes it hold up

Retrieval was never the bottleneck β€” ranking was. Most of the design effort went into deciding which document matters when several could plausibly answer the question, and into making the extraction output typed and validated rather than a blob the next agent has to re-parse. The graph is intentionally small; the hard work sits inside the routing prompt and the schema.

Enterprise project. Official writeup and demo link will be added once online.

© 2026 Dr. Bin Liu