A brownfield extension of `3gpp-ai` focused on high-fidelity extraction of complex 3GPP PDFs into deterministic structured artifacts, summarize workflows, and future-compatible metadata for wiki/retrieval systems.
## Core Value
Extract technically accurate, traceable document structure and meaning from complex PDFs into deterministic Markdown and canonical JSON artifacts.
## Current State
- Latest shipped milestone: **v1.0 Advanced PDF Extraction Pipeline**
Last activity: 2026-04-18 - v1.0 milestone archived and roadmap reset for next cycle
## Project Reference
See: .planning/PROJECT.md (updated 2026-04-17)
**Core value:** Extract technically accurate, traceable document structure and meaning from complex PDFs into deterministic Markdown and JSON artifacts.
**Current focus:** Milestone v1.0 Advanced PDF Extraction Pipeline
## Accumulated Context
- Current extractor stack: Docling + optional VLM + artifact persistence under `.ai`
- Milestone scope excludes embedding/RAG changes
- Summarize command must consume/benefit from structured extraction artifacts
- Future direction includes LLM-wiki approach; extraction metadata should support that model
### Roadmap Evolution
- Phase 6 added: As a preparation for the next milestone (the RAG/information retrieval system), remove all existing/following modules that perform embedding.
- Phase 6 planned: 06-01-PLAN.md and 06-02-PLAN.md added for embedding module decommission and clean baseline handoff.
- Phase 4 executed: additive table/figure/equation fidelity contracts implemented with deterministic provenance normalization and regression coverage.
- Phase 5 executed: summarize now uses structured-first prompt context with deterministic fallback and CLI output-mode compatibility for wiki-ready rendering.
## Deferred Items
Items acknowledged and deferred at v1.0 close on 2026-04-18:
| Category | Item | Status |
|----------|------|--------|
| process | milestone audit artifact (`v1.0-MILESTONE-AUDIT.md`) | missing at close |