Commit 4f291b69 authored by Jan Reimes's avatar Jan Reimes
Browse files

docs: update documentation to reflect removal of 3gpp-ai package

* Deprecate AI Document Processing section in ai.md
* Update configuration references from TDocCrawlerConfig to ThreeGPPConfig
* Revise usage instructions in convert-lo-usage.md regarding AI extraction
* Modify query.md to indicate removal of AI RAG query functionality
parent 485c901f
Loading
Loading
Loading
Loading
+9 −11
Original line number Diff line number Diff line
@@ -35,7 +35,6 @@ Notes:
|-----------|--------------|
| Core | Python 3.14, typer, rich, pydantic, pydantic-sqlite, requests, hishel |
| Specs Crawling | beautifulsoup4, lxml, xlsxwriter, zipinspect |
| AI Module | 3gpp-ai (LanceDB, sentence-transformers, Docling, litellm) |
| Conversion | convert-lo (LibreOffice headless) |
| Database | SQLite via pydantic-sqlite |

@@ -49,7 +48,6 @@ Notes:
| `src/tdoc_crawler/specs/` | Specification operations |
| `src/tdoc_crawler/meetings/` | Meeting data handling |
| `src/tdoc_crawler/parsers/` | Parsing logic (Excel, HTML, etc.) |
| `packages/3gpp-ai/` | AI embeddings, graphs, search |
| `packages/convert-lo/` | LibreOffice document conversion |
| `packages/pool-executors/` | Serial/parallel executor utilities |
| `tests/` | Test suite (see tests/AGENTS.md) |
@@ -60,9 +58,9 @@ Notes:
|-----|-----------|--------------|
| CLI command | `src/tdoc_crawler/cli/tdoc_app.py` | Typer app, Rich console |
| Pydantic model | `src/tdoc_crawler/models/` | Data validation, serialization |
| HTTP caching | `src/tdoc_crawler/http_client.py` | `create_cached_session()` |
| Path management | `src/tdoc_crawler/config/__init__.py` | `CacheManager`, `resolve_cache_manager()` |
| Configuration | `src/tdoc_crawler/config/settings.py` | `TDocCrawlerConfig`, pydantic-settings |
| HTTP caching | `src/tdoc_crawler/http_client/` | `create_cached_session()` |
| Path management | `src/tdoc_crawler/config/` | `CacheManager`, `resolve_cache_manager()` |
| Configuration | `src/tdoc_crawler/config/settings.py` | `ThreeGPPConfig`, pydantic-settings |
| Test structure | `tests/test_crawler.py` | Fixtures, mocking, isolation |

## Heuristics (quick decisions)
@@ -114,25 +112,25 @@ Notes:
| TSG | Technical Specification Group (SA, RAN, CT) |
| Portal | 3GPP EOL authenticated portal |

## Configuration System (NEW in v1.0)
## Configuration System

**Two complementary systems:**

1. **`TDocCrawlerConfig`** (pydantic-settings) — Type-safe configuration from files/env vars
1. **`ThreeGPPConfig`** (pydantic-settings, alias `TDocCrawlerConfig`) — Type-safe configuration from files/env vars
2. **`CacheManager`** (runtime paths) — File system path resolution

### TDocCrawlerConfig (Settings)
### ThreeGPPConfig (Settings)

Use for **all configurable behavior** (timeouts, credentials, limits, etc.):

```python
from tdoc_crawler.config import TDocCrawlerConfig
from tdoc_crawler.config import ThreeGPPConfig

# Load with automatic discovery (3gpp-crawler.toml, env vars)
config = TDocCrawlerConfig.from_settings()
config = ThreeGPPConfig.from_settings()

# Or with explicit config file
config = TDocCrawlerConfig.from_settings(config_file=Path("./my-config.toml"))
config = ThreeGPPConfig.from_settings(config_file=Path("./my-config.toml"))

# Access nested config
config.path.cache_dir      # Path to cache directory
+4 −12
Original line number Diff line number Diff line
@@ -15,7 +15,6 @@ A command-line tool for crawling the 3GPP FTP server, caching 3GPP document meta
- **Case-Insensitive Queries**: Search for TDocs regardless of case
- **Multiple Output Formats**: Export results as table, JSON, or YAML
- **Incremental Updates**: Only fetch new data on subsequent crawls
- **AI Document Processing** - Semantic search, knowledge graphs, and AI-powered summarization (optional, install with `3gpp-crawler[ai]`)
- **Rich CLI**: Beautiful terminal output with progress indicators

## Installation
@@ -30,16 +29,9 @@ uvx tdoc-crawler --help
### Using uv

```bash
# Install from PyPI (publication pending)
uv add 3gpp-crawler

# Install with AI features (optional)
uv add 3gpp-crawler[ai]

# AI features are provided by the optional `3gpp-ai` extension package
# and installed automatically via the extra above.

# Or install from source
# Install from source
cd 3gpp-crawler
uv sync
```

### Using pip (not recommended)
@@ -119,7 +111,7 @@ Gather metadata from 3GPP and WhatTheSpec:
tdoc-crawler crawl-meetings

# Crawl TDoc metadata (RAN, SA, CT)
tdoc-crawler crawl
tdoc-crawler crawl-tdocs

# Populate spec catalog
spec-crawler crawl-specs
+2 −0
Original line number Diff line number Diff line
# AI Document Processing

> **⚠️ Deprecated:** The `3gpp-ai` package has been removed from this repository. This documentation is kept for historical reference only. AI features (semantic search, knowledge graphs, summarization) are no longer available in the current codebase.

The AI module provides intelligent document processing capabilities for 3GPP document data, including semantic search, knowledge graph construction, and AI-powered summarization.

**Key Features:**
+1 −1
Original line number Diff line number Diff line
@@ -6,7 +6,7 @@ This guide covers the configuration system for 3gpp-crawler, including config fi

The 3gpp-crawler uses a composable configuration system with two complementary parts:

1. **TDocCrawlerConfig** (pydantic-settings) — Type-safe configuration from files/env vars
1. **ThreeGPPConfig** (pydantic-settings) — Type-safe configuration from files/env vars (also available as `TDocCrawlerConfig` alias)
1. **CacheManager** (runtime paths) — File system path resolution

Configuration can be provided via:
+1 −12
Original line number Diff line number Diff line
@@ -142,15 +142,4 @@ converter.convert(

## Relationship to AI Conversion Artifacts

`convert-lo` handles format conversion only. Structured AI extraction artifacts are produced by the AI pipeline commands:

```bash
3gpp-ai convert <tdoc_id> --output <file>.md
3gpp-ai workspace process --workspace <workspace_name>
```

When structured extraction is enabled, these AI commands may emit sidecars next to markdown output:

- `*_tables.json`
- `*_figures.json`
- `*_equations.json`
`convert-lo` handles format conversion only. Structured AI extraction artifacts were previously produced by the `3gpp-ai` package (now removed from this repository).
Loading