feat(client, http_client, models, parser, server): implement TEDDI search client (7eddd618) · Commits · Jan Reimes / 3gpp-crawler

.gitignore

+1 −0

Original line number	Diff line number	Diff line
		@@ -246,3 +246,4 @@ skills-lock.json
		.dolt/
		*.db
		.beads/dolt-.
		src/teddi-mcp/uv.lock

README.md

+6 −0

Original line number	Diff line number	Diff line
		@@ -34,6 +34,9 @@ uv add tdoc-crawler
		# Install with AI features (optional)
		uv add tdoc-crawler[ai]

		# Install with TEDDI MCP features (optional)
		uv add tdoc-crawler[teddi]

		# AI features are provided by the optional `tdoc-ai` extension package
		# and installed automatically via the extra above.

		@@ -44,6 +47,9 @@ uv sync

		# Enable optional AI extension in source checkout
		uv sync --extra ai

		# Enable optional TEDDI MCP extension in source checkout
		uv sync --extra teddi
		```

		### Using pip (not recommended)

pyproject.toml

+4 −0

Original line number	Diff line number	Diff line
		@@ -40,6 +40,9 @@ dependencies = [
		ai = [
		"tdoc-ai>=0.0.0",
		]
		teddi = [
		"teddi-mcp>=0.1.0",
		]

		[project.urls]
		Repository = "https://forge.3gpp.org/rep/reimes/tdoc-crawler"
		@@ -114,3 +117,4 @@ style = "semver"
		[tool.uv.sources]
		specify-cli = { git = "https://github.com/github/spec-kit.git" }
		tdoc-ai = { path = "src/tdoc-ai", editable = true }
		teddi-mcp = { path = "src/teddi-mcp", editable = true }

src/teddi-mcp/AGENTS.md

0 → 100644

+146 −0

Original line number	Diff line number	Diff line
		# TEDDI-MCP: FastMCP 3.0 Server for ETSI TEDDI Search

		## Quick Start

		FastMCP 3.0 server wrapping ETSI's TEDDI (TErms and Definitions Database Interactive) for AI agent integration.

		Key Files:
		- `models.py` — Pydantic models (SearchIn, SearchPattern, TechnicalBody, TermResult, DocumentRef)
		- `client.py` — TeddiClient with Protocol-based abstraction
		- `server.py` — FastMCP 3.0 server (stdio transport)
		- `parser.py` — HTML parsing with TB grouping
		- `cli.py` — Typer CLI (non-MCP interface)

		```bash
		uv pip install -e .
		teddi-mcp search-term --term "QoS" --search-pattern exactmatch
		teddi-mcp serve # Start MCP server
		```

		## Key Design Patterns

		### Protocol-Based Abstraction

		`TeddiSource` protocol enables mocking and swapping TEDDI sources:

		```python
		class TeddiSource(Protocol):
		async def search_terms(self, request: SearchRequest) -> SearchResponse: ...
		async def get_available_technical_bodies(self) -> list[TechnicalBody]: ...
		async def fetch_document(self, url: str) -> bytes: ...
		```

		### HTTP Caching

		All requests use `create_cached_teddi_session()`:
		- SQLite backend: `.cache/teddi_http.sqlite3`
		- TTL: 2 hours (refresh on access)
		- Auto-retries: 429, 500, 502, 503, 504

		### TB Grouping in Parser

		TEDDI results have nested tables. Parser handles TB inheritance:

		```
		HTML: [TB="3GPP", Doc=""], [TB="", Doc="TS 24.008"]
		→ Parsed: DocumentRef(tb="3GPP", spec="TS 24.008", url=...)
		```

		### Dual API: MCP + CLI

		Both expose identical tools:
		- `search_term()` — query with filters
		- `list_technical_bodies()` — list TBs
		- `fetch_document()` — retrieve content (MCP-only)

		## Data Models

		### Enums

		```python
		class SearchIn(str, Enum):
		ABBREVIATIONS = "abbreviations"
		DEFINITIONS = "definitions"
		BOTH = "both"

		class SearchPattern(str, Enum):
		ALL_OCCURRENCES = "alloccurrences"
		EXACT_MATCH = "exactmatch"
		STARTING_WITH = "startingwith"
		ENDING_WITH = "endingwith"

		class TechnicalBody(str, Enum):
		ALL = "all"
		THREE_GPP = "3gpp"
		ETSI = "etsi"
		IETF = "ietf"
		# ... more as discovered
		```

		### Data Classes

		```python
		@dataclass
		class DocumentRef:
		technical_body: str
		specification: str
		url: str

		@dataclass
		class TermResult:
		term: str
		description: str
		documents: list[DocumentRef]
		```

		## TEDDI Endpoint

		POST `https://webapp.etsi.org/Teddi/search`

		Parameters:
		- `term` — search string
		- `searchin` — "abbreviations" \| "definitions" \| "both"
		- `searchpattern` — "alloccurrences" \| "exactmatch" \| "startingwith" \| "endingwith"
		- `technicaldiebody` — "all" or comma-separated TBs

		Response: HTML table (parsed by `parser.py`)

		## Testing

		```bash
		uv run pytest tests/teddi_mcp/ -v
		uv run pytest tests/teddi_mcp/ --cov=teddi_mcp --cov-report=term-missing
		```

		Test structure:
		- `test_models.py` — Enum/dataclass validation
		- `test_parser.py` — HTML parsing with fixtures
		- `test_client.py` — TeddiClient (mocked HTTP)
		- `test_http_client.py` — Session creation & caching
		- `test_cli.py` — CLI argument parsing
		- `test_server.py` — FastMCP tool invocation

		## Implementation Notes

		1. TEDDI HTML-Driven: Endpoint reverse-engineered. Parser may need updates if TEDDI UI changes.

		2. TB Grouping Logic: Empty TB cell inherits previous TB value.

		3. Async-First: All core methods async. Use `asyncio.run()` for sync wrappers.

		4. Cache Validation: Tests auto-cache in `tests/.cache/teddi_http.sqlite3`.

		## Adding Features

		### New Search Filter

		1. Define enum in `models.py`
		2. Update `SearchRequest` dataclass
		3. Update HTTP call in `client.py`
		4. Update CLI in `cli.py`
		5. Update MCP server in `server.py`
		6. Add tests

		## Dependencies

		- FastMCP, Pydantic v2, httpx, hishel, BeautifulSoup4, typer, pytest

src/teddi-mcp/README.md

0 → 100644

+92 −0

Original line number	Diff line number	Diff line
		# TEDDI-MCP: FastMCP Server for ETSI TEDDI

		A FastMCP 3.0 server that wraps ETSI's TErms and Definitions Database Interactive (TEDDI) for AI agent integration and command-line usage.

		## Features

		- FastMCP 3.0 Server: Expose TEDDI search as an MCP tool for AI agents (Claude, etc.)
		- CLI Interface: Search TEDDI from the command line with rich table/JSON output
		- HTTP Caching: Automatic hishel-based caching of TEDDI responses (2-hour TTL)
		- TB Grouping: Smart parsing of sub-table results with technical body grouping logic
		- Type-Safe: Full Pydantic models and type hints throughout
		- Async-First: Built on asyncio for performance

		## Quick Start

		### Installation

		```bash
		cd src/teddi-mcp
		uv pip install -e .
		```

		### CLI Usage

		```bash
		# Search for a term
		teddi-mcp search-term --term "QoS" --search-pattern exactmatch

		# List available technical bodies
		teddi-mcp list-technical-bodies

		# JSON output
		teddi-mcp search-term --term "QoS" --output json

		# Filter by technical bodies
		teddi-mcp search-term --term "test" --technical-bodies "3gpp,etsi"
		```

		### MCP Server

		```bash
		# Start the MCP server (stdio)
		teddi-mcp serve
		```

		Then configure your AI agent client (e.g., Claude) to use this server:

		```json
		{
		"mcpServers": {
		"teddi": {
		"command": "teddi-mcp",
		"args": ["serve"]
		}
		}
		}
		```

		## Architecture

		- models.py: Pydantic data models (SearchIn, SearchPattern, TechnicalBody enums)
		- client.py: Core TeddiClient with Protocol-based abstraction
		- parser.py: HTML parsing with TB grouping logic
		- http_client.py: HTTP session manager with hishel caching
		- cli.py: Typer CLI interface
		- server.py: FastMCP 3.0 server implementation

		## Testing

		```bash
		# Run all tests
		uv run pytest tests/teddi_mcp/ -v

		# Run with coverage
		uv run pytest tests/teddi_mcp/ --cov=teddi_mcp --cov-report=term-missing

		# Run specific test
		uv run pytest tests/teddi_mcp/test_parser.py -v
		```

		## Development

		See [AGENTS.md](AGENTS.md) for detailed development guidelines including:
		- Protocol-based abstraction patterns
		- HTTP caching with hishel
		- Sub-table parsing with TB grouping
		- Dual API design (MCP + CLI)
		- Adding new features

		## License

		MIT