Initial commit: Migrate teddi-mcp from tdoc-crawler (08a29848) · Commits · Jan Reimes / teddi-mcp

.gitignore

0 → 100644

+35 −0

Original line number	Diff line number	Diff line
		# Byte-compiled / optimized / DLL files
		__pycache__/
		*.py[cod]
		*$py.class

		# Distribution / packaging
		build/
		dist/
		*.egg-info/

		# Virtual environments
		.venv/
		venv/

		# IDE
		.idea/
		.vscode/
		*.swp
		*.swo

		# Testing
		.pytest_cache/
		.coverage
		htmlcov/

		# Cache
		.ruff_cache/
		.mypy_cache/

		# OS
		.DS_Store
		Thumbs.db

		# Local cache
		.cache/

AGENTS.md

0 → 100644

+92 −0

Original line number	Diff line number	Diff line
		# teddi-mcp

		FastMCP 3.0 server for ETSI TEDDI (TErms and Definitions Database Interactive).

		## Quick Start

		```bash
		uv pip install -e .
		teddi-mcp search-term --term "QoS" --search-pattern exactmatch
		teddi-mcp serve # Start MCP server
		```

		Key Files:
		- `models.py` — Pydantic models
		- `client.py` — `TeddiClient` with Protocol abstraction
		- `server.py` — FastMCP 3.0 server (stdio)
		- `parser.py` — HTML parsing with TB grouping
		- `cli.py` — Typer CLI (non-MCP)

		## Key Design Patterns

		### Protocol-Based Abstraction

		`TeddiSource` protocol enables mocking/swapping TEDDI sources:

		```python
		class TeddiSource(Protocol):
		async def search_terms(self, request: SearchRequest) -> SearchResponse: ...
		async def get_available_technical_bodies(self) -> list[TechnicalBody]: ...
		async def fetch_document(self, url: str) -> bytes: ...
		```

		### HTTP Caching

		All requests use `create_cached_teddi_session()`:
		- SQLite: `.cache/teddi_http.sqlite3`
		- TTL: 2 hours (refresh on access)
		- Auto-retries: 429, 500, 502, 503, 504

		### TB Grouping in Parser

		TEDDI results have nested tables. Parser handles TB inheritance:
		- Empty TB cell inherits previous TB value
		- Results grouped by technical body

		### Dual API: MCP + CLI

		Both expose identical tools:
		- `search_term()` — query with filters
		- `list_technical_bodies()` — list TBs
		- `fetch_document()` — retrieve content (MCP-only)

		## Data Models

		See `models.py` for complete definitions:

		- Enums: `SearchIn`, `SearchPattern`, `TechnicalBody`
		- Data Classes: `DocumentRef`, `TermResult`

		## TEDDI Endpoint

		POST `https://webapp.etsi.org/Teddi/search`

		Parameters: `term`, `searchin`, `searchpattern`, `technicaldiebody`

		Response: HTML table (parsed by `parser.py`)

		## Testing

		```bash
		uv run pytest tests/teddi_mcp/ -v
		uv run pytest tests/teddi_mcp/ --cov=teddi_mcp
		```

		Test structure: `test_models.py`, `test_parser.py`, `test_client.py`, `test_http_client.py`, `test_cli.py`, `test_server.py`

		## Implementation Notes

		1. TEDDI HTML-Driven: Endpoint reverse-engineered. Parser may need updates if TEDDI UI changes.
		2. Async-First: All core methods async. Use `asyncio.run()` for sync wrappers.
		3. Cache Validation: Tests auto-cache in `tests/.cache/teddi_http.sqlite3`.

		## Adding Features

		### New Search Filter

		1. Define enum in `models.py`
		2. Update `SearchRequest` dataclass
		3. Update HTTP call in `client.py`
		4. Update CLI in `cli.py`
		5. Update MCP server in `server.py`
		6. Add tests

README.md

0 → 100644

+98 −0

Original line number	Diff line number	Diff line
		# TEDDI-MCP: FastMCP Server for ETSI TEDDI

		A FastMCP 3.0 server that wraps ETSI's TErms and Definitions Database Interactive (TEDDI) for AI agent integration and command-line usage.

		## Features

		- FastMCP 3.0 Server: Expose TEDDI search as an MCP tool for AI agents (Claude, etc.)
		- CLI Interface: Search TEDDI from the command line with table, JSON, ISON, and TOON output
		- HTTP Caching: Automatic hishel-based caching of TEDDI responses (2-hour TTL)
		- TB Grouping: Smart parsing of sub-table results with technical body grouping logic
		- Type-Safe: Full Pydantic models and type hints throughout
		- Async-First: Built on asyncio for performance

		## Quick Start

		### Installation

		```bash
		cd src/teddi-mcp
		uv pip install -e .
		```

		### CLI Usage

		```bash
		# Search for a term
		teddi-mcp search term "QoS" --search-pattern exactmatch

		# List available technical bodies
		teddi-mcp search list-bodies

		# JSON output
		teddi-mcp search term "QoS" --output json

		# ISON output (token-optimized)
		teddi-mcp search term "QoS" --output ison

		# TOON output (token-optimized)
		teddi-mcp search term "QoS" --output toon

		# Filter by technical bodies
		teddi-mcp search term "test" --technical-bodies "3gpp,etsi"
		```

		### MCP Server

		```bash
		# Start the MCP server (stdio)
		teddi-mcp server
		```

		Then configure your AI agent client (e.g., Claude) to use this server:

		```json
		{
		"mcpServers": {
		"teddi": {
		"command": "teddi-mcp",
		"args": ["server"]
		}
		}
		}
		```

		## Architecture

		- models.py: Pydantic data models (SearchIn, SearchPattern, TechnicalBody enums)
		- client.py: Core TeddiClient with Protocol-based abstraction
		- parser.py: HTML parsing with TB grouping logic
		- http_client.py: HTTP session manager with hishel caching
		- cli.py: Typer CLI interface
		- server.py: FastMCP 3.0 server implementation

		## Testing

		```bash
		# Run all tests
		uv run pytest tests/teddi_mcp/ -v

		# Run with coverage
		uv run pytest tests/teddi_mcp/ --cov=teddi_mcp --cov-report=term-missing

		# Run specific test
		uv run pytest tests/teddi_mcp/test_parser.py -v
		```

		## Development

		See [AGENTS.md](AGENTS.md) for detailed development guidelines including:
		- Protocol-based abstraction patterns
		- HTTP caching with hishel
		- Sub-table parsing with TB grouping
		- Dual API design (MCP + CLI)
		- Adding new features

		## License

		MIT

docs/api.md

0 → 100644

+176 −0

Original line number	Diff line number	Diff line
		# API Reference

		## Enums

		### `OutputFormat`

		Output format for CLI results.

		```python
		class OutputFormat(StrEnum):
		JSON = "json"
		ISON = "ison"
		TOON = "toon"
		TABLE = "table"
		```

		### `SearchIn`

		Search scope: where to search for the term.

		```python
		class SearchIn(StrEnum):
		ABBREVIATIONS = "abbreviations"
		DEFINITIONS = "definitions"
		BOTH = "both"
		```

		### `SearchPattern`

		Search pattern: how to match the term.

		```python
		class SearchPattern(StrEnum):
		ALL_OCCURRENCES = "alloccurrences"
		EXACT_MATCH = "exactmatch"
		STARTING_WITH = "startingwith"
		ENDING_WITH = "endingwith"
		```

		### `TechnicalBody`

		ETSI and standardization body identifiers.

		```python
		class TechnicalBody(StrEnum):
		ALL = "all"
		```

		## Data Classes

		### `DocumentRef`

		Reference to a specification document from TEDDI results.

		```python
		@dataclass
		class DocumentRef:
		technical_body: str # e.g., '3GPP'
		specification: str # e.g., 'TS 24.008'
		url: str # Full HTTP URL to the specification
		```

		### `TermResult`

		A single term found in TEDDI with its definition and document references.

		```python
		@dataclass
		class TermResult:
		term: str # The term/abbreviation found
		description: str # Definition or description
		documents: list[DocumentRef] # Documents grouped by technical body
		```

		### `SearchRequest`

		Request parameters for TEDDI search.

		```python
		@dataclass
		class SearchRequest:
		term: str # Term to search for
		search_in: SearchIn = SearchIn.BOTH # Scope
		search_pattern: SearchPattern = SearchPattern.ALL_OCCURRENCES # Pattern
		technical_bodies: list[TechnicalBody] \| None = None # Filter by TBs
		```

		### `SearchResponse`

		Response from TEDDI search containing all matching terms.

		```python
		@dataclass
		class SearchResponse:
		query: SearchRequest # Original search request
		results: list[TermResult] # Terms found matching the criteria
		total_count: int # Total number of matching results
		```

		## Client

		### `TeddiClient`

		Main client for TEDDI search operations.

		```python
		class TeddiClient:
		def __init__(self, client: httpx.AsyncClient \| None = None) -> None:
		"""Initialize client with optional custom HTTP client."""

		async def __aenter__(self) -> "TeddiClient":
		"""Async context manager entry."""

		async def __aexit__(self, *args: object) -> None:
		"""Async context manager exit."""

		async def search_terms(self, request: SearchRequest) -> SearchResponse:
		"""Search for terms in TEDDI."""

		async def get_available_technical_bodies(self) -> list[str]:
		"""Get list of available technical bodies."""

		async def fetch_document(self, url: str) -> bytes:
		"""Fetch document content from URL."""
		```

		## HTTP Client

		### `create_cached_teddi_async_client`

		Creates an async HTTP client with caching.

		```python
		def create_cached_teddi_async_client(
		cache_dir: Path \| None = None,
		ttl_seconds: int = 7200,
		) -> httpx.AsyncClient:
		"""
		Create async HTTP client with hishel caching.

		- SQLite cache in `.cache/teddi_http.sqlite3`
		- TTL: 2 hours (7200 seconds) by default
		- Auto-retries on 429, 500, 502, 503, 504
		"""
		```

		## Parser

		### `parse_teddi_response`

		Parse TEDDI HTML response into structured results.

		```python
		def parse_teddi_response(html: str) -> list[TermResult]:
		"""
		Parse HTML table response from TEDDI search.

		Handles:
		- Nested tables for document references
		- TB grouping (empty cells inherit from previous)
		- Relative URL resolution
		"""
		```

		## TEDDI Endpoint

		POST `https://webapp.etsi.org/Teddi/search`

		Parameters:
		- `qWhatToSearch`: Term to search
		- `qWhereOption`: Search scope (1=abbreviations, 3=definitions, 2=both)
		- `qWhatOption`: Match pattern (1=alloccurrences, 2=startingwith, 3=endingwith, 4=exactmatch)
		- `qShowTBs`: Technical body filter (default: all)
		- `btnSearch.x`, `btnSearch.y`: Search button coordinates

		Response: HTML table parsed by `parse_teddi_response()`

docs/development.md

0 → 100644

+111 −0

Original line number	Diff line number	Diff line
		# Development Guide

		## Setup

		1. Clone the repository:
		```bash
		git clone https://forge.3gpp.org/rep/reimes/teddi-mcp.git
		cd teddi-mcp
		```

		2. Sync dependencies:
		```bash
		uv sync --all-extras
		```

		3. Install pre-commit hooks:
		```bash
		uv run pre-commit install
		```

		## Running Tests

		```bash
		# Run all tests
		uv run pytest tests/ -v

		# Run with coverage
		uv run pytest tests/ --cov=src.teddi_mcp --cov-report=term-missing

		# Run specific test file
		uv run pytest tests/test_parser.py -v
		```

		## Code Quality

		```bash
		# Format and lint
		uv run ruff format
		uv run ruff check --fix

		# Type checking
		uv run pyright
		```

		## Project Structure

		```
		teddi-mcp/
		├── src/
		│ └── teddi_mcp/ # Main package
		│ ├── models.py # Pydantic models
		│ ├── client.py # TeddiClient with Protocol abstraction
		│ ├── parser.py # HTML parsing with TB grouping
		│ ├── http_client.py # HTTP session with hishel caching
		│ ├── cli.py # Typer CLI
		│ └── server.py # FastMCP 3.0 server
		├── tests/ # Test suite
		├── docs/ # Documentation
		├── pyproject.toml # Project config
		└── README.md # Overview
		```

		## Key Design Patterns

		### Protocol-Based Abstraction

		`TeddiSource` protocol enables mocking/swapping TEDDI sources:

		```python
		class TeddiSource(Protocol):
		async def search_terms(self, request: SearchRequest) -> SearchResponse: ...
		async def get_available_technical_bodies(self) -> list[TechnicalBody]: ...
		async def fetch_document(self, url: str) -> bytes: ...
		```

		### HTTP Caching

		All requests use `create_cached_teddi_session()`:
		- SQLite: `.cache/teddi_http.sqlite3`
		- TTL: 2 hours (refresh on access)
		- Auto-retries: 429, 500, 502, 503, 504

		### TB Grouping in Parser

		TEDDI results have nested tables. Parser handles TB inheritance:
		- Empty TB cell inherits previous TB value
		- Results grouped by technical body

		### Dual API: MCP + CLI

		Both expose identical tools:
		- `search_term()` — query with filters
		- `list_technical_bodies()` — list TBs
		- `fetch_document()` — retrieve content (MCP-only)

		## Adding Features

		### New Search Filter

		1. Define enum in `models.py`
		2. Update `SearchRequest` dataclass
		3. Update HTTP call in `client.py`
		4. Update CLI in `cli.py`
		5. Update MCP server in `server.py`
		6. Add tests

		## Implementation Notes

		1. TEDDI HTML-Driven: Endpoint reverse-engineered. Parser may need updates if TEDDI UI changes.
		2. Async-First: All core methods async. Use `asyncio.run()` for sync wrappers.
		3. Cache Validation: Tests auto-cache in `tests/.cache/teddi_http.sqlite3`.