Commit 7eddd618 authored by Jan Reimes's avatar Jan Reimes
Browse files

feat(client, http_client, models, parser, server): implement TEDDI search client

* Add TeddiClient for interacting with the TEDDI search engine.
* Create HTTP client with caching using hishel for efficient requests.
* Define Pydantic models and enums for search requests and responses.
* Implement HTML parser to extract search results from TEDDI responses.
* Develop FastMCP server to expose search functionality and technical bodies.
* Add tests for client, models, and parser to ensure functionality and correctness.
parent 2342a994
Loading
Loading
Loading
Loading
+1 −0
Original line number Diff line number Diff line
@@ -246,3 +246,4 @@ skills-lock.json
.dolt/
*.db
.beads/dolt-*.*
src/teddi-mcp/uv.lock
+6 −0
Original line number Diff line number Diff line
@@ -34,6 +34,9 @@ uv add tdoc-crawler
# Install with AI features (optional)
uv add tdoc-crawler[ai]

# Install with TEDDI MCP features (optional)
uv add tdoc-crawler[teddi]

# AI features are provided by the optional `tdoc-ai` extension package
# and installed automatically via the extra above.

@@ -44,6 +47,9 @@ uv sync

# Enable optional AI extension in source checkout
uv sync --extra ai

# Enable optional TEDDI MCP extension in source checkout
uv sync --extra teddi
```

### Using pip (not recommended)
+4 −0
Original line number Diff line number Diff line
@@ -40,6 +40,9 @@ dependencies = [
ai = [
    "tdoc-ai>=0.0.0",
]
teddi = [
    "teddi-mcp>=0.1.0",
]

[project.urls]
Repository = "https://forge.3gpp.org/rep/reimes/tdoc-crawler"
@@ -114,3 +117,4 @@ style = "semver"
[tool.uv.sources]
specify-cli = { git = "https://github.com/github/spec-kit.git" }
tdoc-ai = { path = "src/tdoc-ai", editable = true }
teddi-mcp = { path = "src/teddi-mcp", editable = true }
+146 −0
Original line number Diff line number Diff line
# TEDDI-MCP: FastMCP 3.0 Server for ETSI TEDDI Search

## Quick Start

FastMCP 3.0 server wrapping ETSI's TEDDI (TErms and Definitions Database Interactive) for AI agent integration.

**Key Files:**
- `models.py` — Pydantic models (SearchIn, SearchPattern, TechnicalBody, TermResult, DocumentRef)
- `client.py` — TeddiClient with Protocol-based abstraction
- `server.py` — FastMCP 3.0 server (stdio transport)
- `parser.py` — HTML parsing with TB grouping
- `cli.py` — Typer CLI (non-MCP interface)

```bash
uv pip install -e .
teddi-mcp search-term --term "QoS" --search-pattern exactmatch
teddi-mcp serve  # Start MCP server
```

## Key Design Patterns

### Protocol-Based Abstraction

`TeddiSource` protocol enables mocking and swapping TEDDI sources:

```python
class TeddiSource(Protocol):
    async def search_terms(self, request: SearchRequest) -> SearchResponse: ...
    async def get_available_technical_bodies(self) -> list[TechnicalBody]: ...
    async def fetch_document(self, url: str) -> bytes: ...
```

### HTTP Caching

All requests use `create_cached_teddi_session()`:
- SQLite backend: `.cache/teddi_http.sqlite3`
- TTL: 2 hours (refresh on access)
- Auto-retries: 429, 500, 502, 503, 504

### TB Grouping in Parser

TEDDI results have nested tables. Parser handles TB inheritance:

```
HTML: [TB="3GPP", Doc=""], [TB="", Doc="TS 24.008"]
→ Parsed: DocumentRef(tb="3GPP", spec="TS 24.008", url=...)
```

### Dual API: MCP + CLI

Both expose identical tools:
- `search_term()` — query with filters
- `list_technical_bodies()` — list TBs
- `fetch_document()` — retrieve content (MCP-only)

## Data Models

### Enums

```python
class SearchIn(str, Enum):
    ABBREVIATIONS = "abbreviations"
    DEFINITIONS = "definitions"
    BOTH = "both"

class SearchPattern(str, Enum):
    ALL_OCCURRENCES = "alloccurrences"
    EXACT_MATCH = "exactmatch"
    STARTING_WITH = "startingwith"
    ENDING_WITH = "endingwith"

class TechnicalBody(str, Enum):
    ALL = "all"
    THREE_GPP = "3gpp"
    ETSI = "etsi"
    IETF = "ietf"
    # ... more as discovered
```

### Data Classes

```python
@dataclass
class DocumentRef:
    technical_body: str
    specification: str
    url: str

@dataclass
class TermResult:
    term: str
    description: str
    documents: list[DocumentRef]
```

## TEDDI Endpoint

**POST** `https://webapp.etsi.org/Teddi/search`

**Parameters:**
- `term` — search string
- `searchin` — "abbreviations" | "definitions" | "both"
- `searchpattern` — "alloccurrences" | "exactmatch" | "startingwith" | "endingwith"
- `technicaldiebody` — "all" or comma-separated TBs

**Response:** HTML table (parsed by `parser.py`)

## Testing

```bash
uv run pytest tests/teddi_mcp/ -v
uv run pytest tests/teddi_mcp/ --cov=teddi_mcp --cov-report=term-missing
```

**Test structure:**
- `test_models.py` — Enum/dataclass validation
- `test_parser.py` — HTML parsing with fixtures
- `test_client.py` — TeddiClient (mocked HTTP)
- `test_http_client.py` — Session creation & caching
- `test_cli.py` — CLI argument parsing
- `test_server.py` — FastMCP tool invocation

## Implementation Notes

1. **TEDDI HTML-Driven:** Endpoint reverse-engineered. Parser may need updates if TEDDI UI changes.

2. **TB Grouping Logic:** Empty TB cell inherits previous TB value.

3. **Async-First:** All core methods async. Use `asyncio.run()` for sync wrappers.

4. **Cache Validation:** Tests auto-cache in `tests/.cache/teddi_http.sqlite3`.

## Adding Features

### New Search Filter

1. Define enum in `models.py`
2. Update `SearchRequest` dataclass
3. Update HTTP call in `client.py`
4. Update CLI in `cli.py`
5. Update MCP server in `server.py`
6. Add tests

## Dependencies

- FastMCP, Pydantic v2, httpx, hishel, BeautifulSoup4, typer, pytest
+92 −0
Original line number Diff line number Diff line
# TEDDI-MCP: FastMCP Server for ETSI TEDDI

A FastMCP 3.0 server that wraps ETSI's TErms and Definitions Database Interactive (TEDDI) for AI agent integration and command-line usage.

## Features

- **FastMCP 3.0 Server**: Expose TEDDI search as an MCP tool for AI agents (Claude, etc.)
- **CLI Interface**: Search TEDDI from the command line with rich table/JSON output
- **HTTP Caching**: Automatic hishel-based caching of TEDDI responses (2-hour TTL)
- **TB Grouping**: Smart parsing of sub-table results with technical body grouping logic
- **Type-Safe**: Full Pydantic models and type hints throughout
- **Async-First**: Built on asyncio for performance

## Quick Start

### Installation

```bash
cd src/teddi-mcp
uv pip install -e .
```

### CLI Usage

```bash
# Search for a term
teddi-mcp search-term --term "QoS" --search-pattern exactmatch

# List available technical bodies
teddi-mcp list-technical-bodies

# JSON output
teddi-mcp search-term --term "QoS" --output json

# Filter by technical bodies
teddi-mcp search-term --term "test" --technical-bodies "3gpp,etsi"
```

### MCP Server

```bash
# Start the MCP server (stdio)
teddi-mcp serve
```

Then configure your AI agent client (e.g., Claude) to use this server:

```json
{
  "mcpServers": {
    "teddi": {
      "command": "teddi-mcp",
      "args": ["serve"]
    }
  }
}
```

## Architecture

- **models.py**: Pydantic data models (SearchIn, SearchPattern, TechnicalBody enums)
- **client.py**: Core TeddiClient with Protocol-based abstraction
- **parser.py**: HTML parsing with TB grouping logic
- **http_client.py**: HTTP session manager with hishel caching
- **cli.py**: Typer CLI interface
- **server.py**: FastMCP 3.0 server implementation

## Testing

```bash
# Run all tests
uv run pytest tests/teddi_mcp/ -v

# Run with coverage
uv run pytest tests/teddi_mcp/ --cov=teddi_mcp --cov-report=term-missing

# Run specific test
uv run pytest tests/teddi_mcp/test_parser.py -v
```

## Development

See [AGENTS.md](AGENTS.md) for detailed development guidelines including:
- Protocol-based abstraction patterns
- HTTP caching with hishel
- Sub-table parsing with TB grouping
- Dual API design (MCP + CLI)
- Adding new features

## License

MIT
Loading