Commit 3685ea61 authored by Jan Reimes's avatar Jan Reimes
Browse files

refactor(docs): rename tdoc-crawler to 3GPP Crawler throughout documentation

* Update AGENTS.md to reflect new project name.
* Modify README.md for consistent naming and repository links.
* Revise AI documentation to use 3GPP Crawler terminology.
* Adjust configuration and caching documentation for new paths.
parent 98a076be
Loading
Loading
Loading
Loading
+2 −2
Original line number Diff line number Diff line
# TDoc-Crawler
# 3GPP Crawler

CLI tool for querying structured 3GPP TDoc data.
CLI tool for querying structured 3GPP document data.

## Commands

+12 −12
Original line number Diff line number Diff line
# tdoc-crawler
# 3GPP Crawler

![3GPP Crawler Logo](docs/images/logo.jpg)

A command-line tool for crawling the 3GPP FTP server, caching TDoc metadata in a local database, and querying structured data via JSON/YAML output.
A command-line tool for crawling the 3GPP FTP server, caching 3GPP document metadata in a local database, and querying structured data via JSON/YAML output.

**Github repository**: <https://forge.3gpp.org/rep/reimes/tdoc-crawler/>
**Github repository**: <https://forge.3gpp.org/rep/reimes/3gpp-crawler/>

## Features

@@ -15,7 +15,7 @@ A command-line tool for crawling the 3GPP FTP server, caching TDoc metadata in a
- **Case-Insensitive Queries**: Search for TDocs regardless of case
- **Multiple Output Formats**: Export results as table, JSON, or YAML
- **Incremental Updates**: Only fetch new data on subsequent crawls
- **AI Document Processing** - Semantic search, knowledge graphs, and AI-powered summarization (optional, install with `tdoc-crawler[ai]`)
- **AI Document Processing** - Semantic search, knowledge graphs, and AI-powered summarization (optional, install with `3gpp-crawler[ai]`)
- **Rich CLI**: Beautiful terminal output with progress indicators

## Installation
@@ -23,7 +23,7 @@ A command-line tool for crawling the 3GPP FTP server, caching TDoc metadata in a
### Install as uv tool (recommended)

```bash
uv tool install https://forge.3gpp.org/rep/reimes/tdoc-crawler.git
uv tool install https://forge.3gpp.org/rep/reimes/3gpp-crawler.git
uvx tdoc-crawler --help
```

@@ -31,10 +31,10 @@ uvx tdoc-crawler --help

```bash
# Install from PyPI (publication pending)
uv add tdoc-crawler
uv add 3gpp-crawler

# Install with AI features (optional)
uv add tdoc-crawler[ai]
uv add 3gpp-crawler[ai]

# AI features are provided by the optional `tdoc-ai` extension package
# and installed automatically via the extra above.
@@ -45,7 +45,7 @@ uv add tdoc-crawler[ai]
### Using pip (not recommended)

```bash
pip install tdoc-crawler
pip install 3gpp-crawler
```

## Configuration
@@ -122,7 +122,7 @@ tdoc-crawler crawl-meetings
tdoc-crawler crawl

# Populate spec catalog
tdoc-crawler crawl-specs
spec-crawler crawl-specs
```

### 2. Query Metadata
@@ -134,7 +134,7 @@ Search and filter stored information:
tdoc-crawler query R1-2400001

# Query specifications
tdoc-crawler query-specs 23.501
spec-crawler query-specs 23.501

# List recent meetings
tdoc-crawler query-meetings --limit 10
@@ -149,13 +149,13 @@ Open documents, batch download (checkout), and check database status:
tdoc-crawler open R1-2400001

# Download and open latest version of a spec
tdoc-crawler open-spec 23.501
spec-crawler open-spec 23.501

# Batch download (checkout) TDocs to local folder
tdoc-crawler checkout R1-2400001 S2-2400567

# Batch checkout specifications
tdoc-crawler checkout-spec 26130-26140
spec-crawler checkout-spec 26130-26140

# View database statistics
tdoc-crawler stats
+47 −47
Original line number Diff line number Diff line
# AI Document Processing

The AI module provides intelligent document processing capabilities for TDoc data, including semantic search, knowledge graph construction, and AI-powered summarization.
The AI module provides intelligent document processing capabilities for 3GPP document data, including semantic search, knowledge graph construction, and AI-powered summarization.

**Key Features:**

@@ -30,18 +30,18 @@ ______________________________________________________________________
The AI module is available as an optional dependency. Install it with:

```bash
# Install tdoc-crawler with AI support
uv add tdoc-crawler[ai]
# Install 3gpp-crawler with AI support
uv add 3gpp-crawler[ai]

# Or install from source
git clone https://forge.3gpp.org/rep/reimes/tdoc-crawler.git
cd tdoc-crawler
git clone https://forge.3gpp.org/rep/reimes/3gpp-crawler.git
cd 3gpp-crawler
uv sync --extra ai
```

All required dependencies (Kreuzberg, LiteLLM, sentence-transformers, LanceDB) are installed automatically.

Internally, AI capabilities are provided by the optional `tdoc-ai` package, which is pulled in by `tdoc-crawler[ai]`.
Internally, AI capabilities are provided by the optional `tdoc-ai` package, which is pulled in by `3gpp-crawler[ai]`.

______________________________________________________________________

@@ -103,10 +103,10 @@ The AI module follows a workspace-based workflow for organizing and querying you

```bash
# Create a new workspace for your project
tdoc-crawler ai workspace create my-project
3gpp-ai workspace create my-project

# Activate it so you don't need --workspace for other commands
tdoc-crawler ai workspace activate my-project
3gpp-ai workspace activate my-project
```

Once activated, all workspace commands use the active workspace by default. No need to pass `-w` every time.
@@ -117,13 +117,13 @@ After adding TDocs to your workspace, process them to generate RAG/GraphRAG embe

```bash
# Add TDocs to the active workspace
tdoc-crawler ai workspace add-members S4-251971 S4-251972
3gpp-ai workspace add-members S4-251971 S4-251972

# Process all TDocs in workspace (only new ones)
tdoc-crawler ai workspace process -w my-project
3gpp-ai workspace process -w my-project

# Force reprocess all TDocs
tdoc-crawler ai workspace process -w my-project --force
3gpp-ai workspace process -w my-project --force
```

Note: If you created the workspace with `--auto-build`, documents are processed automatically when added.
@@ -134,10 +134,10 @@ Once you have a workspace with documents, query using semantic search and knowle

```bash
# Query the active workspace
tdoc-crawler ai query "your query here"
3gpp-ai query "your query here"

# Or specify a workspace explicitly
tdoc-crawler ai query -w my-project "your query here"
3gpp-ai query -w my-project "your query here"
```

Note: Uses active workspace if `-w` is not provided. Results combine vector embeddings (RAG) and knowledge graph (GraphRAG).
@@ -148,16 +148,16 @@ Keep your workspace clean and manage artifacts:

```bash
# Get detailed workspace information (member counts by type)
tdoc-crawler ai workspace info my-project
3gpp-ai workspace info my-project

# Remove invalid/inactive members
tdoc-crawler ai workspace clear-invalid -w my-project
3gpp-ai workspace clear-invalid -w my-project

# Clear all AI artifacts (embeddings, summaries) while preserving members
tdoc-crawler ai workspace clear -w my-project
3gpp-ai workspace clear -w my-project

# After clearing, re-process to regenerate artifacts
tdoc-crawler ai workspace process -w my-project --force
3gpp-ai workspace process -w my-project --force
```

### 5. Single TDoc Operations
@@ -165,7 +165,7 @@ tdoc-crawler ai workspace process -w my-project --force
Process a single TDoc through the pipeline (classification, extraction, embeddings, graph). Use `--accelerate` to choose the sentence-transformers backend.

```bash
tdoc-crawler ai process --tdoc-id SP-240001 --accelerate onnx
3gpp-ai process --tdoc-id SP-240001 --accelerate onnx
```

______________________________________________________________________
@@ -176,7 +176,7 @@ ______________________________________________________________________

````bash
# Create a new workspace
tdoc-crawler ai workspace create <name> [--auto-build]
3gpp-ai workspace create <name> [--auto-build]

Options:
- `name`: Workspace name
@@ -184,38 +184,38 @@ Options:

# List all workspaces
# Shows (*) next to the active workspace
tdoc-crawler ai workspace list
3gpp-ai workspace list

# Activate a workspace (sets as default for workspace commands)
tdoc-crawler ai workspace activate <name>
3gpp-ai workspace activate <name>

# Deactivate the active workspace
tdoc-crawler ai workspace deactivate
3gpp-ai workspace deactivate

# Get workspace details (name, status, member counts)
tdoc-crawler ai workspace info <name>
3gpp-ai workspace info <name>

# Remove invalid/inactive members from workspace
tdoc-crawler ai workspace clear-invalid [-w <name>]
3gpp-ai workspace clear-invalid [-w <name>]

# Clear all AI artifacts while preserving members
tdoc-crawler ai workspace clear [-w <name>]
3gpp-ai workspace clear [-w <name>]

# Delete a workspace
tdoc-crawler ai workspace delete <name>
3gpp-ai workspace delete <name>
### Querying

Query the knowledge base using semantic embeddings and knowledge graph (RAG + GraphRAG).

```bash
# Query the active workspace
tdoc-crawler ai query "your query here"
3gpp-ai query "your query here"

# Query a specific workspace
tdoc-crawler ai query -w <workspace_name> "your query here"
3gpp-ai query -w <workspace_name> "your query here"

# Specify number of results
tdoc-crawler ai query "your query here" -k 10
3gpp-ai query "your query here" -k 10
````

Note: Uses active workspace if `-w` is not provided. Combines vector embeddings (RAG) and knowledge graph (GraphRAG). The query is a **positional argument** (no `--query` flag needed).
@@ -225,7 +225,7 @@ Note: Uses active workspace if `-w` is not provided. Combines vector embeddings
Summarize a single TDoc with specified word count.

```bash
tdoc-crawler ai summarize <tdoc_id> [--words N] [--format markdown|json|yaml] [--json-output]
3gpp-ai summarize <tdoc_id> [--words N] [--format markdown|json|yaml] [--json-output]
```

Options:
@@ -240,7 +240,7 @@ Options:
Convert a single TDoc to markdown format.

```bash
tdoc-crawler ai convert <tdoc_id> [--output FILE.md] [--json-output]
3gpp-ai convert <tdoc_id> [--output FILE.md] [--json-output]
```

Options:
@@ -255,31 +255,31 @@ Add TDocs to workspaces and process them to generate embeddings and knowledge gr

```bash
# Add members to the active workspace
tdoc-crawler ai workspace add-members S4-251971 S4-251972
3gpp-ai workspace add-members S4-251971 S4-251972

# Add members to a specific workspace
tdoc-crawler ai workspace add-members -w my-project S4-251971 S4-251972
3gpp-ai workspace add-members -w my-project S4-251971 S4-251972

# List members in the active workspace
tdoc-crawler ai workspace list-members
3gpp-ai workspace list-members

# List members including inactive ones
tdoc-crawler ai workspace list-members --include-inactive
3gpp-ai workspace list-members --include-inactive

# Process all TDocs in the active workspace
tdoc-crawler ai workspace process
3gpp-ai workspace process

# Process with options
tdoc-crawler ai workspace process -w my-project --force
3gpp-ai workspace process -w my-project --force

# Get workspace information with member counts
tdoc-crawler ai workspace info my-project
3gpp-ai workspace info my-project

# Remove invalid members (failed checkouts, etc.)
tdoc-crawler ai workspace clear-invalid -w my-project
3gpp-ai workspace clear-invalid -w my-project

# Clear AI artifacts (keep members, remove embeddings/summaries)
tdoc-crawler ai workspace clear -w my-project
3gpp-ai workspace clear -w my-project
```

______________________________________________________________________
@@ -438,7 +438,7 @@ ______________________________________________________________________
**Solution:** Install the AI optional dependencies:

```bash
uv add tdoc-crawler[ai]
uv add 3gpp-crawler[ai]
```

**Problem:** `lancedb not available`
@@ -516,14 +516,14 @@ uv add sentence-transformers
**Solution:** Create the workspace first:

```bash
tdoc-crawler ai workspace create my-project
3gpp-ai workspace create my-project
```

**Solution:** Use `ai summarize` or `ai convert` to work with individual TDocs directly. These commands fetch content from configured sources:

```bash
tdoc-crawler ai summarize SP-240001
tdoc-crawler ai convert SP-240001 --output SP-240001.md
3gpp-ai summarize SP-240001
3gpp-ai convert SP-240001 --output SP-240001.md
```

### Query Errors
@@ -533,7 +533,7 @@ tdoc-crawler ai convert SP-240001 --output SP-240001.md
**Solution:** Ensure the TDoc exists in your workspace or use `ai summarize`/`ai convert` which fetch from external sources:

```bash
tdoc-crawler ai summarize SP-240001 --format markdown
3gpp-ai summarize SP-240001 --format markdown
```

**Problem:** `LLM API timeout`
@@ -572,10 +572,10 @@ export TDC_AI_LLM_MAX_TOKENS=1000

```bash
# Backup first!
cp -r ~/.tdoc-crawler/.ai/lancedb ~/.tdoc-crawler/.ai/lancedb.backup
cp -r ~/.3gpp-crawler/.ai/lancedb ~/.3gpp-crawler/.ai/lancedb.backup

# Delete and let it recreate
rm -rf ~/.tdoc-crawler/.ai/lancedb
rm -rf ~/.3gpp-crawler/.ai/lancedb
```

**Note:** This will delete all processed embeddings and summaries. You'll need to re-process your documents.
+2 −2
Original line number Diff line number Diff line
@@ -98,8 +98,8 @@ Crawl technical specification (TS/TR) metadata.

```bash
# Crawl all specs from all sources
tdoc-crawler crawl-specs
spec-crawler crawl-specs

# Crawl only RAN specs from whatthespec
tdoc-crawler crawl-specs -w RAN -s whatthespec
spec-crawler crawl-specs -w RAN -s whatthespec
```
+3 −3
Original line number Diff line number Diff line
# Development Guide

This guide describes how to set up your environment for contributing to `tdoc-crawler`.
This guide describes how to set up your environment for contributing to `3gpp-crawler`.

## Setup

@@ -9,8 +9,8 @@ This guide describes how to set up your environment for contributing to `tdoc-cr
1. Clone the repository:

   ```bash
   git clone https://forge.3gpp.org/rep/reimes/tdoc-crawler.git
   cd tdoc-crawler
git clone https://forge.3gpp.org/rep/reimes/3gpp-crawler.git
cd 3gpp-crawler
   ```

1. Sync dependencies:
Loading