Commit 91b6024a authored by Jan Reimes's avatar Jan Reimes
Browse files

docs(ai, convert-lo, index, query): update CLI commands to use 3gpp-ai

* Replace `tdoc-crawler ai` with `3gpp-ai` in AI documentation.
* Update conversion examples in convert-lo usage guide.
* Modify index to reflect new CLI entrypoint for AI commands.
* Clarify AI RAG query command usage in query documentation.
parent 4549c368
Loading
Loading
Loading
Loading
+42 −42
Original line number Diff line number Diff line
@@ -100,16 +100,16 @@ ______________________________________________________________________

The AI module follows a workspace-based workflow for organizing and querying your document collection:

All examples below use the current CLI entrypoint: `tdoc-crawler ai ...`.
All examples below use the `3gpp-ai` CLI entrypoint. AI commands are provided by the standalone `3gpp-ai` package (installed via `3gpp-crawler[ai]`).

### 1. Create and Activate Workspace

```bash
# Create a new workspace for your project
tdoc-crawler ai workspace create my-project
3gpp-ai workspace create my-project

# Activate it so you don't need --workspace for other commands
tdoc-crawler ai workspace activate my-project
3gpp-ai workspace activate my-project
```

Once activated, all workspace commands use the active workspace by default. No need to pass `-w` every time.
@@ -120,13 +120,13 @@ After adding TDocs to your workspace, process them to generate RAG/GraphRAG embe

```bash
# Add TDocs to the active workspace
tdoc-crawler ai workspace add-members --kind tdoc S4-251971 S4-251972
3gpp-ai workspace add-members --kind tdoc S4-251971 S4-251972

# Process all TDocs in workspace (only new ones)
tdoc-crawler ai workspace process -w my-project
3gpp-ai workspace process -w my-project

# Force reprocess all TDocs
tdoc-crawler ai workspace process -w my-project --force
3gpp-ai workspace process -w my-project --force
```

Note: If you created the workspace with `--auto-build`, documents are processed automatically when added.
@@ -137,14 +137,14 @@ Once you have a workspace with documents, query using the single RAG command tha

```bash
# Query a workspace
tdoc-crawler ai workspace query --workspace my-project "What are the bit rates in Table 3?"
3gpp-ai workspace query --workspace my-project "What are the bit rates in Table 3?"

# Same command for figure/equation questions
tdoc-crawler ai workspace query --workspace my-project "Describe the architecture figure"
tdoc-crawler ai workspace query --workspace my-project "What is the throughput equation?"
3gpp-ai workspace query --workspace my-project "Describe the architecture figure"
3gpp-ai workspace query --workspace my-project "What is the throughput equation?"
```

Note: `ai workspace query` is the only query entrypoint. Do not use separate table/figure/equation query commands.
Note: `workspace query` is the only query entrypoint. Do not use separate table/figure/equation query commands.

### 4. Workspace Maintenance

@@ -152,16 +152,16 @@ Keep your workspace clean and manage artifacts:

```bash
# Get detailed workspace information (member counts by type)
tdoc-crawler ai workspace info my-project
3gpp-ai workspace info my-project

# Remove invalid/inactive members
tdoc-crawler ai workspace clear-invalid -w my-project
3gpp-ai workspace clear-invalid -w my-project

# Clear all AI artifacts (embeddings, summaries) while preserving members
tdoc-crawler ai workspace clear -w my-project
3gpp-ai workspace clear -w my-project

# After clearing, re-process to regenerate artifacts
tdoc-crawler ai workspace process -w my-project --force
3gpp-ai workspace process -w my-project --force
```

### 5. Single TDoc Operations
@@ -169,8 +169,8 @@ tdoc-crawler ai workspace process -w my-project --force
Process a single TDoc through the pipeline (classification, extraction, embeddings, graph). Use `--accelerate` to choose the sentence-transformers backend.

```bash
tdoc-crawler ai convert SP-240001 --output ./SP-240001.md
tdoc-crawler ai summarize SP-240001 --words 200
3gpp-ai convert SP-240001 --output ./SP-240001.md
3gpp-ai summarize SP-240001 --words 200
```

When structured extraction is enabled, conversion and workspace processing may generate sidecars next to markdown artifacts:
@@ -200,10 +200,10 @@ Use the `--vlm` flag with the workspace process command:

```bash
# Process with VLM features enabled
tdoc-crawler ai workspace process -w my-project --vlm
3gpp-ai workspace process -w my-project --vlm

# Force reprocess with VLM
tdoc-crawler ai workspace process -w my-project --vlm --force
3gpp-ai workspace process -w my-project --vlm --force
```

When `--vlm` is specified, both `enable_picture_description` and `enable_formula_enrichment` are activated.
@@ -226,7 +226,7 @@ ______________________________________________________________________

````bash
# Create a new workspace
tdoc-crawler ai workspace create <name> [--auto-build]
3gpp-ai workspace create <name> [--auto-build]

Options:
- `name`: Workspace name
@@ -234,42 +234,42 @@ Options:

# List all workspaces
# Shows (*) next to the active workspace
tdoc-crawler ai workspace list
3gpp-ai workspace list

# Activate a workspace (sets as default for workspace commands)
tdoc-crawler ai workspace activate <name>
3gpp-ai workspace activate <name>

# Deactivate the active workspace
tdoc-crawler ai workspace deactivate
3gpp-ai workspace deactivate

# Get workspace details (name, status, member counts)
tdoc-crawler ai workspace info <name>
3gpp-ai workspace info <name>

# Remove invalid/inactive members from workspace
tdoc-crawler ai workspace clear-invalid [-w <name>]
3gpp-ai workspace clear-invalid [-w <name>]

# Clear all AI artifacts while preserving members
tdoc-crawler ai workspace clear [-w <name>]
3gpp-ai workspace clear [-w <name>]

# Delete a workspace
tdoc-crawler ai workspace delete <name>
3gpp-ai workspace delete <name>
### Querying

Query the knowledge base using semantic embeddings and knowledge graph (RAG + GraphRAG).

```bash
# Query a specific workspace (single query command)
tdoc-crawler ai workspace query --workspace <workspace_name> "your query here"
3gpp-ai workspace query --workspace <workspace_name> "your query here"
````

Note: Keep `ai workspace query` as the single query interface. The query is a positional argument (no `--query` flag).
Note: Keep `workspace query` as the single query interface. The query is a positional argument (no `--query` flag).

#### Summarize a TDoc

Summarize a single TDoc with specified word count.

```bash
tdoc-crawler ai summarize <tdoc_id> [--words N] [--format markdown|json|yaml] [--json-output]
3gpp-ai summarize <tdoc_id> [--words N] [--format markdown|json|yaml] [--json-output]
```

Options:
@@ -284,7 +284,7 @@ Options:
Convert a single TDoc to markdown format.

```bash
tdoc-crawler ai convert <tdoc_id> [--output FILE.md] [--json-output]
3gpp-ai convert <tdoc_id> [--output FILE.md] [--json-output]
```

Options:
@@ -299,34 +299,34 @@ Add TDocs to workspaces and process them to generate embeddings and knowledge gr

```bash
# Add members to the active workspace
tdoc-crawler ai workspace add-members --kind tdoc S4-251971 S4-251972
3gpp-ai workspace add-members --kind tdoc S4-251971 S4-251972

# Add members to a specific workspace
tdoc-crawler ai workspace add-members -w my-project --kind tdoc S4-251971 S4-251972
3gpp-ai workspace add-members -w my-project --kind tdoc S4-251971 S4-251972

# List members in the active workspace
tdoc-crawler ai workspace list-members
3gpp-ai workspace list-members

# List members including inactive ones
tdoc-crawler ai workspace list-members --include-inactive
3gpp-ai workspace list-members --include-inactive

# Process all TDocs in the active workspace
tdoc-crawler ai workspace process
3gpp-ai workspace process

# Process with options
tdoc-crawler ai workspace process -w my-project --force
3gpp-ai workspace process -w my-project --force

# Process with VLM features (requires GPU)
tdoc-crawler ai workspace process -w my-project --vlm
3gpp-ai workspace process -w my-project --vlm

# Get workspace information with member counts
tdoc-crawler ai workspace info my-project
3gpp-ai workspace info my-project

# Remove invalid members (failed checkouts, etc.)
tdoc-crawler ai workspace clear-invalid -w my-project
3gpp-ai workspace clear-invalid -w my-project

# Clear AI artifacts (keep members, remove embeddings/summaries)
tdoc-crawler ai workspace clear -w my-project
3gpp-ai workspace clear -w my-project
```

______________________________________________________________________
@@ -512,7 +512,7 @@ uv add sentence-transformers
3gpp-ai workspace create my-project
```

**Solution:** Use `ai summarize` or `ai convert` to work with individual TDocs directly. These commands fetch content from configured sources:
**Solution:** Use `summarize` or `convert` to work with individual TDocs directly. These commands fetch content from configured sources:

```bash
3gpp-ai summarize SP-240001
@@ -523,7 +523,7 @@ uv add sentence-transformers

**Problem:** `TDoc 'SP-240001' not found`

**Solution:** Ensure the TDoc exists in your workspace or use `ai summarize`/`ai convert` which fetch from external sources:
**Solution:** Ensure the TDoc exists in your workspace or use `summarize`/`convert` which fetch from external sources:

```bash
3gpp-ai summarize SP-240001 --format markdown
+2 −2
Original line number Diff line number Diff line
@@ -145,8 +145,8 @@ converter.convert(
`convert-lo` handles format conversion only. Structured AI extraction artifacts are produced by the AI pipeline commands:

```bash
tdoc-crawler ai convert <tdoc_id> --output <file>.md
tdoc-crawler ai workspace process --workspace <workspace_name>
3gpp-ai convert <tdoc_id> --output <file>.md
3gpp-ai workspace process --workspace <workspace_name>
```

When structured extraction is enabled, these AI commands may emit sidecars next to markdown output:
+2 −2
Original line number Diff line number Diff line
@@ -21,10 +21,10 @@ PQ|- [**Query Documentation**](query.md) – How to search and display stored me

- [**Crawl-Meetings**](crawl.md#crawl-meetings) (`cm`)
- [**Crawl-TDocs**](crawl.md#crawl-tdocs) (`ct`)
- [**Query-TDocs**](query.md#query-tdocs) (`qt`)
- [**Query-TDocs**](query.md#query-tdocs-alias-qt) (`qt`)
- [**Open TDoc**](utils.md#open)
  #KK|- [**Checkout Specs**](utils.md#checkout-spec)
  #TQ|- **AI Commands**
  #TQ|- **AI Commands** (via `3gpp-ai` CLI)
  #KM|- [**AI Workspace**](ai.md#workspace-management) - Create and manage workspaces
  #RD|- [**AI Query**](ai.md#querying) - Semantic search over TDocs
  #YQ|- [**AI Summarize/Convert**](ai.md#single-tdoc-operations) - Single TDoc operations
+14 −14
Original line number Diff line number Diff line
@@ -4,28 +4,28 @@ Query commands allow you to search and display metadata stored in your local dat

## AI RAG Query

Use a single command for AI-assisted retrieval across text, tables, figures, and equations:
Use a single command for AI-assisted retrieval across text, tables, figures, and equations (requires `3gpp-ai` package):

```bash
tdoc-crawler ai workspace query --workspace <workspace_name> "your query here"
3gpp-ai workspace query --workspace <workspace_name> "your query here"
```

Examples:

```bash
tdoc-crawler ai workspace query --workspace test-rag-elements "What are the bit rates in Table 3?"
tdoc-crawler ai workspace query --workspace test-rag-elements "Describe the architecture figure"
tdoc-crawler ai workspace query --workspace test-rag-elements "What is the throughput equation?"
3gpp-ai workspace query --workspace test-rag-elements "What are the bit rates in Table 3?"
3gpp-ai workspace query --workspace test-rag-elements "Describe the architecture figure"
3gpp-ai workspace query --workspace test-rag-elements "What is the throughput equation?"
```

Notes:

- Keep `ai workspace query` as the single query entrypoint (no separate table/figure/equation query commands).
- Keep `workspace query` as the single query entrypoint (no separate table/figure/equation query commands).
- Retrieval uses enriched chunk content and element-aware metadata when available.

## Commands

### `query-tdocs` (alias: `qt`)
### `query` (TDocs) (alias: `qt`)

Search for TDocs by ID, working group, or date range.

@@ -46,20 +46,20 @@ If you query a TDoc ID that is not in your database, `tdoc-crawler` automaticall

```bash
# Query specific TDoc (auto-fetches if missing)
tdoc-crawler query-tdocs R1-2400001
tdoc-crawler query R1-2400001

# Filter by working group and limit
tdoc-crawler query-tdocs --working-group RAN --limit 20
tdoc-crawler query --working-group RAN --limit 20

# Export results as JSON
tdoc-crawler query-tdocs R1-2400001 --output json
tdoc-crawler query R1-2400001 --output json
```

______________________________________________________________________

### `query-meetings` (alias: `qm`)

Browse meeting schedules and metadata. Unlike `query-tdocs`, this command only searches already-crawled meetings.
Browse meeting schedules and metadata. Unlike `query` (TDocs), this command only searches already-crawled meetings.

**Options:**

@@ -85,7 +85,7 @@ tdoc-crawler query-meetings --output yaml

______________________________________________________________________

### `query-specs` (alias: `qs`)
### `query` (Specs)

Query technical specification metadata from the local catalog.

@@ -101,8 +101,8 @@ Query technical specification metadata from the local catalog.

```bash
# Query specific spec
spec-crawler query-specs 23.501
spec-crawler query 23.501

# Filter by status
spec-crawler query-specs --status "Under change control"
spec-crawler query --status "Under change control"
```
+7 −4
Original line number Diff line number Diff line
@@ -137,12 +137,15 @@ Optionally use pg0 for PostgreSQL-backed storage.

## CLI Integration

Exposed via `tdoc-crawler` commands:
The `3gpp-ai` package provides its own standalone CLI entrypoint:

```bash
tdoc-crawler ai workspace process
tdoc-crawler ai workspace query "your query"
tdoc-crawler ai workspace status
3gpp-ai workspace process
3gpp-ai workspace query "your query"
3gpp-ai workspace status
3gpp-ai summarize <tdoc_id>
3gpp-ai convert <tdoc_id>
3gpp-ai providers list
```

## Import Guidelines