Commit 6bcbd702 authored by Jan Reimes's avatar Jan Reimes
Browse files

docs(README, QUICK_REFERENCE): update documentation for spec crawling and querying

- Add spec catalog feature to README
- Include commands for crawling and querying specifications in QUICK_REFERENCE
- Update command categories and aliases for clarity
parent c0139d05
Loading
Loading
Loading
Loading
+19 −3
Original line number Diff line number Diff line
@@ -14,11 +14,12 @@ A command-line tool for crawling the 3GPP FTP server, caching TDoc metadata in a
## Features

- **Crawl 3GPP FTP Server**: Automatically retrieve TDoc links from RAN, SA, and CT working groups
- **Local SQLite Database**: Store TDoc metadata for fast querying
- **Spec Catalog**: Crawl and query normalized 3GPP Technical Specifications (TS/TR) from multiple sources
- **Local SQLite Database**: Store TDoc and Spec metadata for fast querying
- **Persistent HTTP Caching**: 50-90% faster incremental crawls with automatic request caching
- **Case-Insensitive Queries**: Search for TDocs regardless of case
- **Multiple Output Formats**: Export results as table, JSON, or YAML
- **Incremental Updates**: Only fetch new TDocs on subsequent crawls
- **Incremental Updates**: Only fetch new data on subsequent crawls
- **Rich CLI**: Beautiful terminal output with progress indicators

## Installation
@@ -158,7 +159,22 @@ tdoc-crawler query R1-2301234 --format json --output results.json
tdoc-crawler query --working-group SA --format yaml
```

### 4. View Database Statistics
### 4. Crawl and Query Specifications

Populate and search the spec catalog:

```bash
# Crawl spec metadata from all sources
tdoc-crawler crawl-specs

# Query specific specifications
tdoc-crawler query-specs 23.501 38.331

# Open latest document for a spec
tdoc-crawler open-spec 23.501
```

### 5. View Database Statistics

```bash
tdoc-crawler stats
+136 −3
Original line number Diff line number Diff line
@@ -10,9 +10,9 @@ TDoc-Crawler provides two categories of commands: **Crawling Commands** for gath

| Category | Commands | Purpose |
|----------|----------|---------|
| **Crawling** | `crawl-tdocs`, `crawl-meetings` | Fetch metadata from 3GPP servers and portal |
| **Querying** | `query-tdocs`, `query-meetings` | Search and display stored metadata |
| **Utilities** | `open`, `checkout`, `stats` | File access and database inspection |
| **Crawling** | `crawl-tdocs`, `crawl-meetings`, `crawl-specs` | Fetch metadata from 3GPP servers and portal |
| **Querying** | `query-tdocs`, `query-meetings`, `query-specs` | Search and display stored metadata |
| **Utilities** | `open`, `checkout`, `open-spec`, `checkout-spec`, `stats` | File access and database inspection |

### Short Aliases

@@ -20,8 +20,12 @@ TDoc-Crawler provides two categories of commands: **Crawling Commands** for gath
|---------|-------|
| `crawl-tdocs` | `ct` |
| `crawl-meetings` | `cm` |
| `crawl-specs` | `cs` |
| `query-tdocs` | `qt` |
| `query-meetings` | `qm` |
| `query-specs` | `qs` |
| `checkout-spec` | `cos` |
| `open-spec` | `os` |

Use aliases for faster typing: `tdoc-crawler ct` instead of `tdoc-crawler crawl-tdocs`.

@@ -66,6 +70,9 @@ tdoc-crawler query-tdocs R1-2400001 S2-2400567
# Query all TDocs from a working group
tdoc-crawler query-tdocs --working-group RAN

# Query technical specifications
tdoc-crawler query-specs 23.501 38.331

# Query meetings
tdoc-crawler query-meetings
```
@@ -217,6 +224,43 @@ tdoc-crawler crawl-meetings --eol-username myuser --eol-password mypass
tdoc-crawler crawl-meetings -w RAN -s R1 -s R2
```

### `crawl-specs` (alias: `cs`)

```bash
tdoc-crawler crawl-specs [OPTIONS]
```

Crawl normalized technical specification (TS/TR) metadata from both 3GPP and community sources.

**When to Use:**

- Populating the specs catalog for searching/viewing specs
- Synchronizing latest spec versions and titles
- Checking for metadata discrepancies between sources

**Options:**

| Option | Description |
|--------|-------------|
| `-c, --cache-dir PATH` | Database location (default: `~/.tdoc-crawler`) |
| `-w, --working-group WG` | Working groups to crawl (repeatable) |
| `-s, --source SOURCE` | Metadata sources to use (`3gpp`, `whatthespec`). Default: both |
| `--full` | Force update of existing records |
| `-v, --verbose` | Enable verbose logging |

**Examples:**

```bash
# Crawl all specs from all sources
tdoc-crawler crawl-specs

# Crawl only RAN specs from whatthespec
tdoc-crawler crawl-specs -w RAN -s whatthespec

# Crawl with verbose logging
tdoc-crawler crawl-specs -v
```

## Query Commands

### `query-tdocs` (alias: `qt`)
@@ -295,6 +339,55 @@ tdoc-crawler query-tdocs -w SA --start-date 2024-01-01 --end-date 2024-06-30
tdoc-crawler query-tdocs R1-2400001 --no-fetch
```

### `query-specs` (alias: `qs`)

```bash
tdoc-crawler query-specs [SPEC_NUMBERS] [OPTIONS]
```

Query technical specification metadata from the local catalog. Supports filtering by number, working group, and status.

**When to Use:**

- Search for specs by number (e.g., `23.501`)
- Verify spec titles or responsible working groups
- Inspect metadata discrepancies across sources

**Arguments:**

| Argument | Description |
|----------|-------------|
| `SPEC_NUMBERS` | Optional spec numbers to query (repeatable). Example: `23.501` |

**Options:**

| Option | Description |
|--------|-------------|
| `-c, --cache-dir PATH` | Database cache location (default: `~/.tdoc-crawler`) |
| `-w, --group WG` | Filter by working group (repeatable) |
| `-s, --status STATUS` | Filter by status (e.g., `Under change control`) |
| `-o, --output FORMAT` | Output format: `table`, `json`, `yaml` (default: `table`) |
| `-v, --verbose` | Show per-source discrepancy details |

**Examples:**

```bash
# Query specific spec
tdoc-crawler query-specs 23.501

# Query multiple specs
tdoc-crawler query-specs 23.501 38.331

# Filter by working group
tdoc-crawler query-specs --group RAN

# Filter by status
tdoc-crawler query-specs --status "Under change control"

# Export as JSON
tdoc-crawler query-specs 23.501 -o json
```

### `query-meetings` (alias: `qm`)

```bash
@@ -424,6 +517,46 @@ tdoc-crawler checkout R1-2400001 --force
tdoc-crawler checkout R1-2400001 --cache-dir /path/to/cache
```

### `open-spec` (alias: `os`)

```bash
tdoc-crawler open-spec <SPEC_NUMBER> [OPTIONS]
```

Download and open the latest document for a specification.

**Options:**

| Option | Description |
|--------|-------------|
| `-r, --release RELEASE` | Specify 3GPP release (e.g., `18`) |
| `--doc-only` | Download only Word/PDF (skip zip) |

**Examples:**

```bash
# Open latest version of 23.501
tdoc-crawler open-spec 23.501

# Open release 17 version
tdoc-crawler open-spec 23.501 -r 17
```

### `checkout-spec` (alias: `cos`)

```bash
tdoc-crawler checkout-spec <SPEC_NUMBERS...> [OPTIONS]
```

Batch download specification documents to the checkout folder.

**Examples:**

```bash
# Checkout multiple specs
tdoc-crawler checkout-spec 23.501 38.331
```

### `stats`

```bash