Commit 3f1064c8 authored by Jan Reimes's avatar Jan Reimes
Browse files

fix(tests): resolve ANN001/ANN201/ANN202 and F821 linter issues in tests

parent 3f32e82b
Loading
Loading
Loading
Loading
+38 −0
Original line number Diff line number Diff line
@@ -550,11 +550,49 @@ tdoc-crawler checkout-spec <SPEC_NUMBERS...> [OPTIONS]

Batch download specification documents to the checkout folder. Automatically crawls missing spec metadata before downloading.

**Features:**

- **Progress bar** automatically displayed for multi-spec operations (especially useful with range syntax like `26130-26.140`)
- **Clean error handling** - failed downloads show single-line warnings instead of full tracebacks
- **Safe extraction** - each spec version extracted to its own subfolder to prevent file conflicts

**Options:**

| Option | Description |
|--------|-------------|
| `-r, --release RELEASE` | Specify 3GPP release (e.g., `18`) |
| `--doc-only` | Download only Word/PDF (skip zip) |
| `--checkout-dir PATH` | Custom checkout directory (default: `<cache-dir>/checkout`) |

**Directory Structure:**

Downloaded specs are organized hierarchically:

```
checkout/
└── Specs/
    └── archive/
        └── 26_series/
            └── 26.131/
                ├── 26131-j00.zip           # Original zip file
                └── 26131-j00/              # Extracted contents
                    └── 26131-j00.docx
```

**Examples:**

```bash
# Checkout multiple specs
tdoc-crawler checkout-spec 23.501 38.331

# Checkout spec range (with progress bar)
tdoc-crawler checkout-spec 26130-26.140

# Checkout specific release
tdoc-crawler checkout-spec 23.501 -r 17

# Checkout to custom directory
tdoc-crawler checkout-spec 23.501 --checkout-dir /path/to/docs
```

### `stats`
+16 −5
Original line number Diff line number Diff line
@@ -18,7 +18,9 @@ The following beads issues have been created with the recommended dependencies t
## Dependency Definitions

### Without Dependencies (Can start in parallel)

These issues can be worked on immediately as they don't depend on other tasks:

- `tdc-lst` - Remove duplicate fetch_tdoc()
- `tdc-oot` - Move normalize_portal_meeting_name()
- `tdc-5uc` - Move resolve_meeting_id()
@@ -26,7 +28,9 @@ These issues can be worked on immediately as they don't depend on other tasks:
- `tdc-72h` - Move database_path()

### With Dependencies

These issues must wait for their dependencies to complete:

- `tdc-n80` - Depends on `tdc-6ts` (download_to_path)
- `tdc-lmu` - Depends on all refactoring tasks (#1-#6)
- `tdc-z37` - Depends on all refactoring tasks (#1-#6)
@@ -34,6 +38,7 @@ These issues must wait for their dependencies to complete:
## Manual Setup Instructions

The `bd` issue tracker's dependency flag syntax is:

```bash
bd create <title> --type <type> --priority <priority> --deps <dependencies>

@@ -44,16 +49,19 @@ bd create "Move prepare_tdoc_file()" --type task --priority 3 --deps tdc-6ts
## Beads Command Reference

Create issue:

```bash
bd create <title> [flags]
```

Close issue:

```bash
bd close <id>
```

Add dependencies to existing issue:

```bash
bd deps add <id> <type>:<dependency-id>
```
@@ -61,17 +69,20 @@ bd deps add <id> <type>:<dependency-id>
## Recommended Work Queue

### Phase 1: Foundation (No dependencies)

1. `tdc-lst` - Fix fetch_tdoc duplication (15 min)
2. `tdc-oot` - Move normalize_portal_meeting_name (15 min)
3. `tdc-5uc` - Move resolve_meeting_id (30 min)
4. `tdc-6ts` - Move download_to_path (15 min)
5. `tdc-72h` - Move database_path (20 min)
1. `tdc-oot` - Move normalize_portal_meeting_name (15 min)
1. `tdc-5uc` - Move resolve_meeting_id (30 min)
1. `tdc-6ts` - Move download_to_path (15 min)
1. `tdc-72h` - Move database_path (20 min)

### Phase 2: Integration (Depends on Phase 1)

6. `tdc-n80` - Move prepare_tdoc_file (30 min) - depends on `tdc-6ts`

### Phase 3: Documentation & Verification (Depends on all)

7. `tdc-lmu` - Update AGENTS.md (10 min)
8. `tdc-z37` - Run full test suite (10 min)
1. `tdc-z37` - Run full test suite (10 min)

## Estimated Total Time: ~2.5 hours
+59 −36
Original line number Diff line number Diff line
# CLI Refactoring Implementation Plan

## Overview

Refactor `src/tdoc_crawler/cli/` to contain only CLI-specific functionality, moving library functions to the core package. This enables `tdoc_crawler` to be used as a standalone library.

## Phase 1: Fix Fetching.py Duplication (CRITICAL)

### Issue #1: Remove Duplicate fetch_tdoc() from cli/fetching.py

**Priority:** High
**Complexity:** Low

**Steps:**

1. Remove `fetch_tdoc()` function from `src/tdoc_crawler/cli/fetching.py`
2. Import `fetch_tdoc` from `tdoc_crawler.fetching` at the top of the file
3. Update imports in `src/tdoc_crawler/cli/app.py` if needed
4. Run tests to verify functionality
1. Import `fetch_tdoc` from `tdoc_crawler.fetching` at the top of the file
1. Update imports in `src/tdoc_crawler/cli/app.py` if needed
1. Run tests to verify functionality

**Files Changed:**

- `src/tdoc_crawler/cli/fetching.py`

---
______________________________________________________________________

## Phase 2: Move Library Functions from cli/helpers.py

### Issue #2: Move normalize_portal_meeting_name() to specs/normalization.py

**Priority:** Medium
**Complexity:** Low

**Steps:**

1. Add `normalize_portal_meeting_name()` function to `src/tdoc_crawler/specs/normalization.py`
2. Update import in `src/tdoc_crawler/cli/helpers.py` to import from core
3. Update any other files that import from `cli.helpers`
4. Run tests to verify
1. Update import in `src/tdoc_crawler/cli/helpers.py` to import from core
1. Update any other files that import from `cli.helpers`
1. Run tests to verify

**Files Changed:**

- `src/tdoc_crawler/specs/normalization.py`
- `src/tdoc_crawler/cli/helpers.py`

---
______________________________________________________________________

### Issue #3: Move resolve_meeting_id() to database module

**Priority:** Medium
**Complexity:** Medium

**Steps:**

1. Add `resolve_meeting_id()` function to `src/tdoc_crawler/database/__init__.py` or a new helper module
2. Update `src/tdoc_crawler/cli/fetching.py` to import from database module
3. Remove function from `src/tdoc_crawler/cli/helpers.py`
4. Run tests to verify
1. Update `src/tdoc_crawler/cli/fetching.py` to import from database module
1. Remove function from `src/tdoc_crawler/cli/helpers.py`
1. Run tests to verify

**Files Changed:**

- `src/tdoc_crawler/database/__init__.py` (or new file)
- `src/tdoc_crawler/cli/helpers.py`
- `src/tdoc_crawler/cli/fetching.py`

---
______________________________________________________________________

### Issue #4: Move download_to_path() to http_client module

**Priority:** Medium
**Complexity:** Low

**Steps:**

1. Add `download_to_path()` function to `src/tdoc_crawler/http_client.py`
2. Update `src/tdoc_crawler/cli/helpers.py` to import from core
3. Update `src/tdoc_crawler/checkout.py` to import from core (it already imports from cli.helpers for this function)
4. Run tests to verify
1. Update `src/tdoc_crawler/cli/helpers.py` to import from core
1. Update `src/tdoc_crawler/checkout.py` to import from core (it already imports from cli.helpers for this function)
1. Run tests to verify

**Files Changed:**

- `src/tdoc_crawler/http_client.py`
- `src/tdoc_crawler/cli/helpers.py`
- `src/tdoc_crawler/checkout.py`

---
______________________________________________________________________

### Issue #5: Move prepare_tdoc_file() to checkout module

**Priority:** Medium
**Complexity:** Medium

**Steps:**

1. Add `prepare_tdoc_file()` function to `src/tdoc_crawler/checkout.py`
2. Update `src/tdoc_crawler/cli/helpers.py` to import from checkout module
3. Update `src/tdoc_crawler/cli/app.py` if needed
4. Run tests to verify
1. Update `src/tdoc_crawler/cli/helpers.py` to import from checkout module
1. Update `src/tdoc_crawler/cli/app.py` if needed
1. Run tests to verify

**Files Changed:**

- `src/tdoc_crawler/checkout.py`
- `src/tdoc_crawler/cli/helpers.py`
- `src/tdoc_crawler/cli/app.py`

---
______________________________________________________________________

### Issue #6: Move database_path() to database module

**Priority:** Low
**Complexity:** Low

**Steps:**

1. Add `database_path()` function to `src/tdoc_crawler/database/connection.py` or `__init__.py`
2. Update all files importing from `cli.helpers` to import from database module
3. Remove function from `src/tdoc_crawler/cli/helpers.py`
4. Run tests to verify
1. Update all files importing from `cli.helpers` to import from database module
1. Remove function from `src/tdoc_crawler/cli/helpers.py`
1. Run tests to verify

**Files Changed:**

- `src/tdoc_crawler/database/connection.py`
- `src/tdoc_crawler/cli/helpers.py`
- All files that import `database_path` from `cli.helpers`

---
______________________________________________________________________

## Phase 3: Final Cleanup

### Issue #7: Update AGENTS.md with Final Classification

**Priority:** Low
**Complexity:** Low

**Steps:**

1. Update `src/tdoc_crawler/cli/AGENTS.md` to reflect completed refactoring
2. Document any remaining functions in `cli/helpers.py` and their classification
1. Document any remaining functions in `cli/helpers.py` and their classification

---
______________________________________________________________________

### Issue #8: Run Full Test Suite

**Priority:** Critical
**Complexity:** Low

**Steps:**

1. Run full test suite: `uv run pytest -v`
2. Verify all tests pass
3. Fix any regressions introduced by refactoring
1. Verify all tests pass
1. Fix any regressions introduced by refactoring

---
______________________________________________________________________

## Dependency Order

1. **Issue #1** (fetch_tdoc duplication) - Can be done independently
2. **Issue #4** (download_to_path) - Checkout.py depends on it
3. **Issue #5** (prepare_tdoc_file) - Depends on download_to_path
4. **Issue #2** (normalize_portal_meeting_name) - Independent
5. **Issue #3** (resolve_meeting_id) - Independent
6. **Issue #6** (database_path) - Depends on understanding all imports
7. **Issue #7** (Update AGENTS.md) - After all refactoring
8. **Issue #8** (Full test suite) - After all refactoring
1. **Issue #4** (download_to_path) - Checkout.py depends on it
1. **Issue #5** (prepare_tdoc_file) - Depends on download_to_path
1. **Issue #2** (normalize_portal_meeting_name) - Independent
1. **Issue #3** (resolve_meeting_id) - Independent
1. **Issue #6** (database_path) - Depends on understanding all imports
1. **Issue #7** (Update AGENTS.md) - After all refactoring
1. **Issue #8** (Full test suite) - After all refactoring

## Estimated Effort

+1 −1
Original line number Diff line number Diff line
@@ -17,8 +17,8 @@ from urllib.parse import urlparse

import requests

from tdoc_crawler.models import TDocMetadata
from tdoc_crawler.http_client import download_to_path
from tdoc_crawler.models import TDocMetadata

logger = logging.getLogger(__name__)

+2 −4
Original line number Diff line number Diff line
@@ -15,10 +15,10 @@ from dotenv import load_dotenv
from rich.progress import BarColumn, MofNCompleteColumn, Progress, SpinnerColumn, TextColumn
from rich.table import Table

from tdoc_crawler.checkout import checkout_tdoc
from tdoc_crawler.checkout import checkout_tdoc, prepare_tdoc_file
from tdoc_crawler.crawlers import MeetingCrawler, TDocCrawler
from tdoc_crawler.credentials import set_credentials
from tdoc_crawler.database import TDocDatabase
from tdoc_crawler.database import TDocDatabase, database_path
from tdoc_crawler.models import MeetingCrawlConfig, MeetingQueryConfig, OutputFormat, QueryConfig, SortOrder, TDocCrawlConfig
from tdoc_crawler.specs import SpecCatalog
from tdoc_crawler.specs.downloads import SpecDownloads
@@ -78,8 +78,6 @@ from .printing import (
    spec_query_to_dict,
    tdoc_to_dict,
)
from tdoc_crawler.checkout import prepare_tdoc_file
from tdoc_crawler.database import database_path

load_dotenv()

Loading