fix(tests): resolve ANN001/ANN201/ANN202 and F821 linter issues in tests (3f1064c8) · Commits · Jan Reimes / 3gpp-crawler

docs/QUICK_REFERENCE.md

+38 −0

Original line number	Diff line number	Diff line
		@@ -550,11 +550,49 @@ tdoc-crawler checkout-spec <SPEC_NUMBERS...> [OPTIONS]

		Batch download specification documents to the checkout folder. Automatically crawls missing spec metadata before downloading.

		Features:

		- Progress bar automatically displayed for multi-spec operations (especially useful with range syntax like `26130-26.140`)
		- Clean error handling - failed downloads show single-line warnings instead of full tracebacks
		- Safe extraction - each spec version extracted to its own subfolder to prevent file conflicts

		Options:

		\| Option \| Description \|
		\|--------\|-------------\|
		\| `-r, --release RELEASE` \| Specify 3GPP release (e.g., `18`) \|
		\| `--doc-only` \| Download only Word/PDF (skip zip) \|
		\| `--checkout-dir PATH` \| Custom checkout directory (default: `<cache-dir>/checkout`) \|

		Directory Structure:

		Downloaded specs are organized hierarchically:

		```
		checkout/
		└── Specs/
		└── archive/
		└── 26_series/
		└── 26.131/
		├── 26131-j00.zip # Original zip file
		└── 26131-j00/ # Extracted contents
		└── 26131-j00.docx
		```

		Examples:

		```bash
		# Checkout multiple specs
		tdoc-crawler checkout-spec 23.501 38.331

		# Checkout spec range (with progress bar)
		tdoc-crawler checkout-spec 26130-26.140

		# Checkout specific release
		tdoc-crawler checkout-spec 23.501 -r 17

		# Checkout to custom directory
		tdoc-crawler checkout-spec 23.501 --checkout-dir /path/to/docs
		```

		### `stats`

docs/history/2025-11-01_SUMMARY_01_CLI_REFACTORING_BEADS_DEPENDENCIES.md

+16 −5

Original line number	Diff line number	Diff line
		@@ -18,7 +18,9 @@ The following beads issues have been created with the recommended dependencies t
		## Dependency Definitions

		### Without Dependencies (Can start in parallel)

		These issues can be worked on immediately as they don't depend on other tasks:

		- `tdc-lst` - Remove duplicate fetch_tdoc()
		- `tdc-oot` - Move normalize_portal_meeting_name()
		- `tdc-5uc` - Move resolve_meeting_id()
		@@ -26,7 +28,9 @@ These issues can be worked on immediately as they don't depend on other tasks:
		- `tdc-72h` - Move database_path()

		### With Dependencies

		These issues must wait for their dependencies to complete:

		- `tdc-n80` - Depends on `tdc-6ts` (download_to_path)
		- `tdc-lmu` - Depends on all refactoring tasks (#1-#6)
		- `tdc-z37` - Depends on all refactoring tasks (#1-#6)
		@@ -34,6 +38,7 @@ These issues must wait for their dependencies to complete:
		## Manual Setup Instructions

		The `bd` issue tracker's dependency flag syntax is:

		```bash
		bd create <title> --type <type> --priority <priority> --deps <dependencies>

		@@ -44,16 +49,19 @@ bd create "Move prepare_tdoc_file()" --type task --priority 3 --deps tdc-6ts
		## Beads Command Reference

		Create issue:

		```bash
		bd create <title> [flags]
		```

		Close issue:

		```bash
		bd close <id>
		```

		Add dependencies to existing issue:

		```bash
		bd deps add <id> <type>:<dependency-id>
		```
		@@ -61,17 +69,20 @@ bd deps add <id> <type>:<dependency-id>
		## Recommended Work Queue

		### Phase 1: Foundation (No dependencies)

		1. `tdc-lst` - Fix fetch_tdoc duplication (15 min)
		2. `tdc-oot` - Move normalize_portal_meeting_name (15 min)
		3. `tdc-5uc` - Move resolve_meeting_id (30 min)
		4. `tdc-6ts` - Move download_to_path (15 min)
		5. `tdc-72h` - Move database_path (20 min)
		1. `tdc-oot` - Move normalize_portal_meeting_name (15 min)
		1. `tdc-5uc` - Move resolve_meeting_id (30 min)
		1. `tdc-6ts` - Move download_to_path (15 min)
		1. `tdc-72h` - Move database_path (20 min)

		### Phase 2: Integration (Depends on Phase 1)

		6. `tdc-n80` - Move prepare_tdoc_file (30 min) - depends on `tdc-6ts`

		### Phase 3: Documentation & Verification (Depends on all)

		7. `tdc-lmu` - Update AGENTS.md (10 min)
		8. `tdc-z37` - Run full test suite (10 min)
		1. `tdc-z37` - Run full test suite (10 min)

		## Estimated Total Time: ~2.5 hours

docs/history/2025-11-01_SUMMARY_02_CLI_REFACTORING_PLAN.md

+59 −36

Original line number	Diff line number	Diff line
		# CLI Refactoring Implementation Plan

		## Overview

		Refactor `src/tdoc_crawler/cli/` to contain only CLI-specific functionality, moving library functions to the core package. This enables `tdoc_crawler` to be used as a standalone library.

		## Phase 1: Fix Fetching.py Duplication (CRITICAL)

		### Issue #1: Remove Duplicate fetch_tdoc() from cli/fetching.py

		Priority: High
		Complexity: Low

		Steps:

		1. Remove `fetch_tdoc()` function from `src/tdoc_crawler/cli/fetching.py`
		2. Import `fetch_tdoc` from `tdoc_crawler.fetching` at the top of the file
		3. Update imports in `src/tdoc_crawler/cli/app.py` if needed
		4. Run tests to verify functionality
		1. Import `fetch_tdoc` from `tdoc_crawler.fetching` at the top of the file
		1. Update imports in `src/tdoc_crawler/cli/app.py` if needed
		1. Run tests to verify functionality

		Files Changed:

		- `src/tdoc_crawler/cli/fetching.py`

		---
		______________________________________________________________________

		## Phase 2: Move Library Functions from cli/helpers.py

		### Issue #2: Move normalize_portal_meeting_name() to specs/normalization.py

		Priority: Medium
		Complexity: Low

		Steps:

		1. Add `normalize_portal_meeting_name()` function to `src/tdoc_crawler/specs/normalization.py`
		2. Update import in `src/tdoc_crawler/cli/helpers.py` to import from core
		3. Update any other files that import from `cli.helpers`
		4. Run tests to verify
		1. Update import in `src/tdoc_crawler/cli/helpers.py` to import from core
		1. Update any other files that import from `cli.helpers`
		1. Run tests to verify

		Files Changed:

		- `src/tdoc_crawler/specs/normalization.py`
		- `src/tdoc_crawler/cli/helpers.py`

		---
		______________________________________________________________________

		### Issue #3: Move resolve_meeting_id() to database module

		Priority: Medium
		Complexity: Medium

		Steps:

		1. Add `resolve_meeting_id()` function to `src/tdoc_crawler/database/__init__.py` or a new helper module
		2. Update `src/tdoc_crawler/cli/fetching.py` to import from database module
		3. Remove function from `src/tdoc_crawler/cli/helpers.py`
		4. Run tests to verify
		1. Update `src/tdoc_crawler/cli/fetching.py` to import from database module
		1. Remove function from `src/tdoc_crawler/cli/helpers.py`
		1. Run tests to verify

		Files Changed:

		- `src/tdoc_crawler/database/__init__.py` (or new file)
		- `src/tdoc_crawler/cli/helpers.py`
		- `src/tdoc_crawler/cli/fetching.py`

		---
		______________________________________________________________________

		### Issue #4: Move download_to_path() to http_client module

		Priority: Medium
		Complexity: Low

		Steps:

		1. Add `download_to_path()` function to `src/tdoc_crawler/http_client.py`
		2. Update `src/tdoc_crawler/cli/helpers.py` to import from core
		3. Update `src/tdoc_crawler/checkout.py` to import from core (it already imports from cli.helpers for this function)
		4. Run tests to verify
		1. Update `src/tdoc_crawler/cli/helpers.py` to import from core
		1. Update `src/tdoc_crawler/checkout.py` to import from core (it already imports from cli.helpers for this function)
		1. Run tests to verify

		Files Changed:

		- `src/tdoc_crawler/http_client.py`
		- `src/tdoc_crawler/cli/helpers.py`
		- `src/tdoc_crawler/checkout.py`

		---
		______________________________________________________________________

		### Issue #5: Move prepare_tdoc_file() to checkout module

		Priority: Medium
		Complexity: Medium

		Steps:

		1. Add `prepare_tdoc_file()` function to `src/tdoc_crawler/checkout.py`
		2. Update `src/tdoc_crawler/cli/helpers.py` to import from checkout module
		3. Update `src/tdoc_crawler/cli/app.py` if needed
		4. Run tests to verify
		1. Update `src/tdoc_crawler/cli/helpers.py` to import from checkout module
		1. Update `src/tdoc_crawler/cli/app.py` if needed
		1. Run tests to verify

		Files Changed:

		- `src/tdoc_crawler/checkout.py`
		- `src/tdoc_crawler/cli/helpers.py`
		- `src/tdoc_crawler/cli/app.py`

		---
		______________________________________________________________________

		### Issue #6: Move database_path() to database module

		Priority: Low
		Complexity: Low

		Steps:

		1. Add `database_path()` function to `src/tdoc_crawler/database/connection.py` or `__init__.py`
		2. Update all files importing from `cli.helpers` to import from database module
		3. Remove function from `src/tdoc_crawler/cli/helpers.py`
		4. Run tests to verify
		1. Update all files importing from `cli.helpers` to import from database module
		1. Remove function from `src/tdoc_crawler/cli/helpers.py`
		1. Run tests to verify

		Files Changed:

		- `src/tdoc_crawler/database/connection.py`
		- `src/tdoc_crawler/cli/helpers.py`
		- All files that import `database_path` from `cli.helpers`

		---
		______________________________________________________________________

		## Phase 3: Final Cleanup

		### Issue #7: Update AGENTS.md with Final Classification

		Priority: Low
		Complexity: Low

		Steps:

		1. Update `src/tdoc_crawler/cli/AGENTS.md` to reflect completed refactoring
		2. Document any remaining functions in `cli/helpers.py` and their classification
		1. Document any remaining functions in `cli/helpers.py` and their classification

		---
		______________________________________________________________________

		### Issue #8: Run Full Test Suite

		Priority: Critical
		Complexity: Low

		Steps:

		1. Run full test suite: `uv run pytest -v`
		2. Verify all tests pass
		3. Fix any regressions introduced by refactoring
		1. Verify all tests pass
		1. Fix any regressions introduced by refactoring

		---
		______________________________________________________________________

		## Dependency Order

		1. Issue #1 (fetch_tdoc duplication) - Can be done independently
		2. Issue #4 (download_to_path) - Checkout.py depends on it
		3. Issue #5 (prepare_tdoc_file) - Depends on download_to_path
		4. Issue #2 (normalize_portal_meeting_name) - Independent
		5. Issue #3 (resolve_meeting_id) - Independent
		6. Issue #6 (database_path) - Depends on understanding all imports
		7. Issue #7 (Update AGENTS.md) - After all refactoring
		8. Issue #8 (Full test suite) - After all refactoring
		1. Issue #4 (download_to_path) - Checkout.py depends on it
		1. Issue #5 (prepare_tdoc_file) - Depends on download_to_path
		1. Issue #2 (normalize_portal_meeting_name) - Independent
		1. Issue #3 (resolve_meeting_id) - Independent
		1. Issue #6 (database_path) - Depends on understanding all imports
		1. Issue #7 (Update AGENTS.md) - After all refactoring
		1. Issue #8 (Full test suite) - After all refactoring

		## Estimated Effort

src/tdoc_crawler/checkout.py

+1 −1

Original line number	Diff line number	Diff line
		@@ -17,8 +17,8 @@ from urllib.parse import urlparse

		import requests

		from tdoc_crawler.models import TDocMetadata
		from tdoc_crawler.http_client import download_to_path
		from tdoc_crawler.models import TDocMetadata

		logger = logging.getLogger(__name__)

src/tdoc_crawler/cli/app.py

+2 −4

Original line number	Diff line number	Diff line
		@@ -15,10 +15,10 @@ from dotenv import load_dotenv
		from rich.progress import BarColumn, MofNCompleteColumn, Progress, SpinnerColumn, TextColumn
		from rich.table import Table

		from tdoc_crawler.checkout import checkout_tdoc
		from tdoc_crawler.checkout import checkout_tdoc, prepare_tdoc_file
		from tdoc_crawler.crawlers import MeetingCrawler, TDocCrawler
		from tdoc_crawler.credentials import set_credentials
		from tdoc_crawler.database import TDocDatabase
		from tdoc_crawler.database import TDocDatabase, database_path
		from tdoc_crawler.models import MeetingCrawlConfig, MeetingQueryConfig, OutputFormat, QueryConfig, SortOrder, TDocCrawlConfig
		from tdoc_crawler.specs import SpecCatalog
		from tdoc_crawler.specs.downloads import SpecDownloads
		@@ -78,8 +78,6 @@ from .printing import (
		spec_query_to_dict,
		tdoc_to_dict,
		)
		from tdoc_crawler.checkout import prepare_tdoc_file
		from tdoc_crawler.database import database_path

		load_dotenv()