📝 docs: update test organization guidelines and clarify import patterns (55fc242a) · Commits · Jan Reimes / 3gpp-crawler

src/tdoc_crawler/cli/AGENTS.md

+23 −58

Original line number	Diff line number	Diff line
		# CLI Submodule Guidelines

		This document provides guidelines for development in the `tdoc_crawler.cli` submodule.

		## Design Principle

		The `cli/` submodule should contain only CLI-related functionality. The core `tdoc_crawler` package should be usable as a standalone library without depending on the CLI. Think of `cli/` as an optional extras package (installable as `tdoc_crawler[cli]`).
		The `cli/` submodule contains only CLI-related functionality. The core `tdoc_crawler` package is a standalone library. Think of `cli/` as an optional extras package.

		STRICT RULE: NEVER duplicate logic from the core library in the CLI. If you need functionality that is partially implemented in the CLI but belongs in the core, move it to the core and have the CLI import it.
		STRICT RULE: NEVER duplicate core library logic in CLI. If functionality belongs in the core, move it there and have CLI import it.

		## Classification Rules

		When deciding whether code belongs in `cli/` or the core library, ask:

		### Is this clearly CLI code?
		### CLI Code (belongs in `cli/`)

		- Typer command definitions (`@app.command()`)
		- Typer argument/option types (`Annotated[...]` with `typer.Option/Argument`)
		- Typer argument/option types (`Annotated[...]`)
		- Rich console output and formatting
		- CLI-specific parsing of user input strings
		- System calls for opening files (`os.startfile`, `xdg-open`)
		- CLI-specific input parsing
		- System calls for opening files

		### Is this clearly library code?
		### Library Code (belongs in core)

		- Data models and schemas
		- Database operations
		- HTTP fetching and caching
		- Data normalization and transformation
		- Data normalization/transformation
		- File I/O operations

		### When in doubt...

		Assume `cli/` could be separated as an optional package. If a function would be useful to a Python developer using `tdoc_crawler` as a library, it belongs in the core package.
		Rule of thumb: If useful to a Python developer using `tdoc_crawler` as a library, it belongs in the core.

		## Module Responsibilities

		\| Module \| Responsibility \|
		\|--------\|----------------\|
		\| `app.py` \| Typer command definitions and CLI entry points \|
		\| `args.py` \| Typer Annotated types for arguments and options \|
		\| `console.py` \| Rich Console singleton for CLI output \|
		\| `utils.py` \| Helper functions - check classification rules above \|
		\| `printing.py` \| Table and output formatting for CLI \|
		\| `app.py` \| Typer command definitions \|
		\| `args.py` \| Typer Annotated types \|
		\| `console.py` \| Rich Console singleton \|
		\| `utils.py` \| CLI helpers (check classification rules) \|
		\| `printing.py` \| Table and output formatting \|

		## Lessons Learned

		### Circular Import Resolution

		During the CLI refactoring, a circular import was discovered between `database/` and `specs/` modules:

		```
		database/__init__.py → database/connection.py → specs/query.py → specs/__init__.py → specs/catalog.py → database
		```

		Solution: Created a neutral `models/specs.py` layer for shared types (`SpecQueryFilters`, `SpecQueryResult`). Both `database/` and `specs/` import from `models/` without circularity.

		Key Insight: Circular imports always indicate a structural problem. Never use TYPE_CHECKING or local imports to work around them - refactor the module organization instead.
		Circular import between `database/` and `specs/` was resolved by creating neutral `models/specs.py` layer for shared types. Both modules import from `models/` without circularity.

		### Domain-Oriented Refactoring (Steps 1-14)
		Key Insight: Circular imports indicate structural problems. Never use `TYPE_CHECKING` or local imports as workarounds—refactor module organization instead.

		The project underwent a complete restructuring to eliminate the legacy `crawlers/` package:
		### Domain-Oriented Refactoring

		Before: Mixed orchestration and domain logic in `crawlers/`
		Legacy `crawlers/` package was eliminated:

		Before:
		```python
		from tdoc_crawler.crawlers import TDocCrawler, MeetingCrawler
		from tdoc_crawler.crawlers.whatthespec import resolve_via_whatthespec
		from tdoc_crawler.crawlers.meeting_doclist import fetch_meeting_document_list
		```

		After: Clean domain packages with clear responsibilities

		After:
		```python
		from tdoc_crawler.tdocs import TDocCrawler
		from tdoc_crawler.tdocs.sources.whatthespec import resolve_via_whatthespec
		from tdoc_crawler.tdocs.sources.doclist import fetch_meeting_document_list
		from tdoc_crawler.meetings import MeetingCrawler
		```

		Result: The `crawlers/` folder no longer exists. All functionality lives in domain packages.
		Result: `crawlers/` folder no longer exists. All functionality lives in domain packages.

		### Import Pattern

		The correct import direction is:

		- CLI (`cli/`) → Core (`tdoc_crawler/`)
		- Core submodules (`database/`, `specs/`, etc.) → Common layer (`models/`)

		Never: Core modules importing from CLI modules.

		### Before and After

		Before refactoring:

		- `cli/helpers.py` contained 5 library functions
		- `cli/fetching.py` duplicated `fetch_tdoc()` from core
		- Core modules could not be used independently

		After refactoring:

		- CLI contains only CLI-specific code
		- All library functions in appropriate core modules
		- Core package is a standalone library (installable without CLI extras)
		- Core submodules → Common layer (`models/`)
		- Never: Core importing from CLI

tests/AGENTS.md

+23 −101

Original line number	Diff line number	Diff line
		# Test Organization Guidelines for Coding Assistants

		This document provides guidance for organizing and extending tests in the TDoc-Crawler project.
		# Test Organization Guidelines

		## Quick Start

		1. General TDD patterns: Use the `test-driven-development` skill for pytest, mocking, fixtures, and coverage guidelines
		2. Project-specific patterns: This file covers test organization, pool executor tests, and domain-specific imports
		3. Standard locations: Cache: `./tests/test-cache`, Database: `./tests/test-cache/tdoc_crawler.db`

		## Pool Executor Tests (`tests/pool_executor/`)

		The `pool_executor/` directory contains all tests related to the `pool_executors` package, which provides executor pool extensions including serial execution support.

		### Why a Separate Directory?

		1. Package Isolation: The `pool_executors` package is designed as a standalone package that could be extracted from this repository in the future. Keeping its tests in a dedicated directory mirrors this separation.
		1. General TDD patterns: Use the `test-driven-development` skill for pytest, mocking, fixtures, and coverage
		2. Standard locations: Cache `./tests/test-cache`, DB `./tests/test-cache/tdoc_crawler.db`
		3. Run tests: `uv run pytest -v`

		1. Clear Ownership: Tests for executor functionality are clearly grouped together, making it easier to find and maintain them.
		## Pool Executor Tests

		1. Import Clarity: When working on pool executor tests, imports use the package path `pool_executors.pool_executors` rather than `tdoc_crawler` imports, reinforcing the package's independence.
		The `pool_executor/` directory contains tests for the standalone `pool_executors` package.

		### Test Files

		- test_executor_adapter.py: Tests for the `Runner` class that provides aiointerpreters-compatible API using pool_executors
		- test_serial_executor.py: Comprehensive tests for `SerialPoolExecutor`, `SerialFuture`, and the `create_executor` factory
		- test_executor_adapter.py: Tests for `Runner` class (aiointerpreters-compatible API)
		- test_serial_executor.py: Tests for `SerialPoolExecutor`, `SerialFuture`, `create_executor`

		### Running Pool Executor Tests
		### Coverage Target

		>90% code coverage required for this core component.

		```bash
		# Run all pool executor tests
		uv run pytest tests/pool_executor/ -v

		# Run specific test file
		uv run pytest tests/pool_executor/test_serial_executor.py -v

		# Run with coverage
		uv run pytest tests/pool_executor/ --cov=pool_executors --cov-report=term-missing
		```

		### Coverage Target

		Pool executor tests should maintain >90% code coverage to ensure reliability of this core component.

		## Adding New Tests

		### For Pool Executor Features

		When adding new features to the `pool_executors` package:

		1. Add tests to `tests/pool_executor/test_serial_executor.py` for SerialPoolExecutor changes
		1. Add tests to `tests/pool_executor/test_executor_adapter.py` for Runner adapter changes
		1. Ensure all new tests follow the existing patterns:
		- Use pytest
		- Include docstrings
		- Test both success and failure cases
		- Aim for comprehensive coverage

		### For Other Features

		Follow the existing pattern in the appropriate test file in `tests/`. If adding a substantial new feature, consider creating a new test file in the appropriate location.

		## Test Fixtures

		Shared fixtures are defined in `conftest.py`. Key fixtures include:

		- `test_db_path`: Path to test database
		- `credentials`: Mock PortalCredentials for testing
		- `mock_session`: Mock HTTP session

		## Import Guidelines

		### Pool Executor Tests

		Use package-level imports:

		```python
		from pool_executors.pool_executors import SerialPoolExecutor, create_executor
		```

		### Domain-Specific Imports

		Since the refactoring (Steps 1-14), use domain-specific imports:
		Use domain-specific imports (legacy `crawlers/` package was removed):

		```python
		# TDoc functionality
		from tdoc_crawler.tdocs import TDocCrawler, HybridTDocCrawler
		from tdoc_crawler.tdocs import TDocCrawler
		from tdoc_crawler.tdocs.sources.whatthespec import resolve_via_whatthespec
		from tdoc_crawler.tdocs.sources.doclist import fetch_meeting_document_list

		# Meeting functionality
		from tdoc_crawler.meetings import MeetingCrawler

		# Spec functionality
		from tdoc_crawler.specs import SpecDatabase

		# Constants (patterns, URLs)
		from tdoc_crawler.constants.patterns import EXCLUDED_DIRS, TDOC_PATTERN
		from tdoc_crawler.constants.urls import TDOC_DOWNLOAD_URL

		# Portal client
		from tdoc_crawler.clients.portal import PortalClient, create_portal_client

		# Parsers
		from tdoc_crawler.parsers.portal import parse_tdoc_portal_page
		```

		NEVER use: `from tdoc_crawler.crawlers import ...` (this package was removed)

		## Anti-Duplication in Tests

		### Use Fixtures
		CRITICAL: Do not re-implement or copy logic from `src/` into `tests/`. Tests should import and verify actual implementation. If code is hard to test, refactor it to be more testable.

		Avoid duplicating setup code or data loading in tests. Use `conftest.py` and pytest fixtures.

		### No Library Logic in Tests

		Do not re-implement or copy logic from `src/` into `tests/` for the sake of mocking or testing. Tests should verify the actual implementation by importing it. If the code is hard to test, refactor the code to be more testable rather than duplicating logic.
		Use fixtures from `conftest.py` to avoid duplicating setup code:
		- `test_db_path`: Path to test database
		- `credentials`: Mock PortalCredentials
		- `mock_session`: Mock HTTP session

		## Best Practices

		1. Test Isolation: Each test should be independent and not rely on state from other tests
		1. Descriptive Names: Use clear test method names that describe what is being tested
		1. Minimal Assertions: Each test should verify one specific behavior
		1. Use Fixtures: Leverage shared fixtures to reduce duplication
		1. Mock External Systems: Use mocking to isolate tests from external dependencies
		1. Document Edge Cases: Add tests for edge cases and error conditions

		## Continuous Integration

		Pool executor tests are run as part of the full test suite:

		```bash
		# Full test suite
		uv run pytest tests/ -v

		# Quick smoke test
		uv run pytest tests/pool_executor/test_serial_executor.py::TestSerialFuture::test_serial_future_immediate_execution -v
		```
		- Test isolation: Each test independent
		- Descriptive names: Clear method names describing behavior
		- Minimal assertions: One specific behavior per test
		- Mock external systems: Isolate tests from dependencies
		- Document edge cases: Test error conditions