This document provides guidelines for development in the `tdoc_crawler.cli` submodule.
## Design Principle
The `cli/` submodule should contain **only CLI-related functionality**. The core `tdoc_crawler` package should be usable as a standalone library without depending on the CLI. Think of `cli/` as an optional extras package (installable as `tdoc_crawler[cli]`).
The `cli/` submodule contains**only CLI-related functionality**. The core `tdoc_crawler` package is a standalone library. Think of `cli/` as an optional extras package.
**STRICT RULE: NEVER duplicate logic from the core library in the CLI.** If you need functionality that is partially implemented in the CLI but belongs in the core, move it to the core and have the CLI import it.
**STRICT RULE: NEVER duplicate core library logic in CLI.** If functionality belongs in the core, move it there and have CLI import it.
## Classification Rules
When deciding whether code belongs in `cli/` or the core library, ask:
### Is this clearly CLI code?
### CLI Code (belongs in `cli/`)
- Typer command definitions (`@app.command()`)
- Typer argument/option types (`Annotated[...]` with `typer.Option/Argument`)
- Typer argument/option types (`Annotated[...]`)
- Rich console output and formatting
- CLI-specific parsing of user input strings
- System calls for opening files (`os.startfile`, `xdg-open`)
- CLI-specific input parsing
- System calls for opening files
### Is this clearly library code?
### Library Code (belongs in core)
- Data models and schemas
- Database operations
- HTTP fetching and caching
- Data normalization and transformation
- Data normalization/transformation
- File I/O operations
### When in doubt...
Assume `cli/` could be separated as an optional package. If a function would be useful to a Python developer using `tdoc_crawler` as a library, it belongs in the core package.
**Rule of thumb:** If useful to a Python developer using `tdoc_crawler` as a library, it belongs in the core.
**Solution:** Created a neutral `models/specs.py` layer for shared types (`SpecQueryFilters`, `SpecQueryResult`). Both `database/` and `specs/` import from `models/` without circularity.
**Key Insight:** Circular imports always indicate a structural problem. Never use TYPE_CHECKING or local imports to work around them - refactor the module organization instead.
Circular import between `database/` and `specs/` was resolved by creating neutral `models/specs.py` layer for shared types. Both modules import from `models/` without circularity.
### Domain-Oriented Refactoring (Steps 1-14)
**Key Insight:** Circular imports indicate structural problems. Never use `TYPE_CHECKING` or local imports as workarounds—refactor module organization instead.
The project underwent a complete restructuring to eliminate the legacy `crawlers/` package:
### Domain-Oriented Refactoring
**Before:** Mixed orchestration and domain logic in `crawlers/`
The `pool_executor/` directory contains all tests related to the `pool_executors` package, which provides executor pool extensions including serial execution support.
### Why a Separate Directory?
1.**Package Isolation**: The `pool_executors` package is designed as a standalone package that could be extracted from this repository in the future. Keeping its tests in a dedicated directory mirrors this separation.
1.**General TDD patterns**: Use the `test-driven-development` skill for pytest, mocking, fixtures, and coverage
2.**Standard locations**: Cache `./tests/test-cache`, DB `./tests/test-cache/tdoc_crawler.db`
3.**Run tests**: `uv run pytest -v`
1.**Clear Ownership**: Tests for executor functionality are clearly grouped together, making it easier to find and maintain them.
## Pool Executor Tests
1.**Import Clarity**: When working on poolexecutor tests, imports use the package path `pool_executors.pool_executors` rather than `tdoc_crawler` imports, reinforcing the package's independence.
The `pool_executor/` directory contains tests for the standalone `pool_executors` package.
### Test Files
-**test_executor_adapter.py**: Tests for the `Runner` class that provides aiointerpreters-compatible API using pool_executors
-**test_serial_executor.py**: Comprehensive tests for `SerialPoolExecutor`, `SerialFuture`, and the `create_executor` factory
-**test_executor_adapter.py**: Tests for `Runner` class (aiointerpreters-compatible API)
-**test_serial_executor.py**: Tests for `SerialPoolExecutor`, `SerialFuture`, `create_executor`
### Running Pool Executor Tests
### Coverage Target
**>90% code coverage** required for this core component.
```bash
# Run all pool executor tests
uv run pytest tests/pool_executor/ -v
# Run specific test file
uv run pytest tests/pool_executor/test_serial_executor.py -v
# Run with coverage
uv run pytest tests/pool_executor/ --cov=pool_executors --cov-report=term-missing
```
### Coverage Target
Pool executor tests should maintain **>90% code coverage** to ensure reliability of this core component.
## Adding New Tests
### For Pool Executor Features
When adding new features to the `pool_executors` package:
1. Add tests to `tests/pool_executor/test_serial_executor.py` for SerialPoolExecutor changes
1. Add tests to `tests/pool_executor/test_executor_adapter.py` for Runner adapter changes
1. Ensure all new tests follow the existing patterns:
- Use pytest
- Include docstrings
- Test both success and failure cases
- Aim for comprehensive coverage
### For Other Features
Follow the existing pattern in the appropriate test file in `tests/`. If adding a substantial new feature, consider creating a new test file in the appropriate location.
## Test Fixtures
Shared fixtures are defined in `conftest.py`. Key fixtures include:
-`test_db_path`: Path to test database
-`credentials`: Mock PortalCredentials for testing
**NEVER use:**`from tdoc_crawler.crawlers import ...` (this package was removed)
## Anti-Duplication in Tests
### Use Fixtures
**CRITICAL:** Do not re-implement or copy logic from `src/` into `tests/`. Tests should import and verify actual implementation. If code is hard to test, refactor it to be more testable.
Avoid duplicating setup code or data loading in tests. Use `conftest.py` and pytest fixtures.
### No Library Logic in Tests
Do not re-implement or copy logic from `src/` into `tests/` for the sake of mocking or testing. Tests should verify the actual implementation by importing it. If the code is hard to test, refactor the code to be more testable rather than duplicating logic.
**Use fixtures** from `conftest.py` to avoid duplicating setup code:
-`test_db_path`: Path to test database
-`credentials`: Mock PortalCredentials
-`mock_session`: Mock HTTP session
## Best Practices
1.**Test Isolation**: Each test should be independent and not rely on state from other tests
1.**Descriptive Names**: Use clear test method names that describe what is being tested
1.**Minimal Assertions**: Each test should verify one specific behavior
1.**Use Fixtures**: Leverage shared fixtures to reduce duplication
1.**Mock External Systems**: Use mocking to isolate tests from external dependencies
1.**Document Edge Cases**: Add tests for edge cases and error conditions
## Continuous Integration
Pool executor tests are run as part of the full test suite:
```bash
# Full test suite
uv run pytest tests/ -v
# Quick smoke test
uv run pytest tests/pool_executor/test_serial_executor.py::TestSerialFuture::test_serial_future_immediate_execution -v