docs(cli): add guidelines for CLI submodule development (187f46cd) · Commits · Jan Reimes / 3gpp-crawler

AGENTS.md

+69 −5

Original line number	Diff line number	Diff line
		@@ -243,8 +243,72 @@ The project maintains three levels of documentation:
		- `docs/QUICK_REFERENCE.md` MUST always be up to date and reflect the current state of ALL commands
		- When adding or modifying commands, BOTH the history file AND `docs/QUICK_REFERENCE.md` must be updated

		## Reviews of AGENTS.md

		After several implementation steps, the present file (`AGENTS.md`) might need an update. When explicitly asked, use/update the file `docs/REVIEW_AND_IMPROVEMENTS_AGENTS_MD.md` for that purpose.

		The actual update of `AGENTS.md` will be done only after explicit user confirmation.
		## AGENTS.md File Design Guidelines

		Each AGENTS.md file serves as long-term memory for the project/submodule, providing coding assistants with essential guidelines, conventions, and context. Below are the principles for designing, writing, and updating these documents.

		### Purpose and Scope

		\| AGENTS.md Location \| Purpose \|
		\|--------------------\|---------\|
		\| Root `AGENTS.md` \| Project-wide guidelines, quick start, coding standards, tools and skills \|
		\| `tests/AGENTS.md` \| Test organization, fixtures, patterns, and best practices \|
		\| `src/tdoc_crawler/cli/AGENTS.md` \| CLI submodule guidelines, separation of concerns, architecture decisions \|

		### Design Principles

		1. Living Documentation, Not Plans
		- AGENTS.md should capture established patterns and decisions, not active checklists
		- Completed work belongs in git history, not in AGENTS.md checklists
		- Use AGENTS.md to document how to work, not what needs to be done

		2. Information Hierarchy
		- Quick Start (5-10 lines): What to read first, key files to examine
		- Conventions: Coding standards, patterns, and style guidelines
		- Reference Material: Detailed guidance for specific domains
		- Lessons Learned: Post-mortem insights to avoid repeating mistakes

		3. What to Include
		- Project structure and module responsibilities
		- Coding conventions and style guidelines
		- Import patterns and dependency rules
		- Tool preferences and usage patterns
		- Domain-specific knowledge (e.g., 3GPP terminology)
		- Lessons learned from refactoring sessions

		4. What NOT to Include
		- Checklists of completed items (belongs in git commit messages)
		- Active TODO lists (use beads issues instead)
		- Step-by-step implementation plans (use Prometheus/momus for planning)
		- Temporary debugging notes or workarounds

		5. Updates
		- After refactoring sessions, update AGENTS.md with architectural insights
		- When patterns emerge, document them for future reference
		- When anti-patterns are discovered, add warnings to prevent recurrence
		- After explicitly requested, review and update using `docs/REVIEW_AND_IMPROVEMENTS_AGENTS_MD.md`

		### CLI Submodule Example

		For `src/tdoc_crawler/cli/AGENTS.md`, the focus should be on:

		- Separation of Concerns: What belongs in CLI vs. core library
		- Classification Rules: How to distinguish CLI code from library code
		- Module Responsibilities: What each CLI module should contain
		- Import Patterns: Ensuring CLI can import from core, but not vice versa
		- Lessons Learned: Architectural insights from refactoring (e.g., circular import resolution)

		### Writing Style

		- Be concise and factual
		- Use consistent formatting throughout
		- Include code examples where helpful
		- Cross-reference related sections when appropriate
		- Update proactively when new patterns emerge

		### Review and Updates

		After several implementation steps, AGENTS.md files might need updates. When explicitly asked:
		1. Review the current state using `docs/REVIEW_AND_IMPROVEMENTS_AGENTS_MD.md`
		2. Propose changes based on accumulated knowledge
		3. Update only after explicit user confirmation

src/tdoc_crawler/cli/AGENTS.md

0 → 100644

+113 −0

Original line number	Diff line number	Diff line
		# CLI Submodule Guidelines

		This document provides guidelines for development in the `tdoc_crawler.cli` submodule.

		## Design Principle

		The `cli/` submodule should contain only CLI-related functionality. The core `tdoc_crawler` package should be usable as a standalone library without depending on the CLI. Think of `cli/` as an optional extras package (installable as `tdoc_crawler[cli]`).

		## Classification Rules

		When deciding whether code belongs in `cli/` or the core library, ask:

		### Is this clearly CLI code?

		- Typer command definitions (`@app.command()`)
		- Typer argument/option types (`Annotated[...]` with `typer.Option/Argument`)
		- Rich console output and formatting
		- CLI-specific parsing of user input strings
		- System calls for opening files (`os.startfile`, `xdg-open`)

		### Is this clearly library code?

		- Data models and schemas
		- Database operations
		- HTTP fetching and caching
		- Data normalization and transformation
		- File I/O operations

		### When in doubt...

		Assume `cli/` could be separated as an optional package. If a function would be useful to a Python developer using `tdoc_crawler` as a library, it belongs in the core package.

		## Module Responsibilities

		\| Module \| Responsibility \|
		\|--------\|----------------\|
		\| `app.py` \| Typer command definitions and CLI entry points \|
		\| `args.py` \| Typer Annotated types for arguments and options \|
		\| `console.py` \| Rich Console singleton for CLI output \|
		\| `helpers.py` \| Helper functions - check classification rules above \|
		\| `fetching.py` \| TDoc fetching - check classification rules below \|
		\| `printing.py` \| Table and output formatting for CLI \|

		## Function Classification

		### `helpers.py`

		CLI Functions (stay in cli/):

		- `parse_working_groups()` - CLI argument parsing
		- `parse_subgroups()` - CLI argument parsing
		- `collect_spec_numbers()` - CLI stdin/file input handling
		- `build_limits()` - CLI config builder wrapper
		- `launch_file()` - System calls for opening files
		- `resolve_http_cache_config()` - CLI/env var configuration parsing
		- `infer_working_groups_from_ids()` - CLI string inference

		Library Functions (moved to core):

		- `normalize_portal_meeting_name()` → `tdoc_crawler.specs.normalization`
		- `resolve_meeting_id()` → `tdoc_crawler.database`
		- `download_to_path()` → `tdoc_crawler.http_client`
		- `prepare_tdoc_file()` → `tdoc_crawler.checkout`
		- `database_path()` → `tdoc_crawler.database`

		### `fetching.py`

		CLI Functions (stay in cli/):

		- `fetch_missing_tdocs()` - Uses CLI console output
		- `_fetch_via_whatthespec()` - Uses CLI console output
		- `maybe_fetch_missing_tdocs()` - CLI console and flag handling

		Library Functions:

		- `fetch_tdoc()` - Import from `tdoc_crawler.fetching` (not duplicated in CLI)

		## Lessons Learned

		### Circular Import Resolution

		During the CLI refactoring, a circular import was discovered between `database/` and `specs/` modules:

		```
		database/__init__.py → database/connection.py → specs/query.py → specs/__init__.py → specs/catalog.py → database
		```

		Solution: Created a neutral `models/specs.py` layer for shared types (`SpecQueryFilters`, `SpecQueryResult`). Both `database/` and `specs/` import from `models/` without circularity.

		Key Insight: Circular imports always indicate a structural problem. Never use TYPE_CHECKING or local imports to work around them - refactor the module organization instead.

		### Import Pattern

		The correct import direction is:

		- CLI (`cli/`) → Core (`tdoc_crawler/`)
		- Core submodules (`database/`, `specs/`, etc.) → Common layer (`models/`)

		Never: Core modules importing from CLI modules.

		### Before and After

		Before refactoring:

		- `cli/helpers.py` contained 5 library functions
		- `cli/fetching.py` duplicated `fetch_tdoc()` from core
		- Core modules could not be used independently

		After refactoring:

		- CLI contains only CLI-specific code
		- All library functions in appropriate core modules
		- Core package is a standalone library (installable without CLI extras)