Commit 187f46cd authored by Jan Reimes's avatar Jan Reimes
Browse files

docs(cli): add guidelines for CLI submodule development

- Define design principles for CLI-related functionality.
- Establish classification rules for code organization.
- Outline module responsibilities and function classification.
- Document lessons learned from refactoring, including circular import resolution.
parent c1749592
Loading
Loading
Loading
Loading
+69 −5
Original line number Diff line number Diff line
@@ -243,8 +243,72 @@ The project maintains three levels of documentation:
- `docs/QUICK_REFERENCE.md` **MUST** always be up to date and reflect the current state of ALL commands
- When adding or modifying commands, **BOTH** the history file AND `docs/QUICK_REFERENCE.md` must be updated

## Reviews of AGENTS.md

After several implementation steps, the present file (`AGENTS.md`) might need an update. When explicitly asked, use/update the file `docs/REVIEW_AND_IMPROVEMENTS_AGENTS_MD.md` for that purpose.

The actual update of `AGENTS.md` will be done only after explicit user confirmation.
## AGENTS.md File Design Guidelines

Each AGENTS.md file serves as long-term memory for the project/submodule, providing coding assistants with essential guidelines, conventions, and context. Below are the principles for designing, writing, and updating these documents.

### Purpose and Scope

| AGENTS.md Location | Purpose |
|--------------------|---------|
| Root `AGENTS.md` | Project-wide guidelines, quick start, coding standards, tools and skills |
| `tests/AGENTS.md` | Test organization, fixtures, patterns, and best practices |
| `src/tdoc_crawler/cli/AGENTS.md` | CLI submodule guidelines, separation of concerns, architecture decisions |

### Design Principles

1. **Living Documentation, Not Plans**
   - AGENTS.md should capture established patterns and decisions, not active checklists
   - Completed work belongs in git history, not in AGENTS.md checklists
   - Use AGENTS.md to document *how* to work, not *what* needs to be done

2. **Information Hierarchy**
   - **Quick Start** (5-10 lines): What to read first, key files to examine
   - **Conventions**: Coding standards, patterns, and style guidelines
   - **Reference Material**: Detailed guidance for specific domains
   - **Lessons Learned**: Post-mortem insights to avoid repeating mistakes

3. **What to Include**
   - Project structure and module responsibilities
   - Coding conventions and style guidelines
   - Import patterns and dependency rules
   - Tool preferences and usage patterns
   - Domain-specific knowledge (e.g., 3GPP terminology)
   - Lessons learned from refactoring sessions

4. **What NOT to Include**
   - Checklists of completed items (belongs in git commit messages)
   - Active TODO lists (use beads issues instead)
   - Step-by-step implementation plans (use Prometheus/momus for planning)
   - Temporary debugging notes or workarounds

5. **Updates**
   - After refactoring sessions, update AGENTS.md with architectural insights
   - When patterns emerge, document them for future reference
   - When anti-patterns are discovered, add warnings to prevent recurrence
   - After explicitly requested, review and update using `docs/REVIEW_AND_IMPROVEMENTS_AGENTS_MD.md`

### CLI Submodule Example

For `src/tdoc_crawler/cli/AGENTS.md`, the focus should be on:

- **Separation of Concerns**: What belongs in CLI vs. core library
- **Classification Rules**: How to distinguish CLI code from library code
- **Module Responsibilities**: What each CLI module should contain
- **Import Patterns**: Ensuring CLI can import from core, but not vice versa
- **Lessons Learned**: Architectural insights from refactoring (e.g., circular import resolution)

### Writing Style

- Be concise and factual
- Use consistent formatting throughout
- Include code examples where helpful
- Cross-reference related sections when appropriate
- Update proactively when new patterns emerge

### Review and Updates

After several implementation steps, AGENTS.md files might need updates. When explicitly asked:
1. Review the current state using `docs/REVIEW_AND_IMPROVEMENTS_AGENTS_MD.md`
2. Propose changes based on accumulated knowledge
3. Update only after explicit user confirmation
+113 −0
Original line number Diff line number Diff line
# CLI Submodule Guidelines

This document provides guidelines for development in the `tdoc_crawler.cli` submodule.

## Design Principle

The `cli/` submodule should contain **only CLI-related functionality**. The core `tdoc_crawler` package should be usable as a standalone library without depending on the CLI. Think of `cli/` as an optional extras package (installable as `tdoc_crawler[cli]`).

## Classification Rules

When deciding whether code belongs in `cli/` or the core library, ask:

### Is this clearly CLI code?

- Typer command definitions (`@app.command()`)
- Typer argument/option types (`Annotated[...]` with `typer.Option/Argument`)
- Rich console output and formatting
- CLI-specific parsing of user input strings
- System calls for opening files (`os.startfile`, `xdg-open`)

### Is this clearly library code?

- Data models and schemas
- Database operations
- HTTP fetching and caching
- Data normalization and transformation
- File I/O operations

### When in doubt...

Assume `cli/` could be separated as an optional package. If a function would be useful to a Python developer using `tdoc_crawler` as a library, it belongs in the core package.

## Module Responsibilities

| Module | Responsibility |
|--------|----------------|
| `app.py` | Typer command definitions and CLI entry points |
| `args.py` | Typer Annotated types for arguments and options |
| `console.py` | Rich Console singleton for CLI output |
| `helpers.py` | Helper functions - check classification rules above |
| `fetching.py` | TDoc fetching - check classification rules below |
| `printing.py` | Table and output formatting for CLI |

## Function Classification

### `helpers.py`

**CLI Functions (stay in cli/):**

- `parse_working_groups()` - CLI argument parsing
- `parse_subgroups()` - CLI argument parsing
- `collect_spec_numbers()` - CLI stdin/file input handling
- `build_limits()` - CLI config builder wrapper
- `launch_file()` - System calls for opening files
- `resolve_http_cache_config()` - CLI/env var configuration parsing
- `infer_working_groups_from_ids()` - CLI string inference

**Library Functions (moved to core):**

- `normalize_portal_meeting_name()``tdoc_crawler.specs.normalization`
- `resolve_meeting_id()``tdoc_crawler.database`
- `download_to_path()``tdoc_crawler.http_client`
- `prepare_tdoc_file()``tdoc_crawler.checkout`
- `database_path()``tdoc_crawler.database`

### `fetching.py`

**CLI Functions (stay in cli/):**

- `fetch_missing_tdocs()` - Uses CLI console output
- `_fetch_via_whatthespec()` - Uses CLI console output
- `maybe_fetch_missing_tdocs()` - CLI console and flag handling

**Library Functions:**

- `fetch_tdoc()` - Import from `tdoc_crawler.fetching` (not duplicated in CLI)

## Lessons Learned

### Circular Import Resolution

During the CLI refactoring, a circular import was discovered between `database/` and `specs/` modules:

```
database/__init__.py → database/connection.py → specs/query.py → specs/__init__.py → specs/catalog.py → database
```

**Solution:** Created a neutral `models/specs.py` layer for shared types (`SpecQueryFilters`, `SpecQueryResult`). Both `database/` and `specs/` import from `models/` without circularity.

**Key Insight:** Circular imports always indicate a structural problem. Never use TYPE_CHECKING or local imports to work around them - refactor the module organization instead.

### Import Pattern

The correct import direction is:

- CLI (`cli/`) → Core (`tdoc_crawler/`)
- Core submodules (`database/`, `specs/`, etc.) → Common layer (`models/`)

**Never:** Core modules importing from CLI modules.

### Before and After

**Before refactoring:**

- `cli/helpers.py` contained 5 library functions
- `cli/fetching.py` duplicated `fetch_tdoc()` from core
- Core modules could not be used independently

**After refactoring:**

- CLI contains only CLI-specific code
- All library functions in appropriate core modules
- Core package is a standalone library (installable without CLI extras)