🧑‍💻 instructions: update AGENTS.md to reflect recent codebase changes (0e0656a1) · Commits · Jan Reimes / 3gpp-crawler

AGENTS.md

+102 −13

Original line number	Diff line number	Diff line
		@@ -222,19 +222,20 @@ src/tdoc_crawler/
		├── crawlers/ # Web scraping and HTTP crawling logic
		│ ├── __init__.py # Re-exports all public symbols (includes TDOC_PATTERN, EXCLUDED_DIRS, TDOC_SUBDIRS)
		│ ├── tdocs.py # TDocCrawler - HTTP directory traversal, TDoc discovery, subdirectory detection
		│ ├── meetings.py # MeetingCrawler - HTML parsing, date extraction
		│ ├── meetings.py # MeetingCrawler - HTML parsing, date extraction, alias normalization
		│ ├── ...
		│ └── portal.py # PortalSession - 3GPP portal authentication, TDoc metadata fetching
		├── database/ # Database schema and operations (modular)
		│ ├── __init__.py # Re-exports TDocDatabase and connection utilities
		│ ├── connection.py # TDocDatabase context manager and facade
		│ ├── connection.py # TDocDatabase context manager and facade, bulk operations with progress
		│ ├── tdocs.py # TDoc-specific queries and operations
		│ ├── ...
		│ └── statistics.py # Statistics and crawl log queries
		├── cli/ # CLI commands and helpers (modular)
		│ ├── app.py # Typer application and command registration
		│ ├── __init__.py # Package initialization
		│ ├── app.py # Typer application and command registration (6 commands)
		│ ├── helpers.py # Path resolution, credential handling, working group inference
		│ ├── fetching.py # Targeted fetch logic and portal orchestration
		│ ├── helpers.py # Path/credentials resolution, fuzzy matching
		│ └── printing.py # Output formatting (table, JSON, YAML, CSV)
		├── __init__.py # Package initialization
		└── __main__.py # Entry point for `python -m tdoc_crawler`
		@@ -314,6 +315,7 @@ metadata = fetch_tdoc_metadata("R1-2301234", credentials)
		### File Size Guidelines

		When splitting modules:

		- Base utilities: 20-70 lines
		- Model files: 80-150 lines
		- Crawler files: 150-350 lines
		@@ -482,6 +484,7 @@ def query_tdocs(
		```

		Key Features:

		- Accepts multiple TDoc IDs (case-insensitive)
		- Supports filtering by working group(s)
		- Output formats: `table`, `json`, `yaml`, `csv`
		@@ -503,6 +506,7 @@ def crawl_tdocs(
		limit_meetings: Annotated[int \| None, typer.Option(...)] = None,
		limit_tdocs: Annotated[int \| None, typer.Option(...)] = None,
		force_revalidate: Annotated[bool, typer.Option("--force-revalidate")] = False,
		clear_tdocs: Annotated[bool, typer.Option("--clear-tdocs")] = False,
		cache_dir: Annotated[Path \| None, typer.Option(...)] = None,
		db_file: Annotated[Path \| None, typer.Option(...)] = None,
		eol_username: Annotated[str \| None, typer.Option(...)] = None,
		@@ -511,9 +515,11 @@ def crawl_tdocs(
		```

		Key Features:

		- Filters: working group, subgroup, meeting IDs, date range
		- Limits: meetings and TDocs per crawl
		- Force revalidation: Re-check existing TDocs
		- Clear TDocs: Delete all TDoc records before crawling
		- Requires meetings DB to be populated first
		- Command alias: `ct`

		@@ -528,6 +534,7 @@ def crawl_meetings(
		limit_meetings: Annotated[int \| None, typer.Option(...)] = None,
		limit_meetings_per_wg: Annotated[int \| None, typer.Option(...)] = None,
		force_update: Annotated[bool, typer.Option("--force-update")] = False,
		clear_db: Annotated[bool, typer.Option("--clear-db")] = False,
		cache_dir: Annotated[Path \| None, typer.Option(...)] = None,
		db_file: Annotated[Path \| None, typer.Option(...)] = None,
		eol_username: Annotated[str \| None, typer.Option(...)] = None,
		@@ -536,9 +543,11 @@ def crawl_meetings(
		```

		Key Features:

		- Filter by working group(s)
		- Limit total meetings or per working group
		- Incremental updates: Skip existing unless `--force-update`
		- Clear database: Delete all meetings and TDocs before crawling
		- Prerequisite for `crawl-tdocs` command
		- Command alias: `cm`

		@@ -563,6 +572,7 @@ def query_meetings(
		```

		Key Features:

		- Filters: working group, subgroup, meeting IDs, date range
		- Sorting: By any field, ascending/descending
		- Output formats: `table`, `json`, `yaml`, `csv`
		@@ -584,6 +594,7 @@ def open_tdoc(
		```

		Key Features:

		- Downloads TDoc from FTP if not cached
		- Unzips to cache directory (deletes .zip after)
		- Opens in system default application
		@@ -603,6 +614,7 @@ def stats(
		```

		Key Features:

		- Shows: Total TDocs, validated TDocs, meetings, working groups
		- Displays breakdown by working group
		- Shows recent crawling activity
		@@ -620,15 +632,32 @@ def stats(
		\| `output_format` \| `table` \| - \|

		Helper Functions:

		- `resolve_cache_dir()`: Resolves cache directory from CLI/env/default
		- `resolve_db_file()`: Resolves database file path
		- `get_credentials()`: Gets credentials from CLI/env/prompt
		- `infer_working_groups_from_subgroups()`: Infers working groups from subgroup codes
		- `parse_working_groups()`: Normalizes working group names and handles inference
		- `parse_subgroups()`: Normalizes subgroup aliases to canonical forms

		Credential Handling:

		1. Check CLI parameters (`--eol-username`, `--eol-password`)
		2. Check environment variables (`EOL_USERNAME`, `EOL_PASSWORD`)
		3. Prompt user interactively if not found

		Working Group Inference from Subgroups:

		When only subgroups are specified (without explicit `--working-group`), the CLI should infer the working groups:

		- `S*` subgroups (S4, S1, SP, etc.) → WorkingGroup.SA
		- `R*` subgroups (R1, R2, RP, etc.) → WorkingGroup.RAN
		- `C*` subgroups (C1, C2, CP, etc.) → WorkingGroup.CT

		This enables intuitive filtering like `-s S4 --limit-meetings 3` to crawl only SA, not all three working groups.

		Implementation Location: `src/tdoc_crawler/cli/helpers.py`

		Helper Function Implementations:

		```python
		@@ -709,13 +738,37 @@ def normalize_working_group_alias(value: str) -> str:
		upper = value.upper().strip()
		return PLENARY_ALIASES.get(upper, upper)

		def normalize_subgroup_alias(value: str) -> str:
		"""Expand plenary alias and validate against known subgroups."""
		normalized = normalize_working_group_alias(value)
		# If it's a plenary alias, append "P" (RP→RP, SP→SP, CP→CP)
		if value.upper() in PLENARY_ALIASES:
		return value.upper() # Keep original form
		return normalized
		def normalize_subgroup_alias(alias: str) -> list[str]:
		"""Normalize subgroup aliases to canonical short-form names.

		Transforms long-form names to short form (SA4→S4, RAN1→R1, CT3→C3).
		Returns list of matching canonical subgroup codes.
		Returns the normalized alias in a list if no direct match found.
		"""
		alias_upper = alias.strip().upper()
		matches: list[str] = []

		# Normalize long form to short form
		normalized_alias = alias_upper
		if alias_upper.startswith("SA") and len(alias_upper) > 2:
		normalized_alias = "S" + alias_upper[2:] # SA4 → S4
		elif alias_upper.startswith("RAN") and len(alias_upper) > 3:
		normalized_alias = "R" + alias_upper[3:] # RAN1 → R1
		elif alias_upper.startswith("CT") and len(alias_upper) > 2:
		normalized_alias = "C" + alias_upper[2:] # CT1 → C1

		# Check all registries for matches
		for _working_group, codes in MEETING_CODE_REGISTRY.items():
		for code, subgroup in codes:
		if code.upper() == normalized_alias or code.upper() == alias_upper:
		if subgroup:
		matches.append(subgroup)
		elif subgroup and subgroup.upper() == alias_upper:
		matches.append(subgroup)

		if not matches:
		matches.append(alias_upper)
		return matches
		```

		Usage in CLI:
		@@ -724,10 +777,35 @@ Usage in CLI:
		# User types: --working-group RP
		# Parsed as: WorkingGroup.RAN

		# User types: --sub-group RP
		# Stored as: "RP" (RAN Plenary code)
		# User types: --sub-group SA4
		# Normalized to: ["S4"] (canonical short form)
		```

		### Progress Bar Implementation

		Progress bars provide real-time feedback during long-running crawl operations. The implementation pattern is as follows:

		Database Layer (`database/connection.py`):

		- `bulk_upsert_meetings()` and `bulk_upsert_tdocs()` methods accept an optional `progress_callback: Callable[[float, float], None]` parameter
		- Convert the input `Iterable` to a `list` to determine total count before processing
		- Invoke the callback after each item with `(completed, total)` where both are floats
		- This ensures accurate progress tracking tied to actual database operations, not just collection phases

		CLI Layer (`cli/app.py`):

		- Use Rich's `Progress` context manager with `SpinnerColumn`, `BarColumn`, and `MofNCompleteColumn` for visual progress
		- Create a callback function that updates the progress task: `def update_progress(completed: float, total: float) -> None: progress.update(task, completed=completed, total=total)`
		- Pass this callback to the crawler's `crawl()` method
		- Result: Users see deterministic progress bars showing "N/M" format (e.g., "365/365 meetings", "1745/1745 TDocs")

		Benefits:

		- Crawler remains UI-agnostic (no Rich dependency in crawler layer)
		- Callback optional for non-interactive use cases
		- Progress updates reflect completed database operations, not just items collected
		- Provides accurate estimations of remaining work

		### HTTP Directory Crawling and File Detection

		HTTP Session Management:
		@@ -788,6 +866,7 @@ EXCLUDED_DIRS_NORMALIZED = {entry.upper() for entry in EXCLUDED_DIRS}
		```

		Pattern Explanation:

		- First character: `[RSC]` - Working group (R=RAN, S=SA, C=CT)
		- Second character: `[1-6P]` - Subgroup (1-6) or Plenary (P)
		- Following: `.{4,10}` - 4 to 10 characters (any valid filename characters)
		@@ -866,6 +945,7 @@ def _crawl_meeting(
		```

		Key Points:

		- Always check for subdirectories first
		- Handle both full URLs and relative paths
		- Ensure all URLs end with "/"
		@@ -1076,6 +1156,7 @@ The database consists of five tables with proper foreign key relationships (no S
		#### 2. Meetings Table: `meetings`

		Key Fields:

		- `meeting_id`: 3GPP's unique meeting identifier (integer)
		- `sub_tb`: Foreign key to subworking_groups
		- `files_url`: HTTP URL to FTP directory containing TDocs
		@@ -1141,12 +1222,19 @@ The `TDocDatabase` class is derived from `pydantic_sqlite.DataBase` and provides
		- `insert_tdoc()` / `get_tdoc()`: TDoc CRUD operations with case-insensitive lookup
		- `mark_tdoc_validated()`: Update validation status after portal check
		- `query_tdocs()`: Complex queries with filters (working group, meeting, date range)
		- `query_meetings()`: Complex queries with filters (working group, subgroup, meeting IDs, date range)
		- `clear_tdocs()`: Delete all TDoc records (returns count deleted)
		- `clear_meetings()`: Delete all meeting records (returns count deleted)
		- `clear_all_data()`: Delete all TDocs and meetings (returns tuple of counts)
		- `get_stats()`: Aggregated statistics for CLI `stats` command
		- `bulk_upsert_meetings()`: Insert/update multiple meetings with optional progress callback
		- `bulk_upsert_tdocs()`: Insert/update multiple TDocs with optional progress callback

		Critical Patterns:

		- Return Pydantic models, not raw tuples
		- Handle case-insensitive TDoc IDs via `.upper()` normalization
		- Progress callbacks should use `Callable[[float, float], None]` signature with (completed, total) parameters

		## Testing

		@@ -1312,6 +1400,7 @@ def test_crawl_collects_tdocs(
		```

		Key Differences from FTP Mocking:

		- Mock `requests.Session` not `FTP`
		- Mock `session.get()` to return HTML content
		- Use BeautifulSoup-parseable HTML in mock responses

docs/REVIEW_AND_IMPROVEMENTS_AGENTS_MD.md

+106 −40

Original line number	Diff line number	Diff line
		# Review and Improvements for AGENTS.md
		# Review and Improvements for AGENTS.md# Review and Improvements for AGENTS.md



		Date: 2025-10-23Date: 2025-10-22 (Addendum)

		Summary: This document outlines proposed changes to `AGENTS.md` to align it with the current codebase. The recent implementations of progress bars, subgroup normalization, and CLI helper functions have introduced patterns that are not yet reflected in the instructions.Reviewer: AI Assistant

		Date: 2025-10-22 (Addendum)
		Reviewer: AI Assistant
		Purpose: Update and extend earlier review to incorporate latest codebase refactoring (schema v2, modular CLI/database) and identify any remaining misalignments between `AGENTS.md` and current implementation.

		## 1. Update CLI Helper Functions and Structure

		---

		## 1. Executive Summary
		### Current State in `AGENTS.md`

		The `AGENTS.md` file currently places helper function implementations (`resolve_cache_dir`, `get_credentials`, `_infer_working_groups_from_ids`) inside a single `cli.py` code block. This does not match the current, more modular project structure.## 1. Executive Summary



		The previous review (earlier part of this file) highlighted missing documentation on HTTP crawling, subdirectory detection, fuzzy meeting name matching, and portal authentication. Those concerns have since been largely addressed inside `AGENTS.md` (the document now contains sections titled “HTTP Directory Crawling and File Detection” and subdirectory logic, plus portal module references). However, substantial new divergences have appeared after the recent refactor:
		### Current ImplementationThe previous review (earlier part of this file) highlighted missing documentation on HTTP crawling, subdirectory detection, fuzzy meeting name matching, and portal authentication. Those concerns have since been largely addressed inside `AGENTS.md` (the document now contains sections titled “HTTP Directory Crawling and File Detection” and subdirectory logic, plus portal module references). However, substantial new divergences have appeared after the recent refactor:

		The project has been refactored to use a dedicated `src/tdoc_crawler/cli/helpers.py` module, which contains these helper functions and more. A new helper, `infer_working_groups_from_subgroups`, was also introduced to improve CLI usability.

		High‑impact mismatches:
		1. Database schema in `AGENTS.md` describes an older (pre‑v2) table layout – omits columns now present and still lists non‑existent ones (e.g. `for_value`, `last_validated`, missing `url`, `file_size`, `for_purpose`, `document_type`, `checksum`, `source_path`, `date_updated`, `validation_failed`).
		2. Project structure still assumes monolithic `cli.py` and `database.py` whereas code is now split (e.g. `src/tdoc_crawler/cli/app.py`, `cli/helpers.py`, `cli/fetching.py`, `cli/printing.py` and database submodules `database/schema.py`, `database/connection.py`, `database/tdocs.py`, `database/statistics.py`).
		3. The removal of redundant columns (`working_group`, `subgroup`, `meeting`) from the `tdocs` table (schema v2) is not explicitly documented as a completed normalization step nor its consequences (JOIN-based derivation from `meetings`).
		4. Testing guidance does not warn that foreign key integrity now requires inserting meetings before TDocs (was root cause for earlier failing tests).
		5. Naming consistency: The live schema uses `for_purpose` but `AGENTS.md` still refers to `for_value`; this is a potential source of regenerated incorrect code.

		### Proposed Changes to `AGENTS.md`1. Database schema in `AGENTS.md` describes an older (pre‑v2) table layout – omits columns now present and still lists non‑existent ones (e.g. `for_value`, `last_validated`, missing `url`, `file_size`, `for_purpose`, `document_type`, `checksum`, `source_path`, `date_updated`, `validation_failed`).

		1. Update Project Structure: Modify the `Project Structure` section to show that `cli/` is a submodule containing `app.py` and `helpers.py`.2. Project structure still assumes monolithic `cli.py` and `database.py` whereas code is now split (e.g. `src/tdoc_crawler/cli/app.py`, `cli/helpers.py`, `cli/fetching.py`, `cli/printing.py` and database submodules `database/schema.py`, `database/connection.py`, `database/tdocs.py`, `database/statistics.py`).

		2. Relocate Helper Functions: Move the `Helper Function Implementations` section to describe the contents of `cli/helpers.py`.3. The removal of redundant columns (`working_group`, `subgroup`, `meeting`) from the `tdocs` table (schema v2) is not explicitly documented as a completed normalization step nor its consequences (JOIN-based derivation from `meetings`).

		3. Add New Helper Function: Document the `infer_working_groups_from_subgroups` function and its purpose.4. Testing guidance does not warn that foreign key integrity now requires inserting meetings before TDocs (was root cause for earlier failing tests).

		4. Update `parse_working_groups` Logic: Explain that `parse_working_groups` in `helpers.py` should now accept an optional `subgroups` list to enable inference, and that CLI commands should parse subgroups before working groups.5. Naming consistency: The live schema uses `for_purpose` but `AGENTS.md` still refers to `for_value`; this is a potential source of regenerated incorrect code.

		6. Statistics and helper queries in current code derive working group counts via JOIN; `AGENTS.md` still suggests grouping directly on a removed column.
		7. Crawl log structure changed (new fields: `crawl_type`, `start_time`, `end_time`, `incremental`, `items_added`, `items_updated`, `errors_count`, `status`), but the document lists an older minimal form (`timestamp`, `tdocs_discovered`, etc.).

		If left uncorrected, a coding assistant following the existing `AGENTS.md` would regenerate obsolete schema, reintroduce denormalized columns, misname fields, and write incompatible queries/tests.
		## 2. Enhance Subgroup Alias Normalization Logic7. Crawl log structure changed (new fields: `crawl_type`, `start_time`, `end_time`, `incremental`, `items_added`, `items_updated`, `errors_count`, `status`), but the document lists an older minimal form (`timestamp`, `tdocs_discovered`, etc.).



		### Current State in `AGENTS.md`If left uncorrected, a coding assistant following the existing `AGENTS.md` would regenerate obsolete schema, reintroduce denormalized columns, misname fields, and write incompatible queries/tests.

		The `Working Group Alias Handling` section provides an inaccurate and overly simplistic implementation for `normalize_subgroup_alias`. It suggests the function just calls `normalize_working_group_alias`, which is incorrect.

		---

		## 2. Recommended Structural Updates to AGENTS.md
		### Current Implementation

		The actual `normalize_subgroup_alias` function in `src/tdoc_crawler/crawlers/meetings.py` is more sophisticated. It correctly transforms long-form subgroup names to their canonical short-form equivalents (e.g., `SA4` → `S4`, `RAN1` → `R1`).## 2. Recommended Structural Updates to AGENTS.md



		### Proposed Changes to `AGENTS.md`\| Area \| Current Doc State \| Required Update \| Rationale \|

		- Replace `normalize_subgroup_alias`: Update the example in the `Working Group Alias Handling` section to reflect the current logic, which includes transforming prefixes (SA→S, RAN→R, CT→C) and returning a list of matching canonical names. This ensures the assistant generates the correct, more robust function.\|------\|-------------------\|-----------------\|-----------\|

		\| Area \| Current Doc State \| Required Update \| Rationale \|
		\|------\|-------------------\|-----------------\|-----------\|
		\| Project Structure \| Single `cli.py`, `database.py` mentioned \| Replace with granular module list (CLI and database subpackages) \| Align with refactored modular design, aids navigation \|
		\| Database Schema (TDocs) \| Legacy column set \| Provide complete schema v2 definition + highlight removed columns \| Prevent regeneration of stale schema \|

		## 3. Refine Progress Bar Implementation Pattern\| Database Schema (TDocs) \| Legacy column set \| Provide complete schema v2 definition + highlight removed columns \| Prevent regeneration of stale schema \|

		\| Database Schema (Meetings/Subworking Groups) \| Partially aligned \| Add indices and schema_meta description \| Document versioning & optimization \|
		\| Crawl Log Table \| Old minimal fields \| Update to new expanded audit fields \| Ensure logging features are represented \|
		\| Normalization Rationale \| Not explicit \| Add subsection “Schema v2 Normalization Decisions” \| Preserve architectural intent \|

		### Current State in `AGENTS.md`\| Crawl Log Table \| Old minimal fields \| Update to new expanded audit fields \| Ensure logging features are represented \|

		The `AGENTS.md` file does not contain instructions for implementing progress bars. The recent work introduced a specific, reusable pattern that should be documented.\| Normalization Rationale \| Not explicit \| Add subsection “Schema v2 Normalization Decisions” \| Preserve architectural intent \|

		\| Field Naming \| Mixed (`for_value`) \| Standardize on `for_purpose` \| Consistency & correctness \|
		\| Working Group Derivation \| Implicit \| Add explicit JOIN pattern description \| Prevent reintroduction of removed columns \|
		\| Statistics Queries \| Assume `working_group` column exists \| Show JOIN-based aggregation example \| Match current implementation \|
		\| Test Patterns \| Missing FK insertion guidance \| Add “Foreign Key Preparation” note \| Avoid common test setup errors \|
		\| Migration Guidance \| Absent \| Add minimal instructions for upgrading from schema v1→v2 \| Helps future contributors \|

		---
		### Current Implementation\| Working Group Derivation \| Implicit \| Add explicit JOIN pattern description \| Prevent reintroduction of removed columns \|

		## 3. Detailed Change Proposals
		Progress is tracked at the database level, not the collection level.\| Statistics Queries \| Assume `working_group` column exists \| Show JOIN-based aggregation example \| Match current implementation \|

		### 3.1 Project Structure Section
		Replace the existing tree with:
		- `bulk_upsert_*` methods in `database/connection.py` accept a `Callable[[float, float], None]` callback.\| Test Patterns \| Missing FK insertion guidance \| Add “Foreign Key Preparation” note \| Avoid common test setup errors \|

		- They convert the input `Iterable` to a `list` to get a `total` count.\| Migration Guidance \| Absent \| Add minimal instructions for upgrading from schema v1→v2 \| Helps future contributors \|

		- The callback is invoked with `(completed, total)` on each iteration.

		- The CLI uses Rich's `Progress` with `BarColumn` and `MofNCompleteColumn` to display a deterministic progress bar.---



		### Proposed Changes to `AGENTS.md`## 3. Detailed Change Proposals

		- Add a New Section: Create a new section under `Implementation Patterns` titled "Progress Bar Implementation".

		- Document the Pattern:### 3.1 Project Structure Section

		- Explain that progress should be handled in the database layer for accuracy.Replace the existing tree with:

		- Provide the `Callable[[float, float], None]` signature.

		- Show a conceptual example of the `bulk_upsert_*` method and the corresponding `Progress` block in the CLI.```text

		- Emphasize that this pattern provides a much better user experience than an indeterminate spinner.src/tdoc_crawler/

		```text
		src/tdoc_crawler/
		cli/
		app.py # Typer application & command registration

		## 4. Add New CLI Flags app.py # Typer application & command registration

		fetching.py # Targeted fetch & portal orchestration
		helpers.py # Path/credentials resolution, fuzzy resolution helpers
		printing.py # Output formatting (table/json/yaml/csv)

		### Current State in `AGENTS.md` helpers.py # Path/credentials resolution, fuzzy resolution helpers

		The `CLI Commands Implementation` section is missing the `--clear-db` and `--clear-tdocs` flags that were recently added to the `crawl-meetings` and `crawl-tdocs` commands, respectively. printing.py # Output formatting (table/json/yaml/csv)

		crawlers/
		tdocs.py # HTTP traversal, subdirectory detection, pattern filtering
		meetings.py # Meeting list parsing & date extraction
		portal.py # PortalSession authentication + metadata fetch
		database/

		### Current Implementation tdocs.py # HTTP traversal, subdirectory detection, pattern filtering

		- `crawl-meetings` has a `--clear-db` flag. meetings.py # Meeting list parsing & date extraction

		- `crawl-tdocs` has a `--clear-tdocs` flag. portal.py # PortalSession authentication + metadata fetch

		- The `TDocDatabase` class has corresponding `clear_all_data()` and `clear_tdocs()` methods. database/

		schema.py # SCHEMA_VERSION, table DDL, reference population
		connection.py # TDocDatabase facade/context manager
		tdocs.py # TDoc CRUD & query helpers (JOIN logic)
		statistics.py # Aggregated reporting & crawl log queries
		models/ # Pydantic models & enums

		### Proposed Changes to `AGENTS.md` connection.py # TDocDatabase facade/context manager

		- Update Command Signatures: Add the new boolean flags to the `crawl_meetings` and `crawl_tdocs` function signatures in the `CLI Commands Implementation` section. tdocs.py # TDoc CRUD & query helpers (JOIN logic)

		- Update Key Features: Add a bullet point to the "Key Features" list for each command, explaining what the new flag does (e.g., "Clear all TDoc records before crawling"). statistics.py # Aggregated reporting & crawl log queries

		- Document Database Methods: Briefly mention the `clear_*` methods in the `Database Helper Methods` section. models/ # Pydantic models & enums

		__init__.py
		__main__.py

		By incorporating these changes, `AGENTS.md` will be more aligned with the current state of the project, enabling a coding assistant to reproduce the existing functionality more accurately. __main__.py

		```

		Add an explicit note: “Monolithic `cli.py` and `database.py` referenced elsewhere are legacy; new contributions MUST use the modular structure above.”