Commit 9bc68f1e authored by Jan Reimes's avatar Jan Reimes
Browse files

feat(specs): update constitution and specifications for crawl/query

- Clarify JSON output as optional for initial release of crawl/query specs.
- Update version to 1.2.0 in constitution.
- Modify implementation plan to reflect text output only for spec commands.
- Adjust feature requirements to specify text output and logging practices.
- Add task for logging crawl/download outcomes in tasks list.
parent 10ef1080
Loading
Loading
Loading
Loading
+4 −1
Original line number Diff line number Diff line
@@ -27,6 +27,9 @@ must accept text input via stdin, arguments, or files, and must emit text output
stdout. Every CLI must support a JSON output mode for structured data exchange; errors
go to stderr.

Exception: For the `crawl-specs`, `query-specs`, `checkout-spec`, and `open-spec`
commands, JSON output is optional for the initial release of the specs feature.

### III. Test-Driven Development (Non-Negotiable)
Implementation code must not be written until unit tests are written, reviewed by the
user, and explicitly approved. The approved tests must be executed and verified to fail
@@ -72,4 +75,4 @@ across the toolchain.
- Compliance is verified in specs, plans, task lists, and code reviews; non-compliant
  changes must be blocked or accompanied by an approved amendment.

**Version**: 1.1.0 | **Ratified**: 2026-02-05 | **Last Amended**: 2026-02-05
**Version**: 1.2.0 | **Ratified**: 2026-02-05 | **Last Amended**: 2026-02-05
+6 −8
Original line number Diff line number Diff line
@@ -3,7 +3,7 @@
**Branch**: `001-specs-crawl-query` | **Date**: 2026-02-05 | **Spec**: [spec](spec.md)
**Input**: Feature specification from `/specs/001-specs-crawl-query/spec.md`

**Note**: This plan follows the updated constitution (library-first, CLI JSON output,
**Note**: This plan follows the updated constitution (library-first, CLI text output,
TDD gates, and Python standards).

## Summary
@@ -23,7 +23,7 @@ beautifulsoup4, lxml, pandas, python-calamine, xlsxwriter, zipinspect, hishel
**Target Platform**: Cross-platform CLI (Windows, macOS, Linux)
**Project Type**: single
**Performance Goals**: Query known spec in <2s; crawl success >=95% for known specs
**Constraints**: JSON output for all spec commands; no `print`; use `pathlib` and
**Constraints**: Text output only for spec commands; no `print`; use `pathlib` and
logging; Ruff and Ty clean; doc-only fallback to full zip on mismatch
**Scale/Scope**: 10k+ specs, four new commands, two metadata sources

@@ -32,7 +32,7 @@ logging; Ruff and Ty clean; doc-only fallback to full zip on mismatch
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*

- [x] Library-first boundary documented (standalone library + integration points).
- [x] CLI contract defined (text input/output + JSON mode).
- [x] CLI contract defined (text input/output with JSON optional for spec commands).
- [x] TDD evidence planned (tests written/approved + red phase before implementation).
- [x] Python standards planned (type hints, logging, uv/pyproject, Ruff, Ty, pathlib,
  dataclasses where appropriate, Typer CLI).
@@ -52,8 +52,8 @@ It encapsulates parsing/normalization, source fetching, and download orchestrati

**CLI contract**: Add commands in `tdoc_crawler/cli/app.py` with arguments wired via
`tdoc_crawler/cli/args.py` using Typer `Annotated` patterns. Input supports `--spec`,
`--spec-file`, or stdin (`--spec -`). Output defaults to Rich tables and supports
`--output json|yaml` for structured output. Errors are reported on stderr.
`--spec-file`, or stdin (`--spec -`). Output defaults to Rich tables. Errors are
reported on stderr.

## TDD Evidence

@@ -66,7 +66,7 @@ Planned tests (initial red phase):
- `tests/test_specs_normalization.py`: dotted vs undotted normalization rules
- `tests/test_specs_database.py`: upsert/query behavior for new spec tables
- `tests/test_specs_downloads.py`: doc-only selection and fallback to full zip
- `tests/test_specs_cli.py`: CLI parsing, JSON output, and stdin/file input
- `tests/test_specs_cli.py`: CLI parsing and stdin/file input

## Project Structure

@@ -89,8 +89,6 @@ src/tdoc_crawler/
├── cli/
│   ├── app.py
│   └── args.py
├── crawlers/
│   └── specs.py
├── database/
│   └── connection.py
├── models/
+2 −2
Original line number Diff line number Diff line
@@ -113,9 +113,9 @@ As a maintainer, I want to see when metadata differs between 3GPP.org and whatth
- **FR-021**: The system MUST support querying a spec with multiple source records and expose any differences.
- **FR-022**: The system MUST record crawl and download outcomes, including success or failure, for auditing and troubleshooting.
- **FR-023**: Feature functionality MUST be implemented as a standalone library module before CLI integration.
- **FR-024**: The CLI MUST support text output and a JSON output mode for structured results; errors go to stderr.
- **FR-024**: The CLI MUST support text output for results; errors go to stderr.
- **FR-025**: Unit tests MUST be written, user-approved, and verified failing before implementation begins.
- **FR-026**: Python implementations MUST use `pyproject.toml` with `uv`, include type hints and Google-style docstrings for public code, use `logging` instead of `print`, rely on `pathlib` for file paths, and keep Ruff and Ty checks clean without suppressions unless explicitly approved.
- **FR-026**: Python implementations MUST use `pyproject.toml` with `uv`, include type hints and Google-style docstrings for public code, use `logging` instead of `print`, rely on `pathlib` for file paths, keep Ruff and Ty checks clean without suppressions unless explicitly approved, and prefer `dataclasses` for data entries with `__rich_console__` for console rendering. CLI user output MUST use `typer.echo`.

### Key Entities *(include if feature involves data)*

+1 −0
Original line number Diff line number Diff line
@@ -40,6 +40,7 @@ description: "Task list for crawl and query specs feature"
- [ ] T005 Implement spec number normalization utilities in src/tdoc_crawler/specs/normalization.py
- [ ] T006 Implement spec Pydantic models and enums in src/tdoc_crawler/models/specs.py
- [ ] T007 Implement specs database tables and upsert/query helpers in src/tdoc_crawler/database/connection.py
- [ ] T007a Implement crawl/download outcome logging for specs in src/tdoc_crawler/database/connection.py
- [ ] T008 Implement spec source fetchers (3GPP + whatthespec) in src/tdoc_crawler/specs/sources/
- [ ] T009 Implement SpecCatalog facade in src/tdoc_crawler/specs/catalog.py
- [ ] T010 Implement specs query filters and result shaping in src/tdoc_crawler/specs/query.py