feat(specs): update constitution and specifications for crawl/query (9bc68f1e) · Commits · Jan Reimes / 3gpp-crawler

.specify/memory/constitution.md

+4 −1

Original line number	Diff line number	Diff line
		@@ -27,6 +27,9 @@ must accept text input via stdin, arguments, or files, and must emit text output
		stdout. Every CLI must support a JSON output mode for structured data exchange; errors
		go to stderr.

		Exception: For the `crawl-specs`, `query-specs`, `checkout-spec`, and `open-spec`
		commands, JSON output is optional for the initial release of the specs feature.

		### III. Test-Driven Development (Non-Negotiable)
		Implementation code must not be written until unit tests are written, reviewed by the
		user, and explicitly approved. The approved tests must be executed and verified to fail
		@@ -72,4 +75,4 @@ across the toolchain.
		- Compliance is verified in specs, plans, task lists, and code reviews; non-compliant
		changes must be blocked or accompanied by an approved amendment.

		Version: 1.1.0 \| Ratified: 2026-02-05 \| Last Amended: 2026-02-05
		Version: 1.2.0 \| Ratified: 2026-02-05 \| Last Amended: 2026-02-05

specs/001-specs-crawl-query/plan.md

+6 −8

Original line number	Diff line number	Diff line
		@@ -3,7 +3,7 @@
		Branch: `001-specs-crawl-query` \| Date: 2026-02-05 \| Spec: [spec](spec.md)
		Input: Feature specification from `/specs/001-specs-crawl-query/spec.md`

		Note: This plan follows the updated constitution (library-first, CLI JSON output,
		Note: This plan follows the updated constitution (library-first, CLI text output,
		TDD gates, and Python standards).

		## Summary
		@@ -23,7 +23,7 @@ beautifulsoup4, lxml, pandas, python-calamine, xlsxwriter, zipinspect, hishel
		Target Platform: Cross-platform CLI (Windows, macOS, Linux)
		Project Type: single
		Performance Goals: Query known spec in <2s; crawl success >=95% for known specs
		Constraints: JSON output for all spec commands; no `print`; use `pathlib` and
		Constraints: Text output only for spec commands; no `print`; use `pathlib` and
		logging; Ruff and Ty clean; doc-only fallback to full zip on mismatch
		Scale/Scope: 10k+ specs, four new commands, two metadata sources

		@@ -32,7 +32,7 @@ logging; Ruff and Ty clean; doc-only fallback to full zip on mismatch
		GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

		- [x] Library-first boundary documented (standalone library + integration points).
		- [x] CLI contract defined (text input/output + JSON mode).
		- [x] CLI contract defined (text input/output with JSON optional for spec commands).
		- [x] TDD evidence planned (tests written/approved + red phase before implementation).
		- [x] Python standards planned (type hints, logging, uv/pyproject, Ruff, Ty, pathlib,
		dataclasses where appropriate, Typer CLI).
		@@ -52,8 +52,8 @@ It encapsulates parsing/normalization, source fetching, and download orchestrati

		CLI contract: Add commands in `tdoc_crawler/cli/app.py` with arguments wired via
		`tdoc_crawler/cli/args.py` using Typer `Annotated` patterns. Input supports `--spec`,
		`--spec-file`, or stdin (`--spec -`). Output defaults to Rich tables and supports
		`--output json\|yaml` for structured output. Errors are reported on stderr.
		`--spec-file`, or stdin (`--spec -`). Output defaults to Rich tables. Errors are
		reported on stderr.

		## TDD Evidence

		@@ -66,7 +66,7 @@ Planned tests (initial red phase):
		- `tests/test_specs_normalization.py`: dotted vs undotted normalization rules
		- `tests/test_specs_database.py`: upsert/query behavior for new spec tables
		- `tests/test_specs_downloads.py`: doc-only selection and fallback to full zip
		- `tests/test_specs_cli.py`: CLI parsing, JSON output, and stdin/file input
		- `tests/test_specs_cli.py`: CLI parsing and stdin/file input

		## Project Structure

		@@ -89,8 +89,6 @@ src/tdoc_crawler/
		├── cli/
		│ ├── app.py
		│ └── args.py
		├── crawlers/
		│ └── specs.py
		├── database/
		│ └── connection.py
		├── models/

specs/001-specs-crawl-query/spec.md

+2 −2

Original line number	Diff line number	Diff line
		@@ -113,9 +113,9 @@ As a maintainer, I want to see when metadata differs between 3GPP.org and whatth
		- FR-021: The system MUST support querying a spec with multiple source records and expose any differences.
		- FR-022: The system MUST record crawl and download outcomes, including success or failure, for auditing and troubleshooting.
		- FR-023: Feature functionality MUST be implemented as a standalone library module before CLI integration.
		- FR-024: The CLI MUST support text output and a JSON output mode for structured results; errors go to stderr.
		- FR-024: The CLI MUST support text output for results; errors go to stderr.
		- FR-025: Unit tests MUST be written, user-approved, and verified failing before implementation begins.
		- FR-026: Python implementations MUST use `pyproject.toml` with `uv`, include type hints and Google-style docstrings for public code, use `logging` instead of `print`, rely on `pathlib` for file paths, and keep Ruff and Ty checks clean without suppressions unless explicitly approved.
		- FR-026: Python implementations MUST use `pyproject.toml` with `uv`, include type hints and Google-style docstrings for public code, use `logging` instead of `print`, rely on `pathlib` for file paths, keep Ruff and Ty checks clean without suppressions unless explicitly approved, and prefer `dataclasses` for data entries with `__rich_console__` for console rendering. CLI user output MUST use `typer.echo`.

		### Key Entities (include if feature involves data)

specs/001-specs-crawl-query/tasks.md

+1 −0

Original line number	Diff line number	Diff line
		@@ -40,6 +40,7 @@ description: "Task list for crawl and query specs feature"
		- [ ] T005 Implement spec number normalization utilities in src/tdoc_crawler/specs/normalization.py
		- [ ] T006 Implement spec Pydantic models and enums in src/tdoc_crawler/models/specs.py
		- [ ] T007 Implement specs database tables and upsert/query helpers in src/tdoc_crawler/database/connection.py
		- [ ] T007a Implement crawl/download outcome logging for specs in src/tdoc_crawler/database/connection.py
		- [ ] T008 Implement spec source fetchers (3GPP + whatthespec) in src/tdoc_crawler/specs/sources/
		- [ ] T009 Implement SpecCatalog facade in src/tdoc_crawler/specs/catalog.py
		- [ ] T010 Implement specs query filters and result shaping in src/tdoc_crawler/specs/query.py

Original line number	Diff line number	Diff line
		@@ -113,9 +113,9 @@ As a maintainer, I want to see when metadata differs between 3GPP.org and whatth
		- FR-021: The system MUST support querying a spec with multiple source records and expose any differences.
		- FR-022: The system MUST record crawl and download outcomes, including success or failure, for auditing and troubleshooting.
		- FR-023: Feature functionality MUST be implemented as a standalone library module before CLI integration.
		- FR-024: The CLI MUST support text output and a JSON output mode for structured results; errors go to stderr.
		- FR-024: The CLI MUST support text output for results; errors go to stderr.
		- FR-025: Unit tests MUST be written, user-approved, and verified failing before implementation begins.
		- FR-026: Python implementations MUST use `pyproject.toml` with `uv`, include type hints and Google-style docstrings for public code, use `logging` instead of `print`, rely on `pathlib` for file paths, and keep Ruff and Ty checks clean without suppressions unless explicitly approved.
		- FR-026: Python implementations MUST use `pyproject.toml` with `uv`, include type hints and Google-style docstrings for public code, use `logging` instead of `print`, rely on `pathlib` for file paths, keep Ruff and Ty checks clean without suppressions unless explicitly approved, and prefer `dataclasses` for data entries with `__rich_console__` for console rendering. CLI user output MUST use `typer.echo`.

		### Key Entities (include if feature involves data)