feat(workspaces): normalize workspace names and update performance goals (36b2f5a4) · Commits · Jan Reimes / 3gpp-crawler

specs/001-graphrag-workspaces/plan.md

+1 −1

Original line number	Diff line number	Diff line
		@@ -19,7 +19,7 @@ pipeline stages and removing/adjusting single-workspace assumptions already pres
		Testing: pytest (`uv run pytest`), focused AI tests under `tests/ai/`
		Target Platform: Cross-platform CLI (Windows/Linux/macOS)
		Project Type: Single Python repository (library + CLI)
		Performance Goals: Workspace resolution and membership filtering add negligible overhead (target <10% vs current single-scope flow)
		Performance Goals: Workspace resolution and membership filtering add negligible overhead (target \<10% vs current single-scope flow)
		Constraints: Maintain backward-compatible default behavior via `default` workspace; avoid breaking completed AI pipeline stages
		Scale/Scope: Multiple workspaces per repository, each with selected subsets of TDocs/specs/other files

specs/001-graphrag-workspaces/spec.md

+13 −0

Original line number	Diff line number	Diff line
		@@ -33,6 +33,19 @@ As a user building GraphRAG knowledge bases, I want to create named workspaces t
		1. Given a repository containing multiple TDocs/specs/files, When a workspace is created with a selected subset, Then only that subset is registered in the workspace corpus.
		1. Given two workspaces with different file selections, When each workspace is queried for its corpus members, Then no file appears in a workspace unless it was selected for that workspace.

		### User Story 1.1 - Workspace Name Normalization (Priority: P1)

		As a user, I want workspace names to be normalized consistently regardless of casing, so I can reliably reference workspaces without worrying about exact capitalization.

		Normalization Rule: All workspace names MUST be converted to lowercase before storage and comparison. For example, "MyProject", "myproject", and "MYPROJECT" all refer to the same workspace.

		Independent Test: Can be tested by creating workspaces with different casing variants and verifying they resolve to the same lowercase identifier.

		Acceptance Scenarios:

		1. Given a workspace name with mixed-case characters (e.g., "MyWorkspace"), When the workspace is created, Then it is stored as "myworkspace".
		1. Given a request for workspace "MyWorkspace", When querying for workspace "myworkspace", Then both resolve to the same normalized identifier.

		______________________________________________________________________

		### User Story 2 - Use Default Workspace Automatically (Priority: P2)

specs/001-graphrag-workspaces/tasks.md

+8 −6

Original line number	Diff line number	Diff line
		@@ -53,8 +53,9 @@ ______________________________________________________________________
		- [ ] T017 [US1] Implement create/list/get workspace operations in `src/tdoc_crawler/ai/operations/workspaces.py`
		- [ ] T018 [US1] Implement add/remove/list workspace members in `src/tdoc_crawler/ai/operations/workspaces.py`
		- [ ] T019 [US1] Persist and query workspace/member data in `src/tdoc_crawler/ai/storage.py`
		- [ ] T020 [US1] Add `ai workspace` command group and member management delegation in `src/tdoc_crawler/cli/ai.py`
		- [ ] T020 [US1] Add `ai workspace` command group and member management delegation in `src/tdoc_crawler/cli/ai.py`, including `--json` output mode per FR-009
		- [ ] T021 [US1] Export workspace management API in `src/tdoc_crawler/ai/__init__.py`
		- [ ] T021a [US1] Implement workspace deletion with artifact preservation (deleting workspace X must not remove artifacts from workspace Y) in `src/tdoc_crawler/ai/operations/workspaces.py` and `src/tdoc_crawler/ai/storage.py`

		Checkpoint: User Story 1 is independently functional and testable.

		@@ -120,6 +121,7 @@ ______________________________________________________________________
		- [ ] T043 [P] Sync finalized contract examples and non-functional descriptions in `specs/001-graphrag-workspaces/contracts/workspace-api.openapi.yaml`
		- [ ] T044 Run Ruff/Ty fixes for touched modules in `src/tdoc_crawler/ai/models.py`, `src/tdoc_crawler/ai/storage.py`, `src/tdoc_crawler/ai/operations/pipeline.py`, `src/tdoc_crawler/ai/operations/workspaces.py`, and `src/tdoc_crawler/cli/ai.py`
		- [ ] T045 [P] Execute and stabilize focused AI tests in `tests/ai/test_ai_workspaces.py`, `tests/ai/test_ai_workspace_contract.py`, `tests/ai/test_ai_pipeline.py`, `tests/ai/test_ai_storage.py`, and `tests/ai/test_ai_cli.py`
		- [ ] T046 [P] Validate SC-003 performance and scale: generate test dataset (30+ source items across 3+ workspaces, 8+ items per workspace, mixed docx/pdf/md/txt), measure workspace creation + corpus registration time, verify completion under 2 minutes per SC-003

		______________________________________________________________________

		@@ -179,15 +181,15 @@ ______________________________________________________________________
		### MVP First (US1 only)

		1. Complete Phase 1 and Phase 2.
		2. Deliver Phase 3 (US1) end-to-end.
		3. Validate workspace isolation independently before moving on.
		1. Deliver Phase 3 (US1) end-to-end.
		1. Validate workspace isolation independently before moving on.

		### Incremental Delivery

		1. Add US1 (workspace isolation).
		2. Add US2 (`default` fallback).
		3. Add US3 (workspace-scoped knowledge-base construction).
		4. Finish with Phase 6 polish and quality gates.
		1. Add US2 (`default` fallback).
		1. Add US3 (workspace-scoped knowledge-base construction).
		1. Finish with Phase 6 polish and quality gates.

		### Existing Code Consideration

specs/001-specs-crawl-query/data-model.md

+2 −2

Original line number	Diff line number	Diff line
		@@ -61,6 +61,6 @@

		## Relationships

		- Specification 1---* SpecificationSourceRecord
		- Specification 1---* SpecificationVersion
		- Specification 1---\* SpecificationSourceRecord
		- Specification 1---\* SpecificationVersion
		- SpecificationVersion 1---0..1 SpecificationDownload

specs/001-specs-crawl-query/plan.md

+1 −1

Original line number	Diff line number	Diff line
		@@ -22,7 +22,7 @@ beautifulsoup4, lxml, pandas, python-calamine, xlsxwriter, zipinspect, hishel
		Testing: pytest, pytest-asyncio
		Target Platform: Cross-platform CLI (Windows, macOS, Linux)
		Project Type: single
		Performance Goals: Query known spec in <2s; crawl success >=95% for known specs
		Performance Goals: Query known spec in \<2s; crawl success >=95% for known specs
		Constraints: Text output only for spec commands; no `print`; use `pathlib` and
		logging; Ruff and Ty clean; doc-only fallback to full zip on mismatch
		Scale/Scope: 10k+ specs, four new commands, two metadata sources

Original line number	Diff line number	Diff line
		@@ -19,7 +19,7 @@ pipeline stages and removing/adjusting single-workspace assumptions already pres
		Testing: pytest (`uv run pytest`), focused AI tests under `tests/ai/`
		Target Platform: Cross-platform CLI (Windows/Linux/macOS)
		Project Type: Single Python repository (library + CLI)
		Performance Goals: Workspace resolution and membership filtering add negligible overhead (target <10% vs current single-scope flow)
		Performance Goals: Workspace resolution and membership filtering add negligible overhead (target \<10% vs current single-scope flow)
		Constraints: Maintain backward-compatible default behavior via `default` workspace; avoid breaking completed AI pipeline stages
		Scale/Scope: Multiple workspaces per repository, each with selected subsets of TDocs/specs/other files

Original line number	Diff line number	Diff line
		@@ -22,7 +22,7 @@ beautifulsoup4, lxml, pandas, python-calamine, xlsxwriter, zipinspect, hishel
		Testing: pytest, pytest-asyncio
		Target Platform: Cross-platform CLI (Windows, macOS, Linux)
		Project Type: single
		Performance Goals: Query known spec in <2s; crawl success >=95% for known specs
		Performance Goals: Query known spec in \<2s; crawl success >=95% for known specs
		Constraints: Text output only for spec commands; no `print`; use `pathlib` and
		logging; Ruff and Ty clean; doc-only fallback to full zip on mismatch
		Scale/Scope: 10k+ specs, four new commands, two metadata sources