Commit 36b2f5a4 authored by Jan Reimes's avatar Jan Reimes
Browse files

feat(workspaces): normalize workspace names and update performance goals

* Added user story for consistent workspace name normalization.
* Updated performance goals to clarify overhead expectations.
* Enhanced task descriptions for better clarity and organization.
parent 448c71a1
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -19,7 +19,7 @@ pipeline stages and removing/adjusting single-workspace assumptions already pres
**Testing**: pytest (`uv run pytest`), focused AI tests under `tests/ai/`
**Target Platform**: Cross-platform CLI (Windows/Linux/macOS)
**Project Type**: Single Python repository (library + CLI)
**Performance Goals**: Workspace resolution and membership filtering add negligible overhead (target <10% vs current single-scope flow)
**Performance Goals**: Workspace resolution and membership filtering add negligible overhead (target \<10% vs current single-scope flow)
**Constraints**: Maintain backward-compatible default behavior via `default` workspace; avoid breaking completed AI pipeline stages
**Scale/Scope**: Multiple workspaces per repository, each with selected subsets of TDocs/specs/other files

+13 −0
Original line number Diff line number Diff line
@@ -33,6 +33,19 @@ As a user building GraphRAG knowledge bases, I want to create named workspaces t
1. **Given** a repository containing multiple TDocs/specs/files, **When** a workspace is created with a selected subset, **Then** only that subset is registered in the workspace corpus.
1. **Given** two workspaces with different file selections, **When** each workspace is queried for its corpus members, **Then** no file appears in a workspace unless it was selected for that workspace.

### User Story 1.1 - Workspace Name Normalization (Priority: P1)

As a user, I want workspace names to be normalized consistently regardless of casing, so I can reliably reference workspaces without worrying about exact capitalization.

**Normalization Rule**: All workspace names MUST be converted to lowercase before storage and comparison. For example, "MyProject", "myproject", and "MYPROJECT" all refer to the same workspace.

**Independent Test**: Can be tested by creating workspaces with different casing variants and verifying they resolve to the same lowercase identifier.

**Acceptance Scenarios**:

1. **Given** a workspace name with mixed-case characters (e.g., "MyWorkspace"), **When** the workspace is created, **Then** it is stored as "myworkspace".
1. **Given** a request for workspace "MyWorkspace", **When** querying for workspace "myworkspace", **Then** both resolve to the same normalized identifier.

______________________________________________________________________

### User Story 2 - Use Default Workspace Automatically (Priority: P2)
+8 −6
Original line number Diff line number Diff line
@@ -53,8 +53,9 @@ ______________________________________________________________________
- [ ] T017 [US1] Implement create/list/get workspace operations in `src/tdoc_crawler/ai/operations/workspaces.py`
- [ ] T018 [US1] Implement add/remove/list workspace members in `src/tdoc_crawler/ai/operations/workspaces.py`
- [ ] T019 [US1] Persist and query workspace/member data in `src/tdoc_crawler/ai/storage.py`
- [ ] T020 [US1] Add `ai workspace` command group and member management delegation in `src/tdoc_crawler/cli/ai.py`
- [ ] T020 [US1] Add `ai workspace` command group and member management delegation in `src/tdoc_crawler/cli/ai.py`, including `--json` output mode per FR-009
- [ ] T021 [US1] Export workspace management API in `src/tdoc_crawler/ai/__init__.py`
- [ ] T021a [US1] Implement workspace deletion with artifact preservation (deleting workspace X must not remove artifacts from workspace Y) in `src/tdoc_crawler/ai/operations/workspaces.py` and `src/tdoc_crawler/ai/storage.py`

**Checkpoint**: User Story 1 is independently functional and testable.

@@ -120,6 +121,7 @@ ______________________________________________________________________
- [ ] T043 [P] Sync finalized contract examples and non-functional descriptions in `specs/001-graphrag-workspaces/contracts/workspace-api.openapi.yaml`
- [ ] T044 Run Ruff/Ty fixes for touched modules in `src/tdoc_crawler/ai/models.py`, `src/tdoc_crawler/ai/storage.py`, `src/tdoc_crawler/ai/operations/pipeline.py`, `src/tdoc_crawler/ai/operations/workspaces.py`, and `src/tdoc_crawler/cli/ai.py`
- [ ] T045 [P] Execute and stabilize focused AI tests in `tests/ai/test_ai_workspaces.py`, `tests/ai/test_ai_workspace_contract.py`, `tests/ai/test_ai_pipeline.py`, `tests/ai/test_ai_storage.py`, and `tests/ai/test_ai_cli.py`
- [ ] T046 [P] Validate SC-003 performance and scale: generate test dataset (30+ source items across 3+ workspaces, 8+ items per workspace, mixed docx/pdf/md/txt), measure workspace creation + corpus registration time, verify completion under 2 minutes per SC-003

______________________________________________________________________

@@ -179,15 +181,15 @@ ______________________________________________________________________
### MVP First (US1 only)

1. Complete Phase 1 and Phase 2.
2. Deliver Phase 3 (US1) end-to-end.
3. Validate workspace isolation independently before moving on.
1. Deliver Phase 3 (US1) end-to-end.
1. Validate workspace isolation independently before moving on.

### Incremental Delivery

1. Add US1 (workspace isolation).
2. Add US2 (`default` fallback).
3. Add US3 (workspace-scoped knowledge-base construction).
4. Finish with Phase 6 polish and quality gates.
1. Add US2 (`default` fallback).
1. Add US3 (workspace-scoped knowledge-base construction).
1. Finish with Phase 6 polish and quality gates.

### Existing Code Consideration

+2 −2
Original line number Diff line number Diff line
@@ -61,6 +61,6 @@

## Relationships

- Specification 1---* SpecificationSourceRecord
- Specification 1---* SpecificationVersion
- Specification 1---\* SpecificationSourceRecord
- Specification 1---\* SpecificationVersion
- SpecificationVersion 1---0..1 SpecificationDownload
+1 −1
Original line number Diff line number Diff line
@@ -22,7 +22,7 @@ beautifulsoup4, lxml, pandas, python-calamine, xlsxwriter, zipinspect, hishel
**Testing**: pytest, pytest-asyncio
**Target Platform**: Cross-platform CLI (Windows, macOS, Linux)
**Project Type**: single
**Performance Goals**: Query known spec in <2s; crawl success >=95% for known specs
**Performance Goals**: Query known spec in \<2s; crawl success >=95% for known specs
**Constraints**: Text output only for spec commands; no `print`; use `pathlib` and
logging; Ruff and Ty clean; doc-only fallback to full zip on mismatch
**Scale/Scope**: 10k+ specs, four new commands, two metadata sources
Loading