Commit 756aae1d authored by Jan Reimes's avatar Jan Reimes
Browse files

feat(graphrag-workspaces): add workspace scoping feature and documentation

* Introduce workspace management operations in the AI pipeline.
* Implement CLI commands for workspace creation and member management.
* Ensure default workspace behavior for omitted workspace arguments.
* Validate workspace isolation in processing and querying operations.
* Document user scenarios, requirements, and tasks for implementation.
* Create tests for workspace functionality and ensure backward compatibility.
* Establish clear decision-making rationale for workspace design choices.
parent 4d5784b9
Loading
Loading
Loading
Loading
+6 −1
Original line number Diff line number Diff line
@@ -5,6 +5,10 @@ Auto-generated from all feature plans. Last updated: 2026-02-05
## Active Technologies
- Python 3.14 + typer, rich, pydantic, pydantic-sqlite, requests, (001-specs-crawl-query)
- SQLite via pydantic-sqlite (001-specs-crawl-query)
- Python 3.14 (`requires-python = ">=3.14,<4.0"`) + Docling (DOCX-to-Markdown), sentence-transformers (embeddings), LanceDB (vector + graph storage), litellm (LLM abstraction for Ollama/OpenAI-compatible endpoints) (002-ai-document-processing)
- LanceDB file-based tables for embeddings, summaries, graph nodes/edges, and processing status; existing tdoc-crawler SQLite DB is read-only from the AI module (002-ai-document-processing)
- Python 3.14 + typer, rich, pydantic, lancedb, pyarrow, sentence-transformers (001-graphrag-workspaces)
- LanceDB tables under `.ai/lancedb` + workspace file references (001-graphrag-workspaces)

- Python 3.14 + typer, rich, requests, beautifulsoup4, lxml, pydantic, pydantic-sqlite, hishel, zipinspect (001-specs-crawl-query)

@@ -24,9 +28,10 @@ cd src; pytest; ruff check .
Python 3.14: Follow standard conventions

## Recent Changes
- 001-graphrag-workspaces: Added Python 3.14 + typer, rich, pydantic, lancedb, pyarrow, sentence-transformers
- 002-ai-document-processing: Added Python 3.14 (`requires-python = ">=3.14,<4.0"`) + Docling (DOCX-to-Markdown), sentence-transformers (embeddings), LanceDB (vector + graph storage), litellm (LLM abstraction for Ollama/OpenAI-compatible endpoints)
- 001-specs-crawl-query: Added Python 3.14 + typer, rich, pydantic, pydantic-sqlite, requests,

- 001-specs-crawl-query: Added Python 3.14 + typer, rich, requests, beautifulsoup4, lxml, pydantic, pydantic-sqlite, hishel, zipinspect

<!-- MANUAL ADDITIONS START -->
<!-- MANUAL ADDITIONS END -->
+38 −0
Original line number Diff line number Diff line
@@ -494,3 +494,41 @@ Most eager imports in all __init__.py files are unnecessary, as these are just t
Eager imports may only make sense for:
- *very* relevant types that are also used by consumers of this API.
- for constants and very simple types (like enums or types without additional dependencies)

<skills_system priority="1">

## Available Skills

<!-- SKILLS_TABLE_START -->
<usage>
When users ask you to perform tasks, check if any of the available skills
below can help complete the task more effectively.

How to use skills:
- Invoke: Bash("skilz read <skill-name> --agent universal")
- The skill content will load with detailed instructions
- Base directory provided in output for resolving bundled resources

Step-by-step process:
1. Identify a skill from <available_skills> that matches the user's request
2. Run the command above to load the skill's SKILL.md content
3. Follow the instructions in the loaded skill content
4. Skills may include bundled scripts, templates, and references

Usage notes:
- Only use skills listed in <available_skills> below
- Do not invoke a skill that is already loaded in your context
</usage>

<available_skills>

<skill>
<name>visual-explainer</name>
<description>Generate beautiful, self-contained HTML pages that visually explain systems, code changes, plans, and data. Use when the user asks for a diagram, architecture overview, diff review, plan review, project recap, comparison table, or any visual explanation of technical concepts. Also use proactively when you are about to render a complex ASCII table (4+ rows or 3+ columns) — present it as a styled HTML page instead.</description>
<location>.skilz\skills\visual-explainer/SKILL.md</location>
</skill>

</available_skills>
<!-- SKILLS_TABLE_END -->

</skills_system>
+36 −0
Original line number Diff line number Diff line
# Specification Quality Checklist: GraphRAG Workspace Scoping

**Purpose**: Validate specification completeness and quality before proceeding to planning
**Created**: 2026-02-25
**Feature**: [spec.md](../spec.md)

## Content Quality

- [x] No implementation details (languages, frameworks, APIs)
- [x] Focused on user value and business needs
- [x] Written for non-technical stakeholders
- [x] All mandatory sections completed

## Requirement Completeness

- [x] No [NEEDS CLARIFICATION] markers remain
- [x] Requirements are testable and unambiguous
- [x] Success criteria are measurable
- [x] Success criteria are technology-agnostic (no implementation details)
- [x] All acceptance scenarios are defined
- [x] Edge cases are identified
- [x] Scope is clearly bounded
- [x] Dependencies and assumptions identified

## Feature Readiness

- [x] All functional requirements have clear acceptance criteria
- [x] User scenarios cover primary flows
- [x] Feature meets measurable outcomes defined in Success Criteria
- [x] No implementation details leak into specification

## Notes

- Validation pass 1: all checklist items pass.
- Clarified behavior: if no workspace/project is provided, the system uses `default` and auto-creates it when missing.
- Scope boundary: knowledge-base construction is always workspace-scoped to selected corpus files.
+220 −0
Original line number Diff line number Diff line
openapi: 3.1.0
info:
  title: GraphRAG Workspace API Contract
  version: 0.1.0
  description: Workspace/project scoping logical contract for AI GraphRAG pipeline JSON payloads.
x-contract-mode: logical-non-http
servers:
  - url: /
paths:
  /ai/workspaces:
    get:
      summary: List workspaces
      responses:
        "200":
          description: Workspace list
          content:
            application/json:
              schema:
                type: object
                properties:
                  workspaces:
                    type: array
                    items:
                      $ref: "#/components/schemas/Workspace"
    post:
      summary: Create workspace
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                workspace:
                  type: string
              required: [workspace]
      responses:
        "201":
          description: Workspace created
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/Workspace"

  /ai/workspaces/{workspace}/members:
    get:
      summary: List workspace members
      parameters:
        - $ref: "#/components/parameters/workspace"
      responses:
        "200":
          description: Member list
          content:
            application/json:
              schema:
                type: object
                properties:
                  workspace:
                    type: string
                  members:
                    type: array
                    items:
                      $ref: "#/components/schemas/WorkspaceMember"
    post:
      summary: Add members to workspace
      parameters:
        - $ref: "#/components/parameters/workspace"
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                members:
                  type: array
                  items:
                    $ref: "#/components/schemas/WorkspaceMemberInput"
              required: [members]
      responses:
        "200":
          description: Members registered
          content:
            application/json:
              schema:
                type: object
                properties:
                  workspace:
                    type: string
                  added_count:
                    type: integer

  /ai/process:
    post:
      summary: Process workspace corpus
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                workspace:
                  type: string
                  description: Optional; defaults to "default" when omitted or blank.
                new_only:
                  type: boolean
                  default: false
                force:
                  type: boolean
                  default: false
      responses:
        "200":
          description: Processing results
          content:
            application/json:
              schema:
                type: object
                properties:
                  workspace:
                    type: string
                  processed:
                    type: integer
                  skipped:
                    type: integer

  /ai/status:
    get:
      summary: Get workspace-scoped status
      parameters:
        - name: workspace
          in: query
          required: false
          schema:
            type: string
          description: Optional workspace; defaults to "default".
      responses:
        "200":
          description: Status list
          content:
            application/json:
              schema:
                type: object
                properties:
                  workspace:
                    type: string
                  statuses:
                    type: array
                    items:
                      $ref: "#/components/schemas/ProcessingStatus"

components:
  parameters:
    workspace:
      name: workspace
      in: path
      required: true
      schema:
        type: string
      description: Workspace name. Caller may pass "default" explicitly.

  schemas:
    Workspace:
      type: object
      required: [workspace, is_default, created_at, updated_at]
      properties:
        workspace:
          type: string
        is_default:
          type: boolean
        created_at:
          type: string
          format: date-time
        updated_at:
          type: string
          format: date-time

    WorkspaceMemberInput:
      type: object
      required: [source_item_id, source_path, source_kind]
      properties:
        source_item_id:
          type: string
        source_path:
          type: string
        source_kind:
          type: string
          enum: [tdoc, spec, other]

    WorkspaceMember:
      allOf:
        - $ref: "#/components/schemas/WorkspaceMemberInput"
        - type: object
          required: [workspace, added_at, is_active]
          properties:
            workspace:
              type: string
            added_at:
              type: string
              format: date-time
            is_active:
              type: boolean

    ProcessingStatus:
      type: object
      required: [workspace, source_item_id, current_stage]
      properties:
        workspace:
          type: string
        source_item_id:
          type: string
        current_stage:
          type: string
        completed_at:
          type: string
          format: date-time
          nullable: true
        error_message:
          type: string
          nullable: true
+75 −0
Original line number Diff line number Diff line
# Data Model: GraphRAG Workspace Scoping

## Terminology

- Canonical term: `WorkspaceMember`
- Narrative alias: `corpus entry`

## Entity: Workspace

- **Description**: Logical project/workspace boundary for GraphRAG processing.
- **Fields**:
  - `workspace_name` (string, required, normalized)
  - `created_at` (datetime, required)
  - `updated_at` (datetime, required)
  - `is_default` (bool, required)
  - `status` (enum: `active`, `archived`)
- **Validation rules**:
  - Empty/whitespace workspace names resolve to `default`.
  - Name normalization is case-insensitive (`Default` and `default` map to same identity).
  - `default` is reserved and cannot be deleted while fallback behavior is enabled.

## Entity: WorkspaceMember

- **Description**: One selected source item assigned to one workspace corpus.
- **Fields**:
  - `workspace_name` (string, required)
  - `source_item_id` (string, required, stable identifier)
  - `source_path` (string/path, required)
  - `source_kind` (enum: `tdoc`, `spec`, `other`)
  - `added_at` (datetime, required)
  - `added_by` (string, optional)
  - `is_active` (bool, required)
- **Validation rules**:
  - `source_path` must exist/readable at registration time or be marked invalid explicitly.
  - Duplicate active membership (`workspace_name`, `source_item_id`) is disallowed.
  - Same `source_item_id` may exist in multiple workspaces.

## Entity: ArtifactScope

- **Description**: Links generated pipeline artifacts to one workspace.
- **Fields**:
  - `workspace_name` (string, required)
  - `artifact_type` (enum: `status`, `classification`, `chunk`, `summary`, `graph_node`, `graph_edge`)
  - `artifact_id` (string, required)
  - `source_item_id` (string, optional)
  - `created_at` (datetime, required)
- **Validation rules**:
  - Every generated artifact must map to exactly one workspace.
  - Cross-workspace artifact references are invalid.

## Backward-Compatible Existing Entities (extended)

### ProcessingStatus (existing)

- **Current key**: `tdoc_id`
- **Planned extension**: add `workspace_name` dimension for scoped reads/writes.
- **Compatibility**: operations with omitted workspace map to `workspace_name='default'`.

### DocumentClassification / DocumentChunk / DocumentSummary (existing)

- **Current key context**: primarily `tdoc_id`
- **Planned extension**: add workspace association for filtering and artifact isolation.
- **Compatibility**: keep current `tdoc_id` field semantics for existing pipeline identity.

## Relationships

- Workspace `1 -> N` WorkspaceMember
- Workspace `1 -> N` ArtifactScope
- WorkspaceMember `1 -> N` ArtifactScope (optional linkage by source item)

## State Transitions

- Workspace: `active -> archived`
- WorkspaceMember: `is_active=true -> is_active=false` (soft removal)
- ArtifactScope: immutable after creation; superseded by newer artifacts if reprocessed.
Loading