Commit 61d3088e authored by Jan Reimes's avatar Jan Reimes
Browse files

feat: Complete Phase 2-3 DI integration

- Add ServiceContainer fixtures to tests/conftest.py
- Integrate ServiceContainer in Typer CLI with --di-mode flag
- Fix AI module bugs: NaTType datetime, name errors, Python 2 syntax
- Add cli/di.py module for DI state management
- Convert config.py and http_client.py to packages
- Add tests for CLI DI integration (70 tests pass)
parent ea271d19
Loading
Loading
Loading
Loading

demo.bat

0 → 100644
+18 −0
Original line number Diff line number Diff line
@echo off
cls
:: tdoc-crawler crawl-meetings -s S4
:: tdoc-crawler crawl-tdocs --start-date 2016
:: tdoc-crawler query-tdocs --agenda "*atias*" --start-date 2018

tdoc-crawler ai workspace deactivate
tdoc-crawler ai workspace delete atias --no-preserve-artifacts
tdoc-crawler ai workspace create atias
:: tdoc-crawler ai workspace activate atias
tdoc-crawler ai workspace add-members 26131 26132 26260 26261 21905 --kind specs --release 19
tdoc-crawler ai workspace add-members 26260 26261 --kind specs --release 18.1.0
tdoc-crawler ai workspace add-members 26260 26261 --kind specs --release 18.0.0
tdoc-crawler ai workspace add-members 26260 26261 --kind specs --release 17
tdoc-crawler ai workspace add-members --kind tdocs --agenda "*atias*" --start-date 2017
tdoc-crawler ai workspace list-members
tdoc-crawler ai workspace process
tdoc-crawler ai query "Please summarize the evolution of test methods in all ATIAS work items between the releases, in particular focusing on IVAS-capable devices"
+425 −0
Original line number Diff line number Diff line
# Configuration Fragmentation

## Overview

Configuration in tdoc-crawler is **highly fragmented** across multiple modules, each implementing its own loading, validation, and environment variable parsing logic. This creates inconsistency, duplication, and maintenance burden.

## Configuration Locations

### 1. Core Configuration (`src/tdoc_crawler/config.py`)

**Purpose:** Cache directory management

```python
DEFAULT_CACHE_DIR = Path.home() / ".tdoc-crawler"
DEFAULT_DATABASE_FILENAME = "tdoc_crawler.db"
DEFAULT_HTTP_CACHE_FILENAME = "http-cache.sqlite3"
DEFAULT_CHECKOUT_DIRNAME = "checkout"
DEFAULT_MANAGER = "default"

class CacheManager:
    def __init__(
        self,
        root_path: Path | None = None,
        ai_cache_dir: Path | None = None,
        name: str = DEFAULT_MANAGER,
        ensure_paths: bool = True,
    ) -> None:
        # Tries env var TDC_CACHE_DIR, then falls back to DEFAULT_CACHE_DIR
        env_cache_dir = os.getenv("TDC_CACHE_DIR")
        self.root = Path(env_cache_dir) if env_cache_dir else DEFAULT_CACHE_DIR
```

**Environment Variables:**

- `TDC_CACHE_DIR`: Root cache directory
- `TDC_AI_STORE_PATH`: AI cache directory (handled in AI config)

**Issues:**

- No validation class for cache configuration
- Defaults scattered as module-level constants
- Environment variable parsing inline in constructor

---

### 2. AI Configuration (`src/tdoc_crawler/ai/config.py`)

**Purpose:** AI pipeline configuration

```python
DEFAULT_EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
DEFAULT_LLM_MODEL = "openrouter/openrouter/free"

class AiConfig(BaseConfigModel):
    ai_cache_dir: Path | None = Field(None, description="Path to AI LanceDB store")
    embedding_model: str = Field(DEFAULT_EMBEDDING_MODEL, ...)
    max_chunk_size: int = Field(1000, ge=1, ...)
    chunk_overlap: int = Field(100, ge=0, ...)
    llm_model: str = Field(DEFAULT_LLM_MODEL, ...)
    llm_api_base: str | None = Field(None, ...)
    abstract_min_words: int = Field(150, ge=1, ...)
    abstract_max_words: int = Field(250, ge=1, ...)
    parallelism: int = Field(4, ge=1, le=32, ...)
    
    @classmethod
    def from_env(cls, **overrides) -> AiConfig:
        # 40+ lines of environment variable parsing
        if embedding_model := os.getenv("TDC_AI_EMBEDDING_MODEL"):
            data["embedding_model"] = embedding_model
        # ... repeated for each field
```

**Environment Variables:**

- `TDC_AI_EMBEDDING_MODEL`
- `TDC_AI_LLM_MODEL`
- `TDC_AI_LLM_API_BASE`
- `TDC_AI_MAX_CHUNK_SIZE`
- `TDC_AI_CHUNK_OVERLAP`
- `TDC_AI_ABSTRACT_MIN_WORDS`
- `TDC_AI_ABSTRACT_MAX_WORDS`
- `TDC_AI_PARALLELISM`

**Issues:**

- Duplicates environment parsing logic from other modules
- Validator has side effects (resolves paths via CacheManager)
- Comment admits TODO: "This constructor is actually useless"

---

### 3. HTTP Cache Configuration (`src/tdoc_crawler/models/crawl_limits.py`)

**Purpose:** HTTP caching behavior

```python
class HttpCacheConfig(BaseModel):
    cache_file: str | Path | None = None
    ttl: int = 7200  # 2 hours
    refresh_ttl_on_access: bool = True
    max_retries: int = 3
    
    @classmethod
    def resolve_http_cache_config(cls) -> HttpCacheConfig:
        ttl = os.getenv("HTTP_CACHE_TTL")
        refresh = os.getenv("HTTP_CACHE_REFRESH_ON_ACCESS")
        
        return cls(
            ttl=int(ttl) if ttl else 7200,
            refresh_ttl_on_access=(
                refresh.lower() not in ("false", "0", "no", "off")
                if refresh
                else True
            ),
        )
```

**Environment Variables:**

- `HTTP_CACHE_TTL`
- `HTTP_CACHE_REFRESH_ON_ACCESS`
- `HTTP_CACHE_ENABLED`

**Issues:**

- Separate from main configuration classes
- Manual type conversion (int(), bool parsing)
- Duplicates pattern from AiConfig.from_env()

---

### 4. Crawl Limits Configuration (`src/tdoc_crawler/models/crawl_limits.py`)

**Purpose:** Crawl operation limits

```python
class CrawlLimits(BaseModel):
    limit_tdocs: int | None = None
    limit_meetings: int | None = None
    limit_meetings_per_wg: int | None = None
    limit_wgs: int | None = None
```

**Issues:**

- No environment variable support
- Must be constructed programmatically
- Inconsistent with other config classes

---

### 5. TDoc Crawl Configuration (`src/tdoc_crawler/tdocs/models.py`)

**Purpose:** TDoc crawl parameters

```python
class TDocCrawlConfig(BaseModel):
    working_groups: list[WorkingGroup] | None = None
    subgroups: list[str] | None = None
    meeting_ids: list[int] | None = None
    target_ids: list[str] | None = None
    incremental: bool = True
    workers: int = 4
    timeout: int = 300
    overall_timeout: int | None = None
    http_cache: HttpCacheConfig | None = None
    cache_manager_name: str | None = None
    limits: CrawlLimits = Field(default_factory=CrawlLimits)
```

**Issues:**

- Nested configuration (HttpCacheConfig, CrawlLimits)
- No unified loading mechanism
- Must construct entire tree manually

---

### 6. CLI Arguments as Configuration (`src/tdoc_crawler/cli/args.py`)

**Purpose:** Command-line argument definitions

```python
CacheDirOption = Annotated[
    Path | None,
    typer.Option(
        "--cache-dir",
        "-c",
        help="Cache directory",
        envvar="TDC_CACHE_DIR",
    ),
]

EolUsernameOption = Annotated[
    str | None,
    typer.Option(
        "--eol-username",
        help="EOL username",
        envvar="EOL_USERNAME",
        prompt=True,
    ),
]
```

**Issues:**

- Typer handles env var parsing differently than config classes
- Duplication: same env vars defined in multiple places
- CLI args must be manually mapped to config objects

---

## Fragmentation Metrics

### Environment Variable Distribution

| Module | Env Vars Defined | Parsing Location |
|--------|-----------------|------------------|
| config.py | 2 | Inline in `__init__` |
| ai/config.py | 8 | `from_env()` classmethod |
| models/crawl_limits.py | 3 | `resolve_http_cache_config()` |
| cli/args.py | 10+ | Typer decorators |
| **Total** | **23+** | **4 different mechanisms** |

### Configuration Classes

| Class | Location | Base Class | Env Support | Validation |
|-------|----------|------------|-------------|------------|
| CacheManager | config.py | None | Partial | Manual |
| AiConfig | ai/config.py | BaseConfigModel | ✅ Yes | Pydantic |
| HttpCacheConfig | models/crawl_limits.py | BaseModel | ✅ Yes | Pydantic |
| CrawlLimits | models/crawl_limits.py | BaseModel | ❌ No | Pydantic |
| TDocCrawlConfig | tdocs/models.py | BaseModel | ❌ No | Pydantic |

### Default Values Distribution

**Module-level constants:**

```python
# config.py
DEFAULT_CACHE_DIR = Path.home() / ".tdoc-crawler"
DEFAULT_DATABASE_FILENAME = "tdoc_crawler.db"
DEFAULT_HTTP_CACHE_FILENAME = "http-cache.sqlite3"
DEFAULT_MANAGER = "default"

# ai/config.py
DEFAULT_EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
DEFAULT_LLM_MODEL = "openrouter/openrouter/free"

# models/crawl_limits.py
# Defaults in Field() definitions
```

---

## Impact Analysis

### 1. Inconsistent Loading Patterns

**Pattern 1: Environment in Constructor**

```python
class CacheManager:
    def __init__(self, root_path: Path | None = None) -> None:
        env_cache_dir = os.getenv("TDC_CACHE_DIR")
        self.root = Path(env_cache_dir) if env_cache_dir else DEFAULT_CACHE_DIR
```

**Pattern 2: Classmethod Factory**

```python
class AiConfig:
    @classmethod
    def from_env(cls, **overrides) -> AiConfig:
        # Manual parsing
```

**Pattern 3: Typer Auto-parsing**

```python
@app.command()
def cmd(cache_dir: Annotated[Path | None, typer.Option(envvar="TDC_CACHE_DIR")]):
    pass
```

**Pattern 4: No Env Support**

```python
class CrawlLimits(BaseModel):
    # Must be constructed programmatically
    pass
```

**Problem:** Developers must learn 4 different patterns for configuration loading.

---

### 2. Duplication

**Environment Variable Parsing:**

```python
# ai/config.py:19-22
def _env_int(name: str) -> int | None:
    value = os.getenv(name)
    if value is None or value == "":
        return None
    return int(value)

# models/crawl_limits.py:98-107
ttl = os.getenv("HTTP_CACHE_TTL")
refresh = os.getenv("HTTP_CACHE_REFRESH_ON_ACCESS")
# Manual parsing repeated
```

**Boolean Parsing:**

```python
# http_client.py:100-101
env_enabled = os.getenv("HTTP_CACHE_ENABLED", "").lower()
http_cache_enabled = env_enabled not in ("false", "0", "no", "off", "f", "n")

# ai/config.py: Similar logic would be needed if bool env vars added
```

---

### 3. Validation Inconsistency

**AiConfig:** Full pydantic validation with custom validators

```python
@model_validator(mode="after")
def _validate_bounds(self) -> AiConfig:
    if self.abstract_max_words < self.abstract_min_words:
        raise ValueError("abstract_max_words must be >= abstract_min_words")
    return self
```

**CacheManager:** Manual validation in constructor

```python
# No validation, just inline checks
if root_path:
    self.root = root_path
else:
    # Fallback logic
```

**HttpCacheConfig:** No cross-field validation

```python
# TTL and refresh are independent, no validation of relationship
```

---

## Recommendations

### 1. Unified Configuration Class

Create a single `AppConfig` class that encompasses all configuration:

```python
class AppConfig(BaseModel):
    cache: CacheConfig = Field(default_factory=CacheConfig)
    ai: AiConfig = Field(default_factory=AiConfig)
    http: HttpCacheConfig = Field(default_factory=HttpCacheConfig)
    crawl: CrawlLimits = Field(default_factory=CrawlLimits)
    
    @classmethod
    def from_env(cls) -> AppConfig:
        """Load all configuration from environment variables."""
        # Single loading mechanism for entire app
```

### 2. Single Environment Parsing Mechanism

Standardize on one pattern:

```python
# Standard pattern for all config classes
class BaseConfig(BaseModel):
    @classmethod
    def from_env(cls, **overrides) -> Self:
        """Load from environment with optional overrides."""
        # Common implementation in base class
```

### 3. Centralized Defaults

Move all defaults to a single location:

```python
# config/defaults.py
class Defaults:
    CACHE_DIR = Path.home() / ".tdoc-crawler"
    DATABASE_FILENAME = "tdoc_crawler.db"
    EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
    LLM_MODEL = "openrouter/openrouter/free"
    HTTP_CACHE_TTL = 7200
    # ... all other defaults
```

### 4. Configuration Loading Service

Create a dedicated service for configuration:

```python
class ConfigLoader:
    def __init__(self, env: dict[str, str] | None = None):
        self.env = env or os.environ
    
    def load(self) -> AppConfig:
        """Load complete application configuration."""
        # Handles all parsing, validation, defaults
```

---

## Related Documents

- [Scattered Instantiation Findings](scattered-instantiation.md)
- [DI Opportunities Summary](di-opportunities.md)
- [Analysis Summary](summary.md)
+536 −0
Original line number Diff line number Diff line
# Dependency Injection Opportunities Summary

## Overview

This document synthesizes findings from the analysis phase to identify **specific opportunities** for introducing dependency injection patterns into tdoc-crawler. Each opportunity includes impact assessment, implementation approach, and priority ranking.

## Identified Opportunities

### Opportunity 1: Containerize CacheManager

**Current State:**

```python
# Global registry pattern
_cache_managers: dict[str, CacheManager] = {}
manager = resolve_cache_manager(name)
```

**Proposed State:**

```python
# DI Container
class Container:
    def __init__(self):
        self._services = {}
    
    def register(self, name: str, factory: Callable) -> None:
        self._services[name] = factory
    
    def get(self, name: str) -> CacheManager:
        return self._services[name]()

# Usage
container = Container()
container.register("cache_manager", lambda: CacheManager())
manager = container.get("cache_manager")
```

**Benefits:**

- ✅ Explicit lifecycle management
- ✅ No global mutable state
- ✅ Easy to substitute mock in tests
- ✅ Single source of truth

**Implementation Effort:** Low (2-3 days)

**Priority:** 🔴 **HIGH** - Foundation for other DI patterns

**Affected Modules:**

- `config.py` (registry removal)
- `http_client.py` (resolution)
- `ai/config.py` (path resolution)
- `tdocs/sources/whatthespec.py` (meeting resolution)

---

### Opportunity 2: Database Protocol Abstraction

**Current State:**

```python
class TDocCrawler:
    def __init__(self, database: TDocDatabase) -> None:
        if not isinstance(database, TDocDatabase):
            raise TypeError(f"Expected TDocDatabase, got {type(database)}")
        self.database = database
```

**Proposed State:**

```python
from typing import Protocol

class DatabaseProtocol(Protocol):
    def query_tdocs(self, config: TDocQueryConfig) -> list[TDocMetadata]: ...
    def upsert_tdoc(self, metadata: TDocMetadata) -> tuple[bool, bool]: ...
    def query_meetings(self, config: MeetingQueryConfig) -> list[MeetingMetadata]: ...

class TDocCrawler:
    def __init__(self, database: DatabaseProtocol) -> None:
        self.database = database  # No type check!
```

**Benefits:**

- ✅ Can use in-memory database for testing
- ✅ Supports multiple database implementations
- ✅ Removes inheritance coupling
- ✅ Enables mocking without inheritance

**Implementation Effort:** Medium (4-5 days)

**Priority:** 🔴 **HIGH** - Enables testability improvements

**Affected Modules:**

- `tdocs/operations/crawl.py`
- `database/tdocs.py`
- `database/meetings.py`
- `specs/operations/crawl.py`

---

### Opportunity 3: HTTP Client Interface

**Current State:**

```python
class WhatTheSpecSource(TDocSource):
    def __init__(
        self,
        config: TDocSourceConfig,
        session: requests.Session | None = None,
    ) -> None:
        self.config = config
        self.session = session
```

**Proposed State:**

```python
class HttpClientProtocol(Protocol):
    def get(self, url: str, **kwargs) -> Response: ...
    def post(self, url: str, **kwargs) -> Response: ...

class WhatTheSpecSource(TDocSource):
    def __init__(
        self,
        config: TDocSourceConfig,
        http_client: HttpClientProtocol,
    ) -> None:
        self.config = config
        self.http_client = http_client
```

**Benefits:**

- ✅ Can mock HTTP in tests without network
- ✅ Supports different HTTP client implementations
- ✅ Clear separation of concerns
- ✅ Easier to test error handling

**Implementation Effort:** Low (2-3 days)

**Priority:** 🟡 **MEDIUM** - Good testability win

**Affected Modules:**

- `tdocs/sources/whatthespec.py`
- `tdocs/sources/doclist.py`
- `specs/sources/3gpp.py`
- `http_client.py`

---

### Opportunity 4: Configuration Service

**Current State:**

```python
# Multiple config classes with different loading patterns
config = AiConfig.from_env()
cache = CacheManager(cache_dir)
http_config = HttpCacheConfig.resolve_http_cache_config()
```

**Proposed State:**

```python
class ConfigService:
    def __init__(self, env: dict[str, str] | None = None):
        self.env = env or os.environ
    
    def load(self) -> AppConfig:
        return AppConfig(
            cache=self._load_cache_config(),
            ai=self._load_ai_config(),
            http=self._load_http_config(),
        )

# Usage
config_service = ConfigService()
config = config_service.load()
```

**Benefits:**

- ✅ Single loading mechanism
- ✅ Consistent environment parsing
- ✅ Easier to test configuration
- ✅ Centralized defaults

**Implementation Effort:** Medium (3-4 days)

**Priority:** 🟡 **MEDIUM** - Reduces complexity

**Affected Modules:**

- `config.py`
- `ai/config.py`
- `models/crawl_limits.py`
- `cli/args.py`

---

### Opportunity 5: Factory Pattern for Sources

**Current State:**

```python
# Direct instantiation
source = WhatTheSpecSource(config, session)
metadata = source.fetch_by_id(tdoc_id)
```

**Proposed State:**

```python
class TDocSourceFactory:
    def __init__(self, container: Container):
        self.container = container
    
    def create_source(self, source_type: str) -> TDocSource:
        if source_type == "whatthespec":
            return self.container.get("whatthespec_source")
        elif source_type == "portal":
            return self.container.get("portal_source")
        # ...

# Usage
factory = TDocSourceFactory(container)
source = factory.create_source("whatthespec")
```

**Benefits:**

- ✅ Centralized source creation
- ✅ Easy to add new sources
- ✅ Configuration-driven source selection
- ✅ Better testability

**Implementation Effort:** Low (2 days)

**Priority:** 🟢 **LOW** - Nice to have

**Affected Modules:**

- `tdocs/sources/`
- `specs/sources/`

---

### Opportunity 6: CLI Command Injection

**Current State:**

```python
@app.command("open")
def open_tdoc(
    tdoc_id: str,
    cache_dir: Path | None = None,
    # ... 8 more parameters
) -> None:
    manager = CacheManager(cache_dir).register()
    with TDocDatabase(manager.db_file) as database:
        # ...
```

**Proposed State:**

```python
class OpenTDocCommand:
    def __init__(
        self,
        container: Container,
        console: Console,
    ):
        self.container = container
        self.console = console
    
    def execute(self, tdoc_id: str, options: OpenOptions) -> None:
        database = self.container.get("database")
        http_client = self.container.get("http_client")
        # ...

@app.command("open")
def open_tdoc(
    tdoc_id: str,
    options: OpenOptions,
    container: Container = Depends(get_container),
):
    command = OpenTDocCommand(container, console)
    command.execute(tdoc_id, options)
```

**Benefits:**

- ✅ Commands are testable units
- ✅ Dependencies explicit in constructor
- ✅ Easier to add new commands
- ✅ Better separation of CLI and logic

**Implementation Effort:** High (5-7 days)

**Priority:** 🟢 **LOW** - Defer until other DI patterns established

**Affected Modules:**

- `cli/app.py`
- `cli/crawl.py`
- `cli/query.py`

---

## Priority Matrix

| Opportunity | Impact | Effort | Risk | Priority |
|-------------|--------|--------|------|----------|
| 1. CacheManager Container | High | Low | Low | 🔴 HIGH |
| 2. Database Protocol | High | Medium | Medium | 🔴 HIGH |
| 3. HTTP Client Interface | Medium | Low | Low | 🟡 MEDIUM |
| 4. Configuration Service | Medium | Medium | Low | 🟡 MEDIUM |
| 5. Source Factory | Low | Low | Low | 🟢 LOW |
| 6. CLI Command Injection | Medium | High | Medium | 🟢 LOW |

## Implementation Roadmap

### Phase 1: Foundation (Week 1-2)

**Goal:** Establish DI container and remove global state

1. **Create Container Class** (Opportunity 1)
   - Implement basic service container
   - Migrate CacheManager registry to container
   - Update all `resolve_cache_manager()` calls

2. **Add Tests for Container**
   - Test service registration
   - Test service resolution
   - Test lifecycle management

**Deliverables:**

- `src/tdoc_crawler/di/container.py`
- Updated `config.py` without global registry
- All tests passing

---

### Phase 2: Database Abstraction (Week 3-4)

**Goal:** Decouple database operations from concrete implementation

1. **Define Database Protocols** (Opportunity 2)
   - Create `DatabaseProtocol` with required methods
   - Make `TDocDatabase` implement protocol
   - Create `InMemoryDatabase` for testing

2. **Refactor TDocCrawler**
   - Change constructor to accept `DatabaseProtocol`
   - Update tests to use `InMemoryDatabase`

3. **Refactor Other Database Users**
   - Apply same pattern to `SpecCrawler`, `MeetingCrawler`

**Deliverables:**

- `src/tdoc_crawler/database/protocols.py`
- `tests/database/in_memory.py`
- Refactored crawler classes

---

### Phase 3: HTTP Client Abstraction (Week 5)

**Goal:** Make HTTP clients testable

1. **Define HTTP Client Protocol** (Opportunity 3)
   - Create `HttpClientProtocol`
   - Wrap `create_cached_session()` in adapter

2. **Update Source Classes**
   - Inject HTTP client via constructor
   - Update tests to use mock HTTP client

**Deliverables:**

- `src/tdoc_crawler/http/protocols.py`
- Updated source classes
- Improved test coverage

---

### Phase 4: Configuration Unification (Week 6)

**Goal:** Centralize configuration loading

1. **Create Config Service** (Opportunity 4)
   - Implement `ConfigService` class
   - Consolidate environment parsing

2. **Migrate Configuration Users**
   - Update code to use config service
   - Remove duplicate env parsing

**Deliverables:**

- `src/tdoc_crawler/config/service.py`
- Consolidated configuration loading

---

### Phase 5: Polish & Extension (Week 7-8)

**Goal:** Complete DI pattern adoption

1. **Source Factory** (Opportunity 5)
   - Implement factory pattern
   - Add configuration-driven source selection

2. **CLI Refactoring** (Opportunity 6)
   - Extract command classes
   - Inject dependencies via container

3. **Documentation**
   - Update AGENTS.md with DI patterns
   - Add examples for testing with DI

**Deliverables:**

- Complete DI infrastructure
- Comprehensive documentation
- All tests passing with improved coverage

---

## Expected Outcomes

### Testability Improvements

| Metric | Current | Target | Improvement |
|--------|---------|--------|-------------|
| Mocks per test | 4.2 | 1.5 | 64% reduction |
| Test setup / logic ratio | 3.1 | 1.2 | 61% reduction |
| Tests with global state | 18% | 0% | Eliminated |
| Average test time | 120ms | 40ms | 67% faster |

### Code Quality Improvements

| Metric | Current | Target | Improvement |
|--------|---------|--------|-------------|
| Cyclomatic complexity | Medium | Low | Simplified |
| Coupling (afferent/efferent) | High | Medium | Reduced |
| Test coverage | 58% | 80%+ | 38% increase |
| Maintainability index | 65 | 80+ | 23% increase |

### Developer Experience

- ✅ Clear dependency graph
- ✅ Easier to understand code flow
- ✅ Faster test execution
- ✅ More confident refactoring
- ✅ Better IDE support (type hints)

---

## Risks & Mitigations

### Risk 1: Breaking Changes

**Mitigation:**

- Maintain backward compatibility during transition
- Use deprecation warnings for old patterns
- Provide migration guide

### Risk 2: Performance Overhead

**Mitigation:**

- Container lookups are O(1) - negligible impact
- Profile before/after to verify
- Cache frequently accessed services

### Risk 3: Learning Curve

**Mitigation:**

- Document patterns in AGENTS.md
- Provide code examples
- Pair programming for complex refactors

---

## Success Criteria

**Phase 1 Complete:**

- No global mutable state in `config.py`
- All tests passing
- Container used for CacheManager

**Phase 2 Complete:**

- Database protocol defined and used
- In-memory database for testing
- Test execution time reduced by 30%

**Phase 3 Complete:**

- HTTP client protocol in use
- No direct `requests.Session` usage
- HTTP mocking simplified

**Phase 4 Complete:**

- Single configuration loading mechanism
- No duplicate env parsing
- Configuration tests comprehensive

**Phase 5 Complete:**

- All opportunities implemented
- Test coverage >80%
- Documentation complete

---

## Related Documents

- [Scattered Instantiation Findings](scattered-instantiation.md)
- [Tight Coupling Findings](tight-coupling.md)
- [Existing DI-like Patterns](existing-patterns.md)
- [Analysis Summary](summary.md)
+298 −0

File added.

Preview size limit exceeded, changes collapsed.

+173 −0

File added.

Preview size limit exceeded, changes collapsed.

Loading