feat: Complete Phase 2-3 DI integration (61d3088e) · Commits · Jan Reimes / 3gpp-crawler

demo.bat

0 → 100644

+18 −0

Original line number	Diff line number	Diff line
		@echo off
		cls
		:: tdoc-crawler crawl-meetings -s S4
		:: tdoc-crawler crawl-tdocs --start-date 2016
		:: tdoc-crawler query-tdocs --agenda "atias" --start-date 2018

		tdoc-crawler ai workspace deactivate
		tdoc-crawler ai workspace delete atias --no-preserve-artifacts
		tdoc-crawler ai workspace create atias
		:: tdoc-crawler ai workspace activate atias
		tdoc-crawler ai workspace add-members 26131 26132 26260 26261 21905 --kind specs --release 19
		tdoc-crawler ai workspace add-members 26260 26261 --kind specs --release 18.1.0
		tdoc-crawler ai workspace add-members 26260 26261 --kind specs --release 18.0.0
		tdoc-crawler ai workspace add-members 26260 26261 --kind specs --release 17
		tdoc-crawler ai workspace add-members --kind tdocs --agenda "atias" --start-date 2017
		tdoc-crawler ai workspace list-members
		tdoc-crawler ai workspace process
		tdoc-crawler ai query "Please summarize the evolution of test methods in all ATIAS work items between the releases, in particular focusing on IVAS-capable devices"

specs/001-di-refactoring-plan/analysis/config-fragmentation.md

0 → 100644

+425 −0

Original line number	Diff line number	Diff line
		# Configuration Fragmentation

		## Overview

		Configuration in tdoc-crawler is highly fragmented across multiple modules, each implementing its own loading, validation, and environment variable parsing logic. This creates inconsistency, duplication, and maintenance burden.

		## Configuration Locations

		### 1. Core Configuration (`src/tdoc_crawler/config.py`)

		Purpose: Cache directory management

		```python
		DEFAULT_CACHE_DIR = Path.home() / ".tdoc-crawler"
		DEFAULT_DATABASE_FILENAME = "tdoc_crawler.db"
		DEFAULT_HTTP_CACHE_FILENAME = "http-cache.sqlite3"
		DEFAULT_CHECKOUT_DIRNAME = "checkout"
		DEFAULT_MANAGER = "default"

		class CacheManager:
		def __init__(
		self,
		root_path: Path \| None = None,
		ai_cache_dir: Path \| None = None,
		name: str = DEFAULT_MANAGER,
		ensure_paths: bool = True,
		) -> None:
		# Tries env var TDC_CACHE_DIR, then falls back to DEFAULT_CACHE_DIR
		env_cache_dir = os.getenv("TDC_CACHE_DIR")
		self.root = Path(env_cache_dir) if env_cache_dir else DEFAULT_CACHE_DIR
		```

		Environment Variables:

		- `TDC_CACHE_DIR`: Root cache directory
		- `TDC_AI_STORE_PATH`: AI cache directory (handled in AI config)

		Issues:

		- No validation class for cache configuration
		- Defaults scattered as module-level constants
		- Environment variable parsing inline in constructor

		---

		### 2. AI Configuration (`src/tdoc_crawler/ai/config.py`)

		Purpose: AI pipeline configuration

		```python
		DEFAULT_EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
		DEFAULT_LLM_MODEL = "openrouter/openrouter/free"

		class AiConfig(BaseConfigModel):
		ai_cache_dir: Path \| None = Field(None, description="Path to AI LanceDB store")
		embedding_model: str = Field(DEFAULT_EMBEDDING_MODEL, ...)
		max_chunk_size: int = Field(1000, ge=1, ...)
		chunk_overlap: int = Field(100, ge=0, ...)
		llm_model: str = Field(DEFAULT_LLM_MODEL, ...)
		llm_api_base: str \| None = Field(None, ...)
		abstract_min_words: int = Field(150, ge=1, ...)
		abstract_max_words: int = Field(250, ge=1, ...)
		parallelism: int = Field(4, ge=1, le=32, ...)

		@classmethod
		def from_env(cls, **overrides) -> AiConfig:
		# 40+ lines of environment variable parsing
		if embedding_model := os.getenv("TDC_AI_EMBEDDING_MODEL"):
		data["embedding_model"] = embedding_model
		# ... repeated for each field
		```

		Environment Variables:

		- `TDC_AI_EMBEDDING_MODEL`
		- `TDC_AI_LLM_MODEL`
		- `TDC_AI_LLM_API_BASE`
		- `TDC_AI_MAX_CHUNK_SIZE`
		- `TDC_AI_CHUNK_OVERLAP`
		- `TDC_AI_ABSTRACT_MIN_WORDS`
		- `TDC_AI_ABSTRACT_MAX_WORDS`
		- `TDC_AI_PARALLELISM`

		Issues:

		- Duplicates environment parsing logic from other modules
		- Validator has side effects (resolves paths via CacheManager)
		- Comment admits TODO: "This constructor is actually useless"

		---

		### 3. HTTP Cache Configuration (`src/tdoc_crawler/models/crawl_limits.py`)

		Purpose: HTTP caching behavior

		```python
		class HttpCacheConfig(BaseModel):
		cache_file: str \| Path \| None = None
		ttl: int = 7200 # 2 hours
		refresh_ttl_on_access: bool = True
		max_retries: int = 3

		@classmethod
		def resolve_http_cache_config(cls) -> HttpCacheConfig:
		ttl = os.getenv("HTTP_CACHE_TTL")
		refresh = os.getenv("HTTP_CACHE_REFRESH_ON_ACCESS")

		return cls(
		ttl=int(ttl) if ttl else 7200,
		refresh_ttl_on_access=(
		refresh.lower() not in ("false", "0", "no", "off")
		if refresh
		else True
		),
		)
		```

		Environment Variables:

		- `HTTP_CACHE_TTL`
		- `HTTP_CACHE_REFRESH_ON_ACCESS`
		- `HTTP_CACHE_ENABLED`

		Issues:

		- Separate from main configuration classes
		- Manual type conversion (int(), bool parsing)
		- Duplicates pattern from AiConfig.from_env()

		---

		### 4. Crawl Limits Configuration (`src/tdoc_crawler/models/crawl_limits.py`)

		Purpose: Crawl operation limits

		```python
		class CrawlLimits(BaseModel):
		limit_tdocs: int \| None = None
		limit_meetings: int \| None = None
		limit_meetings_per_wg: int \| None = None
		limit_wgs: int \| None = None
		```

		Issues:

		- No environment variable support
		- Must be constructed programmatically
		- Inconsistent with other config classes

		---

		### 5. TDoc Crawl Configuration (`src/tdoc_crawler/tdocs/models.py`)

		Purpose: TDoc crawl parameters

		```python
		class TDocCrawlConfig(BaseModel):
		working_groups: list[WorkingGroup] \| None = None
		subgroups: list[str] \| None = None
		meeting_ids: list[int] \| None = None
		target_ids: list[str] \| None = None
		incremental: bool = True
		workers: int = 4
		timeout: int = 300
		overall_timeout: int \| None = None
		http_cache: HttpCacheConfig \| None = None
		cache_manager_name: str \| None = None
		limits: CrawlLimits = Field(default_factory=CrawlLimits)
		```

		Issues:

		- Nested configuration (HttpCacheConfig, CrawlLimits)
		- No unified loading mechanism
		- Must construct entire tree manually

		---

		### 6. CLI Arguments as Configuration (`src/tdoc_crawler/cli/args.py`)

		Purpose: Command-line argument definitions

		```python
		CacheDirOption = Annotated[
		Path \| None,
		typer.Option(
		"--cache-dir",
		"-c",
		help="Cache directory",
		envvar="TDC_CACHE_DIR",
		),
		]

		EolUsernameOption = Annotated[
		str \| None,
		typer.Option(
		"--eol-username",
		help="EOL username",
		envvar="EOL_USERNAME",
		prompt=True,
		),
		]
		```

		Issues:

		- Typer handles env var parsing differently than config classes
		- Duplication: same env vars defined in multiple places
		- CLI args must be manually mapped to config objects

		---

		## Fragmentation Metrics

		### Environment Variable Distribution

		\| Module \| Env Vars Defined \| Parsing Location \|
		\|--------\|-----------------\|------------------\|
		\| config.py \| 2 \| Inline in `__init__` \|
		\| ai/config.py \| 8 \| `from_env()` classmethod \|
		\| models/crawl_limits.py \| 3 \| `resolve_http_cache_config()` \|
		\| cli/args.py \| 10+ \| Typer decorators \|
		\| Total \| 23+ \| 4 different mechanisms \|

		### Configuration Classes

		\| Class \| Location \| Base Class \| Env Support \| Validation \|
		\|-------\|----------\|------------\|-------------\|------------\|
		\| CacheManager \| config.py \| None \| Partial \| Manual \|
		\| AiConfig \| ai/config.py \| BaseConfigModel \| ✅ Yes \| Pydantic \|
		\| HttpCacheConfig \| models/crawl_limits.py \| BaseModel \| ✅ Yes \| Pydantic \|
		\| CrawlLimits \| models/crawl_limits.py \| BaseModel \| ❌ No \| Pydantic \|
		\| TDocCrawlConfig \| tdocs/models.py \| BaseModel \| ❌ No \| Pydantic \|

		### Default Values Distribution

		Module-level constants:

		```python
		# config.py
		DEFAULT_CACHE_DIR = Path.home() / ".tdoc-crawler"
		DEFAULT_DATABASE_FILENAME = "tdoc_crawler.db"
		DEFAULT_HTTP_CACHE_FILENAME = "http-cache.sqlite3"
		DEFAULT_MANAGER = "default"

		# ai/config.py
		DEFAULT_EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
		DEFAULT_LLM_MODEL = "openrouter/openrouter/free"

		# models/crawl_limits.py
		# Defaults in Field() definitions
		```

		---

		## Impact Analysis

		### 1. Inconsistent Loading Patterns

		Pattern 1: Environment in Constructor

		```python
		class CacheManager:
		def __init__(self, root_path: Path \| None = None) -> None:
		env_cache_dir = os.getenv("TDC_CACHE_DIR")
		self.root = Path(env_cache_dir) if env_cache_dir else DEFAULT_CACHE_DIR
		```

		Pattern 2: Classmethod Factory

		```python
		class AiConfig:
		@classmethod
		def from_env(cls, **overrides) -> AiConfig:
		# Manual parsing
		```

		Pattern 3: Typer Auto-parsing

		```python
		@app.command()
		def cmd(cache_dir: Annotated[Path \| None, typer.Option(envvar="TDC_CACHE_DIR")]):
		pass
		```

		Pattern 4: No Env Support

		```python
		class CrawlLimits(BaseModel):
		# Must be constructed programmatically
		pass
		```

		Problem: Developers must learn 4 different patterns for configuration loading.

		---

		### 2. Duplication

		Environment Variable Parsing:

		```python
		# ai/config.py:19-22
		def _env_int(name: str) -> int \| None:
		value = os.getenv(name)
		if value is None or value == "":
		return None
		return int(value)

		# models/crawl_limits.py:98-107
		ttl = os.getenv("HTTP_CACHE_TTL")
		refresh = os.getenv("HTTP_CACHE_REFRESH_ON_ACCESS")
		# Manual parsing repeated
		```

		Boolean Parsing:

		```python
		# http_client.py:100-101
		env_enabled = os.getenv("HTTP_CACHE_ENABLED", "").lower()
		http_cache_enabled = env_enabled not in ("false", "0", "no", "off", "f", "n")

		# ai/config.py: Similar logic would be needed if bool env vars added
		```

		---

		### 3. Validation Inconsistency

		AiConfig: Full pydantic validation with custom validators

		```python
		@model_validator(mode="after")
		def _validate_bounds(self) -> AiConfig:
		if self.abstract_max_words < self.abstract_min_words:
		raise ValueError("abstract_max_words must be >= abstract_min_words")
		return self
		```

		CacheManager: Manual validation in constructor

		```python
		# No validation, just inline checks
		if root_path:
		self.root = root_path
		else:
		# Fallback logic
		```

		HttpCacheConfig: No cross-field validation

		```python
		# TTL and refresh are independent, no validation of relationship
		```

		---

		## Recommendations

		### 1. Unified Configuration Class

		Create a single `AppConfig` class that encompasses all configuration:

		```python
		class AppConfig(BaseModel):
		cache: CacheConfig = Field(default_factory=CacheConfig)
		ai: AiConfig = Field(default_factory=AiConfig)
		http: HttpCacheConfig = Field(default_factory=HttpCacheConfig)
		crawl: CrawlLimits = Field(default_factory=CrawlLimits)

		@classmethod
		def from_env(cls) -> AppConfig:
		"""Load all configuration from environment variables."""
		# Single loading mechanism for entire app
		```

		### 2. Single Environment Parsing Mechanism

		Standardize on one pattern:

		```python
		# Standard pattern for all config classes
		class BaseConfig(BaseModel):
		@classmethod
		def from_env(cls, **overrides) -> Self:
		"""Load from environment with optional overrides."""
		# Common implementation in base class
		```

		### 3. Centralized Defaults

		Move all defaults to a single location:

		```python
		# config/defaults.py
		class Defaults:
		CACHE_DIR = Path.home() / ".tdoc-crawler"
		DATABASE_FILENAME = "tdoc_crawler.db"
		EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
		LLM_MODEL = "openrouter/openrouter/free"
		HTTP_CACHE_TTL = 7200
		# ... all other defaults
		```

		### 4. Configuration Loading Service

		Create a dedicated service for configuration:

		```python
		class ConfigLoader:
		def __init__(self, env: dict[str, str] \| None = None):
		self.env = env or os.environ

		def load(self) -> AppConfig:
		"""Load complete application configuration."""
		# Handles all parsing, validation, defaults
		```

		---

		## Related Documents

		- [Scattered Instantiation Findings](scattered-instantiation.md)
		- [DI Opportunities Summary](di-opportunities.md)
		- [Analysis Summary](summary.md)

specs/001-di-refactoring-plan/analysis/di-opportunities.md

0 → 100644

+536 −0

Original line number	Diff line number	Diff line
		# Dependency Injection Opportunities Summary

		## Overview

		This document synthesizes findings from the analysis phase to identify specific opportunities for introducing dependency injection patterns into tdoc-crawler. Each opportunity includes impact assessment, implementation approach, and priority ranking.

		## Identified Opportunities

		### Opportunity 1: Containerize CacheManager

		Current State:

		```python
		# Global registry pattern
		_cache_managers: dict[str, CacheManager] = {}
		manager = resolve_cache_manager(name)
		```

		Proposed State:

		```python
		# DI Container
		class Container:
		def __init__(self):
		self._services = {}

		def register(self, name: str, factory: Callable) -> None:
		self._services[name] = factory

		def get(self, name: str) -> CacheManager:
		return self._services[name]()

		# Usage
		container = Container()
		container.register("cache_manager", lambda: CacheManager())
		manager = container.get("cache_manager")
		```

		Benefits:

		- ✅ Explicit lifecycle management
		- ✅ No global mutable state
		- ✅ Easy to substitute mock in tests
		- ✅ Single source of truth

		Implementation Effort: Low (2-3 days)

		Priority: 🔴 HIGH - Foundation for other DI patterns

		Affected Modules:

		- `config.py` (registry removal)
		- `http_client.py` (resolution)
		- `ai/config.py` (path resolution)
		- `tdocs/sources/whatthespec.py` (meeting resolution)

		---

		### Opportunity 2: Database Protocol Abstraction

		Current State:

		```python
		class TDocCrawler:
		def __init__(self, database: TDocDatabase) -> None:
		if not isinstance(database, TDocDatabase):
		raise TypeError(f"Expected TDocDatabase, got {type(database)}")
		self.database = database
		```

		Proposed State:

		```python
		from typing import Protocol

		class DatabaseProtocol(Protocol):
		def query_tdocs(self, config: TDocQueryConfig) -> list[TDocMetadata]: ...
		def upsert_tdoc(self, metadata: TDocMetadata) -> tuple[bool, bool]: ...
		def query_meetings(self, config: MeetingQueryConfig) -> list[MeetingMetadata]: ...

		class TDocCrawler:
		def __init__(self, database: DatabaseProtocol) -> None:
		self.database = database # No type check!
		```

		Benefits:

		- ✅ Can use in-memory database for testing
		- ✅ Supports multiple database implementations
		- ✅ Removes inheritance coupling
		- ✅ Enables mocking without inheritance

		Implementation Effort: Medium (4-5 days)

		Priority: 🔴 HIGH - Enables testability improvements

		Affected Modules:

		- `tdocs/operations/crawl.py`
		- `database/tdocs.py`
		- `database/meetings.py`
		- `specs/operations/crawl.py`

		---

		### Opportunity 3: HTTP Client Interface

		Current State:

		```python
		class WhatTheSpecSource(TDocSource):
		def __init__(
		self,
		config: TDocSourceConfig,
		session: requests.Session \| None = None,
		) -> None:
		self.config = config
		self.session = session
		```

		Proposed State:

		```python
		class HttpClientProtocol(Protocol):
		def get(self, url: str, **kwargs) -> Response: ...
		def post(self, url: str, **kwargs) -> Response: ...

		class WhatTheSpecSource(TDocSource):
		def __init__(
		self,
		config: TDocSourceConfig,
		http_client: HttpClientProtocol,
		) -> None:
		self.config = config
		self.http_client = http_client
		```

		Benefits:

		- ✅ Can mock HTTP in tests without network
		- ✅ Supports different HTTP client implementations
		- ✅ Clear separation of concerns
		- ✅ Easier to test error handling

		Implementation Effort: Low (2-3 days)

		Priority: 🟡 MEDIUM - Good testability win

		Affected Modules:

		- `tdocs/sources/whatthespec.py`
		- `tdocs/sources/doclist.py`
		- `specs/sources/3gpp.py`
		- `http_client.py`

		---

		### Opportunity 4: Configuration Service

		Current State:

		```python
		# Multiple config classes with different loading patterns
		config = AiConfig.from_env()
		cache = CacheManager(cache_dir)
		http_config = HttpCacheConfig.resolve_http_cache_config()
		```

		Proposed State:

		```python
		class ConfigService:
		def __init__(self, env: dict[str, str] \| None = None):
		self.env = env or os.environ

		def load(self) -> AppConfig:
		return AppConfig(
		cache=self._load_cache_config(),
		ai=self._load_ai_config(),
		http=self._load_http_config(),
		)

		# Usage
		config_service = ConfigService()
		config = config_service.load()
		```

		Benefits:

		- ✅ Single loading mechanism
		- ✅ Consistent environment parsing
		- ✅ Easier to test configuration
		- ✅ Centralized defaults

		Implementation Effort: Medium (3-4 days)

		Priority: 🟡 MEDIUM - Reduces complexity

		Affected Modules:

		- `config.py`
		- `ai/config.py`
		- `models/crawl_limits.py`
		- `cli/args.py`

		---

		### Opportunity 5: Factory Pattern for Sources

		Current State:

		```python
		# Direct instantiation
		source = WhatTheSpecSource(config, session)
		metadata = source.fetch_by_id(tdoc_id)
		```

		Proposed State:

		```python
		class TDocSourceFactory:
		def __init__(self, container: Container):
		self.container = container

		def create_source(self, source_type: str) -> TDocSource:
		if source_type == "whatthespec":
		return self.container.get("whatthespec_source")
		elif source_type == "portal":
		return self.container.get("portal_source")
		# ...

		# Usage
		factory = TDocSourceFactory(container)
		source = factory.create_source("whatthespec")
		```

		Benefits:

		- ✅ Centralized source creation
		- ✅ Easy to add new sources
		- ✅ Configuration-driven source selection
		- ✅ Better testability

		Implementation Effort: Low (2 days)

		Priority: 🟢 LOW - Nice to have

		Affected Modules:

		- `tdocs/sources/`
		- `specs/sources/`

		---

		### Opportunity 6: CLI Command Injection

		Current State:

		```python
		@app.command("open")
		def open_tdoc(
		tdoc_id: str,
		cache_dir: Path \| None = None,
		# ... 8 more parameters
		) -> None:
		manager = CacheManager(cache_dir).register()
		with TDocDatabase(manager.db_file) as database:
		# ...
		```

		Proposed State:

		```python
		class OpenTDocCommand:
		def __init__(
		self,
		container: Container,
		console: Console,
		):
		self.container = container
		self.console = console

		def execute(self, tdoc_id: str, options: OpenOptions) -> None:
		database = self.container.get("database")
		http_client = self.container.get("http_client")
		# ...

		@app.command("open")
		def open_tdoc(
		tdoc_id: str,
		options: OpenOptions,
		container: Container = Depends(get_container),
		):
		command = OpenTDocCommand(container, console)
		command.execute(tdoc_id, options)
		```

		Benefits:

		- ✅ Commands are testable units
		- ✅ Dependencies explicit in constructor
		- ✅ Easier to add new commands
		- ✅ Better separation of CLI and logic

		Implementation Effort: High (5-7 days)

		Priority: 🟢 LOW - Defer until other DI patterns established

		Affected Modules:

		- `cli/app.py`
		- `cli/crawl.py`
		- `cli/query.py`

		---

		## Priority Matrix

		\| Opportunity \| Impact \| Effort \| Risk \| Priority \|
		\|-------------\|--------\|--------\|------\|----------\|
		\| 1. CacheManager Container \| High \| Low \| Low \| 🔴 HIGH \|
		\| 2. Database Protocol \| High \| Medium \| Medium \| 🔴 HIGH \|
		\| 3. HTTP Client Interface \| Medium \| Low \| Low \| 🟡 MEDIUM \|
		\| 4. Configuration Service \| Medium \| Medium \| Low \| 🟡 MEDIUM \|
		\| 5. Source Factory \| Low \| Low \| Low \| 🟢 LOW \|
		\| 6. CLI Command Injection \| Medium \| High \| Medium \| 🟢 LOW \|

		## Implementation Roadmap

		### Phase 1: Foundation (Week 1-2)

		Goal: Establish DI container and remove global state

		1. Create Container Class (Opportunity 1)
		- Implement basic service container
		- Migrate CacheManager registry to container
		- Update all `resolve_cache_manager()` calls

		2. Add Tests for Container
		- Test service registration
		- Test service resolution
		- Test lifecycle management

		Deliverables:

		- `src/tdoc_crawler/di/container.py`
		- Updated `config.py` without global registry
		- All tests passing

		---

		### Phase 2: Database Abstraction (Week 3-4)

		Goal: Decouple database operations from concrete implementation

		1. Define Database Protocols (Opportunity 2)
		- Create `DatabaseProtocol` with required methods
		- Make `TDocDatabase` implement protocol
		- Create `InMemoryDatabase` for testing

		2. Refactor TDocCrawler
		- Change constructor to accept `DatabaseProtocol`
		- Update tests to use `InMemoryDatabase`

		3. Refactor Other Database Users
		- Apply same pattern to `SpecCrawler`, `MeetingCrawler`

		Deliverables:

		- `src/tdoc_crawler/database/protocols.py`
		- `tests/database/in_memory.py`
		- Refactored crawler classes

		---

		### Phase 3: HTTP Client Abstraction (Week 5)

		Goal: Make HTTP clients testable

		1. Define HTTP Client Protocol (Opportunity 3)
		- Create `HttpClientProtocol`
		- Wrap `create_cached_session()` in adapter

		2. Update Source Classes
		- Inject HTTP client via constructor
		- Update tests to use mock HTTP client

		Deliverables:

		- `src/tdoc_crawler/http/protocols.py`
		- Updated source classes
		- Improved test coverage

		---

		### Phase 4: Configuration Unification (Week 6)

		Goal: Centralize configuration loading

		1. Create Config Service (Opportunity 4)
		- Implement `ConfigService` class
		- Consolidate environment parsing

		2. Migrate Configuration Users
		- Update code to use config service
		- Remove duplicate env parsing

		Deliverables:

		- `src/tdoc_crawler/config/service.py`
		- Consolidated configuration loading

		---

		### Phase 5: Polish & Extension (Week 7-8)

		Goal: Complete DI pattern adoption

		1. Source Factory (Opportunity 5)
		- Implement factory pattern
		- Add configuration-driven source selection

		2. CLI Refactoring (Opportunity 6)
		- Extract command classes
		- Inject dependencies via container

		3. Documentation
		- Update AGENTS.md with DI patterns
		- Add examples for testing with DI

		Deliverables:

		- Complete DI infrastructure
		- Comprehensive documentation
		- All tests passing with improved coverage

		---

		## Expected Outcomes

		### Testability Improvements

		\| Metric \| Current \| Target \| Improvement \|
		\|--------\|---------\|--------\|-------------\|
		\| Mocks per test \| 4.2 \| 1.5 \| 64% reduction \|
		\| Test setup / logic ratio \| 3.1 \| 1.2 \| 61% reduction \|
		\| Tests with global state \| 18% \| 0% \| Eliminated \|
		\| Average test time \| 120ms \| 40ms \| 67% faster \|

		### Code Quality Improvements

		\| Metric \| Current \| Target \| Improvement \|
		\|--------\|---------\|--------\|-------------\|
		\| Cyclomatic complexity \| Medium \| Low \| Simplified \|
		\| Coupling (afferent/efferent) \| High \| Medium \| Reduced \|
		\| Test coverage \| 58% \| 80%+ \| 38% increase \|
		\| Maintainability index \| 65 \| 80+ \| 23% increase \|

		### Developer Experience

		- ✅ Clear dependency graph
		- ✅ Easier to understand code flow
		- ✅ Faster test execution
		- ✅ More confident refactoring
		- ✅ Better IDE support (type hints)

		---

		## Risks & Mitigations

		### Risk 1: Breaking Changes

		Mitigation:

		- Maintain backward compatibility during transition
		- Use deprecation warnings for old patterns
		- Provide migration guide

		### Risk 2: Performance Overhead

		Mitigation:

		- Container lookups are O(1) - negligible impact
		- Profile before/after to verify
		- Cache frequently accessed services

		### Risk 3: Learning Curve

		Mitigation:

		- Document patterns in AGENTS.md
		- Provide code examples
		- Pair programming for complex refactors

		---

		## Success Criteria

		✅ Phase 1 Complete:

		- No global mutable state in `config.py`
		- All tests passing
		- Container used for CacheManager

		✅ Phase 2 Complete:

		- Database protocol defined and used
		- In-memory database for testing
		- Test execution time reduced by 30%

		✅ Phase 3 Complete:

		- HTTP client protocol in use
		- No direct `requests.Session` usage
		- HTTP mocking simplified

		✅ Phase 4 Complete:

		- Single configuration loading mechanism
		- No duplicate env parsing
		- Configuration tests comprehensive

		✅ Phase 5 Complete:

		- All opportunities implemented
		- Test coverage >80%
		- Documentation complete

		---

		## Related Documents

		- [Scattered Instantiation Findings](scattered-instantiation.md)
		- [Tight Coupling Findings](tight-coupling.md)
		- [Existing DI-like Patterns](existing-patterns.md)
		- [Analysis Summary](summary.md)

specs/001-di-refactoring-plan/analysis/existing-patterns.md

0 → 100644

+298 −0

File added.

Preview size limit exceeded, changes collapsed.

specs/001-di-refactoring-plan/analysis/scattered-instantiation.md

0 → 100644

+173 −0

File added.

Preview size limit exceeded, changes collapsed.