@@ -147,6 +165,48 @@ Each table row contains several meeting-specific data and URLs:
- The column "Files" contains an entry "Files" with a link to the TDoc FTP directory.
- If the column "Files" is empty, it means that the meeting is not yet setup and no TDocs are available for this meeting. Skip these meetings when crawling TDocs later.
## Project Structure
The project follows a modular architecture with clear separation of concerns:
### Source Code Organization
```
src/tdoc_crawler/
├── models/ # Data models and configuration
│ ├── __init__.py # Re-exports all public symbols for backward compatibility
# Database uses COLLATE NOCASE for case-insensitive matching
cursor = self.conn.execute(
"SELECT * FROM tdocs WHERE tdoc_id = ?",
(normalized_id,)
)
# ...
```
## Usage of uv and project management
- Use `uv` for creating isolated Python environments instead of `virtualenv` or `venv`. This ensures consistency across different development setups and simplifies dependency management.
@@ -302,6 +650,8 @@ All other fields are optional and may be added as needed.
## Database Guidelines
### General Database Principles
- Use SQLite as the database for storing TDoc and meeting metadata.
- Design the database schema to efficiently store and query TDoc and meeting metadata.
- Use appropriate indexing to optimize query performance.
@@ -310,6 +660,160 @@ All other fields are optional and may be added as needed.
- Use `pydantic` dataclasses to define the database schema and ensure data integrity.
- Use `pydantic` models to represent database entities and ensure data integrity.
### Complete Database Schema
The database consists of five tables with proper foreign key relationships:
#### 1. Reference Tables: `working_groups` and `subworking_groups`
CREATE UNIQUE INDEX IF NOT EXISTS idx_subworking_groups_tbid_name
ON subworking_groups(tbid, name);
```
**Purpose**: Store the static hierarchy of 3GPP working groups and their subgroups.
**Initialization**: These tables are populated at application startup from the `WorkingGroup` enum and `SUBWORKING_GROUPS` list in `models/working_groups.py` and `models/subworking_groups.py`.
uv run pytest --cov=src/tdoc_crawler --cov-report=term-missing
# Run specific test file
uv run pytest tests/test_cli.py -v
# Run specific test
uv run pytest tests/test_cli.py::TestQueryCommand::test_query_specific_tdoc -v
```
## Documentation
### Code Documentation
@@ -410,5 +1057,5 @@ Document your review findings in the file `docs/REVIEW_AND_IMPROVEMENTS_AGENTS_M
The actual update of AGENTS.md will be done only after explicit user confirmation and after a prompt similar to this one:
```markdown
Based on the review findings in the file `docs/REVIEW_AND_IMPROVEMENTS_AGENTS_MD.md`, please update the coding instruction file AGENTS.md accordingly. Make sure to incorporate all relevant suggestions from the review document, ensuring that the updated AGENTS.md reflects the best practices and guidelines for coding assistants to (re-)generate the code basis as close as possible.
Based on the review findings in the file #file:REVIEW_AND_IMPROVEMENTS_AGENTS_MD.md (`docs/REVIEW_AND_IMPROVEMENTS_AGENTS_MD.md`), please update the coding instruction file AGENTS.md accordingly. Make sure to incorporate all relevant suggestions from the review document, ensuring that the updated AGENTS.md reflects the best practices and guidelines for coding assistants to (re-)generate the code basis as close as possible. You might move the current section regarding "Reviews of AGENTS.md" to a different place, but keep it unchanged.