Commit bfcaad10 authored by Jan Reimes's avatar Jan Reimes
Browse files

Fix MustDowngradeError: disable HTTP/3 on all niquests sessions

The 3GPP portal advertises Alt-Svc: h3 but can't actually handle
HTTP/3, causing niquests to retry 3 times with a 10s timeout per
request. Fix: add disable_http3=True to ALL Session() creation
sites, not just create_cached_session().

Affected files:
- meetings/sources/portal.py: plain niquests.Session()
- tdocs/operations/checkout.py: 3 x requests.Session() context managers

Documented the rule in src/tdoc_crawler/AGENTS.md under a dedicated
'HTTP/3 MustDowngradeError' section with exact rationale, scope
table, and enforcement instructions.
parent e1fd4257
Loading
Loading
Loading
Loading
+27 −0
Original line number Diff line number Diff line
@@ -32,6 +32,33 @@ with create_cached_session() as session:
    response = session.get(url)
```

## HTTP/3 MustDowngradeError (CRITICAL)

When creating a **raw** `requests.Session()` or `niquests.Session()` (not via `create_cached_session`), **ALWAYS** pass `disable_http3=True`:

```python
# CORRECT — disables HTTP/3 to avoid MustDowngradeError from 3GPP servers
session = requests.Session(disable_http3=True)
# or
with requests.Session(disable_http3=True) as session:
    ...

# WRONG — will crash with MustDowngradeError on 3GPP portal
session = requests.Session()  # NEVER this
```

**Why:** `portal.3gpp.org` and `www.3gpp.org` advertise HTTP/3 via Alt-Svc header but cannot actually handle it. niquests tries to upgrade, fails with `MustDowngradeError`, and wastes up to 10 seconds retrying per request.

**Scope:** Applies to ALL naked `Session()` calls outside `create_cached_session()`:

| Location | File | Fixed? |
|----------|------|--------|
| `meetings/sources/portal.py` | `niquests.Session(...)` | ✅ |
| `http_client/session.py` | `requests.Session(...)` (via `create_cached_session`) | ✅ |
| `tdocs/operations/checkout.py` | `requests.Session(...)` (3 locations) | ✅ |

**Enforcement:** If you ever create a raw `niquests.Session()` or `requests.Session()` (aliased from niquests), add `disable_http3=True` without exception. The `create_cached_session()` factory already handles this correctly.

## Anti-Duplication (DRY)

Search before implementing. Check relevant domain package (`tdocs/`, `meetings/`, `specs/`). Refactor rather than duplicate.
+1 −1
Original line number Diff line number Diff line
@@ -114,7 +114,7 @@ def fetch_meetings(

    # Use a plain session — the cached session (hishel) doesn't handle
    # POST endpoints with JSON bodies correctly.
    session = niquests.Session()
    session = niquests.Session(disable_http3=True)
    try:
        while True:
            payload = {
+2 −2
Original line number Diff line number Diff line
@@ -186,7 +186,7 @@ def checkout_tdoc(
            corrected_url = _resolve_corrected_url(metadata)
            if corrected_url and corrected_url != metadata.url:
                logger.info("Retrying %s with corrected URL: %s", metadata.tdoc_id, corrected_url)
                with requests.Session() as plain_session:
                with requests.Session(disable_http3=True) as plain_session:
                    download_to_file(corrected_url, temp_zip_file, session=plain_session)
            else:
                raise
@@ -197,7 +197,7 @@ def checkout_tdoc(
            corrected_url = _resolve_corrected_url(metadata)
            if corrected_url and corrected_url != metadata.url:
                logger.info("Retrying %s with corrected URL: %s", metadata.tdoc_id, corrected_url)
                with requests.Session() as plain_session:
                with requests.Session(disable_http3=True) as plain_session:
                    download_to_file(corrected_url, temp_zip_file, session=plain_session)
            else:
                raise