Commit 1ccfc4a7 authored by Jan Reimes's avatar Jan Reimes
Browse files

refactor(server): remove unoserver management and related tests

* Delete ServerManager and ServerStatus classes for unoserver management.
* Remove related functions and tests for server detection and management.
* Update pyproject.toml to remove unoserver dependency.
* Refactor tests to remove server-related tests and adjust fixture names.
* Modify integration tests to focus on CLI mode without server dependency.
* Ensure all tests pass without the need for a running unoserver instance.
parent 0dbc163d
Loading
Loading
Loading
Loading
+141 −0
Original line number Diff line number Diff line
# convert-lo Usage Guide

## Basic Conversion

```python
from pathlib import Path
from convert_lo import Converter, LibreOfficeFormat

converter = Converter()
result = converter.convert(
    input_file=Path("document.docx"),
    output_format=LibreOfficeFormat.PDF,
    output_dir=Path("./output"),
)
print(f"Created: {result.output_path}")
```

## String Format

Formats can be specified as strings for convenience:

```python
result = converter.convert(
    input_file=Path("document.docx"),
    output_format="pdf",  # instead of LibreOfficeFormat.PDF
    output_dir=Path("./output"),
)
```

## Supported Formats

| Format | Enum | Extensions |
|--------|------|------------|
| PDF | `LibreOfficeFormat.PDF` | .pdf |
| Word | `LibreOfficeFormat.DOCX` | .docx, .doc |
| HTML | `LibreOfficeFormat.HTML` | .html |
| Markdown | `LibreOfficeFormat.MD` | .md |
| ODT | `LibreOfficeFormat.ODT` | .odt |
| Text | `LibreOfficeFormat.TXT` | .txt |

Full list in `convert_lo.formats.LibreOfficeFormat`.

## Batch Conversion

Convert multiple files sequentially:

```python
from pathlib import Path
from convert_lo import Converter, LibreOfficeFormat

converter = Converter()

files = [
    Path("doc1.docx"),
    Path("doc2.docx"),
    Path("doc3.docx"),
]

results = converter.convert_batch(
    files,
    LibreOfficeFormat.PDF,
    Path("./output"),
)

print(f"Converted {len(results)} files")
```

## Configuration

### Custom Timeout

```python
# Increase timeout for large files (default: 300 seconds)
converter = Converter(timeout=600)
```

### Custom LibreOffice Path

```python
from pathlib import Path
from convert_lo import Converter

converter = Converter(
    soffice_path=Path("C:/Program Files/LibreOffice/program/soffice.exe")
)
```

Or set environment variable:

```bash
export LIBREOFFICE_PATH="C:/Program Files/LibreOffice/program/soffice.exe"
```

## Error Handling

```python
from pathlib import Path
from convert_lo import Converter, LibreOfficeFormat
from convert_lo.exceptions import (
    ConversionError,
    UnsupportedConversionError,
    SofficeNotFoundError,
)

try:
    result = converter.convert(
        Path("document.docx"),
        LibreOfficeFormat.PDF,
        Path("./output"),
    )
except UnsupportedConversionError as e:
    print(f"Format not supported: {e}")
except ConversionError as e:
    print(f"Conversion failed: {e}")
except SofficeNotFoundError as e:
    print(f"LibreOffice not found: {e}")
```

## Performance Notes

- **Single file**: ~2-5 seconds depending on file size and complexity
- **Batch**: Sequential conversion, no parallelization
- **Large files**: Increase timeout for documents over 10MB

## CLI Usage

From command line:

```bash
uv run python -c "
from pathlib import Path
from convert_lo import Converter, LibreOfficeFormat

converter = Converter()
converter.convert(
    Path('input.docx'),
    LibreOfficeFormat.PDF,
    Path('output_dir'),
)
"
```
+9 −28
Original line number Diff line number Diff line
# convert-lo

LibreOffice document conversion with CLI and server modes.
LibreOffice document conversion using headless CLI mode.

## Architecture

### Conversion Modes

| Mode | Description | Performance |
|------|-------------|-------------|
| **CLI** (default fallback) | `soffice --headless --convert-to` | Baseline |
| **Server** (unoserver) | Persistent listener | 2-4x faster batch |

### Hybrid Detection

Default (`server_mode="auto"`):
1. Checks for running unoserver
2. Uses server if available
3. Falls back to CLI silently
Uses `soffice --headless --convert-to` for reliable, cross-platform conversion.
No server or daemon required.

### Key Components

| Module | Purpose |
|--------|---------|
| `converter.py` | `Converter` class with hybrid mode |
| `server.py` | `ServerManager` lifecycle |
| `converter.py` | `Converter` class with CLI conversion |
| `locator.py` | LibreOffice executable discovery |
| `formats.py` | `LibreOfficeFormat` enum |
| `benchmark.py` | Performance comparison |

## Configuration

@@ -34,9 +21,8 @@ Default (`server_mode="auto"`):

| Parameter | Default | Description |
|-----------|---------|-------------|
| `server_mode` | `"auto"` | `"auto"`, `"server"`, or `"cli"` |
| `server_port` | `2003` | Unoserver XML-RPC port |
| `auto_start_server` | `False` | Auto-start if needed |
| `soffice_path` | `None` | Path to soffice (auto-detected) |
| `timeout` | `300` | Seconds per file conversion |

### Environment Variables

@@ -69,11 +55,6 @@ uv run pytest tests/convert-lo/ --cov=convert_lo

## Implementation Notes

1. **Thread Safety**: LibreOffice is NOT thread-safe. Conversions run sequentially.
2. **Server Lifecycle**: Server persists after conversion. Use context manager for cleanup.
3. **Fallback Strategy**: Server failures trigger CLI retry.
4. **LSP Warnings**: `unoserver.client` import may show errors (optional import pattern).

## Usage Examples

See `docs/convert-lo-usage.md` for detailed usage patterns.
1. **Thread Safety**: Conversions run sequentially (LibreOffice is not thread-safe)
2. **Timeout**: Default 5 minute timeout per file
3. **No Server Required**: CLI mode is always available
+15 −110
Original line number Diff line number Diff line
# convert-lo

LibreOffice document conversion helpers with **hybrid mode** support for optimal performance.
LibreOffice document conversion using headless CLI mode.

## Features

- **Hybrid conversion**: Auto-detects running unoserver, falls back to CLI mode
- **2-4x faster** batch conversions with server mode
- **50-75% lower CPU load** when using persistent server
- **Graceful degradation**: Always works, even without server
- **Context manager**: Easy server lifecycle management
- **Simple and reliable**: Uses LibreOffice's built-in headless CLI mode
- **Cross-platform**: Works on Windows, macOS, and Linux
- **No server required**: Single file conversions without any daemon
- **Format detection**: Automatically handles format pairs

## Installation

@@ -16,19 +15,16 @@ LibreOffice document conversion helpers with **hybrid mode** support for optimal
uv add convert-lo
```

Requires:
- LibreOffice (any recent version)
- `unoserver>=3.6` (optional, enables server mode)
Requires LibreOffice (any recent version).

## Quick Start

### Basic Conversion (Auto-Detect)
### Basic Conversion

```python
from pathlib import Path
from convert_lo import Converter, LibreOfficeFormat

# Automatically uses server if available, otherwise CLI
converter = Converter()
result = converter.convert(
    input_file=Path("report.docx"),
@@ -46,70 +42,17 @@ results = converter.convert_batch(files, LibreOfficeFormat.PDF, Path("output"))
print(f"Converted {len(results)} files")
```

### Auto-Start Server (Recommended for Batches)
### String Format

```python
from convert_lo import Converter

# Automatically starts server, converts, then stops
with Converter(auto_start_server=True) as converter:
    results = converter.convert_batch(files, LibreOfficeFormat.PDF, Path("output"))
```

## Server Modes

| Mode | Behavior | Use Case |
|------|----------|----------|
| `"auto"` (default) | Detect server, fallback to CLI | General purpose |
| `"server"` | Require server, raise if unavailable | Dedicated server setups |
| `"cli"` | Always use CLI mode | Simple scripts, no server |

### Force Server Mode

```python
# Require server (raises error if unavailable)
converter = Converter(server_mode="server")

# Or auto-start if needed
converter = Converter(
    server_mode="auto",
    auto_start_server=True,
# Formats can be specified as strings
result = converter.convert(
    input_file=Path("report.docx"),
    output_format="pdf",  # instead of LibreOfficeFormat.PDF
    output_dir=Path("output"),
)
```

### Force CLI Mode

```python
# Skip server detection entirely
converter = Converter(server_mode="cli")
```

## Manual Server Management

```python
from convert_lo import ServerManager, Converter

# Start server explicitly
manager = ServerManager(port=2003)
manager.start()

# Use converter with specific server
converter = Converter(server_mode="server", server_port=2003)
results = converter.convert_batch(files, LibreOfficeFormat.PDF, Path("output"))

# Stop server
manager.stop()
```

### Check Server Status

```python
from convert_lo import is_server_running

if is_server_running("127.0.0.1", 2003):
    print("Server is available")
```

## Supported Formats

**Text**: ODT, DOC, DOCX, RTF, PDF, HTML, TXT, MD, EPUB  
@@ -120,41 +63,12 @@ if is_server_running("127.0.0.1", 2003):
```python
from convert_lo import LibreOfficeFormat

# All format examples
LibreOfficeFormat.PDF
LibreOfficeFormat.DOCX
LibreOfficeFormat.MARKDOWN  # or LibreOfficeFormat.MD
LibreOfficeFormat.HTML
```

## Performance

### Benchmark Script

Compare CLI vs server mode performance:

```bash
uv run python -m convert_lo.benchmark --file-count 20 --size medium
```

**Options:**
- `-n, --file-count`: Number of test files (default: 10)
- `-s, --size`: Document size: small/medium/large (default: medium)
- `-f, --format`: Output format (default: pdf)

### Expected Performance

| Scenario | CLI Mode | Server Mode | Speedup |
|----------|----------|-------------|---------|
| Single file | ~2-3s | ~2-3s | 1.0x |
| 10 files | ~20-30s | ~8-12s | **2-3x** |
| 50 files | ~100-150s | ~30-50s | **3-4x** |

Server mode benefits increase with:
- Larger file counts
- Complex documents
- Repeated conversions

## Configuration

### Converter Parameters
@@ -162,10 +76,7 @@ Server mode benefits increase with:
```python
Converter(
    soffice_path=None,  # Auto-detect LibreOffice
    server_host="127.0.0.1",  # Unoserver host
    server_port=2003,         # Unoserver XML-RPC port
    server_mode="auto",       # "auto" | "server" | "cli"
    auto_start_server=False,  # Auto-start if needed
    timeout=300,         # Timeout per file in seconds
)
```

@@ -195,12 +106,6 @@ except SofficeNotFoundError as e:
    print(f"LibreOffice not found: {e}")
```

### Fallback Behavior

- **Auto mode**: Server unavailable → CLI (silent)
- **Auto mode**: Server fails → CLI retry (warning logged)
- **Server mode**: Unavailable → raises `ConversionError`

## Testing

```bash
+0 −5
Original line number Diff line number Diff line
@@ -3,7 +3,6 @@
from convert_lo.converter import ConversionResult, Converter
from convert_lo.exceptions import ConversionError, SofficeNotFoundError, UnsupportedConversionError
from convert_lo.formats import LibreOfficeFormat
from convert_lo.server import ServerManager, ServerStatus, get_server_mode, is_server_running

__all__ = [
    "ConversionError",
@@ -11,9 +10,5 @@ __all__ = [
    "Converter",
    "LibreOfficeFormat",
    "SofficeNotFoundError",
    "ServerManager",
    "ServerStatus",
    "UnsupportedConversionError",
    "get_server_mode",
    "is_server_running",
]
+29 −149
Original line number Diff line number Diff line
@@ -7,20 +7,10 @@ import subprocess
from collections.abc import Iterable
from dataclasses import dataclass
from pathlib import Path
from typing import Literal

from convert_lo.exceptions import ConversionError, UnsupportedConversionError
from convert_lo.formats import UNSUPPORTED_CONVERSIONS, LibreOfficeFormat
from convert_lo.locator import find_soffice
from convert_lo.server import ServerManager, is_server_running

try:
    from unoserver.client import UnoClient

    HAS_UNOSERVER = True
except ImportError:
    UnoClient = None  # type: ignore[assignment, misc]
    HAS_UNOSERVER = False

logger = logging.getLogger(__name__)

@@ -35,116 +25,31 @@ class ConversionResult:


class Converter:
    """Convert documents using LibreOffice.
    """Convert documents using LibreOffice CLI.

    Supports two conversion modes:
    - **CLI mode**: Uses headless LibreOffice CLI (default, always available)
    - **Server mode**: Uses unoserver for faster batch conversions (requires running server)

    The converter automatically detects and uses a running unoserver when available,
    falling back to CLI mode otherwise.
    Uses headless LibreOffice CLI for reliable, cross-platform conversion.
    This is the most straightforward and reliable method.

    Example:
        ```python
        # Auto-detect server, fallback to CLI
        converter = Converter()
        result = converter.convert(input_file, "pdf", output_dir)

        # Force server mode (raises error if server unavailable)
        converter = Converter(server_mode="server")
        result = converter.convert(input_file, "pdf", output_dir)

        # Force CLI mode
        converter = Converter(server_mode="cli")
        result = converter.convert(input_file, "pdf", output_dir)
        ```
    """

    def __init__(
        self,
        soffice_path: Path | None = None,
        server_host: str = "127.0.0.1",
        server_port: int = 2003,
        server_mode: Literal["auto", "server", "cli"] = "auto",
        auto_start_server: bool = True,
        timeout: int = 300,
    ) -> None:
        """Initialize the converter.

        Args:
            soffice_path: Path to soffice executable. If None, auto-detects.
            server_host: Host for unoserver connection (default: 127.0.0.1).
            server_port: Port for unoserver connection (default: 2003).
            server_mode: Conversion mode:
                - "auto": Use server if running, else CLI (default)
                - "server": Require server (raises error if unavailable)
                - "cli": Always use CLI mode
            auto_start_server: If True and server_mode is "auto" or "server",
                attempt to start unoserver if not running.

        Raises:
            ConversionError: If server mode is required but server unavailable.
            timeout: Timeout in seconds per file conversion (default: 300).
        """
        self._soffice_path = soffice_path or find_soffice()
        self._server_host = server_host
        self._server_port = server_port
        self._server_mode = server_mode
        self._auto_start_server = auto_start_server
        self._server_manager: ServerManager | None = None
        self._client: object | None = None

        # Initialize client if server is available
        self._init_server_client()

    def _init_server_client(self) -> None:
        """Initialize unoserver client if server is available."""
        if not HAS_UNOSERVER:
            logger.debug("unoserver not installed, using CLI mode")
            return

        if self._server_mode == "cli":
            logger.debug("CLI mode forced, skipping server initialization")
            return

        # Check if server is running
        server_available = is_server_running(self._server_host, self._server_port)

        if not server_available and self._auto_start_server:
            try:
                self._server_manager = ServerManager(
                    host=self._server_host,
                    port=self._server_port,
                    soffice_path=self._soffice_path,
                )
                self._server_manager.start()
                server_available = True
                logger.info("Auto-started unoserver on %s:%d", self._server_host, self._server_port)
            except ConversionError as exc:
                logger.warning("Failed to auto-start server: %s", exc)
                if self._server_mode == "server":
                    raise

        if server_available:
            self._client = UnoClient(
                server=self._server_host,
                port=str(self._server_port),
                host_location="local",
            )
            logger.info("Using unoserver on %s:%d", self._server_host, self._server_port)
        elif self._server_mode == "server":
            msg = f"Unoserver required but not available at {self._server_host}:{self._server_port}"
            raise ConversionError(msg)
        else:
            logger.info("No server available, using CLI mode")

    @property
    def is_using_server(self) -> bool:
        """Check if converter is currently using unoserver."""
        return self._client is not None

    @property
    def server_mode(self) -> Literal["auto", "server", "cli"]:
        """Return the configured server mode."""
        return self._server_mode
        self._timeout = timeout

    def convert(
        self,
@@ -189,9 +94,6 @@ class Converter:

        try:
            logger.info("Converting %s to %s", input_file, output_format.value)
            if self._client is not None:
                self._convert_via_server(input_file, output_format.value, output_dir)
            else:
            self._run_conversion(input_file, output_format.value, output_dir)
        except subprocess.CalledProcessError as exc:
            msg = f"LibreOffice conversion failed for {input_file}: {exc.stderr or exc}"
@@ -201,41 +103,32 @@ class Converter:
            raise ConversionError(msg) from exc

        output_path = output_dir / f"{input_file.stem}.{output_format.value}"
        return ConversionResult(input_path=input_file, output_path=output_path, output_format=output_format)

    def _convert_via_server(self, input_file: Path, output_format: str, output_dir: Path) -> None:
        """Convert using unoserver.

        Args:
            input_file: Path to input file.
            output_format: Target format (e.g., 'pdf', 'docx').
            output_dir: Output directory.
        """
        if self._client is None:
            msg = "unoserver client not initialized"
            raise ConversionError(msg)

        output_path = output_dir / f"{input_file.stem}.{output_format}"

        try:
            self._client.convert(
                inpath=str(input_file),
                outpath=str(output_path),
                convert_to=output_format,
        return ConversionResult(
            input_path=input_file,
            output_path=output_path,
            output_format=output_format,
        )
        except Exception as exc:
            logger.warning("Server conversion failed, falling back to CLI: %s", exc)
            self._run_conversion(input_file, output_format, output_dir)

    def _run_conversion(self, input_file: Path, output_format: str, output_dir: Path) -> None:
        """Execute the LibreOffice conversion command.

        Args:
            input_file: Path to input file.
            input_file: Path to input file (must exist and be absolute).
            output_format: Target format (e.g., 'pdf', 'docx').
            output_dir: Output directory.
            output_dir: Output directory (must exist and be absolute).
        """
        cmd = [
        # Ensure absolute paths for security - validate before passing to subprocess
        input_file = input_file.resolve()
        output_dir = output_dir.resolve()

        if not input_file.is_file():
            msg = f"Input file is not a valid file: {input_file}"
            raise ConversionError(msg)
        if not output_dir.is_dir():
            msg = f"Output directory is not valid: {output_dir}"
            raise ConversionError(msg)

        cmd: list[str] = [
            str(self._soffice_path),
            "--headless",
            "--convert-to",
@@ -245,11 +138,12 @@ class Converter:
            str(input_file),
        ]

        result = subprocess.run(
        result = subprocess.run(  # noqa: S603
            cmd,
            capture_output=True,
            text=True,
            timeout=300,  # 5 minute timeout per file
            timeout=self._timeout,
            check=False,
        )

        if result.returncode != 0:
@@ -263,9 +157,6 @@ class Converter:
    ) -> list[ConversionResult]:
        """Convert multiple documents sequentially.

        When using unoserver, this is significantly faster than CLI mode
        as LibreOffice stays loaded between conversions.

        Args:
            input_files: Iterable of input file paths.
            output_format: Desired output format.
@@ -279,14 +170,3 @@ class Converter:
            result = self.convert(input_file, output_format, output_dir)
            results.append(result)
        return results

    def __enter__(self) -> Converter:
        """Start server on context entry if auto_start_server is True."""
        if self._auto_start_server and self._server_manager is not None:
            self._server_manager.start()
        return self

    def __exit__(self, exc_type, exc_val, exc_tb) -> None:
        """Stop server on context exit if auto_start_server is True."""
        if self._auto_start_server and self._server_manager is not None:
            self._server_manager.stop()
Loading