main
Main entry point for text extraction service.
Classes
DocumentProcessor
Processes documents from Kafka messages.
Constructor:
def __init__(self) -> None
Methods
process_document
def process_document(self, message_data: dict[str, Any]) -> None
Process a single document with idempotency checks.
Args: message_data: Kafka message data with keys:
- document_id: str (required) - Document UUID
- storage_path: str (required) - Relative path within bucket (e.g., "path/to/file.ext")
- organization_id: str (required)
- project_id: str (optional)
- filename: str (optional)
- trace_id: str (optional) - End-to-end trace ID from workspace-api
- user_id: str (optional)
- metadata: dict (optional) - JSON metadata to store in database
- file_size_bytes: int (optional)
- content_type: str (optional)
Functions
main
def main() -> None
Main entry point.