embedding

Granite Embedding Client for doc-extractor service.

Generates 768-dimensional embeddings using IBM Granite Embedding model via vLLM OpenAI-compatible API. Compatible with pgvector schema.

Classes

Client for IBM Granite Embedding model via vLLM.

Self-hosted embedding solution using IBM Granite on vLLM. Maintains 768-dimension compatibility for existing pgvector schema.

Constructor:

def __init__(self, config: Any) -> None

def generate_embeddings(self, texts: list[str], batch_size: int = 32) -> list[list[float]]

Generate embeddings for a list of texts.

Args: texts: List of text strings to embed batch_size: Maximum texts per API request

Returns: List of embedding vectors (768 dimensions each)

Raises: Exception: If embedding generation fails after retries ValueError: If embedding dimensions don't match expected 768

def health_check(self) -> bool

Check if the embedding endpoint is healthy.

Returns: True if endpoint is responsive, False otherwise

def close(self) -> None

Close the HTTP client and release resources.

def get_embedding_client() -> GraniteEmbeddingClient

Get or create singleton Granite embedding client.

def close_embedding_client() -> None

Close and release the singleton embedding client.

Call this during shutdown to properly release HTTP connections.