embedding
Granite Embedding Client for doc-extractor service.
Generates 768-dimensional embeddings using IBM Granite Embedding model via vLLM OpenAI-compatible API. Compatible with pgvector schema.
Classes
GraniteEmbeddingClient
Client for IBM Granite Embedding model via vLLM.
Self-hosted embedding solution using IBM Granite on vLLM. Maintains 768-dimension compatibility for existing pgvector schema.
Constructor:
def __init__(self, config: Any) -> None
Methods
generate_embeddings
def generate_embeddings(self, texts: list[str], batch_size: int = 32) -> list[list[float]]
Generate embeddings for a list of texts.
Args: texts: List of text strings to embed batch_size: Maximum texts per API request
Returns: List of embedding vectors (768 dimensions each)
Raises: Exception: If embedding generation fails after retries ValueError: If embedding dimensions don't match expected 768
health_check
def health_check(self) -> bool
Check if the embedding endpoint is healthy.
Returns: True if endpoint is responsive, False otherwise
close
def close(self) -> None
Close the HTTP client and release resources.
Functions
get_embedding_client
def get_embedding_client() -> GraniteEmbeddingClient
Get or create singleton Granite embedding client.
close_embedding_client
def close_embedding_client() -> None
Close and release the singleton embedding client.
Call this during shutdown to properly release HTTP connections.