Skip to main content

model_tokenizer

Shared tokenizer utilities aligned with the embedding model.

Uses the same tokenizer as the Granite embedding model to ensure accurate token counting for chunk sizing. The tokenizer is downloaded from Hugging Face and cached locally for fast, in-process tokenization. Also saves/loads from infrastructure bucket for faster startup across service instances.

Functions

count_tokens

def count_tokens(text: str) -> int

Return token count using the model's tokenizer.

encode

def encode(text: str) -> list[int]

Encode text into token ids without adding special tokens.

decode

def decode(tokens: Iterable[int]) -> str

Decode token ids back into text without cleanup stripping spaces.

split_by_tokens

def split_by_tokens(text: str, max_tokens: int) -> list[str]

Split text into segments that are each <= max_tokens.