model_tokenizer
Shared tokenizer utilities aligned with the embedding model.
Uses the same tokenizer as the Granite embedding model to ensure accurate token counting for chunk sizing. The tokenizer is downloaded from Hugging Face and cached locally for fast, in-process tokenization. Also saves/loads from infrastructure bucket for faster startup across service instances.
Functions
count_tokens
def count_tokens(text: str) -> int
Return token count using the model's tokenizer.
encode
def encode(text: str) -> list[int]
Encode text into token ids without adding special tokens.
decode
def decode(tokens: Iterable[int]) -> str
Decode token ids back into text without cleanup stripping spaces.
split_by_tokens
def split_by_tokens(text: str, max_tokens: int) -> list[str]
Split text into segments that are each <= max_tokens.