A general wrapper class for Huggingface Transformers models to generate text embeddings.
The HuggingFaceVectoriser accepts most encoder-based models from the Huggingface Transformers library, and provides a simple interface to generate embeddings from text data. Additional configuration options, such as trust_remote or a HuggingFaceAPI token can be passed via the tokenizer_kwargs and model_kwargs parameters.
Attributes
Name
Type
Description
model_name
str
The name of the Huggingface model to use.
tokenizer
transformers.PreTrainedTokenizer
The tokenizer for the specified model.
model
transformers.PreTrainedModel
The Huggingface model instance.
device
torch.device
The device (CPU or GPU) on which the model is loaded.
tokenizer_kwargs
dict
Additional keyword arguments passed to the tokenizer.