indexers
indexers
This module provides functionality for creating a VectorStore from a CSV (text) file. It defines the VectorStore class, which is used to model and create vector databases from CSV text files using a Vectoriser object.
This class requires a Vectoriser object from the vectorisers submodule, to convert the CSV’s text data into vector embeddings which are then stored in the VectorStore objects.
Key Features:
- Batch processing of input files to handle large datasets.
- Support for CSV file format (additional formats may be added in future updates).
- Integration with a custom embedder for generating vector embeddings.
- Logging for tracking progress and handling errors during processing.
VectorStore Class:
- The
VectorStoreclass is initialized with aVectoriserobject and a CSV knowledgebase. - Additional columns in the CSV may be specified as metadata to be included in the vector database.
- Upon creation, the
VectorStoreis saved in parquet format for efficient, and quick reloading via theVectorStore’s.from_filespace()method. - A new piece of text data (or label) can be queried against the
VectorStorein the following ways:.search(): to find the most semantically similar pieces of text in the vector database..reverse_search(): to find all examples in the knowledgebase that have a given label..embed(): to generate a vector embedding for a given piece of text data.
- ‘Hook’ methods may be specified to perform pre-processing on input data before embedding, and post-processing on the output of the search methods.