Vector Database SDK π
VecML provides an easy-to-use, fast, and accurate vector database SDK toolkit.
π Creating a Fluffy Interface
To use all features of the SDK, include the following header:
#include "fluffy_interface.h"
First, we create a Fluffy Interface instance (fluffyInterface).
A Fluffy Interface manages one data collection and all the indices built for that collection.
std::string fluffy_license_path = "license.txt";
std::string base_path = "path/to/your/project";
fluffy::FluffyInterface fluffyInterface(base_path, fluffy_license_path);
A fluffy interface is created at base_path. Data and indices will be stored in this directory.
π Adding Vectors to the Interface
We now can add a data vector into the data collection. Each vector is stored in a fluffy::Vector object, with the following components:
string_id (required, str):a unique string identifier of the vector.vector (required):the vector data (the supported format will be introduced below).Attributes (optional, str):extra properties of the vector, e.g., date, location, title.
Currently the VecML SDK supports the following vector types:
dense:the standard float32 dense vector. For example, [0, 1.2, 2.4, -10 ,5.7]. Standard embedding vectors from language or vision models can be saved as this type.dense8Bit:uint8 dense vectors, with integer vector elements ranging in [0, 255]. For example, [0, 3, 76, 255, 152]. 8-bit quantized embedding vectors can be saved as this type for storage saving.dense4Bit:4-bit quantized dense vectors, with integer vector elements ranging in [0, 15].dense2Bit:2-bit quantized dense vectors, with integer vector elements ranging in [0, 3].dense1Bit:1-bit quantized dense vectors, with binary vector elements.sparse:sparse vector formatted as a set ofindex:valuepairs (e.g., the libsvm data format). This is useful for high-dimensional sparse vectors.
Given a C++ vector, we first need to build the fluffy::Vector with the proper type:
// for Dense type
std::vector<float> cpp_vector = {0.1, 0.5, -0.7}; // float32 c++ vector
std::unique_ptr<fluffy::Vector> vector;
fluffyInterface.build_vector_dense(cpp_vector, vector);
// for X-Bit Dense type
std::vector<uint8_t> cpp_vector = {3, 217, 56}; // uint8_t c++ vector
std::unique_ptr<fluffy::Vector> vector;
int bits_per_element = 8; // or 4, 2, 1
fluffyInterface.build_vector_bit(cpp_vector, bits_per_element, vector);
// for Sparse type
std::vector<std::pair<fluffy::idx_t, float>> cpp_vector = {{0, 0.1}, {7, 0.5}, {102948, -0.3}};
std::unique_ptr<fluffy::Vector> vector;
int dim = 0;
fluffyInterface.build_vector_sparse(cpp_vector, vector, dim);
dim to zero.
We can also set attributes of the vector by
std::string date = "2025-03-11";
std::string sentence = "hello, world";
vector->set_attribute("date", reinterpret_cast<const uint8_t*>(date.data()), date.size());
vector->set_attribute("sentence", reinterpret_cast<const uint8_t*>(sentence.data()), sentence.size());
After building the fluffy::Vector object, to insert the vector to fluffy interface:
std::string string_id = "unique_id";
fluffyInterface.add_data(string_id, vector);
NOTE: For one vector collection, all vectors added are required to have same type and dimensionality.
π Building an Index
After vectors are inserted into the data collection (managed by fluffyInterface), you can call attach_index() to create an ANN index for nearest neighbor search.
Distance Types. The VecML SDK supports the following distance function types to build the index:
-
Dense (float32) and sparse vectors: Euclidean (
DistanceFunctionType::Euclidean), cosine (DistanceFunctionType::NegativeCosineSimilarity), inner product (DistanceFunctionType::NegativeInnerProduct) -
Quantized dense vectors (8-bit, 4-bit, 2-bit, 1-bit): Euclidean (
DistanceFunctionType::Euclidean), Hamming (DistanceFunctionType::Hamming)
NOTE: for cosine similarity and inner product, by design we add a negative sign to the true value (e.g., if cosine=0.91, it will be shown as -0.91). You can take the absolute value to get the "original" similarity.
Standard Index
API signature:
ErrorCode attach_index(dim_t dim, const std::string vector_type, DistanceFunctionType dist_type, std::string index_name = "", const int num_threads = 1)
Parameters:
dim (fluffy::dim_t): the dimensionality of the vector.vector_type (string): vector type of the index, must match the actual vector type of the vector collection. Supported:"dense",dense1Bit,"dense2Bit","dense4Bit","dense8Bit","sparse".dist_type (fluffy::DistanceFunctionType): distance function type. Supported:fluffy::DistanceFunctionType::Euclidean,fluffy::DistanceFunctionType::NegativeCosineSimilarity,fluffy::istanceFunctionType::NegativeInnerProduct.index_name (string): name of the index.num_threads (int): number of threads for parallel indexing.
Below is an example of building a standard vector index after the vectors are inserted into the data collection.
std::string index_name = "standard_index";
int num_threads = 4;
fluffy::DistanceFunctionType distance_type = fluffy::DistanceFunctionType::Euclidean;
int dim = ...; // this is the dimension of your vector
// for dense vector
fluffyInterface.attach_index(dim, "dense", distance_type, index_name, num_threads);
// for quantized dense vector
fluffyInterface.attach_index(dim, "dense4Bit", 4, distance_type, index_name, num_threads); // similar for 8-bit, 2-bit, 1-bit
// for sparse vector
fluffyInterface.attach_index(0, "sparse", distance_type, index_name, num_threads); // for sparse vectors, we can set dim = 0
Fast Index
The VecML SDK supports an alternative index type called "fast index". The indexing speed could be magnitude faster than the standard index, at the cost of slightly slower query latency. Fast index is also memory efficient during indexing. Recommended in resource constraint applications.
Currently, fast index only supports dense vectors with Euclidean distance or cosine similarity.
API signature:
ErrorCode attach_soil_index(dim_t dim, const std::string vector_type, DistanceFunctionType dist_type, float shrink_rate = 0.4, std::string index_name = "", int max_num_samples = 100000, int num_threads = 1):
Parameters:
dim (fluffy::dim_t): the dimensionality of the vector.vector_type (string): vector type of the index, must match the actual vector type of the vector collection. Supported:"dense".dist_type (fluffy::DistanceFunctionType): distance function type. Supported:fluffy::DistanceFunctionType::Euclidean,fluffy::DistanceFunctionType::NegativeCosineSimilarity.index_name (string): name of the index.shrink_rate (float): a parameter for fast indexing in (0, 1]. Larger value results in slightly slower indexing speed but potentially higher recall. Default value is 0.4, which should achieve a good utility-speed tradeoff in most applications.max_num_samples (int): an approximate estimate of how many vectors will be indexed. For example, it can be set to the number of total vectors if this number is known. Some intrinsic parameters will be determined by this value for a balanced performance. We recommand to set this value properly that reflects the data scale. Default value is 100000.num_threads (int): number of threads for parallel indexing.
Example:
std::string index_name = "fast_index";
int num_threads = 4;
fluffy::DistanceFunctionType distance_type = fluffy::DistanceFunctionType::NegativeCosineSimilarity;
int dim = ...; // this is the dimension of your vector
// for dense vector
fluffyInterface.attach_soil_index(dim, "dense", distance_type, 0.4, index_name, 10000000, num_threads);
Alternative Approach
You can create the vector collection and build the index simultaneously. Simply:
- Attach the index to fluffy interface before adding data to the vector collection.
- As you add vectors to the collection, they will also be indexed in real-time.
Here's an example workflow:
fluffyInterface.attach_index(1024, "dense", fluffy::DistanceFunctionType::Euclidean, "index_1");
std::vector<float> cpp_vector = ...;
std::unique_ptr<fluffy::Vector> vector;
fluffyInterface.build_vector_dense(cpp_vector, vector);
fluffyInterface.add_data(string_id, vector);
Batch Insertion for Indexing (Multi-Thread)
Like in attach index endpoints, the VecML SDK also supports vector batch insertion which inserts vectors to indices in parallel.
fluffyInterface.attach_index(3, "dense", distance_type, index_name); // attach index first
std::vector<std::pair<std::string, std::unique_ptr<fluffy::Vector>>> batch_data;
for (int i=0; i<100; ++i) {
std::vector<float> cpp_vector = {0.1+i, 0.5+i, -0.7+i}; // as illustration, we construct some synthetic vectors
std::unique_ptr<fluffy::Vector> vector;
fluffyInterface.build_vector_dense(cpp_vector, vector);
batch_data.push_back({std::to_string(i), std::move(vector)});
}
int num_threads = 4;
fluffyInterface.add_data_batch(batch_data, num_threads); // batch insert to index
num_threads: the number of threads used for parallel data insertion.
The batch size of the vectors can be flexible. For example, 1000 vectors per batch could be a reasonable number.
Thread Safety
If batch insertion is used, do not apply other custom multithreading (e.g., std::thread in C++) or use OpenMP.
Deleting an Index
To delete an index, call delete_index function:
std::string index_name = "index_1";
fluffyInterface.delete_index(index_name);
π Performing a Nearest Neighbor Search
After building an index, you can search for the top-k nearest neighbors by creating a Query object.
API signature:
ErrorCode search(Query& query, InterfaceQueryResults& result, const std::string index_name = "", float search_intensity = 0.3, float search_budget = 0.1)
Parameters:
query (fluffy::Query&): the query object.results (fluffy::InterfaceQueryResults&): Output retrieved results containing pairs of(string_id, distance), starting from top-1 (nearest) to top-k.index_name (string): Name of the index to search.search_intensity (float): Controls search thoroughness for fast indexes. Range: [0, 1]. Only used by fast index. Default: 0.3.search_budget (float): Controls computational budget for search. Range: [0, 1]. Used for both standard index and fast index. Default: 0.1.
Note:
- Increasing
search_intensityandsearch_budgetimproves search accuracy but increases latency. Tune these values to balance accuracy and performance for your application.
Searching a Standard Index
Tune search_budget to balance the accuracy-latency tradeoff. Below is an example of searching a standard index:
// Create and configure the query
fluffy::Query query;
query.top_k = 3; // Number of nearest neighbors to return
query.similarity_measure = fluffy::DistanceFunctionType::Euclidean;
// Build the query vector (same process as adding a vector)
std::vector<float> cpp_query_vector = {0.0, -0.2, 0.6};
std::unique_ptr<fluffy::Vector> query_data;
fluffyInterface.build_vector_dense(cpp_query_vector, query_data);
query.vector = query_data.get(); // Assign the raw pointer
// Perform the search
fluffy::InterfaceQueryResults result;
fluffyInterface.search(query, result, "index_1", 0, 0.4); // search_intensity is not used by standard index, so we can set to 0
// Access and print the results
for (size_t i = 0; i < result.results.size(); ++i) {
const auto& res = result.results[i];
std::cout << "Neighbor " << (i + 1) << ": "
<< res.string_id << " "
<< res.dis << std::endl;
}
Example output when k=3:
Neighbor 1: "string_id_83" 0.52
Neighbor 2: "string_id_47203" 0.69
Neighbor 3: "string_id_2751" 0.72
The returned result is a vector of (string_id, distance) pairs, ordered by similarity. The first result is the most similar to the query vector.
Retrieving vectors by string_id:
You can retrieve the corresponding vector from the data collection using its string_id:
// Retrieve a dense vector by its string_id
std::vector<std::vector<float>> retrieved_vector;
std::string string_id = "string_id_83";
fluffyInterface.to_vector_dense(string_id, retrieved_vector);
Searching a Fast Index
For fast indexes, you can tune the search_intensity and search_budget parameters to control the accuracy-latency tradeoff:
Example:
// Create and configure the query
fluffy::Query query;
query.top_k = 3;
query.similarity_measure = fluffy::DistanceFunctionType::Euclidean;
// Build the query vector
std::vector<float> cpp_query_vector = {0.0, -0.2, 0.6};
std::unique_ptr<fluffy::Vector> query_data;
fluffyInterface.build_vector_dense(cpp_query_vector, query_data);
query.vector = query_data.get();
// Perform the search with query-time parameters
fluffy::InterfaceQueryResults result;
fluffyInterface.search(query, result, "fast_index_1", 0.5, 0.5); // search_intensity = 0.5, search_budget = 0.5
Batch Search (Multi-threaded Search for Multiple Queries)
For efficient processing of multiple queries, use the search_batch API, which leverages multi-threading to process queries in parallel.
API signature:
std::vector<ErrorCode> search_batch(std::vector<Query>& queries, std::vector<InterfaceQueryResults>& results, const std::string index_name = "", float search_intensity = 0.3, float search_budget = 0.1, int num_threads = 1)
Parameters:
queries (vector<Query>&): Input vector of query objectsresults (vector<InterfaceQueryResults>&): Output vector of results (one per query)index_name (string): Name of the index to searchsearch_intensity (float): Controls search thoroughness for fast indexes. Range: [0, 1]. Default: 0.3.search_budget (float): Controls computational budget for fast indexes. Range: [0, 1]. Default: 0.1.num_threads (int): Number of parallel threads for batch processing. Default: 1.
Batch Search for Standard Index
Tune search_budget to balance the accuracy-latency tradeoff.
// Prepare multiple queries
std::vector<fluffy::Query> queries;
std::vector<std::unique_ptr<fluffy::Vector>> query_vectors; // Keep vectors alive
for (int i = 0; i < 100; ++i) {
// Build query vector
std::vector<float> cpp_query_vector = {0.1 * i, -0.2 * i, 0.6};
std::unique_ptr<fluffy::Vector> query_data;
fluffyInterface.build_vector_dense(cpp_query_vector, query_data);
// Create query
fluffy::Query query;
query.top_k = 10;
query.similarity_measure = fluffy::DistanceFunctionType::Euclidean;
query.vector = query_data.get();
queries.push_back(query);
query_vectors.push_back(std::move(query_data));
}
// Perform batch search with 8 threads
std::vector<fluffy::InterfaceQueryResults> results;
std::vector<fluffy::ErrorCode> error_codes = fluffyInterface.search_batch(queries, results, "index_1", 0.0, 0.5, 8); // search_intensity is not used by standard index
// Check for errors and process results
for (size_t i = 0; i < error_codes.size(); ++i) {
if (error_codes[i] != fluffy::ErrorCode::Success) {
std::cerr << "Query " << i << " failed with error code: "
<< static_cast<int>(error_codes[i]) << std::endl;
continue;
}
std::cout << "Query " << i << " results:" << std::endl;
for (size_t j = 0; j < results[i].results.size(); ++j) {
const auto& res = results[i].results[j];
std::cout << " Neighbor " << (j + 1) << ": "
<< res.string_id << " " << res.dis << std::endl;
}
}
Batch Search for Fast Index
For fast indexes, you can tune the search_intensity and search_budget parameters to control the accuracy-latency tradeoff across all queries:
// Prepare multiple queries (same as above)
std::vector<fluffy::Query> queries;
std::vector<std::unique_ptr<fluffy::Vector>> query_vectors;
for (int i = 0; i < 100; ++i) {
std::vector<float> cpp_query_vector = {0.1 * i, -0.2 * i, 0.6};
std::unique_ptr<fluffy::Vector> query_data;
fluffyInterface.build_vector_dense(cpp_query_vector, query_data);
fluffy::Query query;
query.top_k = 10;
query.similarity_measure = fluffy::DistanceFunctionType::NegativeInnerProduct;
query.vector = query_data.get();
queries.push_back(query);
query_vectors.push_back(std::move(query_data));
}
// Perform batch search with tuned parameters
std::vector<fluffy::InterfaceQueryResults> results;
std::vector<fluffy::ErrorCode> error_codes = fluffyInterface.search_batch(queries, results, "fast_index_1", 0.5, 0.5, 8);
// Process results
for (size_t i = 0; i < results.size(); ++i) {
if (error_codes[i] == fluffy::ErrorCode::Success) {
std::cout << "Query " << i << " found " << results[i].results.size()
<< " neighbors" << std::endl;
}
}
βSearch with Filter / Condition Constraints
VecML SDK supports nearest neighbor search under filter/condition constraints. For example, you can search top-k neighbors of a query under the constraint that the vector attribute date is within a range. Fluffy allows very general and flexible conditions---the condition is simply a user-defined C++ Lambda function on a fluffy::Vector that returns a bool, which can depend on the vector data and its attributes. An example is given below, where the filter is a date range.
fluffy::Condition condition;
std::string start_attr = "2024-03-11";
std::string end_attr = "2024-05-26";
auto eval_func = [start_attr, end_attr](const fluffy::Vector& vector) -> bool {
std::string date;
if (vector.get_attribute("date", date) == fluffy::ErrorCode::Success) { // retrieve date from vector
return date >= start_attr && date <= end_attr; // check if the date is within the start and end range; if yes, return true
}
return false;
};
condition.set_eval(eval_func);
// Then add the condition to query and search
query.condition = &condition;
fluffy::InterfaceQueryResults result;
fluffyInterface.search(query, result, "index_1"); // or use search_batch
The retrieved results are guaranteed to satisfy the user-defined condition, and it is guaranteed that exactly k nearest neighbors will be returned, unless there are in total fewer than k vectors in the data collection that satisfy the condition.
Managing Storage and Performance
πΎ Flushing Data to Disk
To ensure that all changes are persisted to disk, call flush():
fluffy::ErrorCode status = fluffyInterface.flush();
if (status == fluffy::ErrorCode::Success) {
std::cout << "Data successfully flushed to disk." << std::endl;
}
flush() prevents data loss in case of system crashes.
βοΈ Saving Memory
Efficient memory management is crucial when working with large vector datasets. Our SDK provides an offload() function that allows offloading memory to disk to reduce runtime memory usage.
Using Offload to Reduce Memory Usage
The fluffyInterface class provides the following method:
fluffyInterface.offload();
Calling offload() frees up memory by offloading unused or cached data to disk. The system will transparently reload data as needed, meaning you do not need to manage reloading manually.
When to Call Flush/Offload?
While flush() and offload() can be called at any time, frequent calls may slightly slow down the system due to repeated disk operations. Consider using them strategically:
β Best Practices:
-
Call
flush()after a large batch of data is inserted/indexed, or a relatively long time window periodically. -
Call
offload()after indexing large datasets to free memory used during index creation. -
Call
offload()after performing queries if you have limited memory. -
Do not call
flush()oroffload()in rapid succession (e.g., in a loop), as it may slow down the system. If necessary, call them after a batch of operations.
Example Usage
// Reduce memory usage after indexing
fluffy::ErrorCode status = fluffyInterface.offload();
if (status == fluffy::ErrorCode::Success) {
std::cout << "Memory offloaded successfully." << std::endl;
} else {
std::cerr << "Offload failed with error: " << static_cast<int>(status) << std::endl;
}
Memory Budget Considerations
If your application has a low memory budget, using offload() after indexing and querying is recommended. This approach ensures that only actively used data remains in memory while older or less frequently used data is offloaded.
By using offload(), you can efficiently manage memory usage without sacrificing functionality, as data will be transparently reloaded when needed.
πRemove a Vector
To remove a vector (both from data collection and the indices), simply call remove_data with the string ID of the vector:
std::string string_id = "unique_id";
fluffyInterface.remove_data(string_id);
We can also remove a batch of vectors with a vector of string ids with the remove_data endpoint:
std::vector<std::string> string_ids = {"id0", "id1"};
fluffyInterface.remove_data(string_ids);
The removed vector(s) will no longer be searchable.
Disk Space Management
While remove_data guarantees the removed vector will not present in any retrieval results anymore, it does not remove the vector from the vector collection and indices from disk immediately. To free up the disk space, call
fluffyInterface.shrink_to_fit();
shrink_to_fit() takes some time to run because the database on disk need to be re-factored. Therefore, it is not recommended to call it too frequently. It is recommanded to call it after a massive number of vectors are removed and the disk space are needed immediately.