Document Database SDK 📄
The VecML Document Database SDK provides a high-performance, scalable document storage and retrieval system for applications requiring efficient document indexing, querying, and storage management. Upon this, you can build fast, scalable, and memory-efficient document storage and retrieval systems. 🚀
🛠 Creating a Fluffy Document Interface
To begin using the Fluffy Document Database, initialize an instance of FluffyDocumentInterface
. This class manages document storage, indexing, and retrieval.
Initialization
#include "fluffy_document_interface.h"
std::string database_path = "path/to/database";
std::string license_path = "license.txt";
// Create a document collection instance
fluffy::FluffyDocumentInterface docInterface(database_path, license_path);
database_path
: Directory where document data and indices are stored.license_path
: Path to a valid license file.
About License
Please contact sales@vecml.com to obtain a valid license.txt
. The license file is required to initialize and use the VecML SDK. Without a valid license, functionalities are restricted or unavailable. Ensure that the license.txt
file is placed in the correct directory and accessible by your application to avoid initialization errors.
📥 Adding Documents
You can add documents and its attributes to the database using the add_document()
method. Each document is identified by a unique document string ID.
fluffy::idx_t document_id = "unique_id";
std::string text = "This is an example document.";
std::string path = "path/to/the/file"; // you can specify the file path if needed
std::string title = "document_title";
std::string category = "news";
std::unordered_map<std::string, std::string> attributes = {{"title", title}, {"category", category}};
// Insert the document into the collection
fluffy::ErrorCode status = docInterface.add_document(document_id, text, path, attributes);
if (status == fluffy::ErrorCode::Success) {
std::cout << "Document added successfully!" << std::endl;
} else {
std::cerr << "Failed to add document. Error code: " << static_cast<int>(status) << std::endl;
}
🔍 Searching Documents
Full-text Search
To perform full-text search, use search_documents()
. The function returns relevant documents based on the provided query.
std::string query = "example";
fluffy::InterfaceDocumentQueryResults results;
// Execute search
int top_k = 10;
fluffy::ErrorCode status = docInterface.search_documents(query, top_k, results);
if (status == fluffy::ErrorCode::Success) {
std::cout << "Full text search with keyword: " << keyword << ":" << std::endl;
for (auto& doc : results.results) {
std::string doc_id = doc.document_id;
std::string raw_text;
docInterface.get_text(doc_id, raw_text);
std::cout << doc_id << " " << raw_text << std::endl;
}
} else {
std::cerr << "Search failed. Error code: " << static_cast<int>(status) << std::endl;
}
get_text()
can be used to retrieve the full text of a document specified by its ID.
Results Structure (InterfaceDocumentQueryResults
)
- A list of matching document string IDs, sorted by similarity from high to low in terms of full-text keyword (fuzzy) matching.
Attribute Search (e.g., by Title, Author)
To conduct keyword search on a specific attribute (for example, by document title or author), follow the two steps below:
- Create an index for the attribute using
attach_attribute_index()
:
docInterface.attach_attribute_index("title"); // create an index for attribute "title"
After attaching the index to the document interface, all documents added in the future will be added to the index automatically.
- Use
search_attribute()
function to perform keyword search on an attribute:
std::string title_keyword = "Animali";
fluffy::InterfaceDocumentQueryResults attr_results;
int top_k = 10;
docInterface.search_attribute(title_keyword, "title", top_k, attr_results);
std::cout << "Title search with keyword" << title_keyword << ":" << std::endl;;
for (auto& doc : attr_results.results) {
std::string doc_id = doc.document_id;
std::string title;
docInterface.get_document_attribute(doc_id, "title", title);
std::cout << doc_id << " " << title << std::endl;
}
get_document_attribute()
can be used to retrieve an attribute from a document specified by its ID.
Results Structure (InterfaceDocumentQueryResults
)
- A list of matching document string IDs, sorted by similarity from high to low in terms of atttribute keyword (fuzzy) matching.
🗑 Removing a Document
To remove a document, call remove_document()
.
std::string document_id = "id_to_remove";
// Remove the document
fluffy::ErrorCode status = docInterface.remove_document(document_id);
if (status == fluffy::ErrorCode::Success) {
std::cout << "Document removed successfully!" << std::endl;
} else {
std::cerr << "Failed to remove document. Error code: " << static_cast<int>(status) << std::endl;
}
💾 Managing Storage and Performance
Flushing Data to Disk
To ensure that all changes are persisted to disk, call flush()
:
fluffy::ErrorCode status = docInterface.flush();
if (status == fluffy::ErrorCode::Success) {
std::cout << "Data successfully flushed to disk." << std::endl;
}
Calling flush()
prevents data loss in case of system crashes.
Optimizing Memory Usage
To reduce memory usage, use offload()
, which moves unused data from memory to disk.
fluffy::ErrorCode status = docInterface.offload();
if (status == fluffy::ErrorCode::Success) {
std::cout << "Memory offloaded successfully." << std::endl;
}
Key Notes:
- The system will automatically reload necessary data when required.
- Calling
offload()
too frequently may impact performance.
Fine-Grained Offloading
If you need more control over offloading, use:
offload_data()
: Moves document text data to disk.offload_index()
: Moves indexing structures to disk.
docInterface.offload_data(); // Offload document text data
docInterface.offload_index(); // Offload indexing structures
Use these selectively when memory is constrained.
🚀 Best Practices
✅ Use flush()
regularly to persist changes.
✅ Use offload()
after indexing and querying to free memory.
✅ Batch insert documents to improve performance.
✅ Ensure unique document IDs to avoid overwriting data.