Vector DB CLI 🚀
The VecML Command Line Tool (CLI) provides an easy-to-use interface to interact with your cloud VecML database. It allows the user to manage projects, datasets, indices, and perform searches using various methods throuogh command line.
Downloads, Setup, and Authentication (API key)
Download the CLI Executables:
Please download the executable for your system architecture and version.
Platform | Architecture | Version | Download Link |
---|---|---|---|
Linux | x86_64 | Ubuntu 22 | Download |
Linux | x86_64 | Ubuntu 20 | Download |
Windows | x86_64 | Windows 10, 11 | Download |
MacOS | ARM | MacOS | Download |
To use VecML CLI, simply run the executable.
Authentication (API key):
VecML CLI is authenticated through user's API key. The API key can be generated as follows:
- Go to https://account.vecml.com/user-api-keys , sign up a free VecML account.
- After registration, you can get an API key by clicking "Create New API Key". Now you can log in to VecML CLI with this API key.
- This unique API key will only be seen once at the creation time, so please keep the API key safely. If you need to re-generate an API key, simply delete the previous one and then create a new key.
Recommanded: Set up the VecML API key as an environment variable to waive the need of copying it every time. Once configured, CLI will detect the key automatically when logging in.
Linux:
- For bash:
echo 'export VECML_API_KEY="your_vecml_api_key"' >> ~/.bashrc source ~/.bashrc
- For Zsh:
echo 'export VECML_API_KEY="your_vecml_api_key"' >> ~/.zshrc source ~/.zshrc
Windows:
setx VECML_API_KEY "your_vecml_api_key"
MacOS:
- For bash:
echo 'export VECML_API_KEY="your_vecml_api_key"' >> ~/.bash_profile source ~/.bash_profile
- For Zsh:
echo 'export VECML_API_KEY="your_vecml_api_key"' >> ~/.zprofile source ~/.zprofile
Commands Overview
- HELP: Display a list of valid commands or command-specific help.
- ATTACH_INDEX: Attach an index to a dataset within a project.
- CREATE_PROJECT: Create a new project with a specified application.
- DELETE_PROJECT: Delete an existing project (confirmation required).
- LIST_PROJECTS: Retrieve a list of all projects.
- DELETE_DATASET: Delete a dataset from a project (confirmation required).
- DELETE_INDEX: Delete an index from a dataset (confirmation required).
- LIST_DATASETS: Retrieve the list of all datasets in a project.
- INIT_DATASET: Initialize/create a dataset.
- INSERT_VECTOR: Insert an individual vector into a dataset (and update attached indices).
- INSERT_VECTOR_BATCH: Insert a batch of vectors in one request. Highly recommended when inserting multiple vectors.
- DELETE_VECTOR: delete vectors from data collection and all its attached indices.
- LIST_INDEX: List all indices attached to a dataset along with their information.
- SEARCH: Perform a nearest neighbor search using different query types.
- UPLOAD: Upload a file to add vectors to a dataset.
Lower-case commands and naming
The CLI commands are called "reserved words", e.g., CREATE_PROJECT. For convenience, we also allow the use of lower case reserved words, e.g., create_project can also be used to create a new project. Please avoid including two reserved words in one command, i.e., try not to use reserved words as project, dataset, and index names. This may cause a parsing error.
Rate limit
Due to resource allocation constraints and service stability, VecML cloud has a limit on the number of operations per second. Please avoid high frequency operations. When inserting multiple vectors, please use INSERT_VECTOR_BATCH with a reasonably large batch size or UPLOAD with multiple data files/shards.
Batch insertion and file upload data size
Due to the consideration of network stability:
-
The size of a single UPLOAD request cannot exceed 1GB.
-
The size of a single INSERT_VECTOR_BATCH request cannot exceed 200MB.
Managing Unique IDs for vectors
Vectors are identified by UNIQUE string IDs. While VecML provide a way to maintain auto-generated string IDs internally, we highly encourage the users to maintain and specify unique IDs for the vectors, for the convenience of database operations. Multiple ways are available to specify the IDs, please check the data inserting functions for details.
An Example WorkFlow
With VecML API, managing vector database and searching for queries become easy. Below is a standard workflow using VecML RESTful API:
-
Call
CREATE_PROJECT
to create a new project. -
Create a dataset within the project. Two convenient options are:
(i) Call
INIT_DATASET
to initialize/create a new dataset. Then useINSERT_VECTOR_BATCH
to (iteratively) add batch of vectors to the dataset.(ii) Upload a data file (supported types: csv, json, binary format, libsvm) as a dataset using
UPLOAD
. You can iteratively upload files to a data collection. -
Build an index for the dataset using
ATTACH_INDEX
. -
Conduct approximate nearest neighbor search using
SEARCH
.
Detailed Command Usage
HELP Commands
- Usage:
HELP
<command> -h
- Description:
Displays a list of all valid commands or, if used with a command flag, shows detailed instruction for that specific command.
ATTACH_INDEX
- Usage:
ATTACH_INDEX <project_name> <dataset_name> <index_name> <dist_type> <large_index>
- Description:
Attaches an index to a dataset within the specified project. - Valid Values for
dist_type
:"L2"
"Cosine"
"InnerProduct"
"Hamming"
large_index
: (bool, optional) whether to use advanced approach to optimize the performance for very large index. For large indices, we provide techniques to balance the query efficiency, accuracy, and memory usage. It is highly recommended to setlarge_index
to true if your data collection has more than ~2M vectors.- Example:
ATTACH_INDEX ProjectAlpha Dataset1 IndexCosine Cosine false
CREATE_PROJECT
- Usage:
CREATE_PROJECT <project_name> <application>
- Description:
Creates a new project with the given project name and application type. The application type is either "search", "autoML", or "both". Currently, VecML CLI only supports search functionalities. - Example:
CREATE_PROJECT "ProjectAlpha" "search"
DELETE_PROJECT
- Usage:
DELETE_PROJECT <project_name>
- Description:
Deletes the specified project after a confirmation prompt. All the datasets and indices in the project will be deleted. This action cannot be undone. - Example:
DELETE_PROJECT "ProjectAlpha"
LIST_PROJECTS
- Usage:
LIST_PROJECTS
- Description:
Retrieves a list of all existing projects. - Example:
LIST_PROJECTS
DELETE_DATASET
- Usage:
DELETE_DATASET <project_name> <dataset_name>
- Description:
Deletes the specified dataset from a project. All the indices of the dataset will also be deleted. This action cannot be undone. - Example:
DELETE_DATASET "ProjectAlpha" "Dataset1"
DELETE_INDEX
- Usage:
DELETE_INDEX <project_name> <dataset_name> <index_name>
- Description:
Deletes the specified index from a dataset. - Example:
DELETE_INDEX "ProjectAlpha" "Dataset1" "IndexCosine"
LIST_DATASETS
- Usage:
LIST_DATASETS <project_name>
- Description:
Retrieves a list of all datasets within the specified project. - Example:
LIST_DATASETS "ProjectAlpha"
INIT_DATASET
- Usage:
INIT_DATASET <project_name> <dataset_name> <vector_type> [<vector_dim>]
- Description:
Creates a new dataset with the given project name, dataset name, and vector type. - Supported vector types:
- "dense": dense vectors with each entry a float32 value
- "dense8Bit", "dense4Bit", "dense2Bit", "dense1Bit": dense vectors with each entry a low-bit integer
- "sparse": sparse vector format where each vector contains multiple key:value pairs
Note: vector_dim is required for non-sparse vector types
- Example:
CREATE_DATASET "ProjectAlpha" "ImageClassification" "dense" "784"
INSERT_VECTOR
- Usage:
INSERT_VECTOR <project_name> <dataset_name> <vector_string_id> [a_1, a_2, ..., a_n] [options] INSERT_VECTOR <project_name> <dataset_name> <vector_string_id> [idx_1:a_1, idx_2:a_2, ..., idx_n:a_n] [options]
- Dense and Sparse Vectors: Use the first format to insert dense vectors and use the second format to insert sparse vectors.
- Description:
Inserts an individual vector into a dataset and updates all indices attached to that dataset. - Options:
--vector_attributes [attr1:value1, attr2:value2, ...]
: (Optional) Specify vector attributes.
- Example:
INSERT_VECTOR "ProjectAlpha" "Dataset1" "vector_001" [2, 1.3, 0, -2, -2.1111] --vector_attributes [Cover_Type:"Forest"] INSERT_VECTOR "ProjectAlpha" "Dataset2" "vector_001" [0:2, 997:-5.3, 7:-0.002]
INSERT_VECTOR_BATCH
- Usage:
INSERT_VECTOR_BATCH <project_name> <dataset_name> [string_id1, string_id2, ..., string_idn] [[<v_1>, <v_2>, ..., <v_m>], [<v_1>, <v_2>, ..., <v_m>], ..., [<v_1>, <v_2>, ..., <v_m>]] options INSERT_VECTOR_BATCH <project_name> <dataset_name> [string_id1, string_id2, ..., string_idn] [[<idx_1>:<v_1>, <idx_2>:<v_2>, ..., <idx_m>:<v_m>], [<idx_1>:<v_1>, <idx_2>:<v_2>, ..., <idx_m>:<v_m>], ..., [<idx_1>:<v_1>, <idx_2>:<v_2>, ..., <idx_m>:<v_m>]] options
- Dense and Sparse Vectors: Use the first format to insert dense vectors and use the second format to insert sparse vectors.
- Description:
Inserts multiple vectors into a dataset and updates all indices attached to that dataset. The size of the batched vectors should be smaller than 200MB. - Options:
--vector_attributes [[attr1:value1, attr2:value2, ...], [attr1:value1, attr2:value2, ...], ...]
: (Optional) Specify the attributes of the vectors.
- Example:
INSERT_VECTOR_BATCH projA another_test [test_id6, test_id7] [[1, 2, 3], [4, 5, 6]] --vector_attributes [[attribute:value, something: nothing], []]
DELETE_VECTOR
- Usage:
INSERT_VECTOR <project_name> <dataset_name> [string_id1, string_id2, ..., strin_idn]
-
Description:
Delete vectors from the specified dataset, and from all the indices attached to the dataset. -
Example:
DELETE_VECTOR ProjectAlpha Dataset1 [id1, id2, id3]
LIST_INDEX
- Usage:
LIST_INDEX <project_name> <dataset_name>
- Description:
Lists all indices attached to the specified dataset, along with details such as index name, and distance type. - Example:
LIST_INDEX "ProjectAlpha" "Dataset1"
SEARCH
- Usage:
SEARCH <project_name> <dataset_name> <index_name> <query_type> [search_options]
- Description:
Performs a nearest neighbor search on the specified index using given query parameters. - Query types and their options:
- vectors: Search using input query vectors.
- Option:
--query_vectors [[a_11, a_12, ..., a_1n], [a_21, a_22, ..., a_2n], ...]
- Option:
- file: Search using vectors provided in a file.
- Options:
--file_path "path/to/file"
,--file_type "csv"
(or other valid file type)
- Options:
- row_number: Search using an existing vector identified by its row number in the dataset.
- Option:
--row_number <number>
- Option:
- dataset: Search using vectors from another dataset.
- Option:
--query_dataset_name "QueryDataset"
- Option:
- vectors: Search using input query vectors.
NOTE: for option "file", please make sure the uploaded file has the same format (vector dimension, attributes) as the dataset in database.
- Additional Search Options:
--top_k <number>
: Set the number of nearest neighbors to return (default is 5).--filter_attribute "attribute"
: Filter based on a specified attribute.--filter_type "range"
or"equality"
: Type of filtering to apply.--start_value "value"
and--end_value "value"
: For range filtering, define the lexicographical bounds.--target_values [value1, value2, ...]
: For equality filtering, define the exact match.
- Example (vector search):
SEARCH "ProjectAlpha" "Dataset1" "IndexCosine" "vector" --query_vector [0.1, 0.2, 0.3, 0.4, 0.5] --top_k 10
UPLOAD
- Usage:
UPLOAD <project_name> <file_path> [upload_options]
- Description:
Uploads a file to the database. The file's content is processed and added to a dataset. If the specified dataset does not exist, it can be created. - Mandatory Option:
--dataset_name "name"
: The dataset to which the file will be added to.--file_format "csv" | "json" | "libsvm" | "binary"
: Specifies the file format.--vector_type "sparse" | "dense" | "dense1Bit" | "dense2Bit" | "dense4Bit" | "dense8Bit"
: Specifies the type of vector encoding.
- Conditional Options:
- For csv or json files:
--has_field_names true|false
: Indicates whether the file includes header field names (default is true).--vector_data_field_name "field_name"
: Required for JSON files or when the data contains field names.
- For binary files:
--binary_dim <number>
: Specifies the dimension of vectors in the file.--binary_dtype "uint8" | "float32"
: Specifies the data type in the binary file.
- For csv or json files:
-
Optional Options:
--attributes [attr1, attr2, ..., attrN]
: List of attribute columns/fields associated with the file’s vectors.--vector_id_field_name "field_name"
: Specifies the column/field that should be used as the vector ID.--string_ids_file_path "file_nath"
: Path to an extra white-space or newline separated .txt format file that specifies the UNIQUE string IDs of the vectors. The number of string IDs should equal the number of vectors in the data file. Ifstring_ids_file_path
is provided, it will override all other ID sources and become the exclusive source for unique IDs.
-
Example:
UPLOAD "ProjectAlpha" "data/covtype_small.csv" --dataset_name "NewDataset" --file_format csv --has_field_names true --vector_type dense --attributes [Cover_Type, Num_Type]
Additional Notes
-
Confirmation Prompts:
Deletion commands (DELETE_DATASET, DELETE_INDEX, DELETE_PROJECT) will prompt you to confirm the operation. Typey
orY
to proceed. -
Options Formatting:
Options are specified in the format--option_name value
(or with quotes if needed). Lists are denoted using square brackets (e.g.,[value1, value2, ...]
). -
Error Handling:
If required options are missing or if provided values (such as dimensions) are not in the expected format, the CLI will output an error message and abort the command.