Vector DB CLI 🚀

The VecML Command Line Tool (CLI) provides an easy-to-use interface to interact with your cloud VecML database. It allows the user to manage projects, datasets, indices, and perform searches using various methods throuogh command line.

Downloads, Setup, and Authentication (API key)

Download the CLI Executables:

Please download the executable for your system architecture and version.

Platform	Architecture	Version	Download Link
Linux	x86_64	Ubuntu 22	Download
Linux	x86_64	Ubuntu 20	Download
Windows	x86_64	Windows 10, 11	Download
MacOS	ARM	MacOS	Download

To use VecML CLI, simply run the executable.

Authentication (API key):

VecML CLI is authenticated through user's API key. The API key can be generated as follows:

Go to https://account.vecml.com/user-api-keys , sign up a free VecML account.
After registration, you can get an API key by clicking "Create New API Key". Now you can log in to VecML CLI with this API key.
This unique API key will only be seen once at the creation time, so please keep the API key safely. If you need to re-generate an API key, simply delete the previous one and then create a new key.

Recommanded: Set up the VecML API key as an environment variable to waive the need of copying it every time. Once configured, CLI will detect the key automatically when logging in.

Linux:

For bash:

echo 'export VECML_API_KEY="your_vecml_api_key"' >> ~/.bashrc
source ~/.bashrc

For Zsh:

echo 'export VECML_API_KEY="your_vecml_api_key"' >> ~/.zshrc
source ~/.zshrc

Windows:

setx VECML_API_KEY "your_vecml_api_key"

MacOS:

For bash:

echo 'export VECML_API_KEY="your_vecml_api_key"' >> ~/.bash_profile
source ~/.bash_profile

For Zsh:

echo 'export VECML_API_KEY="your_vecml_api_key"' >> ~/.zprofile
source ~/.zprofile

Commands Overview

HELP: Display a list of valid commands or command-specific help.
ATTACH_INDEX: Attach an index to a dataset within a project.
CREATE_PROJECT: Create a new project with a specified application.
DELETE_PROJECT: Delete an existing project (confirmation required).
LIST_PROJECTS: Retrieve a list of all projects.
DELETE_DATASET: Delete a dataset from a project (confirmation required).
DELETE_INDEX: Delete an index from a dataset (confirmation required).
LIST_DATASETS: Retrieve the list of all datasets in a project.
INIT_DATASET: Initialize/create a dataset.
INSERT_VECTOR: Insert an individual vector into a dataset (and update attached indices).
INSERT_VECTOR_BATCH: Insert a batch of vectors in one request. Highly recommended when inserting multiple vectors.
DELETE_VECTOR: delete vectors from data collection and all its attached indices.
LIST_INDEX: List all indices attached to a dataset along with their information.
SEARCH: Perform a nearest neighbor search using different query types.
UPLOAD: Upload a file to add vectors to a dataset.

Lower-case commands and naming

The CLI commands are called "reserved words", e.g., CREATE_PROJECT. For convenience, we also allow the use of lower case reserved words, e.g., create_project can also be used to create a new project. Please avoid including two reserved words in one command, i.e., try not to use reserved words as project, dataset, and index names. This may cause a parsing error.

Rate limit

Due to resource allocation constraints and service stability, VecML cloud has a limit on the number of operations per second. Please avoid high frequency operations. When inserting multiple vectors, please use INSERT_VECTOR_BATCH with a reasonably large batch size or UPLOAD with multiple data files/shards.

Batch insertion and file upload data size

Due to the consideration of network stability:

The size of a single UPLOAD request cannot exceed 1GB.
The size of a single INSERT_VECTOR_BATCH request cannot exceed 200MB.

Managing Unique IDs for vectors

Vectors are identified by UNIQUE string IDs. While VecML provide a way to maintain auto-generated string IDs internally, we highly encourage the users to maintain and specify unique IDs for the vectors, for the convenience of database operations. Multiple ways are available to specify the IDs, please check the data inserting functions for details.

An Example WorkFlow

With VecML API, managing vector database and searching for queries become easy. Below is a standard workflow using VecML RESTful API:

Call CREATE_PROJECT to create a new project.
Create a dataset within the project. Two convenient options are:

(i) Call INIT_DATASET to initialize/create a new dataset. Then use INSERT_VECTOR_BATCH to (iteratively) add batch of vectors to the dataset.

(ii) Upload a data file (supported types: csv, json, binary format, libsvm) as a dataset using UPLOAD. You can iteratively upload files to a data collection.
Build an index for the dataset using ATTACH_INDEX.
Conduct approximate nearest neighbor search using SEARCH.

Detailed Command Usage

HELP Commands

Usage:
- HELP
- <command> -h
Description:
Displays a list of all valid commands or, if used with a command flag, shows detailed instruction for that specific command.

ATTACH_INDEX

Usage:

ATTACH_INDEX <project_name> <dataset_name> <index_name> <dist_type> <large_index>

Description:
Attaches an index to a dataset within the specified project.
Valid Values for dist_type:
- "L2"
- "Cosine"
- "InnerProduct"
- "Hamming"
large_index: (bool, optional) whether to use advanced approach to optimize the performance for very large index. For large indices, we provide techniques to balance the query efficiency, accuracy, and memory usage. It is highly recommended to set large_index to true if your data collection has more than ~2M vectors.

Example:

ATTACH_INDEX ProjectAlpha Dataset1 IndexCosine Cosine false

CREATE_PROJECT

Usage:

CREATE_PROJECT <project_name> <application>

Description:
Creates a new project with the given project name and application type. The application type is either "search", "autoML", or "both". Currently, VecML CLI only supports search functionalities.
Example:
```
CREATE_PROJECT "ProjectAlpha" "search"
```

DELETE_PROJECT

Usage:
```
DELETE_PROJECT <project_name>
```
Description:
Deletes the specified project after a confirmation prompt. All the datasets and indices in the project will be deleted. This action cannot be undone.
Example:
```
DELETE_PROJECT "ProjectAlpha"
```

LIST_PROJECTS

Usage:
```
LIST_PROJECTS
```
Description:
Retrieves a list of all existing projects.
Example:
```
LIST_PROJECTS
```

DELETE_DATASET

Usage:

DELETE_DATASET <project_name> <dataset_name>

Description:
Deletes the specified dataset from a project. All the indices of the dataset will also be deleted. This action cannot be undone.

Example:

DELETE_DATASET "ProjectAlpha" "Dataset1"

DELETE_INDEX

Usage:

DELETE_INDEX <project_name> <dataset_name> <index_name>

Description:
Deletes the specified index from a dataset.

Example:

DELETE_INDEX "ProjectAlpha" "Dataset1" "IndexCosine"

LIST_DATASETS

Usage:
```
LIST_DATASETS <project_name>
```
Description:
Retrieves a list of all datasets within the specified project.
Example:
```
LIST_DATASETS "ProjectAlpha"
```

INIT_DATASET

Usage:

INIT_DATASET <project_name> <dataset_name> <vector_type> [<vector_dim>]

Description:
Creates a new dataset with the given project name, dataset name, and vector type.
Supported vector types:
- "dense": dense vectors with each entry a float32 value
- "dense8Bit", "dense4Bit", "dense2Bit", "dense1Bit": dense vectors with each entry a low-bit integer
- "sparse": sparse vector format where each vector contains multiple key:value pairs

Note: vector_dim is required for non-sparse vector types

Example:

CREATE_DATASET "ProjectAlpha" "ImageClassification" "dense" "784"

INSERT_VECTOR

Usage:

INSERT_VECTOR <project_name> <dataset_name> <vector_string_id> [a_1, a_2, ..., a_n] [options]
INSERT_VECTOR <project_name> <dataset_name> <vector_string_id> [idx_1:a_1, idx_2:a_2, ..., idx_n:a_n] [options]

Dense and Sparse Vectors: Use the first format to insert dense vectors and use the second format to insert sparse vectors.
Description:
Inserts an individual vector into a dataset and updates all indices attached to that dataset.
Options:
- --vector_attributes [attr1:value1, attr2:value2, ...]: (Optional) Specify vector attributes.

Example:

INSERT_VECTOR "ProjectAlpha" "Dataset1" "vector_001" [2, 1.3, 0, -2, -2.1111] --vector_attributes [Cover_Type:"Forest"]
INSERT_VECTOR "ProjectAlpha" "Dataset2" "vector_001" [0:2, 997:-5.3, 7:-0.002]

INSERT_VECTOR_BATCH

Usage:

  INSERT_VECTOR_BATCH <project_name> <dataset_name> [string_id1, string_id2, ..., string_idn] [[<v_1>, <v_2>, ..., <v_m>], [<v_1>, <v_2>, ..., <v_m>], ..., [<v_1>, <v_2>, ..., <v_m>]] options
  INSERT_VECTOR_BATCH <project_name> <dataset_name> [string_id1, string_id2, ..., string_idn] [[<idx_1>:<v_1>, <idx_2>:<v_2>, ..., <idx_m>:<v_m>], [<idx_1>:<v_1>, <idx_2>:<v_2>, ..., <idx_m>:<v_m>], ..., [<idx_1>:<v_1>, <idx_2>:<v_2>, ..., <idx_m>:<v_m>]] options

Dense and Sparse Vectors: Use the first format to insert dense vectors and use the second format to insert sparse vectors.
Description:
Inserts multiple vectors into a dataset and updates all indices attached to that dataset. The size of the batched vectors should be smaller than 200MB.
Options:
- --vector_attributes [[attr1:value1, attr2:value2, ...], [attr1:value1, attr2:value2, ...], ...]: (Optional) Specify the attributes of the vectors.

Example:

  INSERT_VECTOR_BATCH projA another_test [test_id6, test_id7] [[1, 2, 3], [4, 5, 6]] --vector_attributes [[attribute:value, something: nothing], []]

DELETE_VECTOR

Usage:

INSERT_VECTOR <project_name> <dataset_name> [string_id1, string_id2, ..., strin_idn]

Description:
Delete vectors from the specified dataset, and from all the indices attached to the dataset.

Example:

DELETE_VECTOR ProjectAlpha Dataset1 [id1, id2, id3]

LIST_INDEX

Usage:

LIST_INDEX <project_name> <dataset_name>

Description:
Lists all indices attached to the specified dataset, along with details such as index name, and distance type.
Example:
```
LIST_INDEX "ProjectAlpha" "Dataset1"
```

SEARCH

Usage:

SEARCH <project_name> <dataset_name> <index_name> <query_type> [search_options]

Description:
Performs a nearest neighbor search on the specified index using given query parameters.
Query types and their options:
- vectors: Search using input query vectors.
  - Option: --query_vectors [[a_11, a_12, ..., a_1n], [a_21, a_22, ..., a_2n], ...]
- file: Search using vectors provided in a file.
  - Options: --file_path "path/to/file", --file_type "csv" (or other valid file type)
- row_number: Search using an existing vector identified by its row number in the dataset.
  - Option: --row_number <number>
- dataset: Search using vectors from another dataset.
  - Option: --query_dataset_name "QueryDataset"

NOTE: for option "file", please make sure the uploaded file has the same format (vector dimension, attributes) as the dataset in database.

Additional Search Options:
- --top_k <number>: Set the number of nearest neighbors to return (default is 5).
- --filter_attribute "attribute": Filter based on a specified attribute.
- --filter_type "range" or "equality": Type of filtering to apply.
- --start_value "value" and --end_value "value": For range filtering, define the lexicographical bounds.
- --target_values [value1, value2, ...]: For equality filtering, define the exact match.

Example (vector search):

SEARCH "ProjectAlpha" "Dataset1" "IndexCosine" "vector" --query_vector [0.1, 0.2, 0.3, 0.4, 0.5] --top_k 10

UPLOAD

Usage:

UPLOAD <project_name> <file_path> [upload_options]

Description:
Uploads a file to the database. The file's content is processed and added to a dataset. If the specified dataset does not exist, it can be created.
Mandatory Option:
- --dataset_name "name": The dataset to which the file will be added to.
- --file_format "csv" | "json" | "libsvm" | "binary": Specifies the file format.
- --vector_type "sparse" | "dense" | "dense1Bit" | "dense2Bit" | "dense4Bit" | "dense8Bit": Specifies the type of vector encoding.
Conditional Options:
- For csv or json files:
  - --has_field_names true|false: Indicates whether the file includes header field names (default is true).
  - --vector_data_field_name "field_name": Required for JSON files or when the data contains field names.
- For binary files:
  - --binary_dim <number>: Specifies the dimension of vectors in the file.
  - --binary_dtype "uint8" | "float32": Specifies the data type in the binary file.
Optional Options:
- --attributes [attr1, attr2, ..., attrN]: List of attribute columns/fields associated with the file’s vectors.
- --vector_id_field_name "field_name": Specifies the column/field that should be used as the vector ID.
- --string_ids_file_path "file_nath": Path to an extra white-space or newline separated .txt format file that specifies the UNIQUE string IDs of the vectors. The number of string IDs should equal the number of vectors in the data file. If string_ids_file_path is provided, it will override all other ID sources and become the exclusive source for unique IDs.

Example:

UPLOAD "ProjectAlpha" "data/covtype_small.csv" --dataset_name "NewDataset" --file_format csv --has_field_names true --vector_type dense --attributes [Cover_Type, Num_Type]

Additional Notes

Confirmation Prompts:
Deletion commands (DELETE_DATASET, DELETE_INDEX, DELETE_PROJECT) will prompt you to confirm the operation. Type y or Y to proceed.
Options Formatting:
Options are specified in the format --option_name value (or with quotes if needed). Lists are denoted using square brackets (e.g., [value1, value2, ...]).
Error Handling:
If required options are missing or if provided values (such as dimensions) are not in the expected format, the CLI will output an error message and abort the command.