Vector DB API

This document describes each endpoint of VecML RESTful API, including:

Endpoint URL
Expected Request Body (JSON)
Response Format (JSON)
Relevant Details or Constraints
Example Requests / Responses

All RESTful API calls to VecML cloud database should be sent to: https://aidb.vecml.com/api. All endpoints require a JSON body unless otherwise noted.

Rate and request size limit

For resource allocation and server stability, the server enforces a rate limit on the number of API calls per second. If exceeded, the server responds with:

400 Bad Request
Request limit reached, please try again later

Please avoid high-frequency API calls. For example, when inserting multiple vectors, please use /add_data_batch instead of iteratively calling /add_data.

The size of each request cannot pass 200MB.

Managing Unique IDs for vectors

Vectors are identified by UNIQUE string IDs. While VecML provide a way to maintain auto-generated string IDs internally, we highly encourage the users to maintain and specify unique IDs for the vectors, for the convenience of database operations. Multiple ways are available to specify the IDs, please check the data inserting functions for details.

Authentication (API key):

VecML RESTful API requests are authenticated through user's API key. The API key can be generated as follows:

Go to https://account.vecml.com/user-api-keys , sign up a free VecML account.
After registration, you can get an API key by clicking "Create New API Key".
This unique API key will only be seen once at the creation time, so please keep the API key safely. If you need to re-generate an API key, simply delete the previous one and then create a new key.

An Example WorkFlow

With VecML API, managing vector database and searching for queries become easy. Below is a standard workflow using VecML RESTful API:

Call /create_project to create a new project.
Create a dataset (or collection, interchangably) within the project. Two convenient options are:

(i) Call /init to initialize/create a new dataset. Then use /add_data_batch to (iteratively) add batch of vectors to the dataset.

(ii) Upload a data file (supported types: csv, json, binary format, libsvm) as a dataset using /upload. You can iteratively upload files to a data collection.
Build an index for the dataset using /attach_index. Note: Before indexing, please call /get_upload_data_status to confirm that the data insertion has finished. Otherwise, indexing an actively incrementing dataset may cause unexpected behavior.
Conduct approximate nearest neighbor search using /search. Note: Before search, please call /get_attach_index_status to confirm the index construction has finished. Otherwise, search might not return full results.

The following is an example code to demonstrate the workflow. The code is in Python, Javascript, and cURL.

PythonJavaScriptcURL

import requests
import json
import numpy as np
import time

# Configuration
API_KEY = "replace_this_with_your_api_key"
BASE_URL = "https://aidb.vecml.com/api"

def make_request(endpoint, data):            # helper function to make API calls
    url = f"{BASE_URL}/{endpoint}"
    response = requests.post(url, json=data)
    print(f"Request to {endpoint}: HTTP {response.status_code}")
    if response.text:
        json_response = response.json()
        return response.status_code, json_response
    else:
        return response.status_code, None

# 1. Create a project
project_data = {"user_api_key": API_KEY, "project_name": "RAG-example", "application": "search"}
status, response = make_request("create_project", project_data)

# 2. Initialize dataset
init_data = {"user_api_key": API_KEY, "project_name": "RAG-example", "collection_name": "embeddings",
             "vector_type": "dense", "vector_dim": 128 }
status, response = make_request("init", init_data)

# 3. Generate and add 100 random vectors with /add_data_batch
vectors = np.random.randn(100, 128).tolist()                             # some random vectors
string_ids = ['ID_' + str(i) for i in range(100)]                        # dummy ids for each vector
attributes = [{"text": f"example text {i}"} for i in range(100)]         # attributes of each vector, which can be arbitrary

batch_data = {"user_api_key": API_KEY, "project_name": "RAG-example", "collection_name": "embeddings",
             "string_ids": string_ids, "data": vectors, "attributes": attributes }
status, response = make_request("add_data_batch", batch_data)

# 4. Attach index with L2 distance
index_data = {"user_api_key": API_KEY, "project_name": "RAG-example", "collection_name": "embeddings",
              "index_name": "L2_index", "dist_type": "L2" }
status, response = make_request("attach_index", index_data)
index_job_id = response["job_id"]
print(f"Got index job ID: {index_job_id}")

# 5. Checking the indexing job status every 1 sec, query after the job is done
print("Waiting for indexing to complete...")
max_wait_time = 20
start_time = time.time()
while True:
    status_data = {"user_api_key": API_KEY, "job_id": index_job_id}
    status, status_response = make_request("get_attach_index_status", status_data)

    if status_response.get("status") == "finished":
        print("Indexing completed successfully")
        break

    if time.time() - start_time > max_wait_time:
        print("Server busy - indexing timeout after 20 seconds")
        exit(1)

    print("Indexing in progress, checking again...")
    time.sleep(1)

# 6. Search with a random query vector
query_vectors = np.random.randn(2, 128).tolist()     # query vector
search_data = {
    "user_api_key": API_KEY, "project_name": "RAG-example", "collection_name": "embeddings",
    "index_name": "L2_index", "query_type": "vectors",
    "top_k": 5, "query_vectors": query_vectors
}
status, search_results = make_request("search", search_data)

print("Search results:")
for result in search_results["results"]:
    matches = result["matches"]
    query_id = result["query_vector_id"]

    print("Query id: ", query_id)
    for pair in matches:
      print(f"ID: {pair['idx']}, L2 Distance: {pair['distance']}")

const fetch = require('node-fetch');

// Configuration
const API_KEY = "replace_this_with_your_api_key";
const BASE_URL = "https://aidb.vecml.com/api";

// Helper function to make API calls
async function makeRequest(endpoint, data) {
    const url = `${BASE_URL}/${endpoint}`;
    const response = await fetch(url, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify(data)
    });
    console.log(`Request to ${endpoint}: HTTP ${response.status}`);
    // If the response is not empty (204 = No Content), parse JSON
    if (response.status !== 204) {
        return [response.status, await response.json()];
    }
    return [response.status, null];
}

// Main function to run the workflow
async function runWorkflow() {
    try {
        // 1. Create a project
        const projectData = { "user_api_key": API_KEY, "project_name": "RAG-example", "application": "search" };
        let [status, response] = await makeRequest("create_project", projectData);

        // 2. Initialize dataset
        const initData = { "user_api_key": API_KEY, "project_name": "RAG-example", "collection_name": "embeddings",
                           "vector_type": "dense", "vector_dim": 128 };
        [status, response] = await makeRequest("init", initData);

        // 3. Generate and add 100 random vectors with /add_data_batch
        //    (uniform in [-1, 1], but that's okay for demonstration)
        const vectors = Array(100).fill().map(() =>
            Array(128).fill().map(() => (Math.random() * 2 - 1))
        );
        const stringIds = Array(100).fill().map((_, i) => `ID_${i}`);
        const attributes = Array(100).fill().map((_, i) => ({"text": `example text ${i}`}));

        const batchData = { "user_api_key": API_KEY, "project_name": "RAG-example", "collection_name": "embeddings",
                            "string_ids": stringIds, "data": vectors, "attributes": attributes };
        [status, response] = await makeRequest("add_data_batch", batchData);

        // 4. Attach index with L2 distance
        const indexData = { "user_api_key": API_KEY, "project_name": "RAG-example", "collection_name": "embeddings",
                            "index_name": "L2_index", "dist_type": "L2" };
        [status, response] = await makeRequest("attach_index", indexData);
        const indexJobId = response.job_id;
        console.log(`Got index job ID: ${indexJobId}`);

        // 5. Checking the indexing job status every 1 sec, query after the job is done
        console.log("Waiting for indexing to complete...");
        const maxWaitTime = 20; // seconds
        const startTime = Date.now();
        while (true) {
            const statusData = {
                "user_api_key": API_KEY,
                "job_id": indexJobId
            };
            [status, response] = await makeRequest("get_attach_index_status", statusData);

            if (response && response.status === "finished") {
                console.log("Indexing completed successfully");
                break;
            }
            if ((Date.now() - startTime) / 1000 > maxWaitTime) {
                console.log("Server busy - indexing timeout after 20 seconds");
                process.exit(1);
            }
            console.log("Indexing in progress, checking again...");
            await new Promise(resolve => setTimeout(resolve, 1000));
        }

        // 6. Search with a random query vector
        const queryVector = Array(128).fill().map(() => (Math.random() * 2 - 1));
        const searchData = { "user_api_key": API_KEY, "project_name": "RAG-example", "collection_name": "embeddings",
                             "index_name": "L2_index", "query_type": "vector", "top_k": 5, "query_vector": queryVector };
        let [_, searchResults] = await makeRequest("search", searchData);

        console.log("Search results:");
        searchResults.results.forEach(result => {
            console.log(`ID: ${result.idx}, L2 Distance: ${result.distance}`);
        });
    } catch (error) {
        console.error("Error in workflow:", error);
    }
}

runWorkflow();

#!/bin/bash

# Configuration
API_KEY="replace_this_with_your_api_key"
BASE_URL="https://aidb.vecml.com/api"

# Helper function to make requests
make_request() {
    local endpoint=$1
    local data=$2
    echo "Request to $endpoint"
    response=$(curl -s -X POST "$BASE_URL/$endpoint" \
        -H "Content-Type: application/json" \
        -d "$data")
    echo "$response"
    echo "$response" > last_response.json
}

# 1. Create a project
project_data='{ "user_api_key": "'"$API_KEY"'", "project_name": "RAG-example", "application": "search" }'
make_request "create_project" "$project_data"

# 2. Initialize dataset (128-dimensional)
init_data='{ "user_api_key": "'"$API_KEY"'", "project_name": "RAG-example", "collection_name": "embeddings",
             "vector_type": "dense", "vector_dim": 8 }'
make_request "init" "$init_data"

############################################
# 3. Add vectors (3 vectors, each 128D).
#    You can expand to more vectors if needed.
############################################

# Example of 3 distinct 128D vectors with placeholder floats:
# (We just fill them mostly with 0.0 for brevity.)
# Make sure each vector has exactly 128 elements!

batch_data='{
  "user_api_key": "'"$API_KEY"'", "project_name": "RAG-example", "collection_name": "embeddings",
  "string_ids": ["ID_0", "ID_1", "ID_2"],
  "data": [
    [0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80],
    [-0.10, -0.20, -0.30, -0.40, -0.50, -0.60, -0.70, -0.80],
    [0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.05, -0.05]
  ],
  "attributes": [
    { "text": "example text 0" },
    { "text": "example text 1" },
    { "text": "example text 2" }
  ]
}'
make_request "add_data_batch" "$batch_data"

# 4. Attach index with L2 distance
index_data='{ "user_api_key": "'"$API_KEY"'", "project_name": "RAG-example", "collection_name": "embeddings",
              "index_name": "L2_index", "dist_type": "L2" }'
make_request "attach_index" "$index_data"

# Extract job ID from the response
index_job_id=$(grep -o '"job_id":"[^"]*"' last_response.json | cut -d'"' -f4)
echo "Got index job ID: $index_job_id"

# 5. Check indexing status with a 20-second timeout
echo "Waiting for indexing to complete..."
start_time=$(date +%s)
max_wait_time=20
while true; do
    status_data='{ "user_api_key": "'"$API_KEY"'", "job_id": "'"$index_job_id"'" }'
    make_request "get_attach_index_status" "$status_data"
    current_status=$(grep -o '"status":"[^"]*"' last_response.json | cut -d'"' -f4)
    if [ "$current_status" = "finished" ]; then
        echo "Indexing completed successfully"
        break
    fi
    current_time=$(date +%s)
    elapsed=$((current_time - start_time))
    if [ $elapsed -gt $max_wait_time ]; then
        echo "Server busy - indexing timeout after $max_wait_time seconds"
        exit 1
    fi
    echo "Indexing in progress, checking again..."
    sleep 1
done

# 6. Search with a 128D query vector
# (Here we just pick some arbitrary 128D vector.)
search_data='{ "user_api_key": "'"$API_KEY"'", "project_name": "RAG-example", "collection_name": "embeddings",
               "index_name": "L2_index", "query_type": "vector", "top_k": 5,
               "query_vector": [0.12, 0.20, 0.33, -0.15, 0.40, 0.11, 0.05, 0.06] }'
make_request "search" "$search_data"

# Parse and display search results from last_response.json
echo "Search results:"
# Extract all IDs and distances into arrays
ids=($(grep -o '"idx":"[^"]*"' last_response.json | cut -d'"' -f4))
distances=($(grep -o '"distance":[0-9.\-e]*' last_response.json | cut -d':' -f2))

# Process them using array indexing
for i in ${!ids[@]}; do
    echo "ID: ${ids[$i]}, L2 Distance: ${distances[$i]}"
done

1. `/add_data`

Description: Attaches a vector and any attributes to the dataset under a specified string_id.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "collection_name": "string",
  "string_id": "string",
  "data":   // For "dense" or bit-packed dense => array of floats or array of uint8
           // For "sparse" => array of [ idx, val ] pairs
           // Example: [0.1, 0.2, ...] or [[1, 2.0], [5, -1.2]]
  "attributes": {
    "key1": "value1",
    "key2": "value2",
    ...
  }
}

attributes is optional. If present, each key is an attribute name and its value is a string.

Response Success Response (200 OK):

{
  "error_code": "Success"
}

Error Responses: 400 status code with different error messages.

Notes

If the dataset is not yet initialized, this call attempts to initialize it (provided the dataset metadata is known).
Fails if dimension/type mismatch occurs.

Example

POST /add_data
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "collection_name": "MyDataset",
  "string_id": "Sample_123",
  "data": [0.25, 0.75, 1.25, 2.00],
  "attributes": {
    "Label": "CatA",
    "Timestamp": "2025-01-01 10:00:00"
  }
}

Response:

{
  "error_code": "Success"
}

2. `/add_data_batch`

Description: Inserts many vectors at once, a batched version of \add_data. Expects arrays of string_ids, data, and attributes with the same length.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "collection_name": "string",
  "string_ids": [
    "id_1",
    "id_2",
    ...
  ],
  "data": [
    // for each index i, this is a dense array or sparse pairs
    [...],
    ...
  ],
  "attributes": [
    // array of objects, parallel to data
    {"attr1": "val1", ...},
    {"attr2": "val2", ...},
    ...
  ]
}

- "attributes" is optional. If present, each key is an attribute name and its value is a string.

All arrays must have the same length. The i-th entry in string_id matches the i-th entry in data and attributes.

Response

{
  "success": true,
  "job_id": "string",
  "checksum_server": "string",
  "error_message": "none"
}

The checksum_server is computed for the data field (dumped).

Example

POST /add_data_batch
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "collection_name": "BatchDS",
  "string_ids": ["row1", "row2"],
  "data": [
    [0.25, 0.75],
    [0.6, 0.3]
  ],
  "attributes": [
    {"date": "A"},
    {"category": "B"}
  ]
}

3. `/attach_index`

Description: Creates an ANN (Approximate Nearest Neighbor) index for the given dataset. The indexing job runs asynchronously. You receive a job_id for tracking progress via /get_attach_index_status. Once attached, whenever new data are added to or removed from the data collection, the index will be updated automatically.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "collection_name": "string",
  "index_name": "string",     // must be a valid name, <=128 chars, no invalid symbols
  "index_type" : "string",    // "standard" or "fast_indexing"
  "dist_type": "string"       // e.g., "L2", "Cosine", "Inner Product", "Hamming"
  "large_index" : bool
}

index_type: standard or fast_indexing. Standard index has moderate indexing time and extremely fast query speed. Fast index builds the index very efficiently, but query speed is slower. Choose based on the application scenario.
large_index: (bool, optional for index_type = standard) whether to use advanced approach to optimize the performance for very large standard index. For large indices, we provide techniques to balance the query efficiency, accuracy, and memory usage. It is highly recommended to set large_index to true if index_type = standard and your data collection has more than ~2M vectors.

Response

{
  "job_id": "string",
  "success": true
}

job_id can be passed to /get_attach_index_status to see if the index build is finished.

Example

POST /attach_index
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "collection_name": "MyDataset",
  "index_name": "index1",
  "dist_type": "Cosine"
}

Response:

{
  "job_id": "api_key_123_ProjectA_MyDataset_index1_AttachIndexJob",
  "success": true
}

4. `/create_project`

Description: Creates a new project for a given user.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "application": "string" // e.g., "autoML", "both", or any descriptive text
}

The project_name must be unique and valid.
application is an internal field describing how the project will be used, select from search, autoML, or both. Currently for Restful API, only search functionalities are available.

Response

{
  "success": true
}

Example

POST /create_project
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "application": "search"
}

Response:

{
  "success": true
}

5. `/delete_dataset`

Description: Deletes an existing dataset from a project. This action cannot be undone.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "dataset_name": "string"
}

Response

{
  "success": true,
  "error_code": "Success"
}

Example

POST /delete_dataset
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "dataset_name": "ObsoleteData"
}

6. `/delete_index`

Description: Deletes a specific index from the dataset. This action cannot be undone.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "dataset_name": "string",
  "index_name": "string"
}

Response

{
  "success": true,
  "error_code": "Success"
}

Example

POST /delete_index
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "dataset_name": "MyDataset",
  "index_name": "index1"
}

7. `/delete_project`

Description: Deletes an existing project and all of its datasets. This action cannot be undone.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string"
}

Response

{
  "success": true,
  "error_code": "Success" // or another code if failure
}

Example

POST /delete_project
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA"
}

8. `/fetch_datasets`

Description: Fetches metadata for all datasets within a given project.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string"
}

Response

{
  "success": true,
  "datasets": [
    {
      "name": "...",
      "vector_type": "...",
      "num_vectors": 1000,
      "vector_dim": 128,
      "bytes": 1234567,
      "create_time": "2025/01/02, 15:30",
      ...
    },
    ...
  ]
}

Example

POST /fetch_datasets
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA"
}

9. `/fetch_projects`

Description: Returns a list of all project(s) belonging to a user.

Method - POST

Request Body

{
  "user_api_key": "string"
}

Response

{
  "success": true,
  "projects": [
    {
      "project_name": "...",
      "application":  "...",
      ...
    },
    ...
  ]
}

Example

POST /fetch_projects
Content-Type: application/json

{
  "user_api_key": "api_key_123"
}

Response:

{
  "success": true,
  "projects": [
    {
      "project_name": "ProjectA",
      "application": "search"
    },
    {
      "project_name": "ProjectB",
      "application": "autoML"
    }
  ]
}

10. `/get_attach_index_status`

Description: Returns the current status of an index-building job.

Method - POST

Request Body

{
  "user_api_key": "string", 
  "job_id": "string"
}

Response

{
  "success": true,
  "status": "pending" | "in_progress" | "finished" | "failed",
  "error": "Indexing job failed"  // (optional if status == "failed")
}

Example

POST /get_attach_index_status
Content-Type: application/json

{ 
  "user_api_key": "api_key_123",
  "job_id": "my_job_id"
}

Response:

{
  "success": true,
  "status": "in_progress"
}

11. `/get_upload_data_status`

Description: Checks the status of a data-upload job initiated by /upload or /add_data_batch.

Method - POST

Request Body

{
  "user_api_key": "string",
  "job_id": "string"
}

Response

{
  "success": true | false,
  "status": "pending" | "in_progress" | "finished" | "failed",
  "error_message": "..."
}

Example

POST /get_upload_data_status
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "job_id": "my_job_id"
}

Response:

For data insertion job was initialized by /UPLOAD:

{
  "success": true,
  "status": "in_progress",
  "error_message": ""
}

For data insertion job was initialized by /ADD_DATA_BATCH:

If all vectors are added to the data collection successfully:

{
  "success": true,
  "status": "in_progress",
  "error_message": ""
}

Otherwise, (for example, two vectors failed to be inserted),

{
  "success": true,
  "status": "in_progress",
  "error_message": "The following vectors failed to be inserted: [
                         {"body":"{"error_code":"VectorAlreadyExists"}","string_id":"ID_0"}
                         {"body":"{"error_code":"VectorAlreadyExists"}","string_id":"ID_1"}
                         ]" 
}

12. `/get_user_jobs`

Description: Retrieves all jobs (upload, indexing, and training) associated with the authenticated user. This endpoint provides a comprehensive overview of all job types with their current status, timing information, and any relevant metadata or error messages. The following jobs will be returned:

All pending and in_progress jobs
finished and failed jobs that are started within 24 hours

Method - POST

Request Body

{
  "user_api_key": "string"
}

Response

{
  "upload_jobs": [
    {
      "job_id": "string",
      "status": "pending" | "in_progress" | "finished" | "failed",
      "start_time": "string",
      "error_message": "string"  // optional, only if error occurred
    }
  ],
  "indexing_jobs": [
    {
      "job_id": "string", 
      "status": "pending" | "in_progress" | "finished" | "failed",
      "start_time": "string",
      "error_message": "string"  // optional, only if error occurred
    }
  ],
  "training_jobs": [
    {
      "job_id": "string",
      "status": "pending" | "in_progress" | "finished" | "failed", 
      "model_name": "string",
      "task_type": "string",
      "label_attribute": "string",
      "start_time": "string",
      "validation_metric": {  // optional, only for finished training jobs
        ...
      }
    }
  ],
  "summary": {
    "total_upload_jobs": int,
    "total_indexing_jobs": int, 
    "total_training_jobs": int
  }
}

Example

POST /get_user_jobs
Content-Type: application/json

{
  "user_api_key": "api_key_123"
}

Response:

{
  "upload_jobs": [
    {
      "job_id": "user123_ProjectA_MyDataset_UploadJob",
      "status": "finished",
      "start_time": "2024-01-15 10:30:00"
    }
  ],
  "indexing_jobs": [
    {
      "job_id": "user123_ProjectA_MyDataset_index1_AttachIndexJob", 
      "status": "in_progress",
      "start_time": "2024-01-15 11:00:00"
    }
  ],
  "training_jobs": [
    {
      "job_id": "user123_model_training_job",
      "status": "finished",
      "model_name": "my_classifier",
      "task_type": "classification",
      "label_attribute": "category",
      "start_time": "2024-01-15 09:00:00",
      "validation_metric": {
        "accuracy": 0.95,
        "micro_precision": 0.94,
        "macro_precision": 0.93,
        "micro_recall": 0.95,
        "macro_recall": 0.92,
        "micro_f1": 0.94,
        "macro_f1": 0.92,
        "auc": 0.96
      }
    }
  ],
  "summary": {
    "total_upload_jobs": 1,
    "total_indexing_jobs": 1,
    "total_training_jobs": 1
  }
}

13. `/init`

Description: Initializes (or re-initializes) an empty dataset in the user's project. If the dataset does not exist, it is created. If the dataset already exists and the specified parameters match the existing configuration, it re-initializes that dataset in memory.

Methods - POST

Request Body Parameters

{
  "user_api_key": "string",
  "project_name": "string",
  "collection_name": "string",
  "vector_type": "string",   // "dense" or "sparse" (no other values supported)
  "vector_dim": int          // Required if vector_type != "sparse"; must be > 0
}

- user_api_key: The user's API key.

project_name: The name of the project to which the dataset belongs. Must already exist.
collection_name: The name of the dataset/collection to initialize.
vector_type: "dense", "dense8Bit", "dense4Bit", "dense2Bit" ,"dense1Bit", or "sparse".
vector_dim: Positive integer dimension for "dense", "dense8Bit", "dense4Bit", "dense2Bit", or "dense1Bit". Not used for "sparse".

Response Success Response (200 OK):

{
  "error_code": "Success"
}

Error Responses: 400 status code with different error messages.

Example

POST /init
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "collection_name": "MyDataset",
  "vector_type": "dense",
  "vector_dim": 100
}

Response:

{
  "error_code": "Success"
}

or, code 400 with (an example failure message)

"The dataset already exists, but the input vector dim does not match the existing vector dim."

14. `/list_all_items`

Description: Retrieves a complete hierarchical structure of all projects, datasets, and indices for a user. Returns organized data showing the relationship between projects, their datasets, and associated indices.

Method - POST

Request Body

{
  "user_api_key": "string"
}

Request Fields - user_api_key: Valid API key for authentication

Response Success Response (200 OK):

{
  "success": true,
  "projects": [
    {
      "project_name": "string",
      "project_id": "string", 
      "application": "string",
      "datasets": [
        {
          "name": "string",
          "type": "string",
          "size": 1024,
          "create_time": "string",
          "last_insert_time": "string",
          "dimension": "(1000, 128)",
          "indices": [
            {
              "name": "string",
              "distance_type": "string",
              "type": "string"
            }
          ]
        }
      ]
    }
  ]
}

Response Fields

success: Boolean indicating operation success
projects: Array of project objects containing:
project_name: Name of the project
project_id: Internal ID of the project
application: Application type or category
datasets: Array of dataset objects containing:
- name: Dataset name
- type: Vector type (e.g., "float32", "int8")
- size: Dataset size in bytes
- create_time: When the dataset was created
- last_insert_time: Last time data was inserted
- dimension: Format "(num_vectors, vector_dimension)"
- indices: Array of index objects containing:
- name: Index name
- distance_type: Distance metric used (e.g., "cosine", "euclidean")

Notes

Returns complete hierarchy in a single request
Useful for dashboard views and project overview
Dataset size is returned in bytes; convert to KB/MB/GB as needed
Empty arrays are returned for projects/datasets with no children
Rate limiting applies to this endpoint

Example

POST /list_all_items
Content-Type: application/json

{
  "user_api_key": "api_key_123"
}

Response:

{
  "success": true,
  "projects": [
    {
      "project_name": "RecommendationEngine",
      "project_id": "proj_456",
      "application": "Search",
      "datasets": [
        {
          "name": "user_embeddings",
          "type": "float32",
          "size": 2048576,
          "create_time": "2025-01-15 10:30:00",
          "last_insert_time": "2025-01-20 14:45:00",
          "dimension": "(10000, 128)",
          "indices": [
            {
              "name": "user_similarity_index",
              "distance_type": "cosine",
            }
          ]
        },
        {
          "name": "item_features",
          "type": "float32", 
          "size": 1572864,
          "create_time": "2025-01-15 11:00:00",
          "last_insert_time": "2025-01-19 09:30:00",
          "dimension": "(5000, 256)",
          "indices": []
        }
      ]
    }
  ]
}

15. `/list_index_infos`

Description: Lists all indexes that have been attached to a particular dataset.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "dataset_name": "string"
}

Response

{
  "success": true,
  "index_infos": [
    [ "index_name", "index_type", "distance_type" ],
    ...
  ]
}

Example

POST /list_index_infos
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "dataset_name": "MyDataset"
}

Response:

{
  "success": true,
  "index_infos": [
    ["index1", "Cosine"]
  ]
}

16. `/remove_data`

Description: Remove vectors from the dataset and its indices based on a list of string ids.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "collection_name": "string",
  "string_ids": ["id1", "id2"]
}

Response Success Response (200 OK):

{
  "error_code": "Success"
}

Error Responses: 400 status code with detailed error messages.

Example

POST /remove_data
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "collection_name": "MyDataset",
  "string_ids": ["Sample_1", "Sample_2"]
}

17. `/search`

Description: Performs a nearest-neighbor vector search using a built index, with optional filter conditions. This endpoint supports:

Searching by providing a list of raw query vectors (dense or sparse).
Searching by referencing an existing row in the dataset (a row_number).
Searching via a file containing multiple vectors to be searched. Please make sure your file have the same format (vector dimension, attributes) as the database dataset.
Searching an entire dataset (query_dataset_name) for batch queries.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "collection_name": "string",
  "query_type": "vectors" | "row" | "file" | "dataset",
  "index_name": "string",
  "top_k": int,
  "filter": {
     // optional filter structure, see below
  },

  // If query_type == "vectors":
  "query_vectors": [[0.1, -0.2, 0.3], [0, 1, 10]],                 // dense-type vectors
                   [[0:0.1, 4:5.2], [2:0, 9:0.3, 20:0.4]]          // sparse vectors

  // If query_type == "row":
  "row_number": 42,

  // If query_type == "file":
  "file_data": "base64-encoded data",
  "file_name": "string",
  "file_format": "json" | "csv" | "libsvm" | "binary",
  // server uses dataset config to parse (field names, etc.)

  // If query_type == "dataset":
  "query_dataset_name": "string"
}

Filter Structure

"filter": {
  "type": "equality" | "range",
  "attribute": "attrName",

  // equality case:
  "target_values": ["someValue1", "someValue2", ...]

  // or range case:
  "start_value": "A",
  "end_value": "Z"
}

If filter is null or incomplete, no filtering is applied.
An equality filter returns items whose attribute is in the set of target values.
A range filter returns items whose attribute is between start_value and end_value (inclusive). For strings, lexical ordering is used. For numeric attributes, the system uses numeric comparison if both are parseable as numbers.

Response - For query_type: "row":

{
  "error_code": "Success",
  "results": [
    {"idx": "string_id_of_match", "distance": floatValue},
    ...
  ]
}

- For query_type: "vectors" or "dataset" or "file":

{
  "error_code": "Success",
  "results": [
    {
      "query_vector_id": "...",   // Typically the row index or custom ID from the file
      "matches": [
        {"idx": "string_id_of_match", "distance": floatValue},
        ...
      ]
    },
    ...
  ]
}

Examples

Query with a direct vector

POST /search
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "collection_name": "MyDataset",
  "query_type": "vectors",
  "index_name": "index1",
  "top_k": 5,
  "query_vectors": [[0.25, 0.75, 1.25, 2.00]],       // use one vector in the example
  "filter": {
    "type": "equality",
    "attribute": "Category",
    "target_values": ["CatA","CatC"]
  }
}

Response:

{
  "error_code": "Success",
  "results": [
    "query_vector_id": "0",
    "matches": [
      {"idx": "Sample_01", "distance": 0.3},
      {"idx": "Sample_07", "distance": 0.8},
      ...
    ]
  ]
}

Query referencing an existing row

{
  "query_type": "row",
  "row_number": 123,
  ...
}

Batch file-based query

{
  "query_type": "file",
  "file_data": "base64encodedFile",
  "file_name": "Queries.json",
  "file_format": "json",
  ...
}

Dataset-to-dataset query

{
  "query_type": "dataset",
  "query_dataset_name": "QueryVectorsDS",
  ...
}

18. `/upload`

Description: Uploads a file containing dataset vectors (CSV, JSON, libsvm, binary format, etc.) and creates a new dataset with those vectors. The upload runs asynchronously; you receive a job_id to query via /get_upload_data_status.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "dataset_name": "string",   // The new dataset's name
  "file_data": "string",   // base64-encoded data (optionally gzip compressed)
  "file_format": "string", // "csv", "json", "libsvm", "binary"
  "has_field_names": true | false,
  "vector_type": "dense" | "dense1Bit" | "dense2Bit" | "dense4Bit" | "dense8Bit" | "sparse",

  // If JSON with field names:
  "vector_data_field_name": "string",
  "vector_id_field_name": "string",
  "vector_attributes": ["attr1", "attr2", ...],

  // If "binary" file_format:
  "binary_dtype": "uint8" | "float32",   // for binary format data, you can set "vector_type" to be dense. The actuall dtype uses "binary_dtype"
  "binary_dim": 123,

  // Optional string ID txt file
  "string_id_txt": "string"     // base64-encoded external string ID .txt file, optional
}

Required:

project_name: The project that the uploaded dataset belongs to.
dataset_name: The name of the data collection in VecML DB.
file_data: The Base64 encoded raw data file.
file_format: "csv", "json", "libsvm" or "binary".
vector_type: the type of the vector. Supported types:
- dense: the standard float32 dense vector. For example, [0, 1.2, 2.4, -10 ,5.7]. Standard embedding vectors from language or vision models can be saved as this type.
- dense8Bit: uint8 dense vectors, with integer vector elements ranging in [0, 255]. For example, [0, 3, 76, 255, 152]. 8-bit quantized embedding vectors can be saved as this type for storage saving.
- dense4Bit: 4-bit quantized dense vectors, with integer vector elements ranging in [0, 15].
- dense2Bit: 2-bit quantized dense vectors, with integer vector elements ranging in [0, 3].
- dense1Bit: 1-bit quantized dense vectors, with binary vector elements.
- sparse: sparse vector formatted as a set of index:value pairs. Please use this for libsvm file format. This is useful for high-dimensional sparse vectors.

Conditional:

For csv and json format data:
- has_field_names (required): whether the data file contains column headers (csv) or field names (json).
- vector_data_field_name (if daat_format == json and has_field_names == true): the json field that contains the vector data.
- vector_attributes (optional, csv and json only): [attr1, attr2, ..., attrN], List of attribute columns/fields associated with the file's vectors.
- vector_id_field_name (optional, csv and json only): Specifies the column/field that should be used as the unique vector IDs of the vectors.

Optional:

string_id_txt: An Bse64 encoded extra white-space or newline separated .txt format file that specifies the UNIQUE string IDs of the vectors. The number of string IDs should equal the number of vectors in the data file. If string_id_txt is provided, it will override all other ID sources and become the exclusive source for unique IDs.

Auto-generation of string ID and column names (for CSV and JSON)

For libsvm and binary data, we will automatically generate 0-based IDs for the vectors, e.g., "0", "1", ... For csv and json, if has_field_names == False or vector_id_field_name is not provided, the 0-based IDs are also auto-generated.

Moreover, for csv files, if has_field_names == False, we still allow the user to specify the ID and attribute columns by specifying the number of column (1-based), such as "vector_data_field_name": "column 1", "vector_attributes": ["column 59", "column 60"]. The column name must strictly follow the format "column XX".

Response

{
  "success": true,
  "job_id": "string",
  "checksum_server": "string",
  "error_message": "none"
}

The checksum_server is computed for the file_data field.

Example

POST /upload
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "dataset_name": "MyDataset",
  "file_data": "BASE64_ENCODED_DATA==",
  "file_format": "csv",
  "has_field_names": true,
  "vector_type": "dense",
  "vector_id_field_name": "unique_ID",
  "vector_attributes": ["date", "region", "category"]
}

Response:

{
  "success": true,
  "job_id": "api_key_123_ProjectA_MyDataset_UploadDataJob",
  "error_message": "none"
}

Vector DB API

Authentication (API key):

An Example WorkFlow

1. /add_data

2. /add_data_batch

3. /attach_index

4. /create_project

5. /delete_dataset

6. /delete_index

7. /delete_project

8. /fetch_datasets

9. /fetch_projects

10. /get_attach_index_status

11. /get_upload_data_status

12. /get_user_jobs

13. /init

14. /list_all_items

15. /list_index_infos

16. /remove_data

17. /search

18. /upload

1. `/add_data`

2. `/add_data_batch`

3. `/attach_index`

4. `/create_project`

5. `/delete_dataset`

6. `/delete_index`

7. `/delete_project`

8. `/fetch_datasets`

9. `/fetch_projects`

10. `/get_attach_index_status`

11. `/get_upload_data_status`

12. `/get_user_jobs`

13. `/init`

14. `/list_all_items`

15. `/list_index_infos`

16. `/remove_data`

17. `/search`

18. `/upload`