Skip to content

Vector DB API

This document describes each endpoint of VecML RESTful API, including:

  • Endpoint URL
  • Expected Request Body (JSON)
  • Response Format (JSON)
  • Relevant Details or Constraints
  • Example Requests / Responses

All RESTful API calls to VecML cloud database should be sent to: https://aidb.vecml.com/api. All endpoints require a JSON body unless otherwise noted.

Rate and request size limit

For resource allocation and server stability, the server enforces a rate limit on the number of API calls per second. If exceeded, the server responds with:

400 Bad Request
Request limit reached, please try again later
Please avoid high-frequency API calls. For example, when inserting multiple vectors, please use /add_data_batch instead of iteratively calling /add_data.

The size of each request cannot pass 200MB.

Managing Unique IDs for vectors

Vectors are identified by UNIQUE string IDs. While VecML provide a way to maintain auto-generated string IDs internally, we highly encourage the users to maintain and specify unique IDs for the vectors, for the convenience of database operations. Multiple ways are available to specify the IDs, please check the data inserting functions for details.


Authentication (API key):

VecML RESTful API requests are authenticated through user's API key. The API key can be generated as follows:

  1. Go to https://account.vecml.com/user-api-keys , sign up a free VecML account.
  2. After registration, you can get an API key by clicking "Create New API Key".
  3. This unique API key will only be seen once at the creation time, so please keep the API key safely. If you need to re-generate an API key, simply delete the previous one and then create a new key.

An Example WorkFlow

With VecML API, managing vector database and searching for queries become easy. Below is a standard workflow using VecML RESTful API:

  1. Call /create_project to create a new project.

  2. Create a dataset (or collection, interchangably) within the project. Two convenient options are:

    (i) Call /init to initialize/create a new dataset. Then use /add_data_batch to (iteratively) add batch of vectors to the dataset.

    (ii) Upload a data file (supported types: csv, json, binary format, libsvm) as a dataset using /upload. You can iteratively upload files to a data collection.

  3. Build an index for the dataset using /attach_index. Note: Before indexing, please call /get_upload_data_status to confirm that the data insertion has finished. Otherwise, indexing an actively incrementing dataset may cause unexpected behavior.

  4. Conduct approximate nearest neighbor search using /search. Note: Before search, please call /get_attach_index_status to confirm the index construction has finished. Otherwise, search might not return full results.

The following is a example code to demonstrate the workflow. The code is in Python, Javascript, and cURL.

import requests
import json
import numpy as np
import time

# Configuration
API_KEY = "replace_this_with_your_api_key"
BASE_URL = "https://aidb.vecml.com/api"

def make_request(endpoint, data):            # helper function to make API calls
    url = f"{BASE_URL}/{endpoint}"
    response = requests.post(url, json=data)
    print(f"Request to {endpoint}: HTTP {response.status_code}")
    if response.text:
        json_response = response.json()
        return response.status_code, json_response
    else:
        return response.status_code, None

# 1. Create a project
project_data = {"user_api_key": API_KEY, "project_name": "RAG-example", "application": "search"}
status, response = make_request("create_project", project_data)

# 2. Initialize dataset
init_data = {"user_api_key": API_KEY, "project_name": "RAG-example", "collection_name": "embeddings",
             "vector_type": "dense", "vector_dim": 128 }
status, response = make_request("init", init_data)

# 3. Generate and add 100 random vectors with /add_data_batch
vectors = np.random.randn(100, 128).tolist()                             # some random vectors
string_ids = ['ID_' + str(i) for i in range(100)]                        # dummy ids for each vector
attributes = [{"text": f"example text {i}"} for i in range(100)]         # attributes of each vector, which can be arbitrary

batch_data = {"user_api_key": API_KEY, "project_name": "RAG-example", "collection_name": "embeddings",
             "string_ids": string_ids, "data": vectors, "attributes": attributes }
status, response = make_request("add_data_batch", batch_data)

# 4. Attach index with L2 distance
index_data = {"user_api_key": API_KEY, "project_name": "RAG-example", "collection_name": "embeddings",
              "index_name": "L2_index", "dist_type": "L2" }
status, response = make_request("attach_index", index_data)
index_job_id = response["job_id"]
print(f"Got index job ID: {index_job_id}")

# 5. Checking the indexing job status every 1 sec, query after the job is done
print("Waiting for indexing to complete...")
max_wait_time = 20
start_time = time.time()
while True:
    status_data = {"user_api_key": API_KEY, "job_id": index_job_id}
    status, status_response = make_request("get_attach_index_status", status_data)

    if status_response.get("status") == "finished":
        print("Indexing completed successfully")
        break

    if time.time() - start_time > max_wait_time:
        print("Server busy - indexing timeout after 20 seconds")
        exit(1)

    print("Indexing in progress, checking again...")
    time.sleep(1)

# 6. Search with a random query vector
query_vectors = np.random.randn(2, 128).tolist()     # query vector
search_data = {
    "user_api_key": API_KEY, "project_name": "RAG-example", "collection_name": "embeddings",
    "index_name": "L2_index", "query_type": "vectors",
    "top_k": 5, "query_vectors": query_vectors
}
status, search_results = make_request("search", search_data)

print("Search results:")
for result in search_results["results"]:
    matches = result["matches"]
    query_id = result["query_vector_id"]

    print("Query id: ", query_id)
    for pair in matches:
      print(f"ID: {pair['idx']}, L2 Distance: {pair['distance']}")

const fetch = require('node-fetch');

// Configuration
const API_KEY = "replace_this_with_your_api_key";
const BASE_URL = "https://aidb.vecml.com/api";

// Helper function to make API calls
async function makeRequest(endpoint, data) {
    const url = `${BASE_URL}/${endpoint}`;
    const response = await fetch(url, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify(data)
    });
    console.log(`Request to ${endpoint}: HTTP ${response.status}`);
    // If the response is not empty (204 = No Content), parse JSON
    if (response.status !== 204) {
        return [response.status, await response.json()];
    }
    return [response.status, null];
}

// Main function to run the workflow
async function runWorkflow() {
    try {
        // 1. Create a project
        const projectData = { "user_api_key": API_KEY, "project_name": "RAG-example", "application": "search" };
        let [status, response] = await makeRequest("create_project", projectData);

        // 2. Initialize dataset
        const initData = { "user_api_key": API_KEY, "project_name": "RAG-example", "collection_name": "embeddings",
                           "vector_type": "dense", "vector_dim": 128 };
        [status, response] = await makeRequest("init", initData);

        // 3. Generate and add 100 random vectors with /add_data_batch
        //    (uniform in [-1, 1], but that's okay for demonstration)
        const vectors = Array(100).fill().map(() =>
            Array(128).fill().map(() => (Math.random() * 2 - 1))
        );
        const stringIds = Array(100).fill().map((_, i) => `ID_${i}`);
        const attributes = Array(100).fill().map((_, i) => ({"text": `example text ${i}`}));

        const batchData = { "user_api_key": API_KEY, "project_name": "RAG-example", "collection_name": "embeddings",
                            "string_ids": stringIds, "data": vectors, "attributes": attributes };
        [status, response] = await makeRequest("add_data_batch", batchData);

        // 4. Attach index with L2 distance
        const indexData = { "user_api_key": API_KEY, "project_name": "RAG-example", "collection_name": "embeddings",
                            "index_name": "L2_index", "dist_type": "L2" };
        [status, response] = await makeRequest("attach_index", indexData);
        const indexJobId = response.job_id;
        console.log(`Got index job ID: ${indexJobId}`);

        // 5. Checking the indexing job status every 1 sec, query after the job is done
        console.log("Waiting for indexing to complete...");
        const maxWaitTime = 20; // seconds
        const startTime = Date.now();
        while (true) {
            const statusData = {
                "user_api_key": API_KEY,
                "job_id": indexJobId
            };
            [status, response] = await makeRequest("get_attach_index_status", statusData);

            if (response && response.status === "finished") {
                console.log("Indexing completed successfully");
                break;
            }
            if ((Date.now() - startTime) / 1000 > maxWaitTime) {
                console.log("Server busy - indexing timeout after 20 seconds");
                process.exit(1);
            }
            console.log("Indexing in progress, checking again...");
            await new Promise(resolve => setTimeout(resolve, 1000));
        }

        // 6. Search with a random query vector
        const queryVector = Array(128).fill().map(() => (Math.random() * 2 - 1));
        const searchData = { "user_api_key": API_KEY, "project_name": "RAG-example", "collection_name": "embeddings",
                             "index_name": "L2_index", "query_type": "vector", "top_k": 5, "query_vector": queryVector };
        let [_, searchResults] = await makeRequest("search", searchData);

        console.log("Search results:");
        searchResults.results.forEach(result => {
            console.log(`ID: ${result.idx}, L2 Distance: ${result.distance}`);
        });
    } catch (error) {
        console.error("Error in workflow:", error);
    }
}

runWorkflow();

#!/bin/bash

# Configuration
API_KEY="replace_this_with_your_api_key"
BASE_URL="https://aidb.vecml.com/api"

# Helper function to make requests
make_request() {
    local endpoint=$1
    local data=$2
    echo "Request to $endpoint"
    response=$(curl -s -X POST "$BASE_URL/$endpoint" \
        -H "Content-Type: application/json" \
        -d "$data")
    echo "$response"
    echo "$response" > last_response.json
}

# 1. Create a project
project_data='{ "user_api_key": "'"$API_KEY"'", "project_name": "RAG-example", "application": "search" }'
make_request "create_project" "$project_data"

# 2. Initialize dataset (128-dimensional)
init_data='{ "user_api_key": "'"$API_KEY"'", "project_name": "RAG-example", "collection_name": "embeddings",
             "vector_type": "dense", "vector_dim": 8 }'
make_request "init" "$init_data"

############################################
# 3. Add vectors (3 vectors, each 128D).
#    You can expand to more vectors if needed.
############################################

# Example of 3 distinct 128D vectors with placeholder floats:
# (We just fill them mostly with 0.0 for brevity.)
# Make sure each vector has exactly 128 elements!

batch_data='{
  "user_api_key": "'"$API_KEY"'", "project_name": "RAG-example", "collection_name": "embeddings",
  "string_ids": ["ID_0", "ID_1", "ID_2"],
  "data": [
    [0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80],
    [-0.10, -0.20, -0.30, -0.40, -0.50, -0.60, -0.70, -0.80],
    [0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.05, -0.05]
  ],
  "attributes": [
    { "text": "example text 0" },
    { "text": "example text 1" },
    { "text": "example text 2" }
  ]
}'
make_request "add_data_batch" "$batch_data"

# 4. Attach index with L2 distance
index_data='{ "user_api_key": "'"$API_KEY"'", "project_name": "RAG-example", "collection_name": "embeddings",
              "index_name": "L2_index", "dist_type": "L2" }'
make_request "attach_index" "$index_data"

# Extract job ID from the response
index_job_id=$(grep -o '"job_id":"[^"]*"' last_response.json | cut -d'"' -f4)
echo "Got index job ID: $index_job_id"

# 5. Check indexing status with a 20-second timeout
echo "Waiting for indexing to complete..."
start_time=$(date +%s)
max_wait_time=20
while true; do
    status_data='{ "user_api_key": "'"$API_KEY"'", "job_id": "'"$index_job_id"'" }'
    make_request "get_attach_index_status" "$status_data"
    current_status=$(grep -o '"status":"[^"]*"' last_response.json | cut -d'"' -f4)
    if [ "$current_status" = "finished" ]; then
        echo "Indexing completed successfully"
        break
    fi
    current_time=$(date +%s)
    elapsed=$((current_time - start_time))
    if [ $elapsed -gt $max_wait_time ]; then
        echo "Server busy - indexing timeout after $max_wait_time seconds"
        exit 1
    fi
    echo "Indexing in progress, checking again..."
    sleep 1
done

# 6. Search with a 128D query vector
# (Here we just pick some arbitrary 128D vector.)
search_data='{ "user_api_key": "'"$API_KEY"'", "project_name": "RAG-example", "collection_name": "embeddings",
               "index_name": "L2_index", "query_type": "vector", "top_k": 5,
               "query_vector": [0.12, 0.20, 0.33, -0.15, 0.40, 0.11, 0.05, 0.06] }'
make_request "search" "$search_data"

# Parse and display search results from last_response.json
echo "Search results:"
# Extract all IDs and distances into arrays
ids=($(grep -o '"idx":"[^"]*"' last_response.json | cut -d'"' -f4))
distances=($(grep -o '"distance":[0-9.\-e]*' last_response.json | cut -d':' -f2))

# Process them using array indexing
for i in ${!ids[@]}; do
    echo "ID: ${ids[$i]}, L2 Distance: ${distances[$i]}"
done


1. /add_data

Description Attaches a vector and any attributes to the dataset under a specified string_id.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "collection_name": "string",
  "string_id": "string",
  "data":   // For "dense" or bit-packed dense => array of floats or array of uint8
           // For "sparse" => array of [ idx, val ] pairs
           // Example: [0.1, 0.2, ...] or [[1, 2.0], [5, -1.2]]
  "attributes": {
    "key1": "value1",
    "key2": "value2",
    ...
  }
}

  • attributes is optional. If present, each key is an attribute name and its value is a string.

Response Success Response (200 OK):

{
  "error_code": "Success"
}
Error Responses: 400 status code with different error messages.

Notes

  • If the dataset is not yet initialized, this call attempts to initialize it (provided the dataset metadata is known).

  • Fails if dimension/type mismatch occurs.

Example

POST /add_data
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "collection_name": "MyDataset",
  "string_id": "Sample_123",
  "data": [0.25, 0.75, 1.25, 2.00],
  "attributes": {
    "Label": "CatA",
    "Timestamp": "2025-01-01 10:00:00"
  }
}
Response:
{
  "error_code": "Success"
}


2. /add_data_batch

Description Inserts many vectors at once, a batched version of \add_data. Expects arrays of string_ids, data, and attributes with the same length.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "collection_name": "string",
  "string_ids": [
    "id_1",
    "id_2",
    ...
  ],
  "data": [
    // for each index i, this is a dense array or sparse pairs
    [...],
    ...
  ],
  "attributes": [
    // array of objects, parallel to data
    {"attr1": "val1", ...},
    {"attr2": "val2", ...},
    ...
  ]
}
- "attributes" is optional. If present, each key is an attribute name and its value is a string.

  • All arrays must have the same length. The i-th entry in string_id matches the i-th entry in data and attributes.

Response

{
  "success": true,
  "job_id": "string",
  "checksum_server": "string",
  "error_message": "none"
}

The checksum_server is computed for the data field (dumped).

Example

POST /add_data_batch
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "collection_name": "BatchDS",
  "string_ids": ["row1", "row2"],
  "data": [
    [0.25, 0.75],
    [0.6, 0.3]
  ],
  "attributes": [
    {"date": "A"},
    {"category": "B"}
  ]
}


3. /attach_index

Description Creates an ANN (Approximate Nearest Neighbor) index for the given dataset. The indexing job runs asynchronously. You receive a job_id for tracking progress via /get_attach_index_status. Once attached, whenever new data are added to or removed from the data collection, the index will be updated automatically.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "collection_name": "string",
  "index_name": "string",  // must be a valid name, <=128 chars, no invalid symbols
  "dist_type": "string"    // e.g., "L2", "Cosine", "Inner Product", "Hamming"
  "large_index" : bool
}

  • large_index: (bool, optional) whether to use advanced approach to optimize the performance for very large index. For large indices, we provide techniques to balance the query efficiency, accuracy, and memory usage. It is highly recommended to set large_index to true if your data collection has more than ~2M vectors.

Response

{
  "job_id": "string",
  "success": true
}

  • job_id can be passed to /get_attach_index_status to see if the index build is finished.

Example

POST /attach_index
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "collection_name": "MyDataset",
  "index_name": "index1",
  "dist_type": "Cosine"
}
Response:
{
  "job_id": "api_key_123_ProjectA_MyDataset_index1_AttachIndexJob",
  "success": true
}


4. /create_project

Description Creates a new project for a given user.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "application": "string" // e.g., "autoML", "both", or any descriptive text
}

  • The project_name must be unique and valid.
  • application is an internal field describing how the project will be used, select from search, autoML, or both. Currently for Restful API, only search functionalities are available.

Response

{
  "success": true
}

Example

POST /create_project
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "application": "search"
}
Response:
{
  "success": true
}


5. /delete_dataset

Description Deletes an existing dataset from a project. This action cannot be undone.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "dataset_name": "string"
}

Response

{
  "success": true,
  "error_code": "Success"
}

Example

POST /delete_dataset
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "dataset_name": "ObsoleteData"
}


6. /delete_index

Description Deletes a specific index from the dataset. This action cannot be undone.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "dataset_name": "string",
  "index_name": "string"
}

Response

{
  "success": true,
  "error_code": "Success"
}

Example

POST /delete_index
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "dataset_name": "MyDataset",
  "index_name": "index1"
}


7. /delete_project

Description Deletes an existing project and all of its datasets. This action cannot be undone.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string"
}

Response

{
  "success": true,
  "error_code": "Success" // or another code if failure
}

Example

POST /delete_project
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA"
}


8. /fetch_datasets

Description Fetches metadata for all datasets within a given project.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string"
}

Response

{
  "success": true,
  "datasets": [
    {
      "name": "...",
      "vector_type": "...",
      "num_vectors": 1000,
      "vector_dim": 128,
      "bytes": 1234567,
      "create_time": "2025/01/02, 15:30",
      ...
    },
    ...
  ]
}

Example

POST /fetch_datasets
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA"
}


9. /fetch_projects

Description Returns a list of all project(s) belonging to a user.

Method - POST

Request Body

{
  "user_api_key": "string"
}

Response

{
  "success": true,
  "projects": [
    {
      "project_name": "...",
      "application":  "...",
      ...
    },
    ...
  ]
}

Example

POST /fetch_projects
Content-Type: application/json

{
  "user_api_key": "api_key_123"
}
Response:
{
  "success": true,
  "projects": [
    {
      "project_name": "ProjectA",
      "application": "search"
    },
    {
      "project_name": "ProjectB",
      "application": "autoML"
    }
  ]
}


10. /get_attach_index_status

Description Returns the current status of an index-building job.

Method - POST

Request Body

{
  "user_api_key": "string", 
  "job_id": "string"
}

Response

{
  "success": true,
  "status": "pending" | "in_progress" | "finished" | "failed",
  "error": "Indexing job failed"  // (optional if status == "failed")
}

Example

POST /get_attach_index_status
Content-Type: application/json

{ 
  "user_api_key": "api_key_123",
  "job_id": "my_job_id"
}
Response:
{
  "success": true,
  "status": "in_progress"
}


11. /get_upload_data_status

Description Checks the status of a data-upload job initiated by /upload or /add_data_batch.

Method - POST

Request Body

{
  "user_api_key": "string",
  "job_id": "string"
}

Response

{
  "success": true | false,
  "status": "pending" | "in_progress" | "finished" | "failed",
  "error_message": "..."
}

Example

POST /get_upload_data_status
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "job_id": "my_job_id"
}
Response:

For data insertion job was initialized by /UPLOAD:

{
  "success": true,
  "status": "in_progress",
  "error_message": ""
}

For data insertion job was initialized by /ADD_DATA_BATCH:

If all vectors are added to the data collection successfully:

{
  "success": true,
  "status": "in_progress",
  "error_message": ""
}

Otherwise, (for example, two vectors failed to be inserted),

{
  "success": true,
  "status": "in_progress",
  "error_message": "The following vectors failed to be inserted: [
                         {"body":"{"error_code":"VectorAlreadyExists"}","string_id":"ID_0"}
                         {"body":"{"error_code":"VectorAlreadyExists"}","string_id":"ID_1"}
                         ]" 
}


12. /init

Description Initializes (or re-initializes) an empty dataset in the user's project. If the dataset does not exist, it is created. If the dataset already exists and the specified parameters match the existing configuration, it re-initializes that dataset in memory.

Methods - POST

Request Body Parameters

{
  "user_api_key": "string",
  "project_name": "string",
  "collection_name": "string",
  "vector_type": "string",   // "dense" or "sparse" (no other values supported)
  "vector_dim": int          // Required if vector_type != "sparse"; must be > 0
}
- user_api_key: The user's API key.

  • project_name: The name of the project to which the dataset belongs. Must already exist.

  • collection_name: The name of the dataset/collection to initialize.

  • vector_type: "dense", "dense8Bit", "dense4Bit", "dense2Bit" ,"dense1Bit", or "sparse".

  • vector_dim: Positive integer dimension for "dense", "dense8Bit", "dense4Bit", "dense2Bit", or "dense1Bit". Not used for "sparse".

Response Success Response (200 OK):

{
  "error_code": "Success"
}
Error Responses: 400 status code with different error messages.

Example

POST /init
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "collection_name": "MyDataset",
  "vector_type": "dense",
  "vector_dim": 100
}
Response:
{
  "error_code": "Success"
}
or, code 400 with (an example failure message)
"The dataset already exists, but the input vector dim does not match the existing vector dim."


13. /list_all_items

Description Retrieves a complete hierarchical structure of all projects, datasets, and indices for a user. Returns organized data showing the relationship between projects, their datasets, and associated indices.

Method - POST

Request Body

{
  "user_api_key": "string"
}

Request Fields - user_api_key: Valid API key for authentication

Response Success Response (200 OK):

{
  "success": true,
  "projects": [
    {
      "project_name": "string",
      "project_id": "string", 
      "application": "string",
      "datasets": [
        {
          "name": "string",
          "type": "string",
          "size": 1024,
          "create_time": "string",
          "last_insert_time": "string",
          "dimension": "(1000, 128)",
          "indices": [
            {
              "name": "string",
              "distance_type": "string",
              "type": "string"
            }
          ]
        }
      ]
    }
  ]
}

Response Fields

  • success: Boolean indicating operation success
  • projects: Array of project objects containing:
  • project_name: Name of the project
  • project_id: Internal ID of the project
  • application: Application type or category
  • datasets: Array of dataset objects containing:
    • name: Dataset name
    • type: Vector type (e.g., "float32", "int8")
    • size: Dataset size in bytes
    • create_time: When the dataset was created
    • last_insert_time: Last time data was inserted
    • dimension: Format "(num_vectors, vector_dimension)"
    • indices: Array of index objects containing:
    • name: Index name
    • distance_type: Distance metric used (e.g., "cosine", "euclidean")

Notes

  • Returns complete hierarchy in a single request
  • Useful for dashboard views and project overview
  • Dataset size is returned in bytes; convert to KB/MB/GB as needed
  • Empty arrays are returned for projects/datasets with no children
  • Rate limiting applies to this endpoint

Example

POST /list_all_items
Content-Type: application/json

{
  "user_api_key": "api_key_123"
}

Response:

{
  "success": true,
  "projects": [
    {
      "project_name": "RecommendationEngine",
      "project_id": "proj_456",
      "application": "Search",
      "datasets": [
        {
          "name": "user_embeddings",
          "type": "float32",
          "size": 2048576,
          "create_time": "2025-01-15 10:30:00",
          "last_insert_time": "2025-01-20 14:45:00",
          "dimension": "(10000, 128)",
          "indices": [
            {
              "name": "user_similarity_index",
              "distance_type": "cosine",
            }
          ]
        },
        {
          "name": "item_features",
          "type": "float32", 
          "size": 1572864,
          "create_time": "2025-01-15 11:00:00",
          "last_insert_time": "2025-01-19 09:30:00",
          "dimension": "(5000, 256)",
          "indices": []
        }
      ]
    }
  ]
}


14. /list_index_infos

Description Lists all indexes that have been attached to a particular dataset.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "dataset_name": "string"
}

Response

{
  "success": true,
  "index_infos": [
    [ "index_name", "index_type", "distance_type" ],
    ...
  ]
}

Example

POST /list_index_infos
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "dataset_name": "MyDataset"
}
Response:
{
  "success": true,
  "index_infos": [
    ["index1", "Cosine"]
  ]
}


15. /remove_data

Description Remove vectors from the dataset and its indices based on a list of string ids.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "collection_name": "string",
  "string_ids": ["id1", "id2"]
}
Response Success Response (200 OK):
{
  "error_code": "Success"
}
Error Responses: 400 status code with detailed error messages.

Example

POST /remove_data
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "collection_name": "MyDataset",
  "string_ids": ["Sample_1", "Sample_2"]
}


Description Performs a nearest-neighbor vector search using a built index, with optional filter conditions. This endpoint supports:

  • Searching by providing a list of raw query vectors (dense or sparse).
  • Searching by referencing an existing row in the dataset (a row_number).
  • Searching via a file containing multiple vectors to be searched. Please make sure your file have the same format (vector dimension, attributes) as the database dataset.
  • Searching an entire dataset (query_dataset_name) for batch queries.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "collection_name": "string",
  "query_type": "vectors" | "row" | "file" | "dataset",
  "index_name": "string",
  "top_k": int,
  "filter": {
     // optional filter structure, see below
  },

  // If query_type == "vectors":
  "query_vectors": [[0.1, -0.2, 0.3], [0, 1, 10]],                 // dense-type vectors
                   [[0:0.1, 4:5.2], [2:0, 9:0.3, 20:0.4]]          // sparse vectors

  // If query_type == "row":
  "row_number": 42,

  // If query_type == "file":
  "file_data": "base64-encoded data",
  "file_name": "string",
  "file_format": "json" | "csv" | "libsvm" | "binary",
  // server uses dataset config to parse (field names, etc.)

  // If query_type == "dataset":
  "query_dataset_name": "string"
}

Filter Structure

"filter": {
  "type": "equality" | "range",
  "attribute": "attrName",

  // equality case:
  "target_values": ["someValue1", "someValue2", ...]

  // or range case:
  "start_value": "A",
  "end_value": "Z"
}

  • If filter is null or incomplete, no filtering is applied.
  • An equality filter returns items whose attribute is in the set of target values.
  • A range filter returns items whose attribute is between start_value and end_value (inclusive). For strings, lexical ordering is used. For numeric attributes, the system uses numeric comparison if both are parseable as numbers.

Response - For query_type: "row":

{
  "error_code": "Success",
  "results": [
    {"idx": "string_id_of_match", "distance": floatValue},
    ...
  ]
}
- For query_type: "vectors" or "dataset" or "file":
{
  "error_code": "Success",
  "results": [
    {
      "query_vector_id": "...",   // Typically the row index or custom ID from the file
      "matches": [
        {"idx": "string_id_of_match", "distance": floatValue},
        ...
      ]
    },
    ...
  ]
}

Examples

  • Query with a direct vector

    POST /search
    Content-Type: application/json
    
    {
      "user_api_key": "api_key_123",
      "project_name": "ProjectA",
      "collection_name": "MyDataset",
      "query_type": "vectors",
      "index_name": "index1",
      "top_k": 5,
      "query_vectors": [[0.25, 0.75, 1.25, 2.00]],       // use one vector in the example
      "filter": {
        "type": "equality",
        "attribute": "Category",
        "target_values": ["CatA","CatC"]
      }
    }
    Response:
    {
      "error_code": "Success",
      "results": [
        "query_vector_id": "0",
        "matches": [
          {"idx": "Sample_01", "distance": 0.3},
          {"idx": "Sample_07", "distance": 0.8},
          ...
        ]
      ]
    }

  • Query referencing an existing row

    {
      "query_type": "row",
      "row_number": 123,
      ...
    }

  • Batch file-based query

    {
      "query_type": "file",
      "file_data": "base64encodedFile",
      "file_name": "Queries.json",
      "file_format": "json",
      ...
    }

  • Dataset-to-dataset query

    {
      "query_type": "dataset",
      "query_dataset_name": "QueryVectorsDS",
      ...
    }


17. /upload

Description Uploads a file containing dataset vectors (CSV, JSON, libsvm, binary format, etc.) and creates a new dataset with those vectors. The upload runs asynchronously; you receive a job_id to query via /get_upload_data_status.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "dataset_name": "string",   // The new dataset's name
  "file_data": "string",   // base64-encoded data (optionally gzip compressed)
  "file_format": "string", // "csv", "json", "libsvm", "binary"
  "has_field_names": true | false,
  "vector_type": "dense" | "dense1Bit" | "dense2Bit" | "dense4Bit" | "dense8Bit" | "sparse",

  // If JSON with field names:
  "vector_data_field_name": "string",
  "vector_id_field_name": "string",
  "vector_attributes": ["attr1", "attr2", ...],

  // If "binary" file_format:
  "binary_dtype": "uint8" | "float32",   // for binary format data, you can set "vector_type" to be dense. The actuall dtype uses "binary_dtype"
  "binary_dim": 123,

  // Optional string ID txt file
  "string_id_txt": "string"     // base64-encoded external string ID .txt file, optional
}

Required:

  • project_name: The project that the uploaded dataset belongs to.
  • dataset_name: The name of the data collection in VecML DB.
  • file_data: The Base64 encoded raw data file.
  • file_format: "csv", "json", "libsvm" or "binary".
  • vector_type: the type of the vector. Supported types:
    • dense: the standard float32 dense vector. For example, [0, 1.2, 2.4, -10 ,5.7]. Standard embedding vectors from language or vision models can be saved as this type.
    • dense8Bit: uint8 dense vectors, with integer vector elements ranging in [0, 255]. For example, [0, 3, 76, 255, 152]. 8-bit quantized embedding vectors can be saved as this type for storage saving.
    • dense4Bit: 4-bit quantized dense vectors, with integer vector elements ranging in [0, 15].
    • dense2Bit: 2-bit quantized dense vectors, with integer vector elements ranging in [0, 3].
    • dense1Bit: 1-bit quantized dense vectors, with binary vector elements.
    • sparse: sparse vector formatted as a set of index:value pairs. Please use this for libsvm file format. This is useful for high-dimensional sparse vectors.

Conditional:

  • For csv and json format data:
    • has_field_names (required): whether the data file contains column headers (csv) or field names (json).
    • vector_data_field_name (if daat_format == json and has_field_names == true): the json field that contains the vector data.
    • vector_attributes (optional, csv and json only): [attr1, attr2, ..., attrN], List of attribute columns/fields associated with the file's vectors.
    • vector_id_field_name (optional, csv and json only): Specifies the column/field that should be used as the unique vector IDs of the vectors.

Optional:

  • string_id_txt: An Bse64 encoded extra white-space or newline separated .txt format file that specifies the UNIQUE string IDs of the vectors. The number of string IDs should equal the number of vectors in the data file. If string_id_txt is provided, it will override all other ID sources and become the exclusive source for unique IDs.

Auto-generation of string ID and column names (for CSV and JSON)

For libsvm and binary data, we will automatically generate 0-based IDs for the vectors, e.g., "0", "1", ... For csv and json, if has_field_names == False or vector_id_field_name is not provided, the 0-based IDs are also auto-generated.

Moreover, for csv files, if has_field_names == False, we still allow the user to specify the ID and attribute columns by specifying the number of column (1-based), such as "vector_data_field_name": "column 1", "vector_attributes": ["column 59", "column 60"]. The column name must strictly follow the format "column XX".

Response

{
  "success": true,
  "job_id": "string",
  "checksum_server": "string",
  "error_message": "none"
}
The checksum_server is computed for the file_data field.

Example

POST /upload
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "dataset_name": "MyDataset",
  "file_data": "BASE64_ENCODED_DATA==",
  "file_format": "csv",
  "has_field_names": true,
  "vector_type": "dense",
  "vector_id_field_name": "unique_ID",
  "vector_attributes": ["date", "region", "category"]
}
Response:
{
  "success": true,
  "job_id": "api_key_123_ProjectA_MyDataset_UploadDataJob",
  "error_message": "none"
}