AutoML API

This document describes each endpoint of VecML RESTful API for Automated Maching Learning (AutoML), including:

Endpoint URL
Expected Request Body (JSON)
Response Format (JSON)
Relevant Details or Constraints
Example Requests / Responses

All RESTful API calls to VecML cloud database should be sent to: https://aidb.vecml.com/api. All endpoints require a JSON body unless otherwise noted.

Rate and request size limit

For resource allocation and server stability, the server enforces a rate limit on the number of API calls per second. If exceeded, the server responds with:

400 Bad Request
Request limit reached, please try again later

Please avoid high-frequency API calls. For example, when inserting multiple vectors, please use /add_data_batch instead of iteratively calling /add_data.

The size of each request cannot pass 200MB.

Managing Unique IDs for vectors

Vectors are identified by UNIQUE string IDs. While VecML provide a way to maintain auto-generated string IDs internally, we highly encourage the users to maintain and specify unique IDs for the vectors, for the convenience of database operations. Multiple ways are available to specify the IDs, please check the data inserting functions for details.

Authentication (API key):

VecML RESTful API requests are authenticated through user's API key. The API key can be generated as follows:

Go to https://account.vecml.com/user-api-keys , sign up a free VecML account.
After registration, you can get an API key by clicking "Create New API Key".
This unique API key will only be seen once at the creation time, so please keep the API key safely. If you need to re-generate an API key, simply delete the previous one and then create a new key.

An Example WorkFlow

VecML Database provides fast and performant built-in ML tools for your project and bussiness. Below is a standard workflow using VecML RESTful API for AutoML model training:

Call /create_project to create a new project.
Create a dataset (or collection, interchangably) within the project. Two convenient options are:

(i) Call /init to initialize/create a new dataset. Then use /add_data_batch to (iteratively) add batch of vectors to the dataset.

(ii) Upload a data file (supported types: csv, json, binary format, libsvm) as a dataset using /upload. You can iteratively upload files to a data collection.

When uploading data, attributes are required to train AutoML models. In this context, "attributes" should include the label (for classification) or response (for regression), and other categorical features that you want to include in the model. We provide easy ways to specify the attributes along with the data vectors, see the endpoint instruction for more details.

Train an AutoML model for the dataset using /train_automl_model. Note: Before starting model training, please call /get_upload_data_status to confirm that the data insertion has finished. Otherwise, model training may raise an error while inserting data.
Make predictions on predict/test dataset /automl_preidct. Note: Before search, please call /get_automl_training_status to confirm the training has finished.

The following is a example python code to demonstrate the API workflow.

Python

import requests
import json
import numpy as np
import time

# Configuration
API_KEY = "replace_this_with_your_api_key"
BASE_URL = "https://aidb.vecml.com/api"

def make_request(endpoint, data):
    """Helper function to make API calls"""
    url = f"{BASE_URL}/{endpoint}"
    response = requests.post(url, json=data)
    print(f"Request to {endpoint}: HTTP {response.status_code}")

    if response.text:
        try:
            json_response = response.json()
            print(f"Response: {json_response}")
            return response.status_code, json_response
        except requests.exceptions.JSONDecodeError:
            print(f"Response: {response.text}")
            return response.status_code, {"error": "Not JSON", "message": response.text}
    else:
        print("Response: Empty")
        return response.status_code, None

def wait_for_job_completion(job_id, status_endpoint, max_wait_time=60):
    """Wait for an async job to complete"""
    start_time = time.time()

    while True:
        status_data = {"user_api_key": API_KEY, "job_id": job_id}
        status, status_response = make_request(status_endpoint, status_data)

        if status_response and status_response.get("status") == "finished":
            return True
        elif status_response and status_response.get("status") == "failed":
            return False

        if time.time() - start_time > max_wait_time:
            return False

        time.sleep(2)

def generate_dataset(num_samples, vector_dim, id_prefix, seed=2025):
    """Generate dataset with linear decision boundary"""
    np.random.seed(seed)
    vectors = np.random.randn(num_samples, vector_dim).tolist()
    categories = [np.random.choice(['A', 'B', 'C']) for _ in range(num_samples)]

    labels = []
    for vec, category in zip(vectors, categories):
        # Linear combination of first few components plus category weight
        score = sum(vec[:20]) + {'A': 1.0, 'B': -0.5, 'C': 0.0}[category]
        label = '1' if score > 0 else '0'
        labels.append(label)

    # Generate IDs and attributes
    ids = [f"{id_prefix}_{i:03d}" for i in range(num_samples)]
    attributes = [{"label": str(label), "category": category} for label, category in zip(labels, categories)]

    return vectors, ids, attributes

# Clean up any existing project
status, response = make_request("delete_project", {"user_api_key": API_KEY, "project_name": "AutoML-Demo"})

# 1. Create a project
project_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "application": "Machine Learning"}
status, response = make_request("create_project", project_data)

# 2. Initialize training dataset
init_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "collection_name": "training_data",
             "vector_type": "dense", "vector_dim": 64}
status, response = make_request("init", init_data)

# 3. Generate and add training data using add_data_batch
vectors, ids, attributes = generate_dataset(num_samples=1000, vector_dim=64, id_prefix="train", seed=2025)

# Add training data in batch
batch_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "collection_name": "training_data",
              "string_ids": ids, "data": vectors, "attributes": attributes}
status, response = make_request("add_data_batch", batch_data)
train_upload_job_id = response["job_id"]

# Wait for training data upload to complete
if not wait_for_job_completion(train_upload_job_id, "get_upload_data_status", max_wait_time=30):
    exit(1)

# 4. Train AutoML model
train_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "dataset_name": "training_data",
              "model_name": "model1", "training_mode": "high_speed", "task_type": "classification",
              "label_attribute": "label", "categorical_features": ["category"]}
status, response = make_request("train_automl_model", train_data)
train_job_id = response["job_id"]

# Wait for training to complete
if not wait_for_job_completion(train_job_id, "get_automl_training_status", max_wait_time=60):
    exit(1)

# 5. Initialize prediction dataset
pred_init_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "collection_name": "prediction_data",
                  "vector_type": "dense", "vector_dim": 64}
status, response = make_request("init", pred_init_data)

# 6. Generate and add prediction data
prediction_vectors, prediction_ids, prediction_attributes = generate_dataset(num_samples=100, vector_dim=64, id_prefix="pred", seed=2026)

# Add prediction data in batch
pred_batch_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "collection_name": "prediction_data",
                   "string_ids": prediction_ids, "data": prediction_vectors, "attributes": prediction_attributes}
status, response = make_request("add_data_batch", pred_batch_data)
pred_upload_job_id = response["job_id"]

# Wait for prediction data upload to complete
if not wait_for_job_completion(pred_upload_job_id, "get_upload_data_status"):
    exit(1)

# 7. Make predictions using the existing dataset
predict_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "dataset_name": "training_data",
                "model_name": "model1", "prediction_dataset": "prediction_data"}
status, prediction_results = make_request("automl_predict", predict_data)

1. `/add_data`

Description Attaches a vector and any attributes to the dataset under a specified string_id.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "collection_name": "string",
  "string_id": "string",
  "data":   // For "dense" or bit-packed dense => array of floats or array of uint8
           // For "sparse" => array of [ idx, val ] pairs
           // Example: [0.1, 0.2, ...] or [[1, 2.0], [5, -1.2]]
  "attributes": {
    "key1": "value1",
    "key2": "value2",
    ...
  }
}

attributes is optional. If present, each key is an attribute name and its value is a string.

Response Success Response (200 OK):

{
  "error_code": "Success"
}

Error Responses: 400 status code with different error messages.

Notes

If the dataset is not yet initialized, this call attempts to initialize it (provided the dataset metadata is known).
Fails if dimension/type mismatch occurs.

Example

POST /add_data
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "collection_name": "MyDataset",
  "string_id": "Sample_123",
  "data": [0.25, 0.75, 1.25, 2.00],
  "attributes": {
    "Label": "CatA",
    "Timestamp": "2025-01-01 10:00:00"
  }
}

Response:

{
  "error_code": "Success"
}

2. `/add_data_batch`

Description Inserts many vectors at once, a batched version of \add_data. Expects arrays of string_ids, data, and attributes with the same length.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "collection_name": "string",
  "string_ids": [
    "id_1",
    "id_2",
    ...
  ],
  "data": [
    // for each index i, this is a dense array or sparse pairs
    [...],
    ...
  ],
  "attributes": [
    // array of objects, parallel to data
    {"attr1": "val1", ...},
    {"attr2": "val2", ...},
    ...
  ]
}

- "attributes" is optional. If present, each key is an attribute name and its value is a string.

All arrays must have the same length. The i-th entry in string_id matches the i-th entry in data and attributes.

Response

{
  "success": true,
  "job_id": "string",
  "checksum_server": "string",
  "error_message": "none"
}

The checksum_server is computed for the data field (dumped).

Example

POST /add_data_batch
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "collection_name": "BatchDS",
  "string_ids": ["row1", "row2"],
  "data": [
    [0.25, 0.75],
    [0.6, 0.3]
  ],
  "attributes": [
    {"date": "A"},
    {"category": "B"}
  ]
}

3. `/automl_predict`

Description Use a trained AutoML model to make predictions on new data. Can accept data as a file upload or reference an existing dataset.

Method - POST

Request Body

Option 1: Using file upload

{
  "user_api_key": "string",
  "project_name": "string", 
  "dataset_name": "string",
  "model_name": "string",
  "file_data": "string",          // base64 encoded file
  "file_format": "string",        // "csv", "json", or "libsvm"
  "has_field_names": boolean,     // required for CSV files
  "compression_type": "string"    // optional: "gzip"
}

Option 2: Using existing dataset

{
  "user_api_key": "string",
  "project_name": "string",
  "dataset_name": "string", 
  "model_name": "string",
  "prediction_dataset": "string"
}

Request Fields

dataset_name: Name of the dataset the model was trained on
model_name: Name of the trained model to use for prediction
file_data: (Option 1) Base64 encoded file containing prediction data
file_format: (Option 1) Format of uploaded file: "csv", "json", or "libsvm"
has_field_names: (Option 1, CSV only) Whether the CSV file contains column headers
compression_type: (Option 1, optional) Set to "gzip" if file is gzip compressed
prediction_dataset: (Option 2) Name of existing dataset to make predictions on

Response For Classification Models:

{
  "success": true,
  "num_samples": 1000,
  "predictions": [0.0, 1.0, 2.0, ...],
  "prediction_metric": {
    "accuracy": 0.85,
    "micro_precision": 0.86, 
    "macro_precision": 0.84,
    "micro_recall": 0.85,
    "macro_recall": 0.83,
    "micro_f1": 0.855,
    "macro_f1": 0.835,
    "auc": 0.92
  }
}

For Regression Models:

{
  "success": true,
  "num_samples": 1000,
  "predictions": [12.5, 8.3, 15.7, ...],
  "prediction_metric": 0.045  // MSE value
}

Error Response:

{
  "success": false,
  "error_message": "Error description"
}

Response Fields

num_samples: Number of samples that have a label and are used to compute the prediction matrics
predictions: Array of prediction values (class labels for classification, numeric values for regression)
prediction_metric: Performance metrics if true labels are available in the data
- For classification: detailed metrics object
- For regression: single MSE (Mean Squared Error) value

Notes

The prediction data must have the same vector dimensions and format as the training data
If using categorical features, all categorical attributes must be present in the prediction data
When uploading files, supported formats are CSV, JSON, and LibSVM
Prediction metrics are only calculated if the true labels are present in the prediction data
Temporary datasets created from file uploads are automatically cleaned up after prediction

Example

POST /automl_predict
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "MLProject",
  "dataset_name": "customer_data", 
  "model_name": "customer_segmentation_v1",
  "prediction_dataset": "new_customers"
}

Response:

{
  "success": true,
  "num_samples": 393,
  "predictions": [0, 1, 2, 1, 0, 2, 1, 0, ...],
  "prediction_metric": {
    "accuracy": 0.89,
    "micro_precision": 0.91,
    "macro_precision": 0.88,
    "micro_recall": 0.89,
    "macro_recall": 0.87,
    "micro_f1": 0.90,
    "macro_f1": 0.875,
    "auc": 0.94
  }
}

4. `/create_project`

Description Creates a new project for a given user.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "application": "string" // e.g., "autoML", "both", or any descriptive text
}

The project_name must be unique and valid.
application is an internal field describing how the project will be used, select from search, autoML, or both. Currently for Restful API, only search functionalities are available.

Response

{
  "success": true
}

Example

POST /create_project
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "application": "search"
}

Response:

{
  "success": true
}

5. `/delete_automl_model`

Description Deletes a specific AutoML model from a dataset. This action cannot be undone.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string", 
  "dataset_name": "string",
  "model_name": "string"
}

Response Success Response (200 OK):

{
  "success": true
}

Example

POST /delete_automl_model
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "dataset_name": "MyDataset",
  "model_name": "old_classification_model"
}

Response:

{
  "success": true
}

6. `/delete_dataset`

Description Deletes an existing dataset from a project. This action cannot be undone.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "dataset_name": "string"
}

Response

{
  "success": true,
  "error_code": "Success"
}

Example

POST /delete_dataset
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "dataset_name": "ObsoleteData"
}

7. `/delete_project`

Description Deletes an existing project and all of its datasets. This action cannot be undone.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string"
}

Response

{
  "success": true,
  "error_code": "Success" // or another code if failure
}

Example

POST /delete_project
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA"
}

8. `/fetch_datasets`

Description Fetches metadata for all datasets within a given project.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string"
}

Response

{
  "success": true,
  "datasets": [
    {
      "name": "...",
      "vector_type": "...",
      "num_vectors": 1000,
      "vector_dim": 128,
      "bytes": 1234567,
      "create_time": "2025/01/02, 15:30",
      ...
    },
    ...
  ]
}

Example

POST /fetch_datasets
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA"
}

9. `/fetch_projects`

Description Returns a list of all project(s) belonging to a user.

Method - POST

Request Body

{
  "user_api_key": "string"
}

Response

{
  "success": true,
  "projects": [
    {
      "project_name": "...",
      "application":  "...",
      ...
    },
    ...
  ]
}

Example

POST /fetch_projects
Content-Type: application/json

{
  "user_api_key": "api_key_123"
}

Response:

{
  "success": true,
  "projects": [
    {
      "project_name": "ProjectA",
      "application": "search"
    },
    {
      "project_name": "ProjectB",
      "application": "autoML"
    }
  ]
}

10. `/get_automl_training_status`

Description Retrieves the current status and progress of an AutoML training job.

Method - POST

Request Body

{
  "user_api_key": "string",
  "job_id": "string"
}

Response For In-Progress Jobs:

{
  "success": true,
  "task_type": "classification",
  "model_name": "model_123",
  "label_attribute": "category",
  "status": "in_progress",
  "start_time": "2025/01/15 14:30:25",
  "duration": "00:05:30"
}

For Completed Jobs:

{
  "success": true,
  "task_type": "classification",
  "model_name": "model_123", 
  "label_attribute": "category",
  "status": "finished",
  "start_time": "2025/01/15 14:30:25",
  "duration": "00:12:45",
  "validation_metric": {
    "accuracy": 0.85,
    "micro_precision": 0.86,
    "macro_precision": 0.84,
    "micro_recall": 0.85,
    "macro_recall": 0.83,
    "micro_f1": 0.855,
    "macro_f1": 0.835,
    "auc": 0.92
  }
}

For Failed Jobs:

{
  "success": true,
  "task_type": "regression",
  "model_name": "model_456",
  "label_attribute": "price", 
  "status": "failed",
  "start_time": "2025/01/15 15:00:10",
  "duration": "00:02:15",
  "error": "Training failed"
}

Notes

Status values: "pending", "in_progress", "finished", "failed"
For classification tasks, validation_metric contains detailed metrics
For regression tasks, validation_metric contains a single MSE float value
Duration is formatted as "HH:MM:SS"

Example

POST /get_automl_training_status
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "job_id": "training_job_789"
}

Response:

{
  "success": true,
  "task_type": "classification",
  "model_name": "customer_segment_model",
  "label_attribute": "segment",
  "status": "finished", 
  "start_time": "2025/01/15 14:30:25",
  "duration": "00:08:42",
  "validation_metric": {
    "accuracy": 0.91,
    "micro_precision": 0.92,
    "macro_precision": 0.90,
    "micro_recall": 0.91,
    "macro_recall": 0.89,
    "micro_f1": 0.915,
    "macro_f1": 0.895,
    "auc": 0.96
  }
}

11. `/get_model_validation_metric`

Description Retrieves validation metrics for a specific trained AutoML model.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "dataset_name": "string", 
  "model_name": "string"
}

Response: For Classification Models:

{
  "success": true,
  "validation_metric": {
    "accuracy": 0.85,
    "micro_precision": 0.86,
    "macro_precision": 0.84,
    "micro_recall": 0.85,
    "macro_recall": 0.83,
    "micro_f1": 0.855,
    "macro_f1": 0.835,
    "auc": 0.92
  }
}

For Regression Models:

{
  "success": true,
  "validation_metric": 0.045  // MSE value
}

Example

POST /get_model_validation_metric
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA", 
  "dataset_name": "MyDataset",
  "model_name": "classification_model_1"
}

Response:

{
  "success": true,
  "validation_metric": {
    "accuracy": 0.89,
    "micro_precision": 0.90,
    "macro_precision": 0.88,
    "micro_recall": 0.89,
    "macro_recall": 0.87,
    "micro_f1": 0.895,
    "macro_f1": 0.875,
    "auc": 0.94
  }
}

12. `/get_upload_data_status`

Description Checks the status of a data-upload job initiated by /upload or /add_data_batch.

Method - POST

Request Body

{
  "user_api_key": "string",
  "job_id": "string"
}

Response

{
  "success": true | false,
  "status": "pending" | "in_progress" | "finished" | "failed",
  "error_message": "..."
}

Example

POST /get_upload_data_status
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "job_id": "my_job_id"
}

Response:

For data insertion job was initialized by /UPLOAD:

{
  "success": true,
  "status": "in_progress",
  "error_message": ""
}

For data insertion job was initialized by /ADD_DATA_BATCH:

If all vectors are added to the data collection successfully:

{
  "success": true,
  "status": "in_progress",
  "error_message": ""
}

Otherwise, (for example, two vectors failed to be inserted),

{
  "success": true,
  "status": "in_progress",
  "error_message": "The following vectors failed to be inserted: [
                         {"body":"{"error_code":"VectorAlreadyExists"}","string_id":"ID_0"}
                         {"body":"{"error_code":"VectorAlreadyExists"}","string_id":"ID_1"}
                         ]" 
}

13. `/init`

Description Initializes (or re-initializes) an empty dataset in the user's project. If the dataset does not exist, it is created. If the dataset already exists and the specified parameters match the existing configuration, it re-initializes that dataset in memory.

Methods - POST

Request Body Parameters

{
  "user_api_key": "string",
  "project_name": "string",
  "collection_name": "string",
  "vector_type": "string",   // "dense" or "sparse" (no other values supported)
  "vector_dim": int          // Required if vector_type != "sparse"; must be > 0
}

- user_api_key: The user's API key.

project_name: The name of the project to which the dataset belongs. Must already exist.
collection_name: The name of the dataset/collection to initialize.
vector_type: "dense", "dense8Bit", "dense4Bit", "dense2Bit" ,"dense1Bit", or "sparse".
vector_dim: Positive integer dimension for "dense", "dense8Bit", "dense4Bit", "dense2Bit", or "dense1Bit". Not used for "sparse".

Response Success Response (200 OK):

{
  "error_code": "Success"
}

Error Responses: 400 status code with different error messages.

Example

POST /init
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "collection_name": "MyDataset",
  "vector_type": "dense",
  "vector_dim": 100
}

Response:

{
  "error_code": "Success"
}

or, code 400 with (an example failure message)

"The dataset already exists, but the input vector dim does not match the existing vector dim."

14. `/list_automl_model_infos`

Description Retrieves metadata for all AutoML models within a specific dataset.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "dataset_name": "string"
}

Response Success Response (200 OK):

{
  "success": true,
  "model_infos": [
    [
      "model_name",
      "task_type",
      "training_mode", 
      "label_attribute",
      "create_time"
    ],
    ...
  ]
}

Example

POST /list_automl_model_infos
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "dataset_name": "MyDataset"
}

Response:

{
  "success": true,
  "model_infos": [
    [
      "classification_model_1",
      "classification",
      "high_speed",
      "category",
      "2025/01/15, 14:30"
    ],
    [
      "regression_model_1", 
      "regression",
      "balanced",
      "price",
      "2025/01/16, 09:15"
    ]
  ]
}

15. `/remove_data`

Description Remove vectors from the dataset and its indices based on a list of string ids.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "collection_name": "string",
  "string_ids": ["id1", "id2"]
}

Response Success Response (200 OK):

{
  "error_code": "Success"
}

Error Responses: 400 status code with detailed error messages.

Example

POST /remove_data
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "collection_name": "MyDataset",
  "string_ids": ["Sample_1", "Sample_2"]
}

16. `/train_automl_model`

Description Initiates training of an AutoML model on a specified dataset. This is an asynchronous operation that returns a job ID for tracking progress.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "dataset_name": "string", 
  "model_name": "string",
  "training_mode": "string",
  "task_type": "string",
  "label_attribute": "string",
  "categorical_features": ["string", ...],  // optional
  "validation_dataset": "string"            // optional
}

Request Fields

model_name: Unique name for the model (max 128 characters, no special characters).
training_mode: Training mode configuration: "high_speed", "balanced", "high_accuracy".
task_type: Type of ML task - either "classification" or "regression".
label_attribute: Name of the attribute/column to use as the target label.
categorical_features: (Optional) Array of attribute names to treat as categorical features.
validation_dataset: (Optional) Name of separate dataset to use for validation. If not set, default cross-validation will be used. If validation_dataset is provided, it must have the same vector dimensions as the training dataset.

Response Success Response (200 OK):

{
  "success": true,
  "job_id": "string"
}

Notes

Model names must be unique within a dataset.
Training is asynchronous - use /get_automl_training_status with the returned job_id to monitor progress.
The training dataset must contain the specified label_attribute. Data samples without label will not be included in the final training set to train the model.

Example

POST /train_automl_model
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "MLProject",
  "dataset_name": "customer_data",
  "model_name": "customer_segmentation_v1",
  "training_mode": "auto",
  "task_type": "classification", 
  "label_attribute": "customer_segment",
  "categorical_features": ["region", "subscription_type"],
  "validation_dataset": "customer_validation_data"
}

Response:

{
  "success": true,
  "job_id": "user123_MLProject_customer_data_customer_segmentation_v1_TrainJob"
}

17. `/upload`

Description Uploads a file containing dataset vectors (CSV, JSON, libsvm, binary format, etc.) and creates a new dataset with those vectors. The upload runs asynchronously; you receive a job_id to query via /get_upload_data_status.

Method - POST

Request Body

{
  "user_api_key": "string",
  "project_name": "string",
  "dataset_name": "string",   // The new dataset's name
  "file_data": "string",   // base64-encoded data (optionally gzip compressed)
  "file_format": "string", // "csv", "json", "libsvm", "binary"
  "has_field_names": true | false,
  "vector_type": "dense" | "dense1Bit" | "dense2Bit" | "dense4Bit" | "dense8Bit" | "sparse",

  // If JSON with field names:
  "vector_data_field_name": "string",
  "vector_id_field_name": "string",
  "vector_attributes": ["attr1", "attr2", ...],

  // If "binary" file_format:
  "binary_dtype": "uint8" | "float32",   // for binary format data, you can set "vector_type" to be dense. The actuall dtype uses "binary_dtype"
  "binary_dim": 123,

  // Optional string ID txt file
  "string_id_txt": "string"     // base64-encoded external string ID .txt file, optional
}

Required:

project_name: The project that the uploaded dataset belongs to.
dataset_name: The name of the data collection in VecML DB.
file_data: The Base64 encoded raw data file.
file_format: "csv", "json", "libsvm" or "binary".
vector_type: the type of the vector. Supported types:
- dense: the standard float32 dense vector. For example, [0, 1.2, 2.4, -10 ,5.7]. Standard embedding vectors from language or vision models can be saved as this type.
- dense8Bit: uint8 dense vectors, with integer vector elements ranging in [0, 255]. For example, [0, 3, 76, 255, 152]. 8-bit quantized embedding vectors can be saved as this type for storage saving.
- dense4Bit: 4-bit quantized dense vectors, with integer vector elements ranging in [0, 15].
- dense2Bit: 2-bit quantized dense vectors, with integer vector elements ranging in [0, 3].
- dense1Bit: 1-bit quantized dense vectors, with binary vector elements.
- sparse: sparse vector formatted as a set of index:value pairs. Please use this for libsvm file format. This is useful for high-dimensional sparse vectors.

Conditional:

For csv and json format data:
- has_field_names (required): whether the data file contains column headers (csv) or field names (json).
- vector_data_field_name (if daat_format == json and has_field_names == true): the json field that contains the vector data.
- vector_attributes (optional, csv and json only): [attr1, attr2, ..., attrN], List of attribute columns/fields associated with the file's vectors.
- vector_id_field_name (optional, csv and json only): Specifies the column/field that should be used as the unique vector IDs of the vectors.

Optional:

string_id_txt: An Bse64 encoded extra white-space or newline separated .txt format file that specifies the UNIQUE string IDs of the vectors. The number of string IDs should equal the number of vectors in the data file. If string_id_txt is provided, it will override all other ID sources and become the exclusive source for unique IDs.

Auto-generation of string ID and column names (for CSV and JSON)

For libsvm and binary data, we will automatically generate 0-based IDs for the vectors, e.g., "0", "1", ... For csv and json, if has_field_names == False or vector_id_field_name is not provided, the 0-based IDs are also auto-generated.

Moreover, for csv files, if has_field_names == False, we still allow the user to specify the ID and attribute columns by specifying the number of column (1-based), such as "vector_data_field_name": "column 1", "vector_attributes": ["column 59", "column 60"]. The column name must strictly follow the format "column XX".

Response

{
  "success": true,
  "job_id": "string",
  "checksum_server": "string",
  "error_message": "none"
}

The checksum_server is computed for the file_data field.

Example

POST /upload
Content-Type: application/json

{
  "user_api_key": "api_key_123",
  "project_name": "ProjectA",
  "dataset_name": "MyDataset",
  "file_data": "BASE64_ENCODED_DATA==",
  "file_format": "csv",
  "has_field_names": true,
  "vector_type": "dense",
  "vector_id_field_name": "unique_ID",
  "vector_attributes": ["date", "region", "category"]
}

Response:

{
  "success": true,
  "job_id": "api_key_123_ProjectA_MyDataset_UploadDataJob",
  "error_message": "none"
}

AutoML API

Authentication (API key):

An Example WorkFlow

1. /add_data

2. /add_data_batch

3. /automl_predict

4. /create_project

5. /delete_automl_model

6. /delete_dataset

7. /delete_project

8. /fetch_datasets

9. /fetch_projects

10. /get_automl_training_status

11. /get_model_validation_metric

12. /get_upload_data_status

13. /init

14. /list_automl_model_infos

15. /remove_data

16. /train_automl_model

17. /upload

1. `/add_data`

2. `/add_data_batch`

3. `/automl_predict`

4. `/create_project`

5. `/delete_automl_model`

6. `/delete_dataset`

7. `/delete_project`

8. `/fetch_datasets`

9. `/fetch_projects`

10. `/get_automl_training_status`

11. `/get_model_validation_metric`

12. `/get_upload_data_status`

13. `/init`

14. `/list_automl_model_infos`

15. `/remove_data`

16. `/train_automl_model`

17. `/upload`