AutoML API
This document describes each endpoint of VecML RESTful API for Automated Maching Learning (AutoML), including:
- Endpoint URL
- Expected Request Body (JSON)
- Response Format (JSON)
- Relevant Details or Constraints
- Example Requests / Responses
All RESTful API calls to VecML cloud database should be sent to: https://aidb.vecml.com/api. All endpoints require a JSON body unless otherwise noted.
Rate and request size limit
For resource allocation and server stability, the server enforces a rate limit on the number of API calls per second. If exceeded, the server responds with:
400 Bad Request
Request limit reached, please try again later
/add_data_batch and avoid calling the API too frequently.
The size of each request cannot pass 200MB.
Managing Unique IDs for vectors
Vectors are identified by UNIQUE string IDs. While VecML provide a way to maintain auto-generated string IDs internally, we highly encourage the users to maintain and specify unique IDs for the vectors, for the convenience of database operations. Multiple ways are available to specify the IDs, please check the data inserting functions for details.
Authentication (API key):
VecML RESTful API requests are authenticated through user's API key. The API key can be generated as follows:
- Go to https://account.vecml.com/user-api-keys , sign up a free VecML account.
- After registration, you can get an API key by clicking "Create New API Key".
- This unique API key will only be seen once at the creation time, so please keep the API key safely. If you need to re-generate an API key, simply delete the previous one and then create a new key.
Getting Started: Project and Dataset Management
To use AutoML APIs, the user first needs to upload/insert a vector collection (dataset) to VecML cloud.
See the vector DB API documentation for Project Management and Dataset Management. Also check Job Management for managing the async job status for data insertion/upload.
An Example WorkFlow
VecML Database provides fast and performant built-in ML tools for your project and bussiness. Below is a standard workflow using VecML RESTful API for AutoML model training:
-
Call
/create_projectto create a new project. -
Create a dataset (or collection, interchangably) within the project and upload data matrix
X. The recommended way is:- Upload a data file (supported types: csv, json, binary format, libsvm) as a dataset using
/upload_automl_X. You can iteratively upload files to a data collection.
Note: When uploading data, the user needs to specify the categorical features that will be included when training the model.
- Upload a data file (supported types: csv, json, binary format, libsvm) as a dataset using
-
Attach the label (response) to the dataset. Use
/attach_automl_labelto upload theYto the dataset. Now (X,Y) form the AutoMl training problem. -
Train an AutoML model for the dataset using
/train_automl_model.Note: Before starting model training, please call
/get_upload_data_statusto confirm thatXandYhave been uploaded successfull. Otherwise, the model training result might be incorrect. -
Make predictions on predict/test dataset
/automl_preidct.
The following is an example python code to demonstrate the API workflow.
import requests
import json
import numpy as np
import time
# Configuration
API_KEY = "replace_this_with_your_api_key"
BASE_URL = "https://aidb.vecml.com/api"
def make_request(endpoint, data):
"""Helper function to make API calls"""
url = f"{BASE_URL}/{endpoint}"
response = requests.post(url, json=data)
print(f"Request to {endpoint}: HTTP {response.status_code}")
if response.text:
try:
json_response = response.json()
print(f"Response: {json_response}")
return response.status_code, json_response
except requests.exceptions.JSONDecodeError:
print(f"Response: {response.text}")
return response.status_code, {"error": "Not JSON", "message": response.text}
else:
print("Response: Empty")
return response.status_code, None
def wait_for_job_completion(job_id, status_endpoint, max_wait_time=60):
"""Wait for an async job to complete"""
start_time = time.time()
while True:
status_data = {"user_api_key": API_KEY, "job_id": job_id}
status, status_response = make_request(status_endpoint, status_data)
if status_response and status_response.get("status") == "finished":
return True
elif status_response and status_response.get("status") == "failed":
return False
if time.time() - start_time > max_wait_time:
return False
time.sleep(2)
def generate_dataset(num_samples, vector_dim, id_prefix, seed=2025):
"""Generate dataset with linear decision boundary"""
np.random.seed(seed)
vectors = np.random.randn(num_samples, vector_dim).tolist()
categories = [np.random.choice(['A', 'B', 'C']) for _ in range(num_samples)]
labels = []
for vec, category in zip(vectors, categories):
# Linear combination of first few components plus category weight
score = sum(vec[:20]) + {'A': 1.0, 'B': -0.5, 'C': 0.0}[category]
label = '1' if score > 0 else '0'
labels.append(label)
# Generate IDs and attributes
ids = [f"{id_prefix}_{i:03d}" for i in range(num_samples)]
attributes = [{"label": str(label), "category": category} for label, category in zip(labels, categories)]
return vectors, ids, attributes
# Clean up any existing project
status, response = make_request("delete_project", {"user_api_key": API_KEY, "project_name": "AutoML-Demo"})
# 1. Create a project
project_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "application": "Machine Learning"}
status, response = make_request("create_project", project_data)
# 2. Initialize training dataset
init_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "collection_name": "training_data",
"vector_type": "dense", "vector_dim": 64}
status, response = make_request("init", init_data)
# 3. Generate and add training data using add_data_batch
vectors, ids, attributes = generate_dataset(num_samples=1000, vector_dim=64, id_prefix="train", seed=2025)
# Add training data in batch
batch_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "collection_name": "training_data",
"string_ids": ids, "data": vectors, "attributes": attributes}
status, response = make_request("add_data_batch", batch_data)
train_upload_job_id = response["job_id"]
# Wait for training data upload to complete
if not wait_for_job_completion(train_upload_job_id, "get_upload_data_status", max_wait_time=30):
exit(1)
# 4. Train AutoML model
train_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "collection_name": "training_data",
"model_name": "model1", "training_mode": "high_speed", "task_type": "classification",
"label_attribute": "label", "train_categorical_features": ["category"]}
status, response = make_request("train_automl_model", train_data)
train_job_id = response["job_id"]
# Wait for training to complete
if not wait_for_job_completion(train_job_id, "get_automl_training_status", max_wait_time=60):
exit(1)
# 5. Initialize prediction dataset
pred_init_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "collection_name": "prediction_data",
"vector_type": "dense", "vector_dim": 64}
status, response = make_request("init", pred_init_data)
# 6. Generate and add prediction data
prediction_vectors, prediction_ids, prediction_attributes = generate_dataset(num_samples=100, vector_dim=64, id_prefix="pred", seed=2026)
# Add prediction data in batch
pred_batch_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "collection_name": "prediction_data",
"string_ids": prediction_ids, "data": prediction_vectors, "attributes": prediction_attributes}
status, response = make_request("add_data_batch", pred_batch_data)
pred_upload_job_id = response["job_id"]
# Wait for prediction data upload to complete
if not wait_for_job_completion(pred_upload_job_id, "get_upload_data_status"):
exit(1)
# 7. Make predictions using the existing dataset
predict_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "collection_name": "training_data",
"model_name": "model1", "prediction_dataset": "prediction_data"}
status, prediction_results = make_request("automl_predict", predict_data)
Upload AutoML Datasets
/upload_automl_X
Description:
Uploads a file containing dataset vectors (CSV, JSON, libsvm, binary format, etc.) and creates a new dataset with those vectors. The upload runs asynchronously; you receive a job_id to query via /get_upload_data_status.
Method - POST
Request Body
{
"user_api_key": "string",
"project_name": "string",
"collection_name": "string", // The new dataset's name
"X": "string", // base64-encoded data (optionally gzip compressed)
"file_format": "string", // "csv", "json", "libsvm", "binary"
"has_field_names": true | false,
"vector_type": "dense" | "dense1Bit" | "dense2Bit" | "dense4Bit" | "dense8Bit" | "sparse",
// If JSON with field names:
"vector_data_field_name": "string",
// If JSON or CSV with field names
"categorical_features": ["attr1", "attr2", ...],
// If "binary" file_format:
"binary_dtype": "uint8" | "float32", // for binary format data, you can set "vector_type" to be dense. The actuall dtype uses "binary_dtype"
"binary_dim": 123,
// Optional compression method
"compression_type": "string", // now only "gzip" is supported
// Optional checksum for data integrity check
"checksum": "string" // this should be the SHA256 hash computed on "file_data" field
}
Required:
project_name: The project that the uploaded dataset belongs to.collection_name: The name of the data collection in VecML DB.X: The Base64 encoded raw data matrix (set of vectors) for AutoML model training.file_format: "csv", "json", "libsvm" or "binary".vector_type: the type of the vector. Supported types:dense:the standard float32 dense vector. For example, [0, 1.2, 2.4, -10 ,5.7]. Standard embedding vectors from language or vision models can be saved as this type.dense8Bit:uint8 dense vectors, with integer vector elements ranging in [0, 255]. For example, [0, 3, 76, 255, 152]. 8-bit quantized embedding vectors can be saved as this type for storage saving.dense4Bit:4-bit quantized dense vectors, with integer vector elements ranging in [0, 15].dense2Bit:2-bit quantized dense vectors, with integer vector elements ranging in [0, 3].dense1Bit:1-bit quantized dense vectors, with binary vector elements.sparse:sparse vector formatted as a set ofindex:valuepairs. Please use this for libsvm file format. This is useful for high-dimensional sparse vectors.
Conditional:
- For csv and json format data:
has_field_names(required): whether the data file contains column headers (csv) or field names (json).vector_data_field_name(ifdaat_format == jsonandhas_field_names == true): the json field that contains the vector data.vector_attributes(optional, csv and json only): [attr1, attr2, ..., attrN], List of attribute columns/fields associated with the file's vectors.
Optional:
Auto-generation of column names (for CSV)
For csv files, if has_field_names == False, we allow the user to specify categorical feature columns by specifying the number of column (1-based), such as "categorical_features": ["column 59", "column 60"]. The column name must strictly follow the format "column XX".
-
compression_type: to further reduce the file size and speed up communication, the user can gzip the file before converting to Base64 format. If gzip is applied, set this argument to "gzip". Currently, only "gzip" is supported. -
checksum: it is the SHA256 hash of thefile_datafield in string format.
Response
{
"success": true,
"job_id": "string",
"error_message": "none",
"checksum_server": "string" // if "checksum" is provided in the API request
}
-
job_id: Use this ID to query the status of the upload job via/get_upload_data_status. -
checksum_server: the checksum computed for the data fileXon the server side for data integrity check, if the user request containschecksumfield.
Example
POST /upload
Content-Type: application/json
{
"user_api_key": "api_key_123",
"project_name": "ProjectA",
"collection_name": "MyDataset",
"X": "BASE64_ENCODED_DATA==",
"file_format": "csv",
"has_field_names": true,
"vector_type": "dense",
"categorical_features": ["date", "region", "category"]
}
{
"success": true,
"job_id": "api_key_123||ProjectA||MyDataset||UploadDataJob",
"error_message": "none"
}
/attach_automl_label
Description:
Attaches label data to an existing dataset by uploading a file containing the label values. The upload runs asynchronously; you receive a job_id to query via /get_upload_data_status.
Method - POST
Request Body
{
"user_api_key": "string",
"project_name": "string",
"collection_name": "string", // The existing dataset's name
"Y": "string", // base64-encoded label data file (optionally gzip compressed)
"label_name": "string", // The name of the label attribute to be added
// Optional compression method
"compression_type": "string" // now only "gzip" is supported
}
Required:
user_api_key: Your VecML API key for authentication.project_name: The project that contains the dataset.collection_name: The name of the existing data collection in VecML DB.Y: The Base64 encoded label data file. The file should be plain text (or csv) file, where each line should contain the string label value, and we will sequentially asign the labels starting from the first vector in the collection.label_name: The name to assign to this label attribute in the dataset.
Optional:
compression_type: To further reduce the file size and speed up communication, the user can gzip the file before converting to Base64 format. If gzip is applied, set this argument to "gzip". Currently, only "gzip" is supported.
Response
{
"success": true,
"job_id": "string",
"error_message": "none",
"num_vectors_parsed": 123
}
job_id: Use this ID to query the status of the upload job via/get_upload_data_status.num_vectors_parsed: The number of ID-label pairs successfully parsed from the uploaded file.
Example
POST /attach_automl_label
Content-Type: application/json
{
"user_api_key": "api_key_123",
"project_name": "ProjectA",
"collection_name": "MyDataset",
"Y": "BASE64_ENCODED_LABEL_DATA==",
"label_name": "sentiment",
"compression_type": "gzip"
}
Response:
{
"success": true,
"job_id": "api_key_123||ProjectA||MyDataset||AddAttributesFromFileJob",
"error_message": "none",
"num_vectors_parsed": 1000
}
AutoML API Endpoints
/train_automl_model
Description: Initiates training of an AutoML model on a specified dataset. This is an asynchronous operation that returns a job ID for tracking progress.
Method - POST
Request Body
{
"user_api_key": "string",
"project_name": "string",
"collection_name": "string",
"model_name": "string",
"training_mode": "string",
"task_type": "string",
"label_attribute": "string",
"train_categorical_features": ["string", ...], // optional
"validation_dataset": "string", // optional
"data_augmentation": "string" // optional
}
Request Fields
model_name: Unique name for the model (max 128 characters, no special characters).training_mode: Training mode configuration:"linear_model","high_speed","balanced","high_accuracy".task_type: Type of ML task - either"classification"or"regression".label_attribute: Name of the attribute/column to use as the target label.train_categorical_features: (Optional) Array of attribute names to treat as categorical features when training the model. The training categorical features must be a subset of the categorical features specified when uploading the dataset.validation_dataset: (Optional) Name of separate dataset to use for validation. If not set, default cross-validation will be used. Ifvalidation_datasetis provided, it must have the same vector dimensions as the training dataset.data_augmentation: (Optional) Whether to enable data augmentation for better performance. Data augmentation takes some extra time but usually gives better performance. Accepts three values:"off","low","high".
Response Success Response (200 OK):
{
"success": true,
"job_id": "string"
}
Notes
- Model names must be unique within a dataset.
- Training is asynchronous - use
/get_automl_training_statuswith the returnedjob_idto monitor progress. - The training dataset must contain the specified
label_attribute. Data samples without label will not be included in the final training set to train the model.
Example
POST /train_automl_model
Content-Type: application/json
{
"user_api_key": "api_key_123",
"project_name": "MLProject",
"collection_name": "customer_data",
"model_name": "customer_segmentation_v1",
"training_mode": "auto",
"task_type": "classification",
"label_attribute": "customer_segment",
"train_categorical_features": ["region", "subscription_type"],
"validation_dataset": "customer_validation_data"
}
Response:
{
"success": true,
"job_id": "user123_MLProject_customer_data_customer_segmentation_v1_TrainJob"
}
/automl_predict
Description: Use a trained AutoML model to make predictions on new data. Can accept data as a file upload or reference an existing dataset.
Method - POST
Request Body
Option 1: Using file upload
{
"user_api_key": "string",
"project_name": "string",
"collection_name": "string",
"model_name": "string",
"file_data": "string", // base64 encoded file
"file_format": "string", // "csv", "json", or "libsvm"
"has_field_names": boolean, // required for CSV files
"compression_type": "string" // optional: "gzip"
}
Option 2: Using existing dataset
{
"user_api_key": "string",
"project_name": "string",
"collection_name": "string",
"model_name": "string",
"prediction_dataset": "string"
}
Request Fields
collection_name: Name of the dataset the model was trained onmodel_name: Name of the trained model to use for predictionfile_data: (Option 1) Base64 encoded file containing prediction datafile_format: (Option 1) Format of uploaded file:"csv","json", or"libsvm"has_field_names: (Option 1, CSV only) Whether the CSV file contains column headerscompression_type: (Option 1, optional) Set to"gzip"if file is gzip compressedprediction_dataset: (Option 2) Name of existing dataset to make predictions on
Response For Classification Models:
{
"success": true,
"num_samples": 1000,
"predictions": [0.0, 1.0, 2.0, ...],
"prediction_metric": {
"accuracy": 0.85,
"micro_precision": 0.86,
"macro_precision": 0.84,
"micro_recall": 0.85,
"macro_recall": 0.83,
"micro_f1": 0.855,
"macro_f1": 0.835,
"auc": 0.92
}
}
For Regression Models:
{
"success": true,
"num_samples": 1000,
"predictions": [12.5, 8.3, 15.7, ...],
"prediction_metric": 0.045 // MSE value
}
Error Response:
{
"success": false,
"error_message": "Error description"
}
Response Fields
num_samples: Number of samples that have a label and are used to compute the prediction matricspredictions: Array of prediction values (class labels for classification, numeric values for regression)prediction_metric: Performance metrics if true labels are available in the data- For classification: detailed metrics object
- For regression: single MSE (Mean Squared Error) value
Notes
- The prediction data must have the same vector dimensions and format as the training data
- When uploading files, supported formats are CSV, JSON, LibSVM, and raw binary (concatenated float32 or int8 vectors)
- Prediction metrics are only calculated if at least one true labels is present in the prediction data
- Temporary datasets created from file uploads are automatically cleaned up after prediction
Example
POST /automl_predict
Content-Type: application/json
{
"user_api_key": "api_key_123",
"project_name": "MLProject",
"collection_name": "customer_data",
"model_name": "customer_segmentation_v1",
"prediction_dataset": "new_customers"
}
Response:
{
"success": true,
"num_samples": 393,
"predictions": [0, 1, 2, 1, 0, 2, 1, 0, ...],
"prediction_metric": {
"accuracy": 0.89,
"micro_precision": 0.91,
"macro_precision": 0.88,
"micro_recall": 0.89,
"macro_recall": 0.87,
"micro_f1": 0.90,
"macro_f1": 0.875,
"auc": 0.94
}
}
/get_feature_importance
Description: Retrieves the feature importance scores for a trained AutoML model. Returns the top features ranked by their contribution to the model's predictions.
Method - POST
Request Body
{
"user_api_key": "string",
"project_name": "string",
"collection_name": "string",
"model_name": "string"
}
Required Fields:
user_api_key: Your VecML API key for authenticationproject_name: The project containing the modelcollection_name: The dataset name used to train the modelmodel_name: The name of the trained model
Response Success Response (200 OK):
{
"success": true,
"feature_importance": [
{
"feature": "string",
"importance": 0.0
},
...
]
}
Response Fields:
success: Boolean indicating if the request was successfulfeature_importance: Array of feature objects, sorted by importance (descending order)feature: Name of the feature (column name from the dataset)importance: Numerical importance score (higher values indicate more important features)
Notes:
- Returns the top 50 most important features by default
- Feature names correspond to the column names in the original training dataset
Example
POST /get_feature_importance
Content-Type: application/json
{
"user_api_key": "api_key_123",
"project_name": "ProjectA",
"collection_name": "MyDataset",
"model_name": "classification_model_1"
}
Response:
{
"success": true,
"feature_importance": [
{
"feature": "income",
"importance": 0.2543
},
{
"feature": "age",
"importance": 0.1876
},
{
"feature": "credit_score",
"importance": 0.1432
},
{
"feature": "employment_length",
"importance": 0.0987
},
{
"feature": "debt_ratio",
"importance": 0.0654
}
]
}
/delete_automl_model
Description: Deletes a specific AutoML model from a dataset. This action cannot be undone.
Method - POST
Request Body
{
"user_api_key": "string",
"project_name": "string",
"collection_name": "string",
"model_name": "string"
}
Response Success Response (200 OK):
{
"success": true
}
Example
POST /delete_automl_model
Content-Type: application/json
{
"user_api_key": "api_key_123",
"project_name": "ProjectA",
"collection_name": "MyDataset",
"model_name": "old_classification_model"
}
Response:
{
"success": true
}
/get_automl_training_status
Description: Retrieves the current status and progress of an AutoML training job.
Method - POST
Request Body
{
"user_api_key": "string",
"job_id": "string"
}
Response For In-Progress Jobs:
{
"success": true,
"task_type": "classification",
"model_name": "model_123",
"label_attribute": "category",
"status": "in_progress",
"start_time": "2025/01/15 14:30:25",
"duration": "00:05:30"
}
For Completed Jobs:
{
"success": true,
"task_type": "classification",
"model_name": "model_123",
"label_attribute": "category",
"status": "finished",
"start_time": "2025/01/15 14:30:25",
"duration": "00:12:45",
"validation_metric": {
"accuracy": 0.85,
"micro_precision": 0.86,
"macro_precision": 0.84,
"micro_recall": 0.85,
"macro_recall": 0.83,
"micro_f1": 0.855,
"macro_f1": 0.835,
"auc": 0.92
}
}
For Failed Jobs:
{
"success": true,
"task_type": "regression",
"model_name": "model_456",
"label_attribute": "price",
"status": "failed",
"start_time": "2025/01/15 15:00:10",
"duration": "00:02:15",
"error": "Training failed"
}
Notes
-
Status values:
"pending","in_progress","finished","failed" -
For classification tasks,
validation_metriccontains detailed metrics -
For regression tasks,
validation_metriccontains a single MSE float value -
Duration is formatted as
"HH:MM:SS"
Example
POST /get_automl_training_status
Content-Type: application/json
{
"user_api_key": "api_key_123",
"job_id": "training_job_789"
}
Response:
{
"success": true,
"task_type": "classification",
"model_name": "customer_segment_model",
"label_attribute": "segment",
"status": "finished",
"start_time": "2025/01/15 14:30:25",
"duration": "00:08:42",
"validation_metric": {
"accuracy": 0.91,
"micro_precision": 0.92,
"macro_precision": 0.90,
"micro_recall": 0.91,
"macro_recall": 0.89,
"micro_f1": 0.915,
"macro_f1": 0.895,
"auc": 0.96
}
}
/get_model_validation_metric
Description: Retrieves validation metrics for a specific trained AutoML model.
Method - POST
Request Body
{
"user_api_key": "string",
"project_name": "string",
"collection_name": "string",
"model_name": "string"
}
Response: For Classification Models:
{
"success": true,
"validation_metric": {
"accuracy": 0.85,
"micro_precision": 0.86,
"macro_precision": 0.84,
"micro_recall": 0.85,
"macro_recall": 0.83,
"micro_f1": 0.855,
"macro_f1": 0.835,
"auc": 0.92
}
}
For Regression Models:
{
"success": true,
"validation_metric": 0.045 // MSE value
}
Example
POST /get_model_validation_metric
Content-Type: application/json
{
"user_api_key": "api_key_123",
"project_name": "ProjectA",
"collection_name": "MyDataset",
"model_name": "classification_model_1"
}
Response:
{
"success": true,
"validation_metric": {
"accuracy": 0.89,
"micro_precision": 0.90,
"macro_precision": 0.88,
"micro_recall": 0.89,
"macro_recall": 0.87,
"micro_f1": 0.895,
"macro_f1": 0.875,
"auc": 0.94
}
}
/list_automl_model_infos
Description: Retrieves metadata for all AutoML models within a specific dataset.
Method - POST
Request Body
{
"user_api_key": "string",
"project_name": "string",
"collection_name": "string"
}
Response Success Response (200 OK):
{
"success": true,
"model_infos": [
{
"model_name": "string",
"task_type": "string",
"training_mode": "string",
"label_attribute": "string",
"data_augmentation": bool,
"validation_dataset": "string",
"create_time": "string"
},
...
]
}
Response Fields:
- model_name: Name of the trained model
- task_type: Type of machine learning task ("classification" or "regression")
- training_mode: Training mode used ("high_speed", "balanced", or "high_accuracy")
- label_attribute: Name of the label/target attribute used for training
- data_augmentation: Whether data augmentation is used
- validation_dataset: Name of the validation dataset used ("cross_validation" if auto-split was used)
- create_time: Timestamp when the model was created
Example
POST /list_automl_model_infos
Content-Type: application/json
{
"user_api_key": "api_key_123",
"project_name": "ProjectA",
"collection_name": "MyDataset"
}
Response:
{
"success": true,
"model_infos": [
{
"model_name": "classification_model_1",
"task_type": "classification",
"training_mode": "high_speed",
"label_attribute": "category",
"data_augmentation": "true",
"validation_dataset": "",
"create_time": "2025/01/15, 14:30"
},
{
"model_name": "regression_model_1",
"task_type": "regression",
"training_mode": "balanced",
"label_attribute": "price",
"data_augmentation": "false",
"validation_dataset": "validation_set",
"create_time": "2025/01/16, 09:15"
}
]
}