Welcome! This guide is the fastest way to go from “what is dbTwin?” to “I’m generating high-quality synthetic data in seconds.” It covers concepts, endpoints, request/response shapes, limits, examples, and common errors—everything a new user needs.
What is dbTwin?
dbTwin generates privacy-preserving synthetic tabular data that retains the statistical properties and column-to-column relationships of your real data—without exposing sensitive records.
30× faster than deep-learning alternatives
No training & No GPUs: Use our API or generate on cloud in minutes
Two algorithms: Core and Flagship. Use Core for lightweight and quick applications and use Flagship when hyperrealism and high-fidelity is needed. Flagship is ideal for machine-learning and analytics use cases.
Cost: Dev-tier accounts get 30 FREE credits per month. 1 credit ~ 10,000 Core records or 5000 Flagship records
Base URL
https://api.dbtwin.com
Endpoints
GET /health
Returns a simple OK so you can quickly check service liveness.
Use as: https://api.dbtwin.com/health
Response: 200 with JSON {"status":"ok"}.
POST /generate
Create a synthetic version of your uploaded table.
Request
Headers
api-key (required): your dbTwin API key
rows (required): positive integer, requested synthetic row count
algo (optional): core (default) or flagship
Body: multipart/form-data with exactly one file field
Supported file types: .csv or .parquet
Request size guard: rejects overly large payloads with a message that references a 0.5GB limit (see Errors).
Quickstart (Python)
import pandas as pd
import requests
from io import BytesIO
key_ = 'YOUR_API_KEY'
num_rows=10000
file_url = f"https://media.githubusercontent.com/media/dbTwin-Inc/public_demo/refs/heads/main/adult_income.csv"
# Fetch sample input
r = requests.get(file_url)
file_obj = BytesIO(r.content)
file_obj.seek(0)
files = {"file": ("adult_income.csv", file_obj)}
headers = {"rows": num_rows, "algo": "flagship", "api-key": key_}
url = 'https://api.dbtwin.com'
# Check health using get request to /health
print(requests.get(url + "/health")) # should be HTTP 200
# post request to /generate
resp = requests.post(url + "/generate", headers=headers, files=files)
if resp.status_code == 200: # if success, save file locally
with open("synthetic.csv", "wb") as f:
f.write(resp.content)
df_synth = pd.read_csv("synthetic.csv") # load synthetic data
print(df_synth.head())
print(resp.headers['distribution-similarity-error']) # QA metric 1
print(resp.headers['association-similarity']) # QA metric 2
else:
print(resp.json()) # print error if generation is unsuccessful
CURL example
curl -X POST "https://api.dbtwin.com/generate" \
-H "api-key: YOUR_API_KEY" \
-H "rows: 50000" \
-H "algo: flagship" \
-F "file=@/path/to/your_data.csv" \
-o synthetic.csv -D headers.txt
Output is written to synthetic.csv
Response headers (quality metrics) are captured in headers.txt
Behavior & limits
Files more than 0.5GB in size are not allowed. For large files, we highly recommend parquet files because of storage efficiency.
Input data must have > 40 rows and > 3 columns (after parsing), or synthesis fails.
Response
Status: 200 OK on success
File: same format as input
Input CSV files will return an synthetic .csv file (and ditto for .parquet)
Synthetic data QA: dbTwin API returns two quality headers (Distribution-similarity-error, Association-similarity).Distribution-similarity-error quantifies the error between the probability distribution of data in real and synthetic columns. Association-similarity measures the similarity of correlation matrices computed for real and synthetic data.
High fidelity synthetic data is characterized by low Distribution-similarity-error values (typically less than 0.1) and high Association-similarity values typically between 0.9-1.
Credits & billing:
After synthetic data generation, your dbTwin credits are updated. If your dbTwin API key is incorrect or you lack sufficient credits, this will throw an error.
Mixed-type data models
Internally, dbTwin auto-detects common column types and handles them appropriately. Currently, dbTwin supports:
ID columns
String columns with 100% cardinality are treated as identifiers. Synthetic identifiers are sampled independently
Categorical & String : Strings with less than 100% cardinality are encoded as categorical values.
Numeric & Boolean are standard int/float values. True/False columns are handled natively.
Missing values are handled natively. Each synthetic column will preserve the fraction of missing values.
Date-time values: Date-times values in common formats without time-zone values are supported natively. Synthetic columns will contain date-time values in the same format
Request & Response Details
Aspect | Value |
Endpoint | /generate (POST ) |
Auth | dbTwin API key (header) |
Real data | multipart/form-data (CSV, PARQUET) |
Required headers | rows (int > 0) |
Optional headers | algo (core | flagship) |
Output format | CSV/PARQUET depending on input format |
Output QA headers | Distribution-similarity-error, Association-similarity |
Error Codes & How to Fix
HTTP | Message (typical) | What it means / How to fix |
400 | Missing 'rows' header | Add rows to headers; must be a positive integer. |
400 | 'rows' must be an integer | Send numeric text (e.g., "5000"), not words. |
400 | 'rows' must be > 0 | Use a positive value. |
400 | Missing 'file' in multipart form-data | Ensure you POST multipart/form-data with a file field. |
400 | Failed to read dbTwin api-key from supplied header | Include api-key header with a valid key. |
413 | Payload too large (limit 0.5GB) | Reduce upload size; split or compress input (Parquet is smaller). |
415 | Unsupported file type.
| Use .csv or .parquet only. |
422 | Failed to parse input .csv/.parquet: … | Check file integrity/encoding; can your file be read by pandas? |
500 | Synthesis failed: dbTwin API needs more than 40 rows and more than 2 columns of real data | Provide a larger/wider dataset. |
500 | Synthesis failed: … | A downstream generation error occurred; try smaller rows, switch algorithms, or clean data. |
500 | Failed to update credits for your dbTwin API key | Ensure your api-key is active and has sufficient credits; contact support if persistent. |
500 | Failed to serialize output: … | Rare; retry generation or try CSV output if Parquet fails (or vice-versa). |
FAQs
Q: Does dbTwin preserve my schema?
Yes—column names and types are preserved in the output (subject to CSV/Parquet typing conventions).
Q: Can I request more rows than exist in my source file?
Yes. Synthetic size is independent of source size. The real data can be as small as 40 rows of data. Obviously, the larger your real dataset, the more realistic is your synthetic data
Q: How do I measure the quality of my synthetic data?
dbTwin API returns QA headers (Distribution-similarity-error, Association-similarity). Distribution-similarity-error quantifies the error between the probability distribution of data in real and synthetic columns. Lower values are better. This error is measured by “Total Variational Distance” for each column-pair and averaged over columns.
Association-similarity measures the similarity of correlation matrices computed for real and synthetic data. Thus, a high value of 0.98, implies that inter-column relationships between real and synthetic data are very similar.
© 2025 dbTwin, Inc. All rights reserved.