dbTwin API — Quickstart & Reference


Welcome! This guide is the fastest way to go from “what is dbTwin?” to “I’m generating high-quality synthetic data in seconds.” It covers concepts, endpoints, request/response shapes, limits, examples, and common errors—everything a new user needs.


What is dbTwin?


dbTwin generates privacy-preserving synthetic tabular data that retains the statistical properties and column-to-column relationships of your real data—without exposing sensitive records.
  • 30× faster than deep-learning alternatives 
  • No training & No GPUs: Use our API or generate on cloud in minutes 
  • Two algorithms: Core and Flagship.  Use Core for lightweight and quick applications and use Flagship when hyperrealism and high-fidelity is needed. Flagship is ideal for machine-learning and analytics use cases.  
  • Cost: Dev-tier accounts get 30 FREE credits per month. 1 credit ~ 10,000 Core records or 5000 Flagship records  

Base URL


https://api.dbtwin.com 


Endpoints
 


GET /health 

Returns a simple OK so you can quickly check service liveness. 

Use as: https://api.dbtwin.com/health 
  • Response: 200 with JSON {"status":"ok"}.  
POST /generate 

Create a synthetic version of your uploaded table. 


Request
 
  • Headers 
    • api-key (required): your dbTwin API key 
  • rows (required): positive integer, requested synthetic row count 
  • algo (optional): core (default) or flagship 
  • Body: multipart/form-data with exactly one file field 
    • Supported file types: .csv or .parquet 
    • Request size guard: rejects overly large payloads with a message that references a 0.5GB limit (see Errors).  

Quickstart (Python)
				
					import pandas as pd 
import requests 
from io import BytesIO 
key_ = 'YOUR_API_KEY' 
num_rows=10000 
file_url = f"https://media.githubusercontent.com/media/dbTwin-Inc/public_demo/refs/heads/main/adult_income.csv" 
# Fetch sample input 
r = requests.get(file_url) 
file_obj = BytesIO(r.content) 
file_obj.seek(0) 
files = {"file": ("adult_income.csv", file_obj)} 
headers = {"rows": num_rows, "algo": "flagship", "api-key": key_} 
url = 'https://api.dbtwin.com' 
# Check health using get request to /health 
print(requests.get(url + "/health"))  # should be HTTP 200 
# post request to /generate 
resp = requests.post(url + "/generate", headers=headers, files=files) 
if resp.status_code == 200: # if success, save file locally 
    with open("synthetic.csv", "wb") as f: 
        f.write(resp.content) 
    df_synth = pd.read_csv("synthetic.csv")  # load synthetic data 
    print(df_synth.head()) 
    print(resp.headers['distribution-similarity-error'])  # QA metric 1 
    print(resp.headers['association-similarity'])  # QA metric 2 
else: 
    print(resp.json())  # print error if generation is unsuccessful 
				
			

CURL example 

curl -X POST "https://api.dbtwin.com/generate" \ 

  -H "api-key: YOUR_API_KEY" \ 

  -H "rows: 50000" \ 

  -H "algo: flagship" \ 

  -F "file=@/path/to/your_data.csv" \ 

  -o synthetic.csv -D headers.txt 
  • Output is written to synthetic.csv 
  • Response headers (quality metrics) are captured in headers.txt 

Behavior & limits
 
  • Files more than 0.5GB in size are not allowed. For large files, we highly recommend parquet files because of storage efficiency. 
  • Input data must have > 40 rows and > 3 columns (after parsing), or synthesis fails.  

Response
 
  • Status: 200 OK on success 
  • File: same format as input 
  • Input CSV files will return an synthetic .csv file (and ditto for .parquet) 
  • Synthetic data QA: dbTwin API returns two quality headers (Distribution-similarity-error, Association-similarity).Distribution-similarity-error quantifies the error between the probability distribution of data in real and synthetic columns. Association-similarity measures the similarity of correlation matrices computed for real and synthetic data.  
High fidelity synthetic data is characterized by low Distribution-similarity-error values (typically less than 0.1) and high Association-similarity values typically between 0.9-1. 


Credits & billing:

After synthetic data generation, your dbTwin credits are updated. If your dbTwin API key is incorrect or you lack sufficient credits, this will throw an error. 


Mixed-type data models
 


Internally, dbTwin auto-detects common column types and handles them appropriately. Currently, dbTwin supports:
  1. ID columns 
    • String columns with 100% cardinality are treated as identifiers. Synthetic identifiers are sampled independently  
  2. Categorical & String : Strings with less than 100% cardinality are encoded as categorical values.
  3. Numeric & Boolean are standard int/float values. True/False columns are handled natively. 
  4. Missing values are handled natively. Each synthetic column will preserve the fraction of missing values. 
  5. Date-time values: Date-times values in common formats without time-zone values are supported natively. Synthetic columns will contain date-time values in the same format 

Request & Response Details 
Aspect
Value
Endpoint
/generate (POST )
Auth
dbTwin API key (header)
Real data
multipart/form-data (CSV, PARQUET)
Required headers
rows (int > 0)
Optional headers
algo (core | flagship)
Output format
CSV/PARQUET depending on input format
Output QA headers
Distribution-similarity-error, Association-similarity

Error Codes & How to Fix
HTTP
Message (typical)
What it means / How to fix
400
Missing 'rows' header
Add rows to headers; must be a positive integer.
400
'rows' must be an integer
Send numeric text (e.g., "5000"), not words.
400
'rows' must be > 0
Use a positive value.
400
Missing 'file' in multipart form-data
Ensure you POST multipart/form-data with a file field.
400
Failed to read dbTwin api-key from supplied header
Include api-key header with a valid key.
413
Payload too large (limit 0.5GB)
Reduce upload size; split or compress input (Parquet is smaller).
415
Unsupported file type. 
Use .csv or .parquet only.
422
Failed to parse input .csv/.parquet: …
Check file integrity/encoding; can your file be read by pandas?
500
Synthesis failed: dbTwin API needs more than 40 rows and more than 2 columns of real data
Provide a larger/wider dataset.
500
Synthesis failed: …
A downstream generation error occurred; try smaller rows, switch algorithms, or clean data.
500
Failed to update credits for your dbTwin API key
Ensure your api-key is active and has sufficient credits; contact support if persistent.
500
Failed to serialize output: …
Rare; retry generation or try CSV output if Parquet fails (or vice-versa).

FAQs
 


Q: Does dbTwin preserve my schema? 
Yes—column names and types are preserved in the output (subject to CSV/Parquet typing conventions). 

Q: Can I request more rows than exist in my source file? 
Yes. Synthetic size is independent of source size. The real data can be as small as 40 rows of data. Obviously, the larger your real dataset, the more realistic is your synthetic data 

Q: How do I measure the quality of my synthetic data?  
dbTwin API returns QA headers (Distribution-similarity-error, Association-similarity). Distribution-similarity-error quantifies the error between the probability distribution of data in real and synthetic columns. Lower values are better. This error is measured by “Total Variational Distance” for each column-pair and averaged over columns. 

Association-similarity measures the similarity of correlation matrices computed for real and synthetic data. Thus, a high value of 0.98, implies that inter-column relationships between real and synthetic data are very similar.

Engineered for any use case and blazing fast speed.

© 2025 dbTwin, Inc. All rights reserved.