Developers – dbTwin

dbTwin API — Quickstart & Reference


Welcome! This guide is the fastest way to go from “what is dbTwin?” to “I’m generating high-quality synthetic data in seconds.” It covers concepts, endpoints, request/response shapes, limits, examples, and common errors—everything a new user needs. 


What is dbTwin?

dbTwin generates privacy-preserving synthetic tabular data that retains the statistical properties and column-to-column relationships of your real data—without exposing sensitive records.

30× faster than deep-learning alternatives

No training & No GPUs: Use our API or generate on cloud in minutes

Two algorithms: Core and Flagship.  Use Core for lightweight and quick applications and use Flagship when hyperrealism and high-fidelity is needed. Flagship is ideal for machine-learning and analytics use cases.

Cost: Dev-tier accounts get 30 FREE credits per month. 1 credit ~ 10,000 Core records or 5000 Flagship records


Base URL

https://api.dbtwin.com 


Endpoints 

GET /health 

Returns a simple OK so you can quickly check service liveness. 

Use as: https://api.dbtwin.com/health

Response: 200 with JSON {"status":"ok"}.

POST /generate 

Create a synthetic version of your uploaded table. 


Request

Headers

api-key (required): your dbTwin API key

rows (required): positive integer, requested synthetic row count

algo (optional): core (default) or flagship

Body: multipart/form-data with exactly one file field

Supported file types: .csv or .parquet

Request size guard: rejects overly large payloads with a message that references a 0.5GB limit (see Errors).


Quickstart (Python)

				
					import pandas as pd 
import requests 
from io import BytesIO 
key_ = 'YOUR_API_KEY' 
num_rows=10000 
file_url = f"https://media.githubusercontent.com/media/dbTwin-Inc/public_demo/refs/heads/main/adult_income.csv" 
# Fetch sample input 
r = requests.get(file_url) 
file_obj = BytesIO(r.content) 
file_obj.seek(0) 
files = {"file": ("adult_income.csv", file_obj)} 
headers = {"rows": num_rows, "algo": "flagship", "api-key": key_} 
url = 'https://api.dbtwin.com' 
# Check health using get request to /health 
print(requests.get(url + "/health"))  # should be HTTP 200 
# post request to /generate 
resp = requests.post(url + "/generate", headers=headers, files=files) 
if resp.status_code == 200: # if success, save file locally 
    with open("synthetic.csv", "wb") as f: 
        f.write(resp.content) 
    df_synth = pd.read_csv("synthetic.csv")  # load synthetic data 
    print(df_synth.head()) 
    print(resp.headers['distribution-similarity-error'])  # QA metric 1 
    print(resp.headers['association-similarity'])  # QA metric 2 
else: 
    print(resp.json())  # print error if generation is unsuccessful


CURL example 

curl -X POST "https://api.dbtwin.com/generate" \ 

  -H "api-key: YOUR_API_KEY" \ 

  -H "rows: 50000" \ 

  -H "algo: flagship" \ 

  -F "file=@/path/to/your_data.csv" \ 

  -o synthetic.csv -D headers.txt

```
Output is written to synthetic.csv 
```

Response headers (quality metrics) are captured in headers.txt


Behavior & limits

Files more than 0.5GB in size are not allowed. For large files, we highly recommend parquet files because of storage efficiency.

Input data must have > 40 rows and > 3 columns (after parsing), or synthesis fails.


Response

```
Status: 200 OK on success 
```
```
File: same format as input 
```

Input CSV files will return an synthetic .csv file (and ditto for .parquet)

Synthetic data QA: dbTwin API returns two quality headers (Distribution-similarity-error, Association-similarity).Distribution-similarity-error quantifies the error between the probability distribution of data in real and synthetic columns. Association-similarity measures the similarity of correlation matrices computed for real and synthetic data.

High fidelity synthetic data is characterized by low Distribution-similarity-error values (typically less than 0.1) and high Association-similarity values typically between 0.9-1. 


Credits & billing:

After synthetic data generation, your dbTwin credits are updated. If your dbTwin API key is incorrect or you lack sufficient credits, this will throw an error.  


Mixed-type data models 

Internally, dbTwin auto-detects common column types and handles them appropriately. Currently, dbTwin supports:

ID columns

String columns with 100% cardinality are treated as identifiers. Synthetic identifiers are sampled independently

Categorical & String : Strings with less than 100% cardinality are encoded as categorical values.

Numeric & Boolean are standard int/float values. True/False columns are handled natively.

Missing values are handled natively. Each synthetic column will preserve the fraction of missing values.

Date-time values: Date-times values in common formats without time-zone values are supported natively. Synthetic columns will contain date-time values in the same format


Request & Response Details

Aspect	Value
Endpoint	/generate (POST )
Auth	dbTwin API key (header)
Real data	multipart/form-data (CSV, PARQUET)
Required headers	rows (int > 0)
Optional headers	algo (core \| flagship)
Output format	CSV/PARQUET depending on input format
Output QA headers	Distribution-similarity-error, Association-similarity


Error Codes & How to Fix

HTTP	Message (typical)	What it means / How to fix
400	Missing 'rows' header	Add rows to headers; must be a positive integer.
400	'rows' must be an integer	Send numeric text (e.g., "5000"), not words.
400	'rows' must be > 0	Use a positive value.
400	Missing 'file' in multipart form-data	Ensure you POST multipart/form-data with a file field.
400	Failed to read dbTwin api-key from supplied header	Include api-key header with a valid key.
413	Payload too large (limit 0.5GB)	Reduce upload size; split or compress input (Parquet is smaller).
415	Unsupported file type.	Use .csv or .parquet only.
422	Failed to parse input .csv/.parquet: …	Check file integrity/encoding; can your file be read by pandas?
500	Synthesis failed: dbTwin API needs more than 40 rows and more than 2 columns of real data	Provide a larger/wider dataset.
500	Synthesis failed: …	A downstream generation error occurred; try smaller rows, switch algorithms, or clean data.
500	Failed to update credits for your dbTwin API key	Ensure your api-key is active and has sufficient credits; contact support if persistent.
500	Failed to serialize output: …	Rare; retry generation or try CSV output if Parquet fails (or vice-versa).


FAQs 

Q: Does dbTwin preserve my schema? 
Yes—column names and types are preserved in the output (subject to CSV/Parquet typing conventions). 

Q: Can I request more rows than exist in my source file? 
Yes. Synthetic size is independent of source size. The real data can be as small as 40 rows of data. Obviously, the larger your real dataset, the more realistic is your synthetic data  

Q: How do I measure the quality of my synthetic data?  
dbTwin API returns QA headers (Distribution-similarity-error, Association-similarity). Distribution-similarity-error quantifies the error between the probability distribution of data in real and synthetic columns. Lower values are better. This error is measured by “Total Variational Distance” for each column-pair and averaged over columns.  

Association-similarity measures the similarity of correlation matrices computed for real and synthetic data. Thus, a high value of 0.98, implies that inter-column relationships between real and synthetic data are very similar.

dbTwin API — Quickstart & Reference

Quick Links