Upcoming product change: API rate limits effective June 30, 2026
LILT is committed to delivering fast, reliable translation performance to every organization on our platform. To protect that shared experience and make sure no single workload can degrade service for others, we’re adding per-organization rate limits to our document pretranslation and file-translation APIs, beginning June 30, 2026. These endpoints aren’t changing; we’re simply introducing fair-use limits so capacity stays balanced across all our customers.
Overview
Starting June 30, 2026, LILT is introducing per-organization rate limits on the following endpoints:
POST /v2/documents/pretranslate
POST /v2/translate/file
POST /v2/documents/files
Two independent limits apply to every organization:
| Limit | Threshold |
|---|
| Request rate | 300 requests per minute |
| Character throughput | 2,500,000 characters per minute |
Requests that exceed either limit receive an HTTP 429 Too Many Requests response. This guide explains how to read the 429 response headers and restructure your integration to stay comfortably within these thresholds.
Monitoring mode until June 30
Until June 30, 2026, limits run in monitoring mode only — no requests will be blocked. Use this window to audit your current usage patterns and adopt batching before enforcement begins.
Understanding the 429 Response
When a request is rate-limited, LILT returns a 429 with three headers that tell you exactly when you can safely retry:
| Header | Meaning |
|---|
X-RateLimit-Limit | Your per-minute allowance (requests or characters, depending on which limit was hit) |
X-RateLimit-Remaining | Requests or characters still available in the current one-minute window |
X-RateLimit-Reset | Seconds until the current window resets and your full quota is restored |
The two limits are independent. You can hit the character throughput ceiling without exhausting your request count, or vice versa. Check both headers when handling a 429.
Example 429 response
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 300
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 38
{
"error": "rate_limit_exceeded",
"message": "Request rate limit reached. Retry after 38 seconds."
}
Handling a 429: Retry with Back-Off
The simplest fix for an occasional 429 is to wait for the window to reset before retrying. Do not tight-loop or immediately re-send — this wastes quota and keeps triggering 429s.
Recommended retry pattern
On receiving a 429:
- Read the
X-RateLimit-Reset value from the response headers.
- Sleep for that many seconds (plus a small jitter to avoid synchronized retries across parallel workers).
- Re-send the original request.
Python example
import time, random, requests
def pretranslate(payload, headers):
url = "https://api.lilt.com/v2/documents/pretranslate"
for attempt in range(5):
resp = requests.post(url, json=payload, headers=headers)
if resp.status_code == 429:
reset_in = int(resp.headers.get("X-RateLimit-Reset", 60))
jitter = random.uniform(0.5, 2.0)
wait = reset_in + jitter
print(f"Rate limited. Retrying in {wait:.1f}s (attempt {attempt+1}/5)")
time.sleep(wait)
continue
resp.raise_for_status()
return resp.json()
raise RuntimeError("Exceeded retry limit")
Node.js example
async function pretranslate(payload, headers) {
const url = "https://api.lilt.com/v2/documents/pretranslate";
for (let attempt = 0; attempt < 5; attempt++) {
const res = await fetch(url, {
method: "POST",
headers: { ...headers, 'Content-Type': 'application/json' },
body: JSON.stringify(payload),
});
if (res.status === 429) {
const resetIn = parseInt(res.headers.get("X-RateLimit-Reset") ?? "60", 10);
const jitter = Math.random() * 1.5 + 0.5;
const wait = (resetIn + jitter) * 1000;
console.log(`Rate limited. Retrying in ${(wait/1000).toFixed(1)}s`);
await new Promise(r => setTimeout(r, wait));
continue;
}
if (!res.ok) throw new Error(`HTTP ${res.status}`);
return res.json();
}
throw new Error("Exceeded retry limit");
}
Batching Requests
If your integration sends many small, individual translation calls in rapid succession, batching — combining multiple documents or files into fewer API calls — is the most effective way to stay well under the rate limits.
What to batch
Each affected endpoint supports multiple documents or files per call:
| Endpoint | Batching approach |
|---|
POST /v2/documents/pretranslate | Pass an array of document IDs in a single request body |
POST /v2/translate/file | Send multiple source files in one multipart/form-data request |
POST /v2/documents/files | Upload a batch of files together rather than one at a time |
Character throughput tipBatching reduces your request count, but each batch’s characters still count toward the 2,500,000 character-per-minute throughput limit. If you’re working with very large documents, spread batches across multiple windows rather than sending all characters at once.
Sizing your batches
There is no fixed rule for batch size — it depends on your document sizes and submission cadence. Use these guidelines as a starting point:
- Keep each batch well under 2,500,000 characters to leave headroom for concurrent jobs from other parts of your organization.
- If you are consistently close to the
X-RateLimit-Remaining ceiling, reduce batch frequency or split large batches into smaller ones with a brief pause between them.
- For burst workloads (e.g., end-of-sprint file exports), schedule submissions in staggered windows rather than all at once.
Pretranslation: batching document IDs
The POST /v2/documents/pretranslate endpoint accepts an array of document IDs. Instead of issuing one request per document, collect IDs and submit them together:
# ✗ One request per document (inefficient)
for doc_id in document_ids:
requests.post('/v2/documents/pretranslate', json={'id': [doc_id]}, headers=headers)
# ✓ All documents in a single request (efficient)
requests.post(
'/v2/documents/pretranslate',
json={'id': document_ids}, # e.g. [1001, 1002, 1003, ...]
headers=headers,
)
If you have hundreds of documents to pretranslate, split them into chunks and submit one chunk per window:
import time, math, requests
CHUNK_SIZE = 50 # documents per request
WINDOW_SEC = 62 # slightly more than 60 s to be safe
def batch_pretranslate(doc_ids, headers):
chunks = [doc_ids[i:i+CHUNK_SIZE] for i in range(0, len(doc_ids), CHUNK_SIZE)]
for i, chunk in enumerate(chunks):
print(f'Submitting chunk {i+1}/{len(chunks)} ({len(chunk)} docs)')
pretranslate({'id': chunk}, headers) # uses retry helper above
if i < len(chunks) - 1:
time.sleep(WINDOW_SEC)
File translation: batching with multipart uploads
The POST /v2/translate/file and POST /v2/documents/files endpoints accept multiple files in a single multipart/form-data request. Bundle related files together to minimize your request count:
import requests
def upload_files(file_paths, memory_id, source_lang, target_lang, headers):
files = [
('file', (path.split('/')[-1], open(path, 'rb'), 'application/octet-stream'))
for path in file_paths
]
data = {
'memory_id': memory_id,
'source_lang': source_lang,
'target_lang': target_lang,
}
resp = requests.post(
'https://api.lilt.com/v2/documents/files',
files=files, data=data, headers=headers,
)
resp.raise_for_status()
return resp.json()
Proactive Throttling
Rather than reacting to 429s, you can read X-RateLimit-Remaining and X-RateLimit-Reset on every successful response and slow down before you hit the ceiling.
def check_and_throttle(response, low_watermark=20):
"""Pause proactively when remaining quota is low."""
remaining = int(response.headers.get("X-RateLimit-Remaining", 999))
reset_in = int(response.headers.get("X-RateLimit-Reset", 0))
if remaining <= low_watermark and reset_in > 0:
print(f'Quota low ({remaining} left). Pausing {reset_in}s.')
time.sleep(reset_in + 1)
Pre-Launch Checklist
Before June 30, verify that your integration:
- Sends arrays of document IDs to
/v2/documents/pretranslate rather than one ID at a time
- Groups related files into a single
multipart/form-data request where possible
- Handles HTTP 429 by sleeping for
X-RateLimit-Reset seconds (plus jitter) before retrying
- Does not tight-loop on 429 responses
- Monitors
X-RateLimit-Remaining and throttles proactively when quota is low
- Schedules large burst workloads across multiple one-minute windows
Need More Headroom?
The default thresholds are designed to sit well above typical usage patterns. If your workload genuinely requires higher limits, reply to the rate-limits notification email and the LILT team will work with you to find a configuration that fits your needs without impacting the shared platform.
Contact supportReach out via your rate-limits notification email, or contact LILT support at support.lilt.com. Please include your organization ID and a brief description of your workload volume when you get in touch.