Skip to main content
Upcoming product change: API rate limits effective June 30, 2026 LILT is committed to delivering fast, reliable translation performance to every organization on our platform. To protect that shared experience and make sure no single workload can degrade service for others, we’re adding per-organization rate limits to our document pretranslation and file-translation APIs, beginning June 30, 2026. These endpoints aren’t changing; we’re simply introducing fair-use limits so capacity stays balanced across all our customers.

Overview

Starting June 30, 2026, LILT is introducing per-organization rate limits on the following endpoints:
  • POST /v2/documents/pretranslate
  • POST /v2/translate/file
  • POST /v2/documents/files
Two independent limits apply to every organization:
LimitThreshold
Request rate300 requests per minute
Character throughput2,500,000 characters per minute
Requests that exceed either limit receive an HTTP 429 Too Many Requests response. This guide explains how to read the 429 response headers and restructure your integration to stay comfortably within these thresholds. Monitoring mode until June 30 Until June 30, 2026, limits run in monitoring mode only — no requests will be blocked. Use this window to audit your current usage patterns and adopt batching before enforcement begins.

Understanding the 429 Response

When a request is rate-limited, LILT returns a 429 with three headers that tell you exactly when you can safely retry:
HeaderMeaning
X-RateLimit-LimitYour per-minute allowance (requests or characters, depending on which limit was hit)
X-RateLimit-RemainingRequests or characters still available in the current one-minute window
X-RateLimit-ResetSeconds until the current window resets and your full quota is restored
The two limits are independent. You can hit the character throughput ceiling without exhausting your request count, or vice versa. Check both headers when handling a 429.

Example 429 response

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 300
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 38

{
  "error": "rate_limit_exceeded",
  "message": "Request rate limit reached. Retry after 38 seconds."
}

Handling a 429: Retry with Back-Off

The simplest fix for an occasional 429 is to wait for the window to reset before retrying. Do not tight-loop or immediately re-send — this wastes quota and keeps triggering 429s. On receiving a 429:
  1. Read the X-RateLimit-Reset value from the response headers.
  2. Sleep for that many seconds (plus a small jitter to avoid synchronized retries across parallel workers).
  3. Re-send the original request.

Python example

import time, random, requests

def pretranslate(payload, headers):
    url = "https://api.lilt.com/v2/documents/pretranslate"
    for attempt in range(5):
        resp = requests.post(url, json=payload, headers=headers)
        if resp.status_code == 429:
            reset_in = int(resp.headers.get("X-RateLimit-Reset", 60))
            jitter   = random.uniform(0.5, 2.0)
            wait     = reset_in + jitter
            print(f"Rate limited. Retrying in {wait:.1f}s (attempt {attempt+1}/5)")
            time.sleep(wait)
            continue
        resp.raise_for_status()
        return resp.json()
    raise RuntimeError("Exceeded retry limit")

Node.js example

async function pretranslate(payload, headers) {
  const url = "https://api.lilt.com/v2/documents/pretranslate";
  for (let attempt = 0; attempt < 5; attempt++) {
    const res = await fetch(url, {
      method: "POST",
      headers: { ...headers, 'Content-Type': 'application/json' },
      body: JSON.stringify(payload),
    });
    if (res.status === 429) {
      const resetIn = parseInt(res.headers.get("X-RateLimit-Reset") ?? "60", 10);
      const jitter  = Math.random() * 1.5 + 0.5;
      const wait    = (resetIn + jitter) * 1000;
      console.log(`Rate limited. Retrying in ${(wait/1000).toFixed(1)}s`);
      await new Promise(r => setTimeout(r, wait));
      continue;
    }
    if (!res.ok) throw new Error(`HTTP ${res.status}`);
    return res.json();
  }
  throw new Error("Exceeded retry limit");
}

Batching Requests

If your integration sends many small, individual translation calls in rapid succession, batching — combining multiple documents or files into fewer API calls — is the most effective way to stay well under the rate limits.

What to batch

Each affected endpoint supports multiple documents or files per call:
EndpointBatching approach
POST /v2/documents/pretranslatePass an array of document IDs in a single request body
POST /v2/translate/fileSend multiple source files in one multipart/form-data request
POST /v2/documents/filesUpload a batch of files together rather than one at a time
Character throughput tipBatching reduces your request count, but each batch’s characters still count toward the 2,500,000 character-per-minute throughput limit. If you’re working with very large documents, spread batches across multiple windows rather than sending all characters at once.

Sizing your batches

There is no fixed rule for batch size — it depends on your document sizes and submission cadence. Use these guidelines as a starting point:
  • Keep each batch well under 2,500,000 characters to leave headroom for concurrent jobs from other parts of your organization.
  • If you are consistently close to the X-RateLimit-Remaining ceiling, reduce batch frequency or split large batches into smaller ones with a brief pause between them.
  • For burst workloads (e.g., end-of-sprint file exports), schedule submissions in staggered windows rather than all at once.

Pretranslation: batching document IDs

The POST /v2/documents/pretranslate endpoint accepts an array of document IDs. Instead of issuing one request per document, collect IDs and submit them together:
# ✗  One request per document (inefficient)
for doc_id in document_ids:
    requests.post('/v2/documents/pretranslate', json={'id': [doc_id]}, headers=headers)

# ✓  All documents in a single request (efficient)
requests.post(
    '/v2/documents/pretranslate',
    json={'id': document_ids},   # e.g. [1001, 1002, 1003, ...]
    headers=headers,
)
If you have hundreds of documents to pretranslate, split them into chunks and submit one chunk per window:
import time, math, requests

CHUNK_SIZE = 50   # documents per request
WINDOW_SEC = 62   # slightly more than 60 s to be safe

def batch_pretranslate(doc_ids, headers):
    chunks = [doc_ids[i:i+CHUNK_SIZE] for i in range(0, len(doc_ids), CHUNK_SIZE)]
    for i, chunk in enumerate(chunks):
        print(f'Submitting chunk {i+1}/{len(chunks)} ({len(chunk)} docs)')
        pretranslate({'id': chunk}, headers)   # uses retry helper above
        if i < len(chunks) - 1:
            time.sleep(WINDOW_SEC)

File translation: batching with multipart uploads

The POST /v2/translate/file and POST /v2/documents/files endpoints accept multiple files in a single multipart/form-data request. Bundle related files together to minimize your request count:
import requests

def upload_files(file_paths, memory_id, source_lang, target_lang, headers):
    files = [
        ('file', (path.split('/')[-1], open(path, 'rb'), 'application/octet-stream'))
        for path in file_paths
    ]
    data = {
        'memory_id':   memory_id,
        'source_lang': source_lang,
        'target_lang': target_lang,
    }
    resp = requests.post(
        'https://api.lilt.com/v2/documents/files',
        files=files, data=data, headers=headers,
    )
    resp.raise_for_status()
    return resp.json()

Proactive Throttling

Rather than reacting to 429s, you can read X-RateLimit-Remaining and X-RateLimit-Reset on every successful response and slow down before you hit the ceiling.
def check_and_throttle(response, low_watermark=20):
    """Pause proactively when remaining quota is low."""
    remaining = int(response.headers.get("X-RateLimit-Remaining", 999))
    reset_in  = int(response.headers.get("X-RateLimit-Reset", 0))
    if remaining <= low_watermark and reset_in > 0:
        print(f'Quota low ({remaining} left). Pausing {reset_in}s.')
        time.sleep(reset_in + 1)

Pre-Launch Checklist

Before June 30, verify that your integration:
  • Sends arrays of document IDs to /v2/documents/pretranslate rather than one ID at a time
  • Groups related files into a single multipart/form-data request where possible
  • Handles HTTP 429 by sleeping for X-RateLimit-Reset seconds (plus jitter) before retrying
  • Does not tight-loop on 429 responses
  • Monitors X-RateLimit-Remaining and throttles proactively when quota is low
  • Schedules large burst workloads across multiple one-minute windows

Need More Headroom?

The default thresholds are designed to sit well above typical usage patterns. If your workload genuinely requires higher limits, reply to the rate-limits notification email and the LILT team will work with you to find a configuration that fits your needs without impacting the shared platform.
Contact supportReach out via your rate-limits notification email, or contact LILT support at support.lilt.com. Please include your organization ID and a brief description of your workload volume when you get in touch.