Configuring the Internal OCR Provider

Options

Value	Engine	Description
`tesseract`	Tesseract	Traditional OCR using the Tesseract engine bundled in the service container. Requires a local `tessdata` directory.
`neural`	Neural Multimodal OCR	Uses the neural multimodal OCR service for higher-accuracy extraction, particularly on complex layouts and non-Latin scripts. Requires the neural OCR service to be deployed.

Configuration

Helm values

Set the flag in your install_dir/lilt/environments/lilt/values.yaml file under the file-job service configuration:

      internalOcrProvider: NEURAL   # or TESSERACT

Behavior by provider

Tesseract (default)

Uses the Tesseract engine bundled in the service container.

Requires the tessdata directory to be present (defaults to /usr/share/tesseract-ocr/4.00/tessdata/).

OCR is performed synchronously during file processing.

Neural

Delegates OCR to the neural multimodal OCR service, which must be deployed separately.

OCR is performed asynchronously — file processing completes once the neural service returns results.

When to use each provider

Tesseract is appropriate for environments that do not have the neural OCR service deployed, or where self-contained OCR processing is preferred.

Neural is recommended when higher OCR accuracy is needed — particularly for complex layouts or non-Latin scripts — and when the neural OCR service is deployed in the environment.

The neural OCR service relies on a GPU-accelerated LLM. Ensure your environment has sufficient GPU resources available before enabling the neural option. In environments without dedicated GPU capacity, use tesseract instead.

​Overview

​Options

​Configuration

​Helm values

​Behavior by provider

​When to use each provider

Overview

Options

Configuration

Helm values

Behavior by provider

When to use each provider