> ## Documentation Index
> Fetch the complete documentation index at: https://support.lilt.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Configuring the Internal OCR Provider

## Overview

The `internalOcrProvider` flag controls which OCR engine is used when processing image and scanned PDF files. It determines whether OCR is handled by the bundled Tesseract engine or by the neural multimodal OCR service.

The default value is `tesseract`.

## Options

| Value       | Engine                | Description                                                                                                                                                                   |
| ----------- | --------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `tesseract` | Tesseract             | Traditional OCR using the Tesseract engine bundled in the service container. Requires a local `tessdata` directory.                                                           |
| `neural`    | Neural Multimodal OCR | Uses the neural multimodal OCR service for higher-accuracy extraction, particularly on complex layouts and non-Latin scripts. Requires the neural OCR service to be deployed. |

## Configuration

### Helm values

Set the flag in your `install_dir/lilt/environments/lilt/values.yaml` file under the file-job service configuration:

```yaml theme={null}
      internalOcrProvider: NEURAL   # or TESSERACT
```

### Behavior by provider

**Tesseract (default)**

* Uses the Tesseract engine bundled in the service container.
* Requires the `tessdata` directory to be present (defaults to `/usr/share/tesseract-ocr/4.00/tessdata/`).
* OCR is performed synchronously during file processing.

**Neural**

* Delegates OCR to the neural multimodal OCR service, which must be deployed separately.
* OCR is performed asynchronously — file processing completes once the neural service returns results.

## When to use each provider

* **Tesseract** is appropriate for environments that do not have the neural OCR service deployed, or where self-contained OCR processing is preferred.
* **Neural** is recommended when higher OCR accuracy is needed — particularly for complex layouts or non-Latin scripts — and when the neural OCR service is deployed in the environment.

<Note>
  The neural OCR service relies on a GPU-accelerated LLM. Ensure your environment has sufficient GPU resources available before enabling the `neural` option. In environments without dedicated GPU capacity, use `tesseract` instead.
</Note>