Uploading memory files to a Data Source allows LILT to better find translation matches and provide translation suggestions within LILT Translate. This article walks through uploading memory files to new or existing Data Sources. Files can only be uploaded to existing Data Sources. If you want to upload a memory file to an empty Data Source, you’ll need to first create a new Data Source by clicking the New Data Source button in the upper-right corner of the Data > Sources page. See Managing Data Sources for more details on creating new Data Sources.
Once you have located the Data Source you want to upload your memory files to, click the Data Source card or its Edit button to open up the data source management view. Navigate to the Manage resources page. This page displays all documents that have been uploaded to the Data Source. To add files to the Data Source, click the Upload files button in the upper-right and select the type of files you want to upload. Selecting one of the dropdown options will bring up a window for you to locate and select the files you want to upload. After the selected files are loaded into LILT, they will be available to view on the Manage resources page.
TM supported file types: JSON, SDLTM, TBM, TMQ, TMX, TMX.ZIP TB supported file types: CSV, TBX, TSV, XLSX File limitations:
  • Data Source files can contain as many entries as you like, so long as the file adheres to the TM Size limits. In particular, individual files cannot exceed 200 MB. If files exceed this size, zip them and add filename.tmx.zip appendage before uploading. Once uploaded, the file will be parsed into individual, editable entries.
  • See the Data Source Maintenance Best Practices article for information on how to structure CSV files for uploading Termbases.
  • Termbase column entries cannot contain more than 10,000 characters. When uploading a file where any column entries are more than 10,000 characters, LILT will not process the file and will display the following warning:
  • When importing JSON files as TM entries into LILT, use the format shown below to ensure your memory entries are properly imported: [ { “srclang”: “es”, “creationdate”: “2019-04-04T11:24:22Z”, “text”: “Introducción[editar]”, “units”: [ { “trglang”: “en”, “text”: “Introduction” } ] }, { “srclang”: “es”, “creationdate”: “2019-04-04T11:24:22Z”, “text”: “Aumentar”, “units”: [ { “trglang”: “en”, “text”: “Increase” } ] } ]
Data Source entry types:
  • Memory (TM): Choose this option if you want your memory files to be indexed for Concordance, used to train the MT, and used as TM results. The Contextual AI model learns from uploaded data immediately upon upload. Note that deleting documents from a Data Source does not affect the Contextual AI model (i.e. the Contextual AI model does not unlearn the deleted resources). However, there is a recency bias, meaning the most recent documents have a stronger input on the translation output.
  • Memory (TM, concordance only): Choose this option if you want your memory files to only be indexed for Concordance but not used to train the Contextual AI and not used as TM results.
  • Termbase (TB): Choose this option if your Termbase document does not have a header and you want all entries to be added to the Termbase entries of the Data Source.
  • Termbase (TB, with header): Choose this option if your Termbase document has a header at the top of the file that you want to exclude from adding to the Termbase entries of the Data Source.
Metadata: If you load in a file with metadata, LILT creates and populates custom fields for each TM/TB entry as the file is added to the Data Source. Metadata can be useful for providing context about translations. Metadata fields for each Translation Memory entry can be modified from within LILT by opening a TM/TB entry for editing. More details on this can be found in the Managing Termbase and Translation Memory Entries article.
When uploading a file with metadata fields, you will be presented with a popup form to map the metadata fields to existing metadata fields or new metadata fields.

Deleting Data Source files

  1. Select the files you want to delete by clicking the checkbox next to the resource name. Alternatively, you can select all files with the Select all button. If any files are selected, this turns into a Deselect button that will deselect all the resources currently selected.
  2. Click the Delete button in the upper-right to bring up a popup to confirm you want to permanently delete the selected resources. Deleting a resource permanently removes all that resource’s TM/TB entries from the Data Source.

Data Source Maintenance Best Practices

This section provides insights into various tactics that project managers can utilize to enhance localization workflows by leveraging data sources.

TMX File Uploads

  • TM files under 200 MB can be directly uploaded to LILT.
  • For TM files over 200 MB, compress the TMX file and upload it as .tmx.zip. Currently, LILT supports only TMX file format for zipped memory files.
For additional information on file support, refer to the Supported File Formats documentation.

Termbase (TB) CSV File Formatting

Prior to uploading CSV files, consider the following:
  • For immediate visibility to translators, set the Default TB Entry Status to Reviewed.
  • For entries requiring linguist review before use, set the Default TB Entry Status to Unreviewed or Draft.
CSV files should be formatted with the first two columns for source and target text, respectively. Rows lacking either will not be imported. Termbase CSV files may include an optional header row.
Note: Wrap text containing commas in quotes to ensure correct parsing. Omitting quotes will lead to parsing at the first encountered comma. For header row CSV files, utilize the Termbase (TB, with header) import option. Metadata columns in CSV files, like status or dates, enhance translation content understanding. Ensure no unwanted spaces exist in CSV files, as spaces post-commas are included upon import.

TMX File Preparation

Before upload:
  • Remove outdated TM entries by date.
  • Eliminate duplicate entries with identical source text but differing target texts.

TMX File Naming

Use informative labels, including the date, content type, and project or product name, to facilitate easier tracking and organization. Naming Convention Example: [DATE]-[CONTENT TYPE]-[NAME]

Managing TM Repetition

LILT Translate adds translated segments to the Data Source, stacking duplicate source/target pairs for clarity. Project managers decide on the frequency and necessity of duplicate removals, accessible via the Manage entries page in the Sources tab.

Archiving Projects

To preserve TM entries while removing a project from the list, opt for archiving over deletion. Archiving retains TM entries within the Data Source, whereas deletion removes them. Consider downloading TM and TB entries before deleting a project for future use.

TMX Files and Machine Translation

Uploaded TMX files are immediately learned by the associated Data Source’s Contextual AI model, unaffected by subsequent deletions. While the most recent documents significantly influence translation outputs, retaining older, relevant translations is advisable for comprehensive data retention. For outdated TMX files still holding value, uploading for Concordance only allows their indexing without affecting Contextual AI training or TM results.