Data Source Maintenance Best Practices
This document provides insights into various tactics that project managers can utilize to enhance localization workflows by leveraging data sources.
TMX File Uploads
TM files under 200 MB can be directly uploaded to LILT.
For TM files over 200 MB, compress the TMX file and upload it as
.tmx.zip
. Currently, LILT supports only TMX file format for zipped memory files.
For additional information on file support, refer to the Supported File Formats documentation.
Termbase (TB) CSV File Formatting
Prior to uploading CSV files, consider the following:
For immediate visibility to translators, set the Default TB Entry Status to Reviewed.
For entries requiring linguist review before use, set the Default TB Entry Status to Unreviewed or Draft.
CSV files should be formatted with the first two columns for source and target text, respectively. Rows lacking either will not be imported. Termbase CSV files may include an optional header row.
Note: Wrap text containing commas in quotes to ensure correct parsing. Omitting quotes will lead to parsing at the first encountered comma.
For header row CSV files, utilize the Termbase (TB, with header) import option. Metadata columns in CSV files, like status or dates, enhance translation content understanding.
Ensure no unwanted spaces exist in CSV files, as spaces post-commas are included upon import.
TMX File Preparation
Before upload:
Remove outdated TM entries by date.
Eliminate duplicate entries with identical source text but differing target texts.
TMX File Naming
Use informative labels, including the date, content type, and project or product name, to facilitate easier tracking and organization.
Naming Convention Example: [DATE]-[CONTENT TYPE]-[NAME]
Managing TM Repetition
LILT Translate adds translated segments to the Data Source, stacking duplicate source/target pairs for clarity. Project managers decide on the frequency and necessity of duplicate removals, accessible via the Manage entries page in the Sources tab.
Archiving Projects
To preserve TM entries while removing a project from the list, opt for archiving over deletion. Archiving retains TM entries within the Data Source, whereas deletion removes them. Consider downloading TM and TB entries before deleting a project for future use.
TMX Files and Machine Translation
Uploaded TMX files are immediately learned by the associated Data Source's Contextual AI model, unaffected by subsequent deletions. While the most recent documents significantly influence translation outputs, retaining older, relevant translations is advisable for comprehensive data retention.
For outdated TMX files still holding value, uploading for Concordance only allows their indexing without affecting Contextual AI training or TM results.