Introduction

This document details the best practices for Data Sources and terminology, including suggestions for organization and practices to avoid.

Data Source Management

Data Source Organization Best Practices

  • When creating Data Sources, default toward combining Data Sources to focus on a smaller number of Data Sources. Splitting Data Sources too aggressively can reduce the benefit of adaptation.
  • Style impacts translation quality. Prefer to split TMs on the style anticipated by the audience, as mixing styles of writing or registers, e.g. academic journals vs. social media content/text messages may result in translation “unfit” for the task.
  • Including multiple types of content in one Data Source may improve the translation results across each type of content. It can often be better to combine content into a smaller number of Data Sources.

Separating Data Sources

There are reasons you may want to separate Data Sources:
  • **Different Topics or Content Domains: **Data Sources containing completely unrelated content domains may result in conflicting machine translation suggestions
  • Access control: Some Data Sources may contain information only accessible to certain groups or roles and can not be shared with other audiences.
  • Confidential Project Information: Where content is sensitive and should not be freely accessible to users on other projects

Terminology Management

Terminology management is a core part of the overall quality management process for translation. Terminology management is often a complex decision-making process involving many stakeholders in an organization. For example:
  • When dealing with terminology in public-facing materials (such as a website), the marketing, public relations, and/or branding department may need to get involved.
  • When dealing with terminology in a product, a product team and technical writing team would make the final decision.

Creating Terminology

When team members suggest a termbase (TB) entry, the term does not immediately appear in LILT Translate for translators. The term must be accepted by a reviewer for it to show up as a TB suggestion. Once accepted, the TB entry will be highlighted within any document in a project sharing the same Data Source. Below are common best practices and things to avoid when submitting terminology:

Best practices:

  • Check with your LILT services team to understand requirements before adding terms (some programs have external terminology systems).
  • Use the Segment context sidebar to make sure the associated Data Source doesn’t already contain the term you want to add in the same form (singular, plural, lowercase, capitalized, or as part of combined words). If the term should be matched in multiple forms (ex. singular and plural), you will need to have multiple entries.
  • Perform a quick search to ensure the translation you want to add is widely accepted in the locale.
  • Check a trustworthy dictionary for your language (i.e. Oxford Dictionary for English) to make sure your translation doesn’t conflict/contradict with the source term’s definition or connotation.
  • Include appropriate metadata to each entry you add:
    • Part of speech
    • Context
    • Source

Things to avoid:

When adding terms, be careful to avoid:
  • Typos
  • General terminology (a termbase is not a dictionary)
  • Terms that appear infrequently in the document
  • Terms you are unsure of and want someone else to confirm
  • New translations for an existing term (unless you add metadata to indicate specific and distinct use)
Don’t add terms that are:
  • A number
  • Context-sensitive
  • More than four words
  • Variants of existing terms:
    • Differences in capitalization
    • Inflected forms of a word

Terminology Organization

Terms are created within a Data Source and are not recognized globally across multiple Data Sources. If you need terms in multiple Data Sources, the entries will need to be added to each of those Data Sources.