Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Engines are trained from corpora and are used to translate filestext. They are unidirectional, i.e. have one source language and one target language.

Info

Types of engines

  • Custom

    Using master corpora only

    • Domain-adapted (using auxiliary and/or stock corpora), AI-boosted

    • Stock+

  • Stock

Custom engines

All engines trained in Globalese are custom engines. There is, however, a degree to their customisationcustomization, depending on what combination of resources are used to create them.

...

Domain-adapted engines are engines containing both master and auxiliary/stock corpora, and trained using Globalese’s proprietary automated domain adaptation technology. By selecting your important in-domain TM(s) as “Master” training data, the engine will be focusing on the style and the terminology of those TM(s). You can choose to add generic stock data to extend the engine in case the volume of the in-domain data is not enough. You have also the option to add your own auxiliary data.

Info

AI-Boosted engines

The option to train AI-boosted engines is available from Globalese V5. These engines are combining the terminology and style accuracy of a domain-adapted engine with the linguistic capabilities of large language models. In Globalese V5, AI-Boosted Engines use currently GPT models from OpenAI. The AI-boosting option is currently available for domain-adapted engines. AI-boosted engines typically provide the best results in most cases, especially as it is supporting dynamic keyword lists.

Please note that GPT models from MS Azure are currently available as a beta service. AI-boosted engines can only be used in the Cloud Text translation scenario.

Use case

The typical use case for domain-adapted engines is where adhering to a particular terminology and style is important. Some examples: product documentation, end-user manuals or software documentation, where it is essential to use the right terminology and style consistently.

...

The following table shows the minimum and recommended number of segments.

Includes stock corpora?

Minimum volume (segments)

Recommended volume (segments)

Yes

15,000 master

100,000+ master

No

15,000 master
200,000 total

100,000+ master
1,000,000+ total

Typical training time

The typical training time for domain-adapted engines is between 10 and 24 28 hours.

Stock+ engines

Stock+ engines are customised customized stock engines, i.e. engines trained by extending a pre-trained stock engine with you own master data. The selected master data will be part of the engine. If there is new content in the master corpora, the engine will learn it. However, you should not expect changes in terminology and style preferences in the engine based on the master data added.

...

You can also use pre-trained stock engines for certain language combinations.