Creating an engine

To create an engine:

  1. Go to Engines.

  2. Click Create new.

  3. Specify the name, languages and the group the new engine will belong to. Unlike corpora, engines must belong to one group only. We recommend providing the language directions of the engine in the name, as not all of the CAT tools display this information.

  4. Choose an engine type. The default is Domain-adapted, but for language combinations where Globalese offers stock engines, you can choose Stock+ as well. The engine type cannot be changed once an engine has been trained.

  5. Select the Master corpora you would like to use in your engine. See required corpus volumes here.

  6. If you have selected a Stock+ engine, click Save.

  7. If you have selected a Domain-adapted engine, the Boost engine option will appear.

  8. Select if you want to train an AI-boosted, domain-adapted engine. This attribute cannot be changed once an engine has been trained.

  9. If you have selected a Domain-adapted engine, the Auxiliary corpora option will appear

  10. Select if you would like to apply Auxiliary corpora filtering.

  11. If Globalese provides stock corpora for the selected language combination, and the Auxiliary corpora filtering option is activated, you can choose Stock corpora provided by Globalese as auxiliary data in your engine.

  12. Select Own corpora to include your own training data as auxiliary corpora to the engine. See required corpus volumes here.

  13. If you have selected a domain-adapted, AI-boosted engine, the Keyword list and the Custom prompt option will appear. You can select corpora marked as Keyword list to prioritize the terminology contained in the keyword list(s). You can add commands in the custom prompts to influence the behavior of the engine.

  14. Click Save.

Master corpora

Master corpora are the core of the engine. They are always included completely in the engine. Globalese will use master corpora as a reference to select from the Auxiliary corpora when training the engine. Master corpora will always get higher priority over the data of the Auxiliary corpora. This means that even if there is a conflict between parts of the Master corpora and the Auxiliary corpora, the content, terminology and style included in the Master corpora will be applied during translation. Therefore, you should avoid using unqualified and/or irrelevant data as Master corpora.

Auxiliary corpora

Auxiliary corpora are used to extend the engine to a sufficient size. You can either use Stock corpora provided by Globalese, and/or your own data. A bigger pool of auxiliary corpora means a bigger selection base for the training process. In case of auxiliary corpora, you can feel free to use any training data which is linguistically good quality, regardless of the terminology, style or domain it contains.

Auxiliary corpora filtering

By default, Globalese is applying a filter on the auxiliary corpora in the frame of the Automated Domain Adaptation process. In this automated process, Globalese is using only the content most closely related to the master corpora for training the engine. As a result, not all of the data selected as auxiliary data will be part of the engine. However, if you have a sufficient amount of auxiliary data yourself, and you are confident about its content, you can disable the auxiliary corpora filtering. As a result, all data selected as auxiliary data will be part of the engine, and no automated filtering on the auxiliary corpora will be applied during the training.

Keyword lists

You can use keyword lists to dynamically change the translations of words/terms based on the entries in the lists. Ensure that only real keywords are used and avoid generic words and words that may have different meanings.

Custom prompts

You can use custom prompts to influence the behavior of an engine. For example, you can ask for a translation in formal or informal tone of voice.