Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

To create an engine:

  1. Go to Engines.

  2. Click Create new.

  3. Specify the name, languages and the group the new engine will belong to. Unlike corpora, engines must

...

  1. belong to one group only.

  2. Choose an engine type. The default is Domain-adapted, but for language combinations where Globalese offers stock engines, you can choose Stock+ as well.

  3. Select the Master corpora you would like to use in your engine. See required corpus volumes here.

  4. If you have selected a Stock+ engine, click Save.

  5. If you

...

  1. have selected a Domain-adapted engine, the Auxiliary corpora option

...

  1. will appear

  2. Select if you would like to apply Auxiliary corpora filtering.

  3. If Globalese provides stock corpora for the selected language combination, and the Auxiliary corpora filtering option is activated, you can choose Stock corpora provided by Globalese as auxiliary data in your engine.

  4. Select

...

  1. Own corpora

...

  1. to include

...

  1. your own training data as auxiliary corpora to the engine. See required corpus volumes here.

  2. Click Save.

Info

Master corpora

Master corpora are the core of the engine. They are always included completely in the engine. Globalese will use master corpora as a reference to select from the Auxiliary corpora when training the engine. The training process will use segment pairs from the auxiliary and/or stock corpora that are from the same domain as the master corpus with a higher weight, and others with a lower weightMaster corpora will always get higher priority over the data of the Auxiliary corpora. This means that even if there is a conflict between parts of the Master corpora and the Auxiliary corpora, the content, terminology and style included in the Master corpora will be applied during translation. Therefore, you should avoid using unqualified and/or irrelevant data as Master corpora.

Auxiliary corpora

Auxiliary corpora , just like stock corpora, will be are used to enrich the master corporaextend the engine to a sufficient size. You can either use Stock corpora provided by Globalese, and/or your own data. A bigger pool of auxiliary corpora means a bigger selection base for the training process. In case of auxiliary corpora, you can feel free to use any training data which is linguistically good quality, regardless of the terminology, style or domain it contains.
Only

Auxiliary corpora filtering

By default, Globalese is applying a filter on the auxiliary corpora in the frame of the Automated Domain Adaptation process. In this automated process, Globalese is using only the content most closely related to the master corpora will eventually be used for training the engine, so feel free to add any material that has good linguistic valuefor training the engine. As a result, not all of the data selected as auxiliary data will be part of the engine. However, if you have a sufficient amount of auxiliary data yourself, and you are confident about its content, you can disable the auxiliary corpora filtering. As a result, all data selected as auxiliary data will be part of the engine, and no automated filtering on the auxiliary corpora will be applied during the training.