values.yaml
and redeploy app
--datastore
(required): csv
--base-path
: Location to store the results. Should be similar to the value of wpaDataStoreBasePath
values file parameter. It’s possible to use the local pod folders as base path by referring it as file://<path>
--src_lang
: Only results for this source language will be used. Each language is given as a 2 letter code according to ISO 639-1, e.g., de for german. More details here.
--tgt_lang
- Only results for this target language will be used. The format is the same as --src_lang
/path/to/root/directory
), create 2 directories as below
kubectl
proxy or preferred method of your choice
/path/to/root/directory/configs/download_user_data.yaml
Meta_data.json
contains the # of segmentssegments.SRC/TGT_LANG
contains the source and target segments
TRAIN_SPLIT_PROPORTION
is a floating point number between 0 and 1 to indicate how much of the downloaded segments need to be used for training. Eg, if set to 0.7, then 70% of the segments downloaded into /path/to/root/directory/data/db_segments
will be used for train and the remaining 30% will be used for test. For the provided sample within sample_data/db_segments
using a TRAIN_SPLIT_PROPORTION
of 0.7 will yield 7 train segments and 3 test segments as provided within sample_data/db_segments/splits/{train,test}
/path/to/root/directory/data/splits/train
will be used to run a batch updater operation, and once completed, the data present within /path/to/root/directory/data/splits/test
will be used to run batch translate operation on the recently updated memory. The results will be downloaded and the computed bleu scores will be persisted in /path/to/root/directory/data/evaluation.lilt_api.json
Sample output
splits
directory, except for the ones which aren’t mentioned to be used for batch updater. If the inference
section in the config is updated to operations: [ "translate" ]
, then train
directory won’t be used for batch updater anymore and inference will be run on that too. To prevent this, train
directory can be deleted from the splits
directory so the inference is run only on test
.