Evaluating multilingual language models

2020-2021 tavasz

Nyelvtechnológia

Téma leírása

Large multilingual language models, particularly the multilingual version of BERT, are trained on over 100 languages but they are only evaluated on a handful languages with limited scope. Probing is a popular way of evaluating language models. Probing extracts a sentence or word representation from the language model and tries to recover linguistic information from the representation by training a small classifier on it. In this project we use morphological probes in 39 languages. The probes are automatically generated from the Universal Dependencies Treebank.

Feltételek

  • Python
  • machine learning basics
  • English reading skills

Maximális létszám: 5 fő