Reading Group - GDR TAL
Posted on Ven, 01 jan 2021 in misc
October 30, 2020
A multilingual view of unsupervised MT
Garcia, Foret, Sellam and Parikh
Devise a model for multilingual unsupevised MT. The main idea is as follows: consider \(L\) languages, and model the joint distribution \(P(x,y,z...)\) (let us assume \(L=3\) for the sake of the argument) based on a multiplicity of monolingual or bilingual corpora. The translation parameters require conditional models so the main objective is a sum
where the unobserved source data (source) are handled as latent variables in the model. A major assumption is that we do not need \(y,z\) to generate the translation \(x\), hence $ P_{\theta}(x|y,z)= P_{\theta}(x|y)= P_{\theta}(x|z) = \sqrt{P_{\theta}(x|y)P_{\theta}(x|z)}.
Each term in the summation is lower bounded using the Jensen's inequality, yielding for instance for the first term:
Il est intéressant de voir les deux premiers termes comme des termes de reconstruction après back-translation. It is interesting to see the first two terms as reconstruction after a back-translation.
As all these terms are expectations, one can try to use EM to maximise this bound; during the E step one must compute the posterior of \(y|x\), which is approximated using the sole \(argmax\) (itself an approximation) \(\widehat{y}\); during the \(M-step\) one must optimize in \(\theta\), but instead the authors propose to perform one step of gradient update.