Resources for Machine Translation

English-French translation of Cochrane's systematic reviews

This corpus contains translation into French of abstracts of systematic reviews edited by the Cochrane. It contains three subparts: the largest are translations performed by Human translators; two smaller subparts have also been produced via post-edition: one corresponds to post-editions of the contemporary Google's (S)MT engine; the other to post-editions of our in-house biomedical SMT engine.

This corpus is fully documented in this paper: Diagnosing High-Quality Statistical Machine Translation Using Traces of Post-Edition Operations (Julia Ive, Aurélien Max, François Yvon, Philippe Ravaud), In International Conference on Language Resources and Evaluation - Workshop on Translation Evaluation: From Fragmented Tools and Data Sets to an Integrated Ecosystem (MT Eval 2016), 2016.

You can get this corpus here.

The Trace corpus of translation errors

This corpus was develop during the French ANR/Trace project. It contains almost 7,000 French to English and 7,000 English to French translations and their post-editions by professionals translators. Download

This corpus is described in this paper: Design and Analysis of a Large Corpus of Post-Edited Translations: Quality Estimation, Failure Analysis and the Variability of Post-Edition (Guillaume Wisniewski, Anil Kumar Singh, Natalia Segal, François Yvon), In Machine Translation Summit (MT Summit), 2013.