Parliamentary corpora

From Clarin K-Centre
Jump to navigation Jump to search

We currently have no specific parliamentary corpora available, but are working on this topic in the framework of ParlaMint, a project that aims to bring together as many parliamentary corpora of different European languages as possible.

To this end, the different datasets must be converted to a uniform format and provided with linguistic information. The INT will implement this for the bilingual Belgian Federal Parliament (French & Dutch). The aim of the project is to provide suitable research data for targeted observations of trends, opinions and decision-making. This will be tested by conducting a case study of the debate on the COVID-19 epidemic.

The data will be made available.

European Parliament data[edit]

A parallel corpus extracted from the European Parliament web site by Philipp Koehn (University of Edinburgh). The main intended use is to aid statistical machine translation research.