Resources for Language Technologies
-
DGT-Translation Memory
DGT-Translation Memory DGT-TM er en oversættelseshukommelse (sætninger og deres manuelt fremstillede oversættelse) på 24 sprog. Den indeholder segmenter fra den gældende fællesskabsret –...
PDF ZIP (45005 visninger) (4502 Downloads)
-
COVID-19 multilingual terminology in IATE
The dataset is a collection of multilingual entries related to the SARS-CoV-2 virus and the COVID-19 pandemic, available in IATE, the European Union terminology database. It is a...
Excel XLSX (1490 visninger) (122 Downloads)
-
Romanian – English parallel wordlists
English and Romanian lemmatized wordlists extracted from various resources (including RO-EN Wordnets, the Romanian – English news corpus, the Romanian – English literature corpus, and...
ZIP (885 visninger) (765 Downloads)
-
EJTN Handbook (Processed)
Handbook on judical training (Processed) This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated...
ZIP (373 visninger) (266 Downloads)
-
Hallituskausi 2007-2011 fi-en
The "Hallituskausi 2007–2011" translation memory is intended for those translating administrative texts between Finnish and English. It includes key policy reports published by the...
XML PDF ZIP (414 visninger) (317 Downloads)
-
EUIPO - IP case law Italian-English (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action SMART...
ZIP (274 visninger) (175 Downloads)
-
Bilingual English-Norwegian parallel corpus from Norwegian Maritime Authority website
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action SMART...
ZIP (453 visninger) (340 Downloads)
-
Letter of rights for persons arrested on the basis of a European Arrest Warrant (Processed)
Letter of rights for persons arrested on the basis of a European Arrest Warrant (EAW), 1 page, (Processed) This dataset has been created within the framework of the European Language...
ZIP (666 visninger) (557 Downloads)
-
National Health Fund Dataset (Processed)
The dataset is a 274K-token Polish-English parallel resource in XLIFF format created on the basis of "Diagnosis-Related Groups in Europe" publication of the Polish National Health Fund....
ZIP (345 visninger) (231 Downloads)
-
Letter of rights for persons arrested and or detained
Police form, 12 pages. This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation...
ZIP (413 visninger) (296 Downloads)
-
DA-EN Danish Ministry of Higher Education and Science 2
Parallel texts Danish-English from the Danish Ministry of Higher Education and Science, size 115,000 words, topic: research policy This dataset has been created within the framework of...
ZIP (333 visninger) (209 Downloads)
-
The Coimisineir Teanga Bilingual Corpus of Reference Documents
General Reference content from the Language Commissioner's Office Size: 6 bilingual Word documents and 44 parallel Word documents This dataset has been created within the framework...
ZIP (339 visninger) (248 Downloads)
-
The Gaois bilingual corpus of English-Irish legislation
Bilingual corpus of English-Irish legislation provided by the Department of Justice, in two parallel .txt files. Contains 98,758 parallel sentences. This dataset has been created within...
ZIP (432 visninger) (317 Downloads)
-
Corpus of State-related content from the Latvian Web (Processed)
Latvian Web, home pages of ministries and state public services, army, etc. were crawled, and parallel Latvian-English content was collected. (Processed) This dataset has been created...
ZIP (451 visninger) (346 Downloads)
-
English-Slovak parallel corpus of texts from The Ministry of Culture of the Slovak Republic
Dataset of various English-Slovak legal texts within agenda of the Ministry, plain text format alligned at the sentence level, the size: 105791 words This dataset has been created within...
ZIP (357 visninger) (249 Downloads)
-
Convention on the transfer of sentenced persons (English - Greek) (Processed)
Convention, additional protocol on the convention, recomendation R (84) 11 of the Council of Europe, templates on the approval/rejection of transfer requests regarding the convention on...
ZIP (498 visninger) (383 Downloads)
-
Romanian – English literature corpus
Bilingual Romanian - English literature corpus built from a small set of freely available literature books (drama, sci-fi, etc.). The texts are positionally aligned, i.e. the sentence on...
ZIP (411 visninger) (321 Downloads)
-
Translation memories from The Ministry of Foreign Affairs of Norway
Translation memories containing translations of EU legislative acts from English to Norwegian Bokmål.
XML PDF ZIP (663 visninger) (540 Downloads)
-
English-Estonian corpus from Finnish Information Bank (Processed)
http://www.infopankki.fi - Finland in your language - Information about Finland - Moving to Finland - Living in Finland This dataset has been created within the framework of the European...
ZIP (288 visninger) (186 Downloads)
-
English-Swedish corpus from Finnish Information Bank (Processed)
http://www.infopankki.fi - Finland in your language - Information about Finland - Moving to Finland - Living in Finland This dataset has been created within the framework of the European...
ZIP (432 visninger) (327 Downloads)