-
The UCD Bórd na Gaeilge Corpus of bilingual PDFs and Word documents
Parallel data provided by the language office at UCD (University College Dublin) Size: 3 Word documents, 67 PDFs This dataset has been created within the framework of the European...
ZIP (294 views) (196 Downloads)
-
Parallel corpus from Parliament of Estonia
Parallel corpus compiled from contents of website of Parliament of Estonia This dataset has been created within the framework of the European Language Resource Coordination (ELRC)...
ZIP (407 views) (307 Downloads)
-
Belgian parallel corpus about Belgium and the justice system
An automatically aligned parallel corpus of well-translated Belgian texts in Dutch and French. The corpus contains texts about Belgium and the Belgian justice system, with over 100.000...
ZIP (596 views) (488 Downloads)
-
Corpus of Icelandic texts from the Central Bank of Iceland
Corpus of Icelandic texts from the Central Bank of Iceland This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe...
ZIP (440 views) (329 Downloads)
-
Croatian monolingual corpus of the Official journal of the Republic of Croatia
The Croatian monolingual corpus of the Official journal of the Republic of Croatia is formated as the verticalized corpus with the line structure that resembles the simplified CoNLL...
ZIP (360 views) (273 Downloads)
-
Monolingual Polish corpus in the public administration domain (Processed)
Monolingual Polish corpus, containing 22372690 tokens and 1805280 lexical types in the public administration domain. This dataset has been created within the framework of the European...
ZIP (212 views) (115 Downloads)
-
OROSSIMO Corpus - Computer Science
A corpus of academic discourse texts belonging to the Computer Science domain (according to the Dewey Decimal classification, DDC00 - Computer science, knowledge & systems), annotated...
ZIP (368 views) (256 Downloads)
-
The Coimisineir Teanga Bilingual Web Corpus
Web content from the Language Commissioner's Office. Two TXT files containing 6808 words of parallel data This dataset has been created within the framework of the European Language...
ZIP (268 views) (180 Downloads)
-
Czech Banking Association Terminology
Terms in Czech - English relating to finance This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility -...
XML PDF ZIP (578 views) (488 Downloads)
-
Corpus on Finance and Economics from Bank of Latvia (Processed)
Contents of web site https://makroekonomika.lv/ -- Latvian and https://www.macroeconomics.lv/ -- English aligned as a parallel corpus This dataset has been created within the...
ZIP (285 views) (184 Downloads)
-
2015 Calls for Tenders for Translation
Contains monolingual Netherlands Dutch texts with the 2015 calls for tenders for translation work for the child welfare office, the office for prisoner rehabilitation and for the ministry...
ZIP (562 views) (458 Downloads)
-
Polish Ministry of Foreign Affairs Regional Dataset
A collection of Polish-English whitepapers published by the Polish Ministry of Foreign Affairs, including "Eastern Partnership" (10K words in 492 segments) and "Poland's 10 years in the...
XML PDF ZIP (547 views) (437 Downloads)
-
The Terminological Vocabulary of Kela – Benefit-related Concepts, 4th edition (TSK 49)
The Terminological Vocabulary of Kela – Benefit-related Concepts, 4th edition (TSK 49) contains information on more than 500 concepts in term records and concept diagrams. The concepts...
XML PDF ZIP (511 views) (421 Downloads)
-
Documents concerning Federal Constitutional Law in Austria
Alignment documents concerning Austrian Federal Constitutional Law This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting...
ZIP (458 views) (360 Downloads)
-
Guidelines - Judicial maps in Bulgarian
Guidelines on establishment of judicial mapping in Bulgarian This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe...
ZIP (283 views) (189 Downloads)
-
English-Danish EASTIN-CL Multilingual Ontology of Assistive Technology (Processed)
EASTIN-CL Multilingual Ontology of Assistive Technology was created within the EASTIN-CL project aimed at applying language technologies to portal of assistive technologies...
ZIP (424 views) (316 Downloads)
-
Translation of the Luxembourg.lu web site
Translation Luxembourg.lu web site, consisting of 90293 Translation Units of French, German and English This dataset has been created within the framework of the European Language...
XML PDF ZIP (389 views) (290 Downloads)
-
DA-EN Danish Ministry of Higher Education and Science 3
Parallel texts Danish-English from the Danish Ministry of Higher Education and Science, size 110,000 words, topic: research policy This dataset has been created within the framework of...
ZIP (366 views) (264 Downloads)
-
Secretariat-General parallel corpus SL-EN and EN-SL (part 2)
English-Slovenian parallel corpus in TMX format from the Secretariat-General of the Government of the Republic of Slovenia in the legal domain This dataset has been created within the...
XML PDF ZIP (156 views) (128 Downloads)
-
Convention against Torture and Other Cruel, Inhuman or Degrading Treatment or Punishment - United Nations (French-English-Greek)
English text of the Convention against Torture and Other Cruel, Inhuman or Degrading Treatment or Punishment (United nations) and the ratifying bilingual (French - Greek) Greek law...
ZIP (408 views) (300 Downloads)