-
DGT-Translation Memory
DGT-TM is a translation memory (sentences and their manually produced translations) in 24 languages. It contains segments from the Acquis Communautaire, the body of European legislation,...
ZIP (45005 views) (4502 Downloads)
-
EuroVoc
EuroVoc is a multilingual, multidisciplinary thesaurus covering the activities of the EU. It contains terms in 24 EU languages (Bulgarian, Croatian, Czech, Danish, Dutch,...
XML HTML RDF XML ZIP (37050 views) (319 Downloads)
-
[DEPRECATED] Official Journals of the European Union (Irish)
This Dataset has been deprecated, and it is now replaced by the following datasets: Official Journals of the European Union 2021 Official Journals of the European Union 2020...
PDF HTML Formex 4 ZIP Excel XLS (1308 views) (1177 Downloads)
-
The Coimisineir Teanga Bilingual Corpus of Reference Documents
General Reference content from the Language Commissioner's Office Size: 6 bilingual Word documents and 44 parallel Word documents This dataset has been created within the framework...
ZIP (339 views) (248 Downloads)
-
The Gaois bilingual corpus of English-Irish legislation
Bilingual corpus of English-Irish legislation provided by the Department of Justice, in two parallel .txt files. Contains 98,758 parallel sentences. This dataset has been created within...
ZIP (432 views) (317 Downloads)
-
Irish Monolingual Corpus from contents of health.gov.ie web site
Irish Monolingual Corpus from contents of health.gov.ie web site This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting...
ZIP (252 views) (163 Downloads)
-
Citizens Information Bilingual Web-Corpus (Processed)
A web corpus crawled from http://www.citizensinformation.ie. Contains 10,297 parallel sentences of English/Irish that have undergone manual cleaning. May be reproduced and/or re-used free...
ZIP (243 views) (156 Downloads)
-
English-Irish website parallel corpus (Processed)
This is a parallel corpus of bilingual texts crawled from multilingual websites, which contains 1134 TUs. Manual validation has been performed on a sample of the data. This dataset...
ZIP (214 views) (126 Downloads)
-
Legal acts of Ireland as Irish Monolingual Corpus
Legal acts of Ireland as Irish Monolingual Corpus collected from documents of http://acts.ie/ web site This dataset has been created within the framework of the European Language...
ZIP (248 views) (157 Downloads)
-
The Coimisineir Teanga Bilingual Web Corpus (Processed)
Web content from the Language Commissioner's Office. This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility -...
ZIP (288 views) (188 Downloads)
-
The UCD Bord na Gaeilge Corpus of bilingual PDFs and Word documents (Processed)
Parallel data provided by the language office at UCD (University College Dublin) This dataset has been created within the framework of the European Language Resource Coordination (ELRC)...
ZIP (291 views) (189 Downloads)
-
The Udáras na Gaeltachta Corpus of bilingual PDFs and Word documents (Processed)
Information brochures and leaflets. This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated...
ZIP (256 views) (167 Downloads)
-
The UCD Bórd na Gaeilge Corpus of bilingual PDFs and Word documents
Parallel data provided by the language office at UCD (University College Dublin) Size: 3 Word documents, 67 PDFs This dataset has been created within the framework of the European...
ZIP (294 views) (196 Downloads)
-
The Coimisineir Teanga Bilingual Web Corpus
Web content from the Language Commissioner's Office. Two TXT files containing 6808 words of parallel data This dataset has been created within the framework of the European Language...
ZIP (268 views) (180 Downloads)
-
The Udáras na Gaeltachta Corpus of bilingual PDFs and Word documents
Word documents and PDF files of information brochures and leaflets. This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting...
ZIP (292 views) (219 Downloads)
-
The Coimisineir Teanga Bilingual Corpus of Reports and Press Releases
Reports and Press Release data from the Language Commissioner's Office. 19 parallel Word documents. This dataset has been created within the framework of the European Language Resource...
ZIP (258 views) (156 Downloads)