Resources for Language Technologies
-
DGT-Translation Memory
DGT-TM is a translation memory (sentences and their manually produced translations) in 24 languages. It contains segments from the Acquis Communautaire, the body of European legislation,...
PDF ZIP (45005 views) (4502 Downloads)
-
COVID-19 multilingual terminology in IATE
The dataset is a collection of multilingual entries related to the SARS-CoV-2 virus and the COVID-19 pandemic, available in IATE, the European Union terminology database. It is a...
Excel XLSX (1490 views) (122 Downloads)
-
Monolingual corpus from Minutes of the Plenary Sessions of the Croatian Parliament (2016-2018) (Processed)
Minutes of the Plenary Sessions of the Croatian Parliament (2016-2018) were downloaded from http://edoc.sabor.hr . This dataset has been created within the framework of the European...
ZIP (169 views) (85 Downloads)
-
English-Croatian translation memory from the Ministry of Regional Development and EU Funds (Processed)
A translation memory in tmx format with source texts from the Ministry of Regional Development and EU Funds and translations in Croatian by Ciklopea d.o.o. This dataset has been created...
ZIP (323 views) (211 Downloads)
-
English-Croatian translation memory from the Ministry of Agriculture (Processed)
A translation memory in tmx format with source texts from the Ministry of Agriculture and translations in Croatian by Ciklopea d.o.o. This dataset has been created within the framework...
ZIP (288 views) (174 Downloads)
-
Croatian-English translation memory from the Ministry of Agriculture (Part 2) (Processed)
A translation memory in tmx format with source texts from the Ministry of Agriculture and translations in English by Ciklopea d.o.o. This dataset has been created within the framework of...
ZIP (235 views) (129 Downloads)
-
Croatian-English translation memory from the Ministry of Regional Development and EU Funds (Part 1) (Processed)
A translation memory in tmx format with source texts from the Ministry of Regional Development and EU Funds and translations in English by Ciklopea d.o.o. This dataset has been created...
ZIP (301 views) (181 Downloads)
-
Croatian-English glossary of statistical terms (Processed)
Croatian-English glossary of statistical terms from the Croatian Bureau of Statistics website. In its unprocessed form, it comprises circa 1760 terms, of which some include alternatives....
ZIP (218 views) (134 Downloads)
-
Croatian-English parallel corpus from the website of the Embassy of Finland, Zagreb (Processed)
Croatian-English parallel corpus from the website of the Embassy of Finland, Zagreb (http://www.finland.hr) This dataset has been created within the framework of the European Language...
ZIP (363 views) (272 Downloads)
-
Croatian-English corpus with studies on the challenges to the Croatian Accession to the European Union from the Croatian Institute of Public Finance website (Processed)
Croatian-English corpus with studies on the challenges to the Croatian Accession to the European Union from the Croatian Institute of Public Finance website. This dataset has been...
ZIP (336 views) (241 Downloads)
-
Croatian-English translation memory from the Ministry of Agriculture (Part 1) (Processed)
A translation memory in tmx format with source texts from the Ministry of Agriculture and translations in English by Ciklopea d.o.o. This dataset has been created within the framework of...
ZIP (256 views) (161 Downloads)
-
Croatian-English parallel corpus from the website of the Croatian Journal of Fisheries (Processed)
Croatian-English parallel corpus from the website of the Croatian Journal of Fisheries (https://ribarstvo.agr.hr/) This dataset has been created within the framework of the European...
ZIP (374 views) (263 Downloads)
-
University of Vienna Termbanks
3 Termbanks about Risk Management, Austrian Asylum Law and University Law/Education Administration This dataset has been created within the framework of the European Language Resource...
XML PDF ZIP (596 views) (487 Downloads)
-
ZZP. Cisco Academy terminology (Processed)
IT terminology created as part of Cisco Academy. This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility -...
ZIP (275 views) (181 Downloads)
-
Croatian-English corpus with the Rural Development Programme for the Period 2014-2020 from the Croatian Rural Development Programme website (Processed)
Croatian-English corpus with the Rural Development Programme for the Period 2014-2020 from the Croatian Rural Development Programme website This dataset has been created within the...
ZIP (370 views) (267 Downloads)
-
Croatian-English translation memory from the Ministry of Regional Development and EU Funds (Part 2) (Processed)
A translation memory in tmx format with source texts from the Ministry of Regional Development and EU Funds and translations in English by Ciklopea d.o.o. This dataset has been created...
ZIP (307 views) (191 Downloads)
-
English - Croatian parallel corpus from texts of the Swedish Crime Victim Compensation and Support Authority (Brottsoffermyndigheten) web site (Processed)
A collection of information sheets from the Swedish Crime Victim Compensation and Support Authority (Brottsoffermyndigheten). Each folder contains a set of parallel texts, all of them...
ZIP (288 views) (177 Downloads)
-
Parallel texts from the Swedish Migration Board - Migrationsverket (English-Croatian part) (Processed)
All texts have been collected from their website of the Swedish Migration Board. This dataset has been created within the framework of the European Language Resource Coordination (ELRC)...
ZIP (198 views) (112 Downloads)
-
Croatian monolingual corpus of the Official journal of the Republic of Croatia (Processed)
The corpus comprises texts published on the Official journal of the Republic of Croatia from 1992 to 2013. This dataset has been created within the framework of the European Language...
ZIP (276 views) (182 Downloads)
-
Bilingual hr-en parallel corpus from the National and University Library in Zagreb website (Processed)
Contents of http://www.nsk.hr were crawled, aligned on document and sentence level and converted into a parallel corpus This dataset has been created within the framework of the European...
ZIP (384 views) (290 Downloads)