Resources for Language Technologies
-
Bilingual English-Danish parallel corpus from The Danish Medicines Agency website
Contents of https://laegemiddelstyrelsen.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework...
ZIP (340 visninger) (233 Downloads)
-
ENGLISH/POLISH PHRASE BOOK FOR ADMINISTRATIVE STAFF of LOCAL GOVERNMENT UNITS (Processed)
An English/Polish phrase book for the administrative staff of local government units (LGUs). This dataset has been created within the framework of the European Language Resource...
ZIP (612 visninger) (514 Downloads)
-
Compendium The Social Insurance Institution (Processed)
A compendium on the Polish Social Insurance Insitution (ZUS), covering the following issues: short presentation of ZUS, its history, tasks, organizational structure, employees, Social...
ZIP (323 visninger) (203 Downloads)
-
English-Slovak parallel corpus of texts from The Ministry of Culture of the Slovak Republic (Processed)
Dataset of various English-Slovak legal texts within agenda of the Ministry, plain text format alligned at the sentence level, the size: 105791 words It is converted into a 2609-TUs...
ZIP (512 visninger) (400 Downloads)
-
Romanian - English literature corpus (Processed)
Bilingual Romanian – English literature corpus built from a small set of freely available literature books (drama, sci-fi, etc.). The texts are positionally aligned, i.e. the sentence on...
ZIP (594 visninger) (474 Downloads)
-
Central Statistical Office Dataset (Processed)
Two Polish-English publications of the Polish Central Statistical Office in the XLIFF format: 1. "Statistical Yearbook of the Republic of Poland 2015" is the main summary publication...
ZIP (431 visninger) (326 Downloads)
-
The Terminological Vocabulary of Kela – Benefit-related Concepts, 4th edition (TSK 49)
The Terminological Vocabulary of Kela – Benefit-related Concepts, 4th edition (TSK 49) contains information on more than 500 concepts in term records and concept diagrams. The concepts...
XML PDF ZIP (511 visninger) (421 Downloads)
-
English-Danish EASTIN-CL Multilingual Ontology of Assistive Technology (Processed)
EASTIN-CL Multilingual Ontology of Assistive Technology was created within the EASTIN-CL project aimed at applying language technologies to portal of assistive technologies...
ZIP (424 visninger) (316 Downloads)
-
Monolingual documents from the Government of Lithuania
Monolingual documents received from the Government of the Republic of Lithuania. This dataset has been created within the framework of the European Language Resource Coordination (ELRC)...
ZIP (299 visninger) (185 Downloads)
-
Parallel texts from Swedish Labour market agency
Parallel texts, all in pdf files, have been gathered from Arbetsförmedlingen. The language of each document is indicated in its title. The original version is always in Swedish (with...
ZIP (357 visninger) (249 Downloads)
-
OROSSIMO Corpus - Medicine & health
A corpus of academic discourse texts belonging to the Medicine & health domain (according to the Dewey Decimal classification, DDC61 - Medicine & health), annotated at structural...
ZIP (568 visninger) (481 Downloads)
-
English-Danish Parallel corpus from Tatoeba project (Processed)
Parallel corpus from English-Danish translations from tatoeba.org website This dataset has been created within the framework of the European Language Resource Coordination (ELRC)...
ZIP (566 visninger) (466 Downloads)
-
Expression of interest
International call for expression of interest for the selection of the President of the Hellenic Statistical Authority (EL.STAT.) This dataset has been created within the framework of...
ZIP (528 visninger) (422 Downloads)
-
English-Danish Parallel corpus from Tatoeba project
Parallel corpus from English-Danish translations from tatoeba.org website This dataset has been created within the framework of the European Language Resource Coordination (ELRC)...
XML PDF ZIP (958 visninger) (815 Downloads)
-
English-Estonian EASTIN-CL Multilingual Ontology of Assistive Technology
EASTIN-CL Multilingual Ontology of Assistive Technology was created within the EASTIN-CL project aimed at applying language technologies to portal of assistive technologies...
XML PDF ZIP (530 visninger) (427 Downloads)
-
Slovak corpus of texts from the Ministry of Culture of the Slovak Republic
Dataset of Slovak legal texts within agenda of the Ministry, plain text format, the size: 108448 words This dataset has been created within the framework of the European Language...
ZIP (560 visninger) (438 Downloads)
-
Collection of Greek National Spatial Plans
Dataset, 268KB, 5 txt files, national spatial plans (general, aquaculture, tourism, industry, RES, detention facilities) This dataset has been created within the framework of the...
ZIP (211 visninger) (170 Downloads)
-
Central Statistical Office Dataset
Two Polish-English publications of the Polish Central Statistical Office in the XLIFF format: 1. "Statistical Yearbook of the Republic of Poland 2015" is the main summary publication...
XML PDF ZIP (663 visninger) (565 Downloads)
-
English-Lithuanian EASTIN-CL Multilingual Ontology of Assistive Technology (Processed)
EASTIN-CL Multilingual Ontology of Assistive Technology was created within the EASTIN-CL project aimed at applying language technologies to portal of assistive technologies...
ZIP (368 visninger) (256 Downloads)
-
OROSSIMO Corpus - Photography, film & video
A corpus of academic discourse texts belonging to the Photography, film & video domain (according to the Dewey Decimal classification, DDC77 -Photography, computer art, film &...
ZIP (572 visninger) (447 Downloads)