Resources for Language Technologies
-
Romanian – English parallel wordlists
English and Romanian lemmatized wordlists extracted from various resources (including RO-EN Wordnets, the Romanian – English news corpus, the Romanian – English literature corpus, and...
ZIP (885 visninger) (765 Downloads)
-
National Health Fund Dataset (Processed)
The dataset is a 274K-token Polish-English parallel resource in XLIFF format created on the basis of "Diagnosis-Related Groups in Europe" publication of the Polish National Health Fund....
ZIP (345 visninger) (231 Downloads)
-
English-Slovak parallel corpus of texts from The Ministry of Culture of the Slovak Republic
Dataset of various English-Slovak legal texts within agenda of the Ministry, plain text format alligned at the sentence level, the size: 105791 words This dataset has been created within...
ZIP (357 visninger) (249 Downloads)
-
Romanian – English literature corpus
Bilingual Romanian - English literature corpus built from a small set of freely available literature books (drama, sci-fi, etc.). The texts are positionally aligned, i.e. the sentence on...
ZIP (411 visninger) (321 Downloads)
-
English-Estonian corpus from Finnish Information Bank (Processed)
http://www.infopankki.fi - Finland in your language - Information about Finland - Moving to Finland - Living in Finland This dataset has been created within the framework of the European...
ZIP (288 visninger) (186 Downloads)
-
English-Swedish corpus from Finnish Information Bank (Processed)
http://www.infopankki.fi - Finland in your language - Information about Finland - Moving to Finland - Living in Finland This dataset has been created within the framework of the European...
ZIP (432 visninger) (327 Downloads)
-
English-Finnish corpus from Finnish Information Bank (Processed)
http://www.infopankki.fi - Finland in your language - Information about Finland - Moving to Finland - Living in Finland This dataset has been created within the framework of the European...
ZIP (496 visninger) (378 Downloads)
-
English-Estonian corpus from Finnish Information Bank
http://www.infopankki.fi - Finland in your language - Information about Finland - Moving to Finland - Living in Finland This dataset has been created within the framework of the European...
XML PDF ZIP (439 visninger) (337 Downloads)
-
English-Swedish corpus from Finnish Information Bank
http://www.infopankki.fi - Finland in your language - Information about Finland - Moving to Finland - Living in Finland This dataset has been created within the framework of the European...
XML PDF ZIP (641 visninger) (524 Downloads)
-
English-Finnish corpus from Finnish Information Bank
http://www.infopankki.fi - Finland in your language - Information about Finland - Moving to Finland - Living in Finland This dataset has been created within the framework of the European...
XML PDF ZIP (850 visninger) (724 Downloads)
-
Parallel texts from Swedish Labour market agency (Processed)
Parallel texts, all in pdf files, have been gathered from Arbetsförmedlingen. The language of each document is indicated in its title. The original version is always in Swedish (with...
ZIP (334 visninger) (237 Downloads)
-
English-Estonian EASTIN-CL Multilingual Ontology of Assistive Technology (Processed)
EASTIN-CL Multilingual Ontology of Assistive Technology was created within the EASTIN-CL project aimed at applying language technologies to portal of assistive technologies...
ZIP (534 visninger) (420 Downloads)
-
Term lists and Dictionaries from Swedish Authorities
This resource also includes a Dictionary from the ELMN that has a set of terms translated from English to all the EU languages. The list of languages that is indicated with this resource...
ZIP (478 visninger) (365 Downloads)
-
Expression of interest (Processed)
International call for expression of interest for the selection of the President of the Hellenic Statistical Authority (EL.STAT.) This dataset has been created within the framework of...
ZIP (426 visninger) (309 Downloads)
-
Parallel texts from Swedish Work environment Authority (Processed)
Parallel texts from the Swedish Work Environment authority, all in pdf format. Original in Swedish, all the other texts are translations. One original with translations per folder....
ZIP (615 visninger) (487 Downloads)
-
Parallel texts from Swedish Labour market agency. Part 2 (Processed)
Same as part 1, but with the Readme-file. (Processed) This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility...
ZIP (439 visninger) (332 Downloads)
-
Romanian-English corpus with studies, reports and statistical data in the field of culture from the National Institute for Cultural Research and Training website (Processed)
Romanian-English corpus with studies, reports and statistical data in the field of culture from the National Institute for Cultural Research and Training website This dataset has been...
ZIP (362 visninger) (254 Downloads)
-
English-Swedish parallel corpus from the web site of the Swedish Migration Board - Migrationsverket (Processed)
All texts have been collected from their website of the Swedish Migration Board. The original text is always in Swedish, the other texts are translations from Swedish. This dataset has...
ZIP (312 visninger) (221 Downloads)
-
Orossimo Terminological Resource - Medicine & health
A bilingual terminological glossary extracted from academic discourse texts belonging to the Medicine & health domain. This dataset has been created within the framework of the...
XML PDF ZIP (766 visninger) (651 Downloads)
-
Bilingual English-Danish parallel corpus from Aarhus 2017 - European Capital of Culture website
Contents of http://www.aarhus2017.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of the...
ZIP (508 visninger) (391 Downloads)