Resources for Language Technologies
-
COVID-19 multilingual terminology in IATE
The dataset is a collection of multilingual entries related to the SARS-CoV-2 virus and the COVID-19 pandemic, available in IATE, the European Union terminology database. It is a...
Excel XLSX (1490 wyświetleń) (122 Pobrane pliki)
-
Romanian – English parallel wordlists
English and Romanian lemmatized wordlists extracted from various resources (including RO-EN Wordnets, the Romanian – English news corpus, the Romanian – English literature corpus, and...
ZIP (885 wyświetleń) (765 Pobrane pliki)
-
National Health Fund Dataset (Processed)
The dataset is a 274K-token Polish-English parallel resource in XLIFF format created on the basis of "Diagnosis-Related Groups in Europe" publication of the Polish National Health Fund....
ZIP (345 wyświetleń) (231 Pobrane pliki)
-
English-Slovak parallel corpus of texts from The Ministry of Culture of the Slovak Republic
Dataset of various English-Slovak legal texts within agenda of the Ministry, plain text format alligned at the sentence level, the size: 105791 words This dataset has been created within...
ZIP (357 wyświetleń) (249 Pobrane pliki)
-
Romanian – English literature corpus
Bilingual Romanian - English literature corpus built from a small set of freely available literature books (drama, sci-fi, etc.). The texts are positionally aligned, i.e. the sentence on...
ZIP (411 wyświetleń) (321 Pobrane pliki)
-
Corpus RIZIV
Corpus with Dutch and French of the national institute for illness and invalidity insurance
ZIP (625 wyświetleń) (545 Pobrane pliki)
-
English-Estonian corpus from Finnish Information Bank (Processed)
http://www.infopankki.fi - Finland in your language - Information about Finland - Moving to Finland - Living in Finland This dataset has been created within the framework of the European...
ZIP (288 wyświetleń) (186 Pobrane pliki)
-
English-Swedish corpus from Finnish Information Bank (Processed)
http://www.infopankki.fi - Finland in your language - Information about Finland - Moving to Finland - Living in Finland This dataset has been created within the framework of the European...
ZIP (432 wyświetleń) (327 Pobrane pliki)
-
English-Finnish corpus from Finnish Information Bank (Processed)
http://www.infopankki.fi - Finland in your language - Information about Finland - Moving to Finland - Living in Finland This dataset has been created within the framework of the European...
ZIP (496 wyświetleń) (378 Pobrane pliki)
-
English-Estonian corpus from Finnish Information Bank
http://www.infopankki.fi - Finland in your language - Information about Finland - Moving to Finland - Living in Finland This dataset has been created within the framework of the European...
XML PDF ZIP (439 wyświetleń) (337 Pobrane pliki)
-
English-Swedish corpus from Finnish Information Bank
http://www.infopankki.fi - Finland in your language - Information about Finland - Moving to Finland - Living in Finland This dataset has been created within the framework of the European...
XML PDF ZIP (641 wyświetleń) (524 Pobrane pliki)
-
English-Finnish corpus from Finnish Information Bank
http://www.infopankki.fi - Finland in your language - Information about Finland - Moving to Finland - Living in Finland This dataset has been created within the framework of the European...
XML PDF ZIP (850 wyświetleń) (724 Pobrane pliki)
-
English-Estonian EASTIN-CL Multilingual Ontology of Assistive Technology (Processed)
EASTIN-CL Multilingual Ontology of Assistive Technology was created within the EASTIN-CL project aimed at applying language technologies to portal of assistive technologies...
ZIP (534 wyświetleń) (420 Pobrane pliki)
-
Romanian-English corpus with studies, reports and statistical data in the field of culture from the National Institute for Cultural Research and Training website (Processed)
Romanian-English corpus with studies, reports and statistical data in the field of culture from the National Institute for Cultural Research and Training website This dataset has been...
ZIP (362 wyświetleń) (254 Pobrane pliki)
-
English-Swedish parallel corpus from the web site of the Swedish Migration Board - Migrationsverket (Processed)
All texts have been collected from their website of the Swedish Migration Board. The original text is always in Swedish, the other texts are translations from Swedish. This dataset has...
ZIP (312 wyświetleń) (221 Pobrane pliki)
-
Orossimo Terminological Resource - Medicine & health
A bilingual terminological glossary extracted from academic discourse texts belonging to the Medicine & health domain. This dataset has been created within the framework of the...
XML PDF ZIP (766 wyświetleń) (651 Pobrane pliki)
-
Bilingual English-Danish parallel corpus from Aarhus 2017 - European Capital of Culture website
Contents of http://www.aarhus2017.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of the...
ZIP (508 wyświetleń) (391 Pobrane pliki)
-
Polish-English parallel corpus from the website of Public Employment Services in Poland (member of EURES network) (Processed)
Polish-English parallel corpus from the website of Public Employment Services in Poland (member of EURES network, https://eures.praca.gov.pl) This dataset has been created within the...
ZIP (549 wyświetleń) (440 Pobrane pliki)
-
Bilingual Icelandic-English parallel corpus from Statistics Iceland website
Contents of https://www.statice.is and https://hagstofa.is/ websites downloaded, aligned and converted into parallel corpus This dataset has been created within the framework of the...
ZIP (416 wyświetleń) (317 Pobrane pliki)
-
Monolingual documents from the Government of Lithuania (Processed)
Monolingual documents received from the Government of the Republic of Lithuania. (Processed) This dataset has been created within the framework of the European Language Resource...
ZIP (490 wyświetleń) (376 Pobrane pliki)