Resources for Language Technologies
-
Cuimhne Aistriúcháin Ard-Stiúrthóireacht an Aistriúcháin (DGT-TM)
Cuimhne aistriúcháin is ea DGT-TM (abairtí agus na haistriúcháin a cuireadh orthu) atá ar fáil i 24 theanga. Sa chuimhne seo tá píosaí ón Acquis Communautaire, corpas reachtaíochta an...
PDF ZIP (45005 amharc) (4502 Íoslódálacha)
-
COVID-19 multilingual terminology in IATE
The dataset is a collection of multilingual entries related to the SARS-CoV-2 virus and the COVID-19 pandemic, available in IATE, the European Union terminology database. It is a...
Excel XLSX (1490 amharc) (122 Íoslódálacha)
-
Romanian – English parallel wordlists
English and Romanian lemmatized wordlists extracted from various resources (including RO-EN Wordnets, the Romanian – English news corpus, the Romanian – English literature corpus, and...
ZIP (885 amharc) (765 Íoslódálacha)
-
Letter of rights for persons arrested on the basis of a European Arrest Warrant (Processed)
Letter of rights for persons arrested on the basis of a European Arrest Warrant (EAW), 1 page, (Processed) This dataset has been created within the framework of the European Language...
ZIP (666 amharc) (557 Íoslódálacha)
-
Letter of rights for persons arrested and or detained
Police form, 12 pages. This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation...
ZIP (413 amharc) (296 Íoslódálacha)
-
Romanian – English literature corpus
Bilingual Romanian - English literature corpus built from a small set of freely available literature books (drama, sci-fi, etc.). The texts are positionally aligned, i.e. the sentence on...
ZIP (411 amharc) (321 Íoslódálacha)
-
Romanian – English parallel wordlists (Processed)
English and Romanian lemmatized wordlists extracted from various resources (including RO-EN Wordnets, the Romanian – English news corpus, the Romanian – English literature corpus, and...
ZIP (297 amharc) (198 Íoslódálacha)
-
EIR Romanian-English TM (ECHR-33234/12) (Processed)
Converted ECHR translation memory EN-RO (CASE OF AL NASHIRI v. ROMANIA - Application no. 33234/12); This dataset has been created within the framework of the European Language Resource...
ZIP (276 amharc) (169 Íoslódálacha)
-
EIR Romanian-English Newsletter (2009-March 2011) (Processed)
Translation units were extracted from a collection of 392 files (386 Word and 6 Excel files) in the domain of European affairs (the main 4 EIR’s key areas: studies, training, translation...
ZIP (280 amharc) (180 Íoslódálacha)
-
Parallel Global Voices (English - Romanian) (Processed)
Parallel Global Voices EN-RO is a parallel corpus generated from the Global Voices multilingual group of websites (http://globalvoices.org/), where volunteers publish and translate news...
ZIP (280 amharc) (181 Íoslódálacha)
-
Monolingual Romanian corpus in the public administration domain (Processed)
Monolingual Romanian corpus, containing 360833 sentences (9064764 words) in the public administration domain. This dataset has been created within the framework of the European Language...
ZIP (295 amharc) (186 Íoslódálacha)
-
Romanian Parliament Transcripts 1996-2018 (Processed)
The data is obtained from cdep.ro website and contains 500k+ instances of speech from the parliament podium from 1996 to 2018. Sentence splitting and deduplication onm sentence level have...
ZIP (210 amharc) (127 Íoslódálacha)
-
Parallel texts from Swedish Labour market agency (Processed)
Parallel texts, all in pdf files, have been gathered from Arbetsförmedlingen. The language of each document is indicated in its title. The original version is always in Swedish (with...
ZIP (334 amharc) (237 Íoslódálacha)
-
EUIPO - Trade mark Guidelines (October 2017) (English-Romanian) (Processed)
The EUIPO Guidelines are the main point of reference for users of the European Union trade mark system and professional advisers who want to make sure they have the latest information on...
ZIP (239 amharc) (141 Íoslódálacha)
-
EIR terminology (banking) (RO-EN) (Processed)
banking terms (RO, EN) This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation...
ZIP (261 amharc) (173 Íoslódálacha)
-
Parallel texts from Swedish Work environment Authority (Processed)
Parallel texts from the Swedish Work Environment authority, all in pdf format. Original in Swedish, all the other texts are translations. One original with translations per folder....
ZIP (615 amharc) (487 Íoslódálacha)
-
Letter of rights for persons arrested and or detained (Processed)
Collection of transaltion units (1906 in total) in 21 language pairs extracted from 7 Police forms (one form 12 pages long in each of the following languages: BG, EL, EN, FR, LV, PL, RO)....
ZIP (452 amharc) (338 Íoslódálacha)
-
Parallel texts from Swedish Labour market agency. Part 2 (Processed)
Same as part 1, but with the Readme-file. (Processed) This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility...
ZIP (439 amharc) (332 Íoslódálacha)
-
EIR terminology (legal) (RO-EN) (Processed)
legal terminology terminology (CJUE: legal glossary and entries extracted from the Treaty of Lisbon; RO, EN) This dataset has been created within the framework of the European Language...
ZIP (191 amharc) (121 Íoslódálacha)
-
EIR Romanian-English SPOS (2011-2017) (Processed)
Translation Units were extract from 18 Word files (9 Romanian and 9 English) in the field of European Affairs - Strategy and Policy Studies (SPOS); 101 849 words (in Romanian) This...
ZIP (251 amharc) (150 Íoslódálacha)