Maltese-English website parallel corpus (Processed)
Popis
This is a parallel corpus of bilingual texts crawled from multilingual websites, which contains 26,622 TUs. Date of crawling : 16/12/2016 A strict validation process has been followed, which resulted in discarding: - TUs from crawled websites that do not comply with the PSI directive, - TUs identified during the manual validation process and all the TUs from websites which error rate in the sample extracted for manual validation are strictly above the following thresholds: 50% of TUs with language identification errors, 50% of TUs with alignment errors, 50% of TUs with tokenization errors, 20% of TUs identified as machine translated content, 50% of TUs with translation errors.
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) actions SMART 2014/1074 and SMART 2015/1091. For further information on the project: http://lr-coordination.eu.
Oblasti Eurovoc
- Identifikátor
- ELRC_806
- Úvodní strana
- http://data.europa.eu/euodp/en/data/dataset/elrc_806
- Datum posledních úrav
- 2018-09-28
- Jazyk
- maltština, angličtina
- Catalogue
- European Union Open Data Portal