AI and Quality of Data
Data scientists spend approximately 80 percent of their time on cleaning and preparing data to make it usable, leaving them with just 20 percent of their time to focus on data analysis. As policymakers pursue national strategies to increase their competitiveness in Artificial Intelligence (AI), they should recognize that any country that wants to lead in AI must also lead in data quality. In our previous news piece on ‘the rise of AI and open data’, we discussed that data is the lifeblood of AI as data provides the information necessary to form algorithms. Therefore, policymakers should view the increasing amount of high-quality data as a valuable opportunity to accelerate AI development and adoption.
There are three ways through which policymakers can increase the amount of high-quality data available for AI:
- Require the public sector to provide high-quality data
- Promote the voluntary provision of high-quality data from the private and non-profit sectors
- Accelerate efforts to digitize all sectors of the economy to support comprehensive data collection
In recent years, policymakers have emphasized the importance of making data available for AI. Open government data can be a valuable platform for innovation, but these datasets also suffer from data quality problems (e.g. a lack of standard identifiers and inconsistent definitions) that make analysis difficult. Thus, policymakers should invest in efforts to improve the government’s existing data, as well as direct government agencies to develop shared pools of high quality, application-specific training and validation data.
Looking for more open data related news? Visit the EDP news archive and follow us on Twitter, Facebook or LinkedIn. Interested in learning more about COVID-19 and open data? Visit our EDP for COVID-19 page.