Local open data: Why and where to start?
When Brett Goldstein started his job as the City of Chicago’s Chief Data Officer in 2011, the city had already taken its first steps in publishing open data. It had an open data portal with some datasets, it had an open data community interested in engaging with the data, but overall not much progress was being made. Everything changed when, in the same year, a new mayor took office, for whom data-driven decision-making and opening up the city’s vast data resources to the public was a priority. Just a few years later, Chicago’s open data program had become a success story which has thereafter attracted worldwide attention.
Although Estonia’s local administrations cannot exactly be compared to Chicago – including resource-wise – several lessons from the city’s experience are still of use. In a story summarizing his experience, Goldstein discusses the steps that helped the city raise its open data program to new heights. First, the city management’s interest in open data secured sufficient resources for the agenda, including recruiting a CDO, who had open data in his portfolio of tasks. Secondly, the city made real efforts to quickly release some of the most demanded datasets as machine-readable open data. It will probably not come as a surprise that crime data was one of them. As Goldstein describes, making crime data available in an easily reusable format so that the data automatically updates every 24 hours was quite a headache, but worth all the pain as it brought a lot of public attention and activated the user community. This rapidly started a snowball effect towards more success.
The former CDO of Chicago stresses the clear benefits that the city has gained from publishing open data. First, local open data help people better plan their daily lives. For example, street sweeping data enables car owners to move their cars and avoid getting a ticket. Second, open data are often used by researchers and students, saving public officials from the need to engage in lengthy correspondence and answer repetitive requests for information. Third, in the long term, open data saves the local administration’s resources as opening up valuable data enables citizens and companies to build exactly the kind of services and applications that best serve the community as a way of ‘self-service’, without the local government needing to dive into the app business.
Coming closer to Estonia, we see that there are only five local municipalities among the more than 100 data publishers in the Estonian national open data portal In addition to the cities of Tallinn and Tartu, who have published tens of datasets, only three entries can be found, linking to the public document registers of three local municipalities. Yet, there is no doubt that every local municipality holds at least some data that are of interest to local citizens – and not only locals! Some of the most useful data for the community are quite mundane and related to people’s daily errands: public transportation schedules, locations of bus stops and bike parking stations, information about traffic and road conditions, public events, but also public maintenance information such as potholes, piping works, fallen trees and other data that can often also be crowdsourced from citizens through FixMyStreet-type of applications. For example, Tallinn’s open data enable everyone moving around in the town to follow the location of public transportation in real time and plan their logistics accordingly. Tartu’s data on the location of bike share stations , on the other hand, allows both locals and visitors to prefer healthy and environmentally friendly mobility options for getting around.
Another key data category is all kinds of spatial data that, among other benefits, help people make choices about where to live, including locations of schools, kindergartens, playgrounds, shops and clinics. For people with reduced mobility, data about shops and buildings with wheelchair ramps may be beneficial. Tourists may want data about restaurants, hotels, tourist information points, locations of tourist sites and entertainment facilities, which can be used as input in local map applications, but also those of global tech giants such as Google.
A third group of valuable data concerns everything that helps shed light on the local administration’s own activities and use of public money. This includes records and minutes of local councils’ and governments’ meetings and their decisions. On the one hand, these inform the public of their municipalities’ work, but such text corpora can also be surprisingly useful in developing language technology – the more such data is available for training machine translation systems, the better translations municipalities later get. This category of course also involves data about the quality of public services, which the government is already centrally collecting and sharing through the minuomavalitsus.fin.ee portal.
Within this category of data, data about the local budget and costs are particularly important. If budget data is released in machine-readable formats, anyone can relatively easily create a web application visualizing key cost and revenue categories for citizens – see, for example, the budget visualization of the Viimsi municipality. In addition to budget figures, it also makes sense to publish the outputs funded from the public budget. In a recent workshop on local municipalities’ data management , local administrations were kindly reminded of their obligation, set in the Public Information Act, to publish studies and analyses funded from their budgets. The user base of such information is often much broader than just the local community.
However, open data should not only be regarded as a gesture of good will towards the public. High-quality data also support the local administration’s own decision-making. Machine-readable data can be used to build various real-time dashboards and web applications (such as the Smart Tartu portal), which even public officials will find more convenient to use than dive into dry Excel spreadsheets. By the way, an excellent example of the utility of open data in the local government’s work can also be found from the City of Chicago. There, open data was used to create a model for optimizing inspections of food-serving establishments, which helped discover critical food safety violations on average 7.4 days earlier than was possible before. Test runs of the model showed that the probability of violations can be predicted using nine key datasets, all of which were freely available in the city’s open data portal – including, for instance, data about the establishments’ previous violations, type of facility, three-day average high temperature, nearby garbage and sanitation complaints, having a tobacco or alcohol consumption license, or length of time since last inspection. Based on these data, the model was able to estimate the likelihood of serious food safety violations and rank establishments based on order of risk, visualizing this information to local officials via a simple Shiny app so that inspections could be targeted accordingly. Once the model’s source code was published on GitHub, , the accuracy of the model was further improved thanks to voluntary contributors.
So, where to start, if your municipality has decided to enter the world of open data?
- As experts stressed in the recent workshop , most municipalities need to start by creating an overview of the data that they hold in the first place. Therefore, as the first step, it would be wise to plan a general review of data to identify what data are stored in which databases, what is the timeliness and accuracy of the data, and to be sure important data is not stuck in the personal computers of some officials who have long left their jobs. Such an audit creates the basis for publishing open data, but also enables to identify datasets that should not be published and need special security measures.
- Secondly, as we learned from the case of Chicago, it is worthwhile to make an effort to ** publish a few particularly valuable datasets in highly usable machine-readable formats**. In addition to the key data categories listed above, the datasets could be selected based on the most frequent freedom of information requests. Thereby, it is possible to reduce the amount of repetitive requests and officials’ workload.
- Once first open datasets have been published, it is important to make them easily findable for users by linking the data to the national open data portal. However, the work should not stop there. A simple way to attract users’ interest in the data would be to organize a small virtual hackathon inviting anyone interested to build useful applications based on the data. In addition to some working time, the costs of such an event perhaps only involve a symbolic monetary prize, but it certainly pays off – the public will see what the data can be used for, while local municipalities get to see who their users are. As the coordinator of Estonia’s open data policy, the Ministry of Economic Affairs and Communications offers local municipalities advice and support on all issues regarding open data and will soon start a pilot project supporting data publication in selected municipalities. Until then, the ministry’s data advisor Sigrit Siht (email@example.com) will be happy to answer municipalities’ questions, concerns and proposals.