Session number: 12
Participants: Data journalists, data scientists and those with an interest in finding new data sources
Length: 2-3 hours
Web based exercises: Yes
What to bring: Slides, Web-Connected Laptop
- Defining hidden data - The facilitator should define hidden data for the participants through a discussion of the difference between data on the web (downloadable or ‘traditional Open Data) and data in the web (data found in the code of websites or otherwise embedded). The discussion should look at how data in the web is often made visible through pages but is more difficult to access through traditional means.
- Techniques for extracting hidden data - The facilitator should lead participants through an interactive exploration of some key methods for obtaining data found in the web. Techniques should include adding extensions to URLs, RSS feeds, inspecting source code, content negotiation, APIs and web scraping.
- Tools for data extraction - The participants will undertake directed exploration of data extraction tools such as those listed below as well as any introduced by the facilitator. Example tools may include the ‘Hidden Data Extractor’, PDF tables and magic.import.io.
- Finding Open Data FAQ - Walkthrough about finding Open Data on the web
- magic.import.io - Powerful free webpage scraping tool
- Hidden Data Extractor - Simple data extractor for use on certain dynamic data sites
- pdftables.com - Easily extract data from tables in PDF documents
- enigma.io - Data aggregator with Open Data from across the web
- Transport API - Harmonized portal for transport Open Data with API feed
Companion eLearning Modules:
When running this session, we recommend that participants complete the following eLearning module before attending:
Finding hidden data on the web
Completion of the module will help your learners develop a shared understanding of the material before the course and allow you to focus in greater depth on those topics of most interest to the trainees.