We’re pleased to announce the initial version of OCDS Downloads — a new website that provides access to data published using the Open Contracting Data Standard in analyst-friendly tabular formats.
In order for open data to be usable it needs to be available in a way that is useful for people to work with. OCDS Downloads builds on what we’ve learned from working across a range of open data standards and projects, and on conversations we’ve had with data analysts about how they want to use the data. Through this project, we’re exploring how we can help people get working with OCDS data quickly, without requiring knowledge of the intricacies of OCDS tools and OCDS JSON data before they get started.
Data published using the OCDS is often only available in JSON format. However, in our experience, most analysts prefer to work with data in tabular formats using SQL databases, spreadsheet software or other analytical tools. Therefore, they need to spend time transforming OCDS data into a tabular format they can use before they are able to do any analysis.
Apart from being time consuming, transforming data into a tabular format can also be a problem for analysts that lack the skills required to work with the necessary command-line tools.
To reduce barriers to working with data and to save time for analysts, OCDS Downloads provides access to OCDS data in a range of tabular formats that are familiar to analysts. The tool also seeks to address other key challenges faced by analysts working with OCDS data: understanding which fields are included in a dataset, joining data across one-to-many relationships, and compiling data on changes to get the latest values of each field.
How does it work?
OCDS Downloads collects data from OCDS publishers on a weekly basis, converts it into a tabular structure, and provides access in different formats, including Jupyter Notebooks, database dumps, and CSV files.
To show you how it works, we’ll use the data from AusTender, the Australian Government’s procurement information system, as an example.
The OCDS Downloads homepage lists the available data sources along with summary information on the country and publisher of the data, the number of contracting processes in the data and the available download formats. The homepage also shows when the data was last fetched and whether there were any errors encountered. OCDS Downloads’ processing pipeline is fully transparent — you can check the Airflow dashboard for more information on the status of the pipeline for each source.
You can see that there are two data sources available for Australia. For this example, we’re interested in the AusTender data, which covers more than 400,000 contracting processes.
On the source page, you can find links to download the data in different formats, along with instructions on how to use each download format…
…a list of the tables and fields included in the data, including counts, types and documentation for each field…
…and a summary of data collection and processing errors, and links to detailed logs and statistics from the processing pipeline.
To get started analysing data quickly using your web browser, you can either open a Jupyter Notebook using Google Colaboratory, or you can open the Google BigQuery console to query the data in Google’s cloud data warehouse. The Jupyter Notebooks include example queries to help you get started.
While OCDS Downloads isn’t aimed at complete beginners, by providing notebooks and example queries, we hope it is a starting point for people who want to learn data analysis using “real” data.
You can read more about how to use the data in the documentation.
What about the tech?
OCDS Downloads uses a number of tools by the Open Contracting Partnership , which stewards the OCDS, including: Kingfisher Collect to collect data, OCDS Merge to compile data, and the OCDS Extension Registry Python Package to support each OCDS publisher’s extensions to OCDS. It otherwise uses a combination of Python scripts and a PostgreSQL database for data processing. Apache Airflow manages the processing pipeline and handles scheduling. The data, metadata, logs and the website are held in publicly accessible object storage to make sure they will be quick to access and stay available even if the processing pipeline server has issues.
What’s Next?
While we’ve chosen to focus on the challenges of using OCDS data to begin with, similar challenges impact how people are able to use many different open data sets — so this project is informing our co-operative’s work across a range of data access initiatives.
Next, we’ll be working to add more examples and content to help users understand and explore how they can use OCDS data to meet their needs.
If you’d like to use OCDS Downloads in your own organisation or project, or if you have any feedback or suggestions, please get in touch. If you find a problem or want to request a new feature, you can also open an issue on our public issue tracker.