Open Data Services Logo

Introducing our new Python library: Spreadsheet Forms

Have you ever been in a situation where you wanted to collect some data from people? Was the best thing to do to send them a spreadsheet, ask them to fill it in and then send it back? We have seen situations like this and we wanted to make a library to make this easier to work with.

And thus we introduce our new Python library — Spreadsheet Forms.

How it works

Alice wants to collect data about Impact Bonds.

Alice starts with what we call a “guide spreadsheet”. This is a spreadsheet with as much extra information, style and formatting as Alice wants to include. Crucially Alice should also mark special cells where she wants people to put in information.

In the example below, Alice is using one spreadsheet with two tabs — she is collecting general information in one tab, and information about organisations in another tab.

impact_bonds_general_info.webp

impact_bonds_organisations.webp

As we work through this example, you’ll see that the SINGLE keyword is turned into a field, but the DOWN row is turned into a list. The DOWN table can have as many rows as needed.

The library then generates a blank spreadsheet based on Alice’s guide:

impact_bonds_blank_spreadsheet.webp

Alice can send this blank spreadsheet to Bob, Cian and Derek and ask them to fill it in with their data. Bob is involved with an impact bond and can provide some information:

impact_bonds_excel.webp

impact_bonds_excel_organisation_info.webp

At this point Bob is just working with a spreadsheet, something they are probably already very familiar with. This means Alice isn’t putting a burden on her data contributors to learn a new system.

People then send the spreadsheets back to Alice. Now Alice can give the library these spreadsheets and the original “guide spreadsheet” and the library will extract their answers and put them into a JSON object for Alice to easily process:

impact_bonds_json.webp

Alice’s system can then manage this JSON data in whatever way she needs, while her data contributors don’t have to do anything more high tech than edit a spreadsheet — something most of them will already be familiar with.

(This example shows Alice starting with no data; but maybe she has existing data that she wants people to check and update? For that she could take the existing data in JSON form and the “guide spreadsheet” and the library will make a spreadsheet with existing data for her to send to people.)

spreadsheet_forms.webp

Wait, why?

This means that people you contact can edit data in a medium they are very familiar with — a simple spreadsheet. But when it comes back to you, you don’t have to manually process the data results.

For example we have used this for our work for the INDIGO project from the Government Outcomes Lab (GO Lab) at the University of Oxford’s Blavatnik School of Government. The INDIGO project is building up a knowledge base of data on Impact Bonds. They regularly ask external people to contribute data and this is where this really helps.

INDIGO staff can download spreadsheets and send them out to external contributors. These external parties often have to check and approve the data for release carefully — they can send the spreadsheet around among themselves as much as they want until they are confident it is finished and correct. INDIGO staff then upload the spreadsheet to the website, where it is checked before data is extracted and stored.

Is this really a good way of doing data entry?

There are advantages over a web form — someone working with a single spreadsheet can already do things like work with tables, save progress as they go, make copies, or show it to other people for approval before submitting it. If the data entry was in a web form, these features would have to be specially programmed in.

This also allows people to work offline. I regularly have trouble with internet access when taking a train between Scotland and England, for instance, and as we work on international data standards that are used around the world we have to consider all kinds of situations.

But there are disadvantages — some data models are hard to set up well. For instance having one data object refer to another one gets tricky.

You also have to be careful to do type checking well; you may be expecting a number for a value, but you may get back a number in a string. The library has an option to format dates a certain way in the JSON, for instance.

Also, people may be tempted to change the form if their circumstances don’t quite fit — maybe add extra fields or add notes. This is worth paying attention to as that may reveal that your current form design is not good enough. But however you respond, you want your library to catch this and be resilient. We can think of new features to add to help with this.

Whether this is the right tool for you really depends on your circumstances.

PS. Flatten Tool

By the way you may know that we also maintain Flatten Tool, another Python library. So what’s the difference? That tool also converts spreadsheet data to JSON and back, but crucially it responds to what you put in your spreadsheet. If you have added an extra column in your spreadsheet, in Flatten Tool we automatically make extra fields in the JSON (and vice versa). That is great for many situations and we use Flatten Tool heavily.

However in Spreadsheet Forms the point is that the structure of the data is set and people should not be entering data outside that structure. In Spreadsheet Forms if people enter extra data that will produce an error, whereas in Flatten Tool we try to accommodate extra data.

And finally, Flatten Tool only lets you add limited instructions and is stricter about the form it expects. With Spreadsheet Forms, you can arrange the spreadsheet however you want so it looks like a user-friendly form, and you can add as much extra information as you want to help your data contributors.

Have a look

You’ll find both Tools on GitHub and PyPi — have a look and get in touch!

Spreadsheet Forms:

Flatten Tool: