How open data standards can help mitigate climate and disaster risk

Between 2009 and 2019, around two billion people were affected by disasters. As the climate emergency increases the frequency and intensity of these events, it’s more important than ever to have access to better data, so that governments and humanitarian organisations can effectively prevent, mitigate and manage the impact of them.

Assessing risk is an expensive and data-intensive task. When information is created in closed systems, the limited resources available for managing disasters are absorbed into creating, finding and using data. By making it easier to find open data about disaster risk, we can help free up resources that could be better spent managing exposure and impact.

Open Data Services have worked with the Global Factility for Disaster Reduction and Recovery (GFDRR) to do just that, by supporting the development of the Risk Data Library Standard – a global open data standard that helps risk data publishers to describe their datasets, in turn making it easier for risk analysts to identify data that might be useful to their work.

The standard refresh officially launches this week — so to mark the occasion, we’re exploring:

the state of climate and disaster risk data
why publishing metadata matters
how the Risk Data Library Standard can help make climate and disaster risk management and financing easier, cheaper and more effective.

The history climate and disaster risk management data

Collecting and using data about hazards isn’t a new endeavour. Before computers, insurance companies would use pins on a map to visualise the location of underwritten properties, and minimise their exposure to risk.

As technology has advanced, so have the methodologies we use to assess and understand risks caused by natural hazards. In the 1960s and 1970s, insurance companies developed catastrophe models (also known as cat models) to understand their financial exposure to disasters. Initially focusing on hurricanes and earthquakes, cat models use data to quantify, price and manage exposure to extreme events.

Cat models use four types of risk data: hazard, exposure, vulnerability and loss.

Hazard data describes the processes or phenomena that may result in impacts such as loss of life, property damage and social and economic disruption — for example, an earthquake.
Exposure data describes the situation of people, infrastructure and other tangible assets in hazard prone areas — for example, the number of people living near a fault zone of an earthquake.
Vulnerability data describes the susceptibility of exposed people and assets to the impacts of hazards — for example, how likely are homes to collapse following an earthquake.
Loss data describes the damage caused when a hazard occurs. Losses can be physical (for example, how much would it cost to repair buildings after an earthquake) and social (for example, the impact of an earthquake on the health, wellbeing and livelihoods of people living in a fault zone).

Cat models now incorporate a whole range of disasters, from floods and tsunamis, to droughts and wildfires. They’ve also reached beyond the insurance industry and its focus on purely financial losses — governments use their outputs to understand where to direct infrastructure and spending, and humanitarian organisations use their outputs to plan responses to actual and potential events. As a result, cat models, and the data that supports them, are essential digital infrastructure.

Making it easy to find disaster risk metadata

In the past decade, the disaster risk management community recognised the value of opening up the models and data that makes risk analysis possible.

Opening up risk management data provides a number of benefits, including increased confidence, maximised return on investment, and the ability to find and fill data gaps. This is particularly important given the dynamic nature of risk assessment — in a world where built environments, infrastructure and populations constantly change, open data provides more opportunities to adapt and improve the data we use to model risk.

But it’s not enough to just publish risk datasets under an open licence. To be truly useful, this data needs to be easy to find, understand and use.

Risk datasets are produced by many different organisations, through many different types of projects. There are a number of risk data catalogues that bring together risk management datasets for example the World Bank’s Risk Data Library Catalog and the Pacific Risk Information System. But without a consistent way to describe the kind of information a dataset holds — for example, the geographical areas covered by a dataset, or the type of calculation used to model a hazard — it can be difficult for analysts to easily find data that is relevant to them.

The Risk Data Library Standard aims to solve this problem, by creating a common structure and format for publishing metadata about risk datasets, that people and machines can interpret.

Developing the Risk Data Library Standard

The Risk Data Library Standard was created by the Global Facility for Disaster Reduction and Recovery in 2021. It’s a standardised JSON schema that can be used to describe datasets used in climate and disaster risk assessment across the globe. As catastrophe modelling uses the four common components we described earlier — hazard, exposure, vulnerability and loss — they’re used as the building blocks of the standard.

Our team joined the project in November 2022 as technical leads, bringing our experience building sustainable and impactful open data standards. By working closely with the risk experts from GFDRR, we’ve laid the foundations for a robust data standard that can be used by anyone working with disaster risk information.

Clarifying use cases, terminology and scope

We worked with GFDRR to improve the RDLS data model, clarifying the purpose and meaning of concepts and fields in the standard, and the relationships between them. For example, a key concept in hazard data is how often a hazard is likely to occur. This can be based on records of real events, or calculated as a probability from a mathematical model. The value can be expressed in a number of ways, with different domains within disaster risk modelling favouring different terms.

We clarified the use cases and terminology for this concept through an openly documented discussion with both GFDRR and other experts from the Risk Data Library steering committee. This open and asynchronous way of working allowed us to collaboratively draft a refined data model that can capture this concept in a wide range of scenarios.

We also worked to simplify the RDLS data model and to clarify its scope by focusing on standardising metadata, i.e. data that describes risk datasets, rather than standardising the content of the datasets themselves. This clarity of purpose reduces repetition and the associated risk of inconsistencies, easing the publication and use of RDLS metadata and risk datasets. We worked closely with the GFDRR team to review each field in the data model, to reach consensus on whether it related to metadata or data, and to remove non-metadata fields. Through this process, we also identified and resolved ambiguous semantics in field titles and descriptions.

Building a robust schema

With a robust data model in place, we then worked to refine the JSON schema, which describes the format and structure of RDLS metadata. We made sure the schema was technically valid, by fixing broken references and missing definitions. We also ensured that the schema was fully documented, with titles, descriptions and type information for all fields in the standard. To ensure the schema meets best practice for interoperability, we aligned it with existing standards for data catalogue metadata, such as the W3C’s Data Catalogue Vocabulary.

The result is a standard that is technically sound, easier to use and clearer to understand. The improvements we made meant we could build a single source of truth for the data standard. This allows us to build automated tests that ensure the schema, codelists, documentation and supporting files are valid and consistent with each other. It also means we can make sure that documentation is kept up to date, even when we make changes to the schema.

Making it easy to use the Risk Data Library Standard

In order to be impactful, data initiatives need to be supported by the documentation, tools and software that help people work with data. With a robust data model and schema in place, it’s much easier to build and maintain these tools.

We’ve built tools that help people to publish metadata using the Risk Library Data Standard. The RDLS spreadsheet template helps provide a way for publishers to create metadata using familiar and widely available tools, instead of having to write JSON. Then, by using the Risk Data Library Standard Converter publishers can convert the spreadsheets into JSON, and get feedback on the validity of their data. As a result, we’ve made it easier for people creating risk datasets to publish metadata about them to the Risk Data Library.