Text size: A A A

Getting the best outcomes from integrated datasets

An astronomical 64.2 zettabytes of data was estimated to have been created, consumed and stored in 2020. By 2025, global data creation will grow to more than 180 zettabytes.

To better understand the current volume of data being created and its extraordinary growth over the next five years, one zettabyte is one sextillion (1 with 21 zeroes) bytes. If you imagine that one byte was represented by one star, the data generated in 2020 is within the estimated number of stars in the observable universe.

If we look back to 2002, at that time, the amount of digital data generated globally was about five terabytes. Creating that amount of data today takes two days, most likely less.

Government agencies collect data by administering policies and programs and use this data to support policy evaluation. However, legislative, privacy and governance constraints often limit the scope of impacts that can be measured and evaluated; secondary effects often remain unobserved. A housing policy evaluated on housing stability for low-income families may not consider the impacts of housing stability on employment prospects, education outcomes, and lifetime tax contributions which fall outside the remit of the policy.

As members of the Australian public, we are more than just taxpayers, Medicare numbers or welfare recipients. Our interactions with government services are multi-faceted. Though unique, our experiences through the different systems share a commonality that can be used to inform policy development. For example, changes to health policy may have different impacts on people depending on their health and education history, employment status, income, location, and other factors.

The 2017 Productivity Commission on Data Availability and Use highlighted the increasing importance of integrated data to ‘…facilitate the development of new products and services, enhance consumer and business outcomes, better inform decision making and policy development, and facilitate greater efficiency and innovation in the economy.’

The challenge was connecting nationally significant population datasets to maximise the value of existing data for policy and research in a way that allowed for open but safe sharing. How can the government best tap into this powerful resource to unpack the complex challenges of the modern world?

Enter integrated data

Like how astronomers of old would see connections between stars to tell a story, integrated data allows the policy outcome to be analysed in the context of broader explanatory data. As a result, we can build a holistic picture and support citizen-centric policy reform by linking datasets to the individual.

Across the Tasman, New Zealand’s Social Wellbeing Agency (SWA) considers social reform policy through the Integrated Data Infrastructure (IDI) to quantify the social return on policy. With the IDI established as the common evidence base for whole-of-government policy research, SWA can bring together social science research from across government into a centralised, promoting innovation and connecting researchers and policy-makers.

Back home in Australia, there are several integrated datasets available or under development that safely and securely integrate data across the commonwealth and, in some cases, state jurisdictions, including:

  • The Multi-Agency Data Integration Project (MADIP). The Australian Bureau of Statistics (ABS) has developed the MADIP, which combines information on health, education, government payments, income and taxation, employment, and population demographics over time.
  • The National Integrated Health Services Information Analytical Asset (NIHSI). The Australian Institute of Health and Welfare (AIHW) has developed the NIHSI, which combines state and territory hospital data with national health administrative datasets, including Medical Benefits Schedule and Pharmaceutical Benefits Scheme, residential aged care and the National Death Index.
  • National Disability Data Asset (NDDA). The Department of Social Services has worked with the ABS and AIHW to create the disability data asset to provide a complete picture of the supports and services used by people with disability.
  • The Business Longitudinal Analysis Data Environment (BLADE). The Department of Industry Science and Resources has developed BLADE, a statistical resource containing information on Australian businesses.
  • The Linked Employer-Employee Dataset (LEED). The ABS has created LEED to combine employer information from BLADE and integrate it with employee information from Personal Income Tax data into a linked dataset.

These datasets are already being used to address critical policy and research questions. The Department of Education has been working with the MADIP to understand drivers of early childhood developmental risks better, identify socioeconomic factors affecting student pathways to employment, and improve fairness in non-government school funding models.

Access to large and detailed datasets allows researchers to conduct robust investigations without the ethical implications of a randomised controlled trial on the Australian population.

While some agencies have long used integrated data to support policy research and evaluation, others are just beginning their journey. The challenge for these agencies is approaching the data to get the best value.

Sailing by the stars

Like celestial navigation, method, observation and measurement are critical to success. Agencies are at different levels of maturity in using integrated datasets effectively, and there is a need for a structured approach. Those seeking to use integrated data for research and policy development should consider the following:

  • Is data available in a format and structure we can use for analysis and reporting?
  • Can decision-makers access trusted information products that provide actionable insights?
  • Do we have the capability to build and maintain predictive or prescriptive data models over the long term?
  • Are analyses and insights integrated into business decision-making, with a clear understanding of the applications and limitations?
  • Are accountabilities, policies and quality standards clear and well-understood?

For example, an agency was interested in conducting research across several broad policy themes. Our task was to develop medium-term scoping plans to inform future research and develop evidenced-based policy decisions and generate a shared knowledge base, both of findings and research considerations, as well as analytical methods.

A structured and repeatable process was needed to ensure shared understanding across the various stakeholder groups of the critical questions to be addressed. Scoping plans were developed for each theme, encompassing the project’s scope, literature and data scans, proposed analytical methodology, project management, and anticipated research outcomes. Cross-theme knowledge-sharing workshops were held to ensure better practice and increased learning opportunities.

The agency project lead noted that: ‘…access to integrated unit record data allows us to delve into the relationships between measures and apply statistical methods to organise the breadth of data into digestible insights for our stakeholders. These types of insights can support targeted policy development and evaluation.’

While the steps may seem basic, this structured and repeatable approach resulted in accelerated knowledge and skill development, transparent and defensible research methodologies, and iteratively refined data requirements to support policy development and decision-making.

Clarity of scope sets the research team up for robust data governance, creating greater trust and validity in the research outcomes and establishing an evidence base for evaluating policy.

Commonwealth data in integrated datasets can improve the lives for all Australians. The agencies developing and maintaining these datasets continue to improve the quality and breadth of data available to policymakers and researchers. However, these data-driven insights are only as good as those who can extract them. There is an ongoing need to invest in data and analytic skills across the Australian Public Service to ensure the datasets are approached using research methodologies to maximise the value of the data and bring new insight to policy.

David Lim is an executive director and co-lead of Synergy Group’s data and analytics community of practice. He has extensive experience working in private and public sectors with all levels of government, in Australia and abroad.

Disinformation, deep fakes and the deception economy: cyber’s new reality

data governance
High-profile hacks have caused a realisation within public and private organisations that they need better data practices to reduce their vulnerability.
cyber
In the wake of increasing cybercrime, awareness and security are keeping pace. But growing tensions around the world are making an impact.
data science
There is a significant benefit all departments of government would enjoy from the establishment of inter-agency linked datasets in Australia.
radicalisation
Misinformation and disinformation have been weaponised by extremist fringe groups and state-based actors alike.
quiet quitting
Human behaviour in the modern workplace - think the quiet quitting phenomenon - is having an effect on cybersecurity risk.
cybersecurity
The protection of data requires all hands on deck, but a variety of cybersecurity organisations can lead to confusion and possibly conflict.
digital identity
We need the convenience and fluidity of digital engagement, but if our ID credentials aren’t robust and secure we face a serious problem.
disinformation
Privacy activists are pushing for better big tech regulations, while the industry continues viewing human behaviour data as a business asset.
biometrics
The federal government is working on a "portable, safe and secure" national digital ID program. But what will it take to truly defeat fraud?
myGov
The federal government seems to be serious about improving digital services, but can it deliver?
datasets
The integration of data can present a wealth of opportunities. But maximising the value of datasets requires investment in analytic skills.
disinformation
The challenge for the public service is to put more effort into increasing trust in reliable sources of information rather than fighting misinformation and disinformation with impersonal facts.