Getting the best outcomes from integrated datasets
An astronomical 64.2 zettabytes of data was estimated to have been created, consumed and stored in 2020. By 2025, global data creation will grow to more than 180 zettabytes.
To better understand the current volume of data being created and its extraordinary growth over the next five years, one zettabyte is one sextillion (1 with 21 zeroes) bytes. If you imagine that one byte was represented by one star, the data generated in 2020 is within the estimated number of stars in the observable universe.
If we look back to 2002, at that time, the amount of digital data generated globally was about five terabytes. Creating that amount of data today takes two days, most likely less.
Government agencies collect data by administering policies and programs and use this data to support policy evaluation. However, legislative, privacy and governance constraints often limit the scope of impacts that can be measured and evaluated; secondary effects often remain unobserved. A housing policy evaluated on housing stability for low-income families may not consider the impacts of housing stability on employment prospects, education outcomes, and lifetime tax contributions which fall outside the remit of the policy.
As members of the Australian public, we are more than just taxpayers, Medicare numbers or welfare recipients. Our interactions with government services are multi-faceted. Though unique, our experiences through the different systems share a commonality that can be used to inform policy development. For example, changes to health policy may have different impacts on people depending on their health and education history, employment status, income, location, and other factors.
The 2017 Productivity Commission on Data Availability and Use highlighted the increasing importance of integrated data to ‘…facilitate the development of new products and services, enhance consumer and business outcomes, better inform decision making and policy development, and facilitate greater efficiency and innovation in the economy.’
The challenge was connecting nationally significant population datasets to maximise the value of existing data for policy and research in a way that allowed for open but safe sharing. How can the government best tap into this powerful resource to unpack the complex challenges of the modern world?
Enter integrated data
Like how astronomers of old would see connections between stars to tell a story, integrated data allows the policy outcome to be analysed in the context of broader explanatory data. As a result, we can build a holistic picture and support citizen-centric policy reform by linking datasets to the individual.
Across the Tasman, New Zealand’s Social Wellbeing Agency (SWA) considers social reform policy through the Integrated Data Infrastructure (IDI) to quantify the social return on policy. With the IDI established as the common evidence base for whole-of-government policy research, SWA can bring together social science research from across government into a centralised, promoting innovation and connecting researchers and policy-makers.
Back home in Australia, there are several integrated datasets available or under development that safely and securely integrate data across the commonwealth and, in some cases, state jurisdictions, including:
- The Multi-Agency Data Integration Project (MADIP). The Australian Bureau of Statistics (ABS) has developed the MADIP, which combines information on health, education, government payments, income and taxation, employment, and population demographics over time.
- The National Integrated Health Services Information Analytical Asset (NIHSI). The Australian Institute of Health and Welfare (AIHW) has developed the NIHSI, which combines state and territory hospital data with national health administrative datasets, including Medical Benefits Schedule and Pharmaceutical Benefits Scheme, residential aged care and the National Death Index.
- National Disability Data Asset (NDDA). The Department of Social Services has worked with the ABS and AIHW to create the disability data asset to provide a complete picture of the supports and services used by people with disability.
- The Business Longitudinal Analysis Data Environment (BLADE). The Department of Industry Science and Resources has developed BLADE, a statistical resource containing information on Australian businesses.
- The Linked Employer-Employee Dataset (LEED). The ABS has created LEED to combine employer information from BLADE and integrate it with employee information from Personal Income Tax data into a linked dataset.
These datasets are already being used to address critical policy and research questions. The Department of Education has been working with the MADIP to understand drivers of early childhood developmental risks better, identify socioeconomic factors affecting student pathways to employment, and improve fairness in non-government school funding models.
Access to large and detailed datasets allows researchers to conduct robust investigations without the ethical implications of a randomised controlled trial on the Australian population.
While some agencies have long used integrated data to support policy research and evaluation, others are just beginning their journey. The challenge for these agencies is approaching the data to get the best value.
Sailing by the stars
Like celestial navigation, method, observation and measurement are critical to success. Agencies are at different levels of maturity in using integrated datasets effectively, and there is a need for a structured approach. Those seeking to use integrated data for research and policy development should consider the following:
- Is data available in a format and structure we can use for analysis and reporting?
- Can decision-makers access trusted information products that provide actionable insights?
- Do we have the capability to build and maintain predictive or prescriptive data models over the long term?
- Are analyses and insights integrated into business decision-making, with a clear understanding of the applications and limitations?
- Are accountabilities, policies and quality standards clear and well-understood?
For example, an agency was interested in conducting research across several broad policy themes. Our task was to develop medium-term scoping plans to inform future research and develop evidenced-based policy decisions and generate a shared knowledge base, both of findings and research considerations, as well as analytical methods.
A structured and repeatable process was needed to ensure shared understanding across the various stakeholder groups of the critical questions to be addressed. Scoping plans were developed for each theme, encompassing the project’s scope, literature and data scans, proposed analytical methodology, project management, and anticipated research outcomes. Cross-theme knowledge-sharing workshops were held to ensure better practice and increased learning opportunities.
The agency project lead noted that: ‘…access to integrated unit record data allows us to delve into the relationships between measures and apply statistical methods to organise the breadth of data into digestible insights for our stakeholders. These types of insights can support targeted policy development and evaluation.’
While the steps may seem basic, this structured and repeatable approach resulted in accelerated knowledge and skill development, transparent and defensible research methodologies, and iteratively refined data requirements to support policy development and decision-making.
Clarity of scope sets the research team up for robust data governance, creating greater trust and validity in the research outcomes and establishing an evidence base for evaluating policy.
Commonwealth data in integrated datasets can improve the lives for all Australians. The agencies developing and maintaining these datasets continue to improve the quality and breadth of data available to policymakers and researchers. However, these data-driven insights are only as good as those who can extract them. There is an ongoing need to invest in data and analytic skills across the Australian Public Service to ensure the datasets are approached using research methodologies to maximise the value of the data and bring new insight to policy.
David Lim is an executive director and co-lead of Synergy Group’s data and analytics community of practice. He has extensive experience working in private and public sectors with all levels of government, in Australia and abroad.
Disinformation, deep fakes and the deception economy: cyber’s new reality
- Give them nothing: Data governance practices to stop hackers in their tracks
- How geopolitics is shaping government cyber agendas
- The role of data science in optimising health and other government services
- Where radicalisation starts and how fake news spreads
- Why quiet quitting is a cybersecurity risk
- Cybersecurity ‘crowded pitch’ creates complexity
- Inside Australia’s digital identity journey
- Outside the algorithm: disinformation, deception and influence
- With digital ID top of mind, have government biometrics finally beaten fraud?
- In myGov we trust: Labor puts its faith in digital
- Getting the best outcomes from integrated datasets
- Why the right messaging can help bureaucracies counter disinformation