Service provider IDs unmasked in open health data, investigation underway

By Stephen Easton

September 29, 2016

abstract, medical, health, health care, hospital, clinic, logo, tech, hi tech, innovation, sci fi, technology, science, blue, rectangle, circle, pattern, cross, sign, banner, background, future, vision, digital, button, communication, template, business, drug, space, electric, movement, move, perspective, dynamic, modern, light, connection, futuristic, vector, illustration, shape, star, power, gradient, machine,

The Department of Health has removed a set of Pharmaceutical Benefits Scheme and Medicare data from the federal open portal after computer security experts were able to decrypt the health service provider identification numbers it contained.

Information commissioner Timothy Pilgrim has been informed and is investigating the matter as well as “providing independent oversight” says the department, which announced the decision this morning. While it is not confirmed what kind of mistakes were made when the data set was uploaded, the agency has taken the right steps by moving swiftly to remove the information and make a public disclosure.

Health reports it is also “undertaking a full, independent audit of the process of compiling, reviewing and publishing” the dataset, which will only be restored when the privacy concerns are resolved. But it added that personal information about patients and service providers was not involved:

“The dataset does not include names or addresses of service providers and no patient information was identified. However, as a result of the potential to extract some doctor and other service provider ID numbers, the Department of Health immediately removed the dataset from the website to ensure the security and integrity of the data is maintained.

“No patient information has been compromised, and no information about the health service providers has been publicly identified or released.”

The department also moved to reassure the researchers who use the data that it would be restored to data.gov.au as soon as possible:

“The Australian Government Department of Health makes the high-value datasets it holds publicly available to enable researchers, the not-for-profit sector and health industries to extract the most value from government data. This is helping to improve health outcomes for all Australians and is popular in the university, research and health-related communities.”

Computer scientists Vanessa Teague, Christopher Culnane and Benjamin Rubinstein of the University of Melbourne’s Department of Computing and Information Systems made the alert. In their own report on the matter, the team says they first notified the department on September 12 and confirm the public servants have done a lot of things right in this case:

“The Department of Health then immediately removed the dataset from the website, opened lines of communication with us to further understand the issue and conducted their own careful investigation of the problem.

“The MBS 10% sample dataset does not include names or addresses, but the implications of exposing individual provider IDs are still serious.

“The encryption algorithm was described online at data.gov.au. That was the right thing to do, because it made it possible for us to identify weaknesses in the encryption method. Leaving out some of the algorithmic details didn’t keep the data secure ­– if we can reverse-engineer the details in a few days, then there is a risk that others could do so too.

“Security through obscurity doesn’t work – keeping the algorithm secret wouldn’t have made the encryption secure, it just would have taken longer for security researchers to identify the problem. It is much better for such problems to be found and addressed than to remain unnoticed.”

They acknowledge the value of open data to the community, but also warn that the privacy risks are great and addressing them is not always a simple task:

“The mathematical details matter: it’s a technically challenging task to understand whether a particular algorithm securely encrypts data or not. Datasets containing sensitive information about individuals clearly deserve more caution than others, and may not always be suitable for open public release. …

“Details about the privacy protections should be published long in advance. They can then be subject to empirical testing, scientific analysis, and open public review, before they are used on real data. Then we can make sound, evidence-based decisions about how to benefit from open data without sacrificing individual privacy.”

About the author

Any feedback or news tips? Here’s where to contact the relevant team.

The Mandarin Premium

Try Mandarin Premium for $4 a week.

Access all the in-depth briefings. New subscribers only.

Get Premium Today