Sharing data and protecting privacy in low- and middle-income countries
“Switzerland doesn’t have a system for sharing open data in the health sciences,” says statistician Matthias Templ of the Zürich University of Applied Sciences (ZHAW). “It’s basically not practiced here, but it’s done by researchers in Malawi. And I think we can learn from them.”
Published: 01.06.2021, Author: Jeannie Wurz
Sharing and protecting data has become a hot topic for individuals, businesses, NGOs, companies, and countries around the world. A 2020 grant for CHF 10,000 from ESTHER Switzerland is supporting a collaboration between the Malawi Epidemiology and Intervention Research Unit (MEIRU) in Malawi and the Zürich University of Applied Sciences (ZHAW) in Switzerland. The partnership is led in Malawi by data documentalist Chifundo Kanjala and in Zürich by Matthias Templ.
MEIRU is a Malawian NGO conducting population health research in urban and rural Malawi. MEIRU studies are regulated by the Malawi National Commission for Science and Technology, and ethical approval is sought for each research project individually. The ethics clearance stipulates the data sharing conditions.
The goal of the six-month project is to improve the methods used to anonymise health and demographic data collected in low-income countries. The partnership came about as Kanjala was preparing MEIRU datasets for sharing.
“I went to the International Household Survey Network website to look for resources about preparing data for sharing, and I found some information describing a software that Matthias and others had prepared for data anonymisation, called sdcMicro. That got me started. But I quickly ran into questions that I couldn’t easily find answers for, and so I wrote an email to Matthias, and we started corresponding.”
Templ is the author of the book Statistical Disclosure Control for Microdata: Methods and Applications in R, published in 2017. “There were already some books available, but there were none that explained the methods mathematically, as well as providing some software applications,” he says. “I tried to give a theoretical basis but also show in practical examples, with software, how the methods work.”
The initial email correspondence led to a joint project designed to strengthen capacity in the area of ethical sharing of data produced in Malawi, using MEIRU data as a prototype. Kanjala and Templ named their project “Better De-Identification Using Statistical Anonymisation (BISA) for Malawi Population Health Data”. As an acronym for the project, Kanjala chose BISA, which means “hide” in Chichewa, the Malawi national language. “I chose it because when you de-identify you’re trying to hide people’s identities,” he says.
The collaboration involves the creation of a template for anonymising data coming out of the health and demographic surveillance systems (HDSS) and population health research groups. It will bring an immediate benefit for MEIRU, says Kanjala, in that the researchers can be confident they’re sharing data that are properly anonymised and meet international standards.
According to Templ’s book, the demand for and volume of data from surveys, registers or other sources containing sensitive information on persons or enterprises has increased significantly over the last several years. At the same time, privacy protection principles and regulations have imposed restrictions on the access to and use of individual data.
Why share data?
There are advantages to sharing data, says Kanjala. “It’s an efficient use of resources, because the second person doesn’t have to collect the data again.” Transparency in science is another very important issue. “It’s good science to share data. The other person can verify what I did,” Kanjala says. “Otherwise I could just release findings. If I share my data, someone can confirm or refute what I’ve claimed.”
“It’s good science to share data. The other person can verify what I did.”Chifundo Kanjala
Funders of research are increasingly requiring that data be shared. However, although there is great value in sharing data for advancing science and deriving maximum benefits from funding, say the partners, “one of the major concerns is the need to preserve the confidentiality of study participants.”
In 2016, in the Open Access journal Scientific Data, stakeholders representing academia, industry, funding organisations and scholarly publishing created and endorsed the ‘FAIR Guiding Principles for scientific data management and stewardship’ which call for research data to be Findable, Accessible, Interoperable and Reusable (FAIR).
Over a decade, the expectations of funders have led to more emphasis on data anonymisation, says Kanjala. “We’ve gone beyond working within our small group of researchers or with close associates. Today we are sharing the data we produce with a wider group of researchers. You may not know them at all. You aren’t aware who is actually using your data and who isn’t. So you want to make the anonymisation of the data as strong as possible.”
Growing interest in data protection
The core concept of data anonymisation consists of transforming data in a way that reduces the re-identification risk for persons in the data set. “In statistics we want to say something about a group of people,” says Templ. “The goal is not to analyse a particular person. We want to analyse some general indicators about health status, for example, from the population.”
Data protection scandals in the 1990’s highlighted the fact that removing or pseudo-anonymising directly identifying attributes such as names, addresses and social insurance numbers is generally not sufficient to prevent data protection violations.
In May 2016 the European Union approved sweeping new data privacy legislation governing “protection of natural persons with regard to the processing of personal data and on the free movement of such data.” The General Data Protection Regulation (GDPR) went into effect in May 2018.
Meanwhile, in Switzerland the revised Federal Act on Data Protection (FADP) was passed by parliament in autumn 2020 and is expected to come into force in mid-2022. Among other things, the new FADP should reinforce citizens’ rights to data protection and privacy and safeguard them in the longer term.
Differences in data collection
Malawi – a low-income country – doesn’t have the resources to organise access to healthcare data for everyone who wants to do research, says Chifundo Kanjala. Guidelines and open source tools such as those implemented in the BISA project can lower the barrier to individual-level health data sharing for research and decision support.
Kanjala works within a team of database programmers and data scientists who manage the collection, validation, processing, and integration of data which researchers can then analyse to understand and support policy and decision-making. The population-based research cohorts they work with produce very complicated data, Kanjala says, and preparing those data for analysis by researchers requires a team.
“When you do a cross-sectional survey,” he says, “you are just interviewing the people one time. You get your information, then you get a dataset out of it. But the population cohorts that we work with involve longitudinal follow-up of study participants over a period of time. So you repeatedly go back to the population. You’re trying to link the data across time.”
There are additional differences in how health research data is collected in LMIC’s, says Kanjala. “Not everyone is literate. You can’t post a questionnaire on-line and expect someone in rural Malawi to complete it and send it back to you. Quite often you will have to send interviewers who are going to sit down with the participants in your study and talk with them (see title photo of blue-shirted interviewer with participant). You will need someone to interview the patient in their local language, and then you will have to translate it into English.”
Benefits of the ESTHER partnership
In contrast to the situation in many low- and middle-income countries in Africa, MEIRU has access to a rich source of health research data collected in a rural area of Malawi over a period of more than 30 years, and an urban research site in the capital which has existed since 2013. It operates a health and demographic surveillance system (HDSS) which combines field and computing procedures for collecting data on demographics, health risks, exposure and outcomes. The results are used to inform national and regional dialogue on preparation of open data in a manner that is sensitive to the confidentiality and privacy of those consenting to participate in research studies.
“Our ESTHER project is the first to think about how we can share health research data in Switzerland.”Matthias Templ
Currently, according to the project proposal, the important topic of privacy of health data has not been explored sufficiently, in LMICs in general and in HDSS in particular. HDSS are currently sharing data with no clear indication of the levels of risk that the identity of persons in a data set could be disclosed.
Switzerland is far behind Malawi in terms of open-access data sharing in health research, says Matthias Templ. “Our ESTHER project is the first to think about how we can share health research data in Switzerland.”
The partners expect there to be great interest in the results of the ESTHER project. “A lot of countries will be looking at what we’re doing in the area of anonymisation and sharing health data,” says Templ. “A lot of people are waiting to see the outcome.”
The HDSS and medical science research communities in LMIC settings will be the primary beneficiaries of the resulting publications and software, but the results will be useful for anyone working on anonymising longitudinal datasets for purposes of sharing, Templ says.
The partnership began with an ESTHER start-up grant, and now Kanjala and Templ are thinking about how to proceed. In order to give data managers and data scientists an overview of the issues involved their project included a four-hour webinar at the end of April 2021. Data specialists from Malawi, South Africa and nearby countries attended.
Traditionally, there has always been someone coming from a university in a high-income country to lead the analysis of the data in north-south health research partnerships, says Kanjala. But there are more and more established African researchers leading groups in LMIC’s. “The landscape is changing,” he says, “but it does take time.”
The researchers are off to a good start. “Our partnership is an example of where sharing data worked very well,” Templ says. “Chifundo used the software because he could use it, because it’s open-source and free, and that led to our international collaboration. This is the success story of open research.”