But what about the data in aggregate? The simplest way to combine data from multiple users is to average them. For example, the most popular period tracking app, Flo, has an estimated 230 million users. Imagine three cases: a single user, the average of 230 million users, and the average of 230 million users plus 3.5 million users submitting junk data.
An individual’s data may be noisy, but the underlying trend is more obvious when averaged over many users, smoothing out the noise to make the trend more obvious. Junk data is just another type of noise. The difference between the clean and fouled data is noticeable, but the overall trend in the data is still obvious.
This simple example illustrates three problems. People who submit junk data are unlikely to affect predictions for any individual app user. It would take an extraordinary amount of work to shift the underlying signal across the whole population. And even if this occurred, poisoning the data risks making the app useless for those who need it.
Other approaches to protecting privacy
In response to people’s concerns about their period app data being used against them, some period apps made public statements about creating an anonymous mode, using end-to-end encryption, and following European privacy laws.
The security of any “anonymous mode” hinges on what it actually does. Flo’s statement says that the company will de-identify data by removing names, email addresses, and technical identifiers. Removing names and email addresses is a good start, but the company doesn’t define what they mean by technical identifiers.
With Texas paving the road to legally sue anyone aiding anyone else seeking an abortion, and 87% of people in the U.S. identifiable by minimal demographic information like ZIP code, gender, and date of birth, any demographic data or identifier has the potential to harm people seeking reproductive health care. There is a massive market for user data, primarily for targeted advertising, that makes it possible to learn a frightening amount about nearly anyone in the U.S.
While end-to-end encryption and the European General Data Protection Regulation (GDPR) can protect your data from legal inquiries, unfortunately, none of these solutions help with the digital footprints everyone leaves behind with everyday use of technology. Even users’ search histories can identify how far along they are in pregnancy.
What do we really need?
Instead of brainstorming ways to circumvent technology to decrease potential harm and legal trouble, we believe that people should advocate for digital privacy protections and restrictions of data usage and sharing. Companies should effectively communicate and receive feedback from people about how their data is being used, their risk level for exposure to potential harm, and the value of their data to the company.
People have been concerned about digital data collection in recent years. However, in a post-Roe world, more people can be placed at legal risk for doing standard health tracking.
Katie Siek is a professor and the chair of informatics at Indiana University. Alexander L. Hayes and Zaidat Ibrahim are Ph.D. student in health informatics at Indiana University.