Researchers question Census Bureau’s new approach to privacy

In an age of rapidly advancing computer power, the U.S. Census Bureau recently undertook an experiment to see if census answers could threaten the privacy of the people who fill out the questionnaires.

The agency went back to the last national headcount, in 2010, and reconstructed individual profiles from thousands of publicly available tables. It then matched those records against other public population data. The result: Officials were able to infer the identities of 52 million Americans.

Confronted with that discovery, the bureau announced that it would add statistical “noise” to the 2020 data, essentially tinkering with its own numbers to preserve privacy. But that idea creates its own problems, and social scientists, redistricting experts and others worry that it will make next year’s census less accurate. They say the bureau’s response is overkill.

“This is a brand new, radically more conservative definition of privacy,” University of Minnesota demographer Steven Ruggles said.

Federal law bars census officials from disclosing any individual’s responses. But data-crunching computers can tease out likely identities from the broader census results when combined with other personal information.

Some critics fear the agency’s changes could make it harder to draw new congressional and legislative districts accurately. Others worry that research on immigration, demographics, the opioid epidemic and declining life expectancy will be hindered, particularly when it involves less populated areas.

If the change had been in place four years ago, Ruggles said, he would not have been able to conduct a 2015 study on the impact of declines in young men’s incomes on marriage.

With more and more data sets available to the public with a quick download, it has become easier than ever to match information with real names. That means aggregated answers to census questions involving race, housing and relationships could lead to individuals.

The fear is that advertisers, market researchers or anybody with know-how and curiosity could use data to reconstruct the identities of census respondents.

When the bureau went back to the 2010 census, it matched the census data with commercial databases. More than 1 in 6 respondents were identified by name and neighborhood as well as by information about their race, ethnicity, sex and age.

Since the last census, “the data world has changed dramatically,” Ron Jarmin, deputy director of the census agency wrote earlier this year. “Much more personal information is available online and from commercial providers, and the technology to manipulate that data is more powerful than ever.”

The Trump administration’s unsuccessful effort to add a citizenship question to the 2020 questionnaire heightened fears about how census information would be used. But privacy concerns are nothing new for the bureau.

Historians have found evidence that census data helped identify Japanese Americans who were rounded up and confined to camps during World War II. That revelation led to an apology from then-Census Bureau Director Kenneth Prewitt in 2000.

Jewish groups and some liberal organizations had concerns about privacy when the bureau was lobbied to ask about religion for the 1960 census. Some noted that Nazis had used government and church records to identify and round up Jews. The idea never went anywhere.

During the legal battle over the citizenship question, advocates worried that the information could be used to target residents in the country illegally. Some say lingering concerns could have a chilling effect on the 2020 census.

To address those worries, the bureau has adopted a technique called “differential privacy,” which alters the numbers but does not change core findings to protect the identities of individual respondents.

It’s analogous to pixilating the data, a technique commonly used to blur certain images on television, said Michael Hawes, senior adviser for data access and privacy at the Census Bureau.

Redistricting experts say the mathematical blurring could cause problems because they rely on precise numbers to draw congressional and state and local legislative districts. They also worry that it could dilute minority voting power and violate the Voting Rights Act.

“The numbers might be off by five, 10, 20 people, and if you’re dealing with exact percentages, that could mean something. That could mean a lot,” said Jeffrey M. Wice, a national redistricting attorney. “That’s why we care about it so much.”

In the past, the bureau has used “swapping” and other methods to protect confidentiality. Swapping involves taking similar households in different geographic areas and exchanging demographic characteristics.

Census data does not need to be exact for most purposes, “as long as we know it’s really pretty close,” said Justin Levitt, an election law professor at Loyola Law School in Los Angeles. But “there’s certainly a point where blurry becomes too blurry.”

The bureau has not decided precisely how much blurring will take place, but researchers have already delivered academic papers and organized a petition signed by more than 4,000 scholars, planners and journalists. The petition asked the bureau to include the research community in its discussions.

Michael McDonald, a University of Florida redistricting expert, said people must be assured their data will be kept confidential or they may not respond at all. If respondents do not answer questions for the once-a-decade census in a timely manner, census workers must try to interview them in person.

“We need high response rates to the census,” McDonald said. “If we don’t get them, whatever noise will be moot because we won’t have good data to start with.”