The Census ‘Time Bomb’ Awaiting Joe Biden

The Bureau’s well-intentioned efforts to keep 2020 data confidential are making the numbers unusable, experts say.
Start your day with TPM.
Sign up for the Morning Memo newsletter

When Utah lawmakers had the chance to kick the tires on a new privacy program the Census Bureau is planning for the 2020 data, what they found was troubling.

A test in February on a version of the program that was run on 2010 census numbers produced a major distortion of what cities and towns in rural Utah look like. The populations in the tiny towns of Tabiona and Alton were shrunk to less than half their actual sizes. Twenty cities overall saw 20 percent or more of their inhabitants wiped away, and the state incurred a net population loss of nearly 15,000. The smallest town in Utah, the former mining town of Scofield, saw its inhabitants double from 24 to 52.

The Census Bureau has since made some tweaks to the system that have mitigated some of these sledge-hammer effects. Yet an intense but under the radar debate over the program — known as differential privacy — has continued to roil the larger census community. The chaos that the Trump administration brought to other aspects of the decennial survey has overshadowed the controversy around the Bureau’s decision — led by career experts— to subject the 2020 data to the privacy tool. As the release of the data gets closer, the Bureau’s failure so far to prove that it can be implemented without a severe cost to the data’s accuracy is causing major anxiety across a broad coalition of census data users.

The decision to use the tool is not related to President Trump’s hijacking of the survey for political purposes. In fact, it’s been championed internally by the career census expert who was also the Bureau‘s most vocal public critic of the Trump administration’s attempt to add a census citizenship question.

But the privacy system has come under fire from census data users of varying stripes who say that they fear, in the name of privacy, the tool will distort the data to the point that the numbers will be unusable for myriad purposes. In their worst case scenarios, the populations of small towns will appear much larger or smaller than what they actually are. A native Alaskan village will look like it’s up to 50 percent white, instead of majority indigenous. A local government’s ability to administer a Meals on Wheels program, or keep track of how a disease is spreading through its elderly population, will be made much more difficult by the way that age data is skewed when the privacy system is used.

As these fears take hold, skeptics of the program are currently plotting more aggressive pushback. They are making their concerns known to the incoming Joe Biden team, in the hopes that his Commerce Department will intervene in how the bureau has been handling the program’s rollout. There have been discussions about seeking congressional intervention. And they are also floating the possibility of lawsuits if the bureau is unable to fix the issues that they have identified. 

The critics are keenly aware of the unprecedented difficulties the Census Bureau has faced with the 2020 census, between the Trump administration’s attempts to hijack the survey for partisan ends, to the pandemic, wildfires and other natural disasters that complicated the count itself.

But it’s because of the other threats to the census accuracy — and the significant work by stakeholders  to help the Bureau overcome them — that the critics say the risks of differential privacy are just too great to take on right now.

“The fact that we have spent billions of taxpayer dollars to collect good data, taken millions of hours — of community hours, of local government hours, state hours — to come up with a good count, and then the Census Bureau is going to screw it all up? Frankly, that’s just not fair,” said Colorado’s state demographer Elizabeth Garner. 

For this story, TPM spoke to more than a dozen experts who rely on the census data, including state demographers, redistricting consultants, voting rights attorneys and private researchers. 

Some asked to speak anonymously, for fear of jeopardizing their working relationships with the bureau. Almost everyone acknowledged that the bureau was pursuing differential privacy with good intentions — the bureau is under legal obligations to protect the confidentiality of its census respondents — but all agreed that there were serious flaws in how the bureau has tried to implement the program. The Census Bureau did not respond to TPM’s request for comment.

 “We don’t know what it is going to screw up, but no one is watching the cooks here,” one census expert outside the bureau, who asked not to be named, told TPM. “This is a time bomb waiting to go off.”

Injecting ‘Noise’

For decades, the Census Bureau has used different tactics so that identifying information about the individuals who take the survey isn’t inadvertently disclosed with the release of the data. But according to the Bureau, advances in big data technology have required the bureau to go to new lengths for the 2020 census to stay in compliance with its confidentiality obligations. Differential privacy, the approach the Bureau is planning on taking, injects “noise” into the data — in effect, making the data less accurate — to make it more difficult for outsiders to match the characteristics the data capture to specific individuals.

But how much noise is really necessary is a touchy topic, and the outside community of census data users say their perspectives have not been represented among those the Bureau tasked with spearheading differential privacy’s implementation.

“The team that’s responsible for implementing differential privacy is really a team of computer scientists, said Ethan Sharygin, the Director of the Population Research Center at Portland State University. “They don’t have a strong voice on the team to say the results that are produced are usable or not usable.”

The critics say the bureau, which only publicly unveiled the work it was doing on differential privacy in 2018, did not give itself enough time to work out the kinks. And the difficulties the bureau continues to face are evident in the demonstrations they have released to preview how the algorithm will work.

“Where we are right now with them should have been two years ago,” Garner said.

The Problems Identified

If the Bureau cannot get the balance between privacy and accuracy just right, the consequences could be devastating for the larger community of census users.

State demographers are fretting about how public policy makers will react when they’re told that projecting life expectancy is not doable using 2020 census data. Civil rights advocates fear that Voting Rights Act enforcement will be hobbled when racial dynamics in communities are made fuzzier. Some experts wonder if the confusion will create a cottage industry of consultants that governments will have to pay to help them clean up the numbers, and others speculate that the shortcomings will fuel arguments for privatizing the census.

The smaller the subset of data, the more problematic the distortions are, experts say.

“Half of the school districts are under 10,000 total population, and those are the places that get the most distorted data,” said Bill O’Hare, a demographer who advises the census advocacy group Count All Kids. As part of analysis of a demonstration product the Census Bureau released in May 2020, he looked at the amount of error that differential privacy injected into the 2010 census count of  0-4 years olds. He found that on average, for schools districts of that size or smaller, those numbers were off by 24 percent.

Of particular concern is how characteristics like age, race and ethnicity are skewed. That lack of precision has implications for the effectiveness of public programs: how do governments know how to expend the resources to a community that needs them if they cannot know for sure what the makeup of the community is or where those people are?

The effect that differential privacy has on the 2010 census — the demonstrations the Bureau has released allows users to compare the data as released in 2010 to with how it would look with differential privacy —  is also being scrutinized by voting rights attorneys and map drawers to study its effect on Voting Rights Act enforcement.

What they have found so far is that the system makes it harder to identify majority-minority districts as they exist on the ground. For African American communities in particular, differential privacy will also make more difficult to suss out the district political preferences those voters have.

“The biggest impacts are likely to come in analyses of racially polarized voters,” Justin Levitt, an election law professor at Loyola Marymount, told TPM.

A Mess For The Biden Administration

The Census Bureau is running out of time to overcome these obstacles, and the backlash if the bureau doesn’t will be a mess that the Biden administration will have to clean up.

The partisan directives President Trump and his appointees imposed on the census — like the Trump policy to exclude undocumented immigrants from the apportionment count or his order for the collection of citizenship data that states could use to cut non-citizens out of redistricting — will likely be quickly reversed. What Biden can do on differential privacy is a more difficult question.

“You’re at a stage where you’re on the runway and the engines are gearing up and you’re suddenly finding information that tells you: is the engine really good?” said Kim Brace, a redistricting guru. “How can you end up raising some of these issues when the plane’s about ready to take off?”

Nevertheless, public lobbying of the Biden transition has been coupled with private outreach to the beachhead team, which includes two well-regarded census experts who are said to be in information-gathering mode.

There’s discussions of amping up the pressure on Congress to engage as well. Beyond oversight of the program’s implementation, Congress could also pass tweaks to census privacy laws to clarify that the Bureau need not go so overboard on protecting confidentiality that it sacrifices the data’s usefulness. But there is skepticism that such a legislative fix could be done in time.

Without legislation, legal questions about those privacy obligations could quickly come into focus in court instead, if civil rights groups aren’t satisfied with how the Bureau has implemented the program and sue over its use.

“These are legal problems,” Thomas Saenz, the president of the Mexican American Legal Defense and Educational Fund, told TPM. “Because if you have a clash between Voting Rights Act compliance and what the Census Bureau would describe as compliance with Title 13 and confidentiality provisions, it is that kind of statutory conflict that ends up having to go to court to be resolved.”

Latest News

Notable Replies

  1. For this story, TPM spoke to more than a dozen experts who rely on the census data, including state demographers, redistricting consultants, voting rights attorneys and private researchers.

    Excellent work, Ms. Sneed.

    Thank you.

     

    @tierney

  2. Agreed. This is the first article I’ve read that explains what is going on in the data collection and why it is being held up. It seems like a true quagmire. Perhaps the way of collecting it in the first place needs to be tweaked for next time…

  3. Thank you for this article

  4. So maybe the answer is to revert to the raw data and use the systems that have worked reliably in the past. Don’t throw out the data; throw out the analysis and start over.

    I would hate to think the nation would have to go through the counting again, but we do know that there was some chicanery in how the count took place. Let’s go through the analysis again, using reliable systems with a good history, and see if that’s even necessary.

  5. Avatar for paulw paulw says:

    Well done! Seems to me that this is a job that needs doing (because otherwise yeah, the big data people will be able to de-anonymize census returns with ease), but in some way that the CS people haven’t considered yet. Maybe by restricting certain kind of queries or users.

    And this on top of the distortions that the trump administration is injecting into the data for political purposes.

Continue the discussion at forums.talkingpointsmemo.com

31 more replies

Participants

Avatar for discobot Avatar for paulw Avatar for josephebacon Avatar for padfoot Avatar for zandru Avatar for cervantes Avatar for becca656 Avatar for karlsgems Avatar for apotropoxy Avatar for pine Avatar for triletter Avatar for thunderclapnewman Avatar for texastwostep Avatar for tsp Avatar for news247 Avatar for grants Avatar for elin Avatar for sassi2j Avatar for godwit Avatar for quickq Avatar for yellowbeard Avatar for HBryan Avatar for nclaw Avatar for Mikebeee

Continue Discussion
Masthead Masthead
Founder & Editor-in-Chief:
Executive Editor:
Managing Editor:
Deputy Editor:
Editor at Large:
General Counsel:
Publisher:
Head of Product:
Director of Technology:
Associate Publisher:
Front End Developer:
Senior Designer: