Potential Privacy Lapse Found In Americans’ 2010 Census Data

FILE - This March 23, 2018, file photo shows an envelope containing a 2018 census letter mailed to a U.S. resident as part of the nation's only test run of the 2020 Census. The Supreme Court will decide whether the ... FILE - This March 23, 2018, file photo shows an envelope containing a 2018 census letter mailed to a U.S. resident as part of the nation's only test run of the 2020 Census. The Supreme Court will decide whether the 2020 census can include a question about citizenship that could affect the allocation of seats in the House of Representatives and the distribution of billions of dollars in federal money.(AP Photo/Michelle R. Smith, File) MORE LESS
Start your day with TPM.
Sign up for the Morning Memo newsletter

WASHINGTON (AP) — An internal team at the Census Bureau found that basic personal information collected from more than 100 million Americans during the 2010 head count could be reconstructed from encrypted data, but with lots of mistakes, a top agency official disclosed Saturday.

The age, gender, location, race and ethnicity for 138 million people were potentially vulnerable. So far, however, only internal hacking teams have discovered such details at possible risk, and no outside groups are known to have grabbed data intended to remain private for 72 years, chief scientist John Abowd told a scientific conference.

The Census Bureau is now scrapping its old data shielding technique for a state-of-the-art method that Abowd claimed is far better than Google’s or Apple’s.

Some former agency chiefs fear the potential privacy problem will add to the worries that people will avoid answering or lie on the once-every-10 year survey because of the Trump administration’s attempt to add a much-debated citizenship question.

The Supreme Court on Friday announced that it would rule on that proposed question, which has been criticized for being political and not properly tested in the field. The census count is hugely important, helping with the allocation of seats in the House of Representatives and distribution of billions of dollars in federal money.

The 8 billion pieces of statistics in census data are supposed to jumbled in a way so what is released publicly for research cannot identify individuals for more than seven decades. In 2010, the Census Bureau did this by swapping similar household information from one city to another, according to Duke University statistics professor Jerome Reiter.

In the internal tests, Abowd said, officials were able to match of 45 percent of the people who answered the 2010 census with information from public and commercial data sets such as Facebook. But errors in this technique meant that only data for 52 million people would be completely correct — little more than 1-in-6 of the U.S. population.

He said the 2010 census used the best possible privacy protection available, but hackers since then have become more skilled in reconstructing data. To counter their growing abilities, the agency has completely changed the system for 2020 and will offer the “gold standard” of privacy regardless of the fate of the citizenship question, Abowd said.

“We got ahead of it. That was our goal,” Abowd said at the American Association for the Advancement of Science’s annual meeting.

Georgetown University provost Robert Groves, who headed the 2010 census, said the count had the proper privacy and that every census improves. He lauded the new steps.

Former agency chief Kenneth Prewitt, a professor of policy at Columbia University, said the basic information such as age and ethnicity, even if publicly revealed, isn’t as big a deal as other data breaches.

“There is a widespread privacy anxiety out there that is very much related to Facebook and Google and so forth,” Prewitt said. “I’m much more worried about the fact that my iPhone follows me around every day” and that Apple sells that information to companies.

The new system involves complex mathematical algorithms that inject “noise” into the data, making it harder to get accurate information and providing “a very strong guarantee” of privacy, said Duke University computer sciences professor Ashwin Machanavajjhala.

This increases privacy while lowering the accuracy for researchers who use the statistics. Think of it as one set of knobs being dialed up while a second is dialed down at the same time.

The decision on the official privacy/accuracy setting for 2020 hasn’t been set. Abowd said policy officials, not engineers or scientists, will make that call.

The Census Bureau tried this system in a 2018 survey using an ultra-strict privacy setting that, while not directly comparable to Google or Apple, is hundreds if not thousands of times more secure for privacy than what’s now being used on data from searches using Google Chrome or Apple’s iPhone, Duke’s Reiter said.

Prewitt suggested the public might not understand the extra efforts underway for the 2020 count but would be spooked by the disclosure about the privacy vulnerability, making people more reluctant to comply with the next census.

If the administration succeeds in adding the citizenship question, “there will be a huge evasion of it (the census) and some selective misuse of it,” Prewitt said.

Whether some avoid the survey because of it or lie, neither is a good outcome, making the data less usable, Prewitt said.

Groves said technical experts have serious problems with the citizenship question because it hasn’t been tested in the field, as all census questions usually are. He compared it to putting a new drug on the market before the necessary testing.
“Very subtle wording and positional changes in a thing like the Census can have enormous impact way beyond what we as humans can predict,” Groves said
___

Follow Seth Borenstein on Twitter: @borenbears
___

The Associated Press Health & Science Department receives support from the Howard Hughes Medical Institute’s Department of Science Education. The AP is solely responsible for all content.

Latest News
9
Show Comments

Notable Replies

  1. No worries … it’ll be okee-dokee … now …

    Are you now … or have you ever been …
    in the US-of-A while being brown? ? …

  2. Ha!! Dodged another assault on my info - somehow I never got counted in 2010 - nothing in the mail, no one coming to the door, nada. So they they missed an old white guy, regular voter, driver’s license holder, still working in those days at 70. At the time I thought, what the hell is wrong with these people; now I don’t feel so bad about it.

  3. I do a lot of genealogical research, so i look at a lot of old census forms. There are a lot of mistakes on those forms. People’s names are mispelled, their ages are often wrong, and even the place they or their parents were born is wrong pretty often too. I think part of the mistakes are because a census taker used to come to people’s homes and ask the questions and maybe the person who answered didn’t really know the answers to some of the questions about the other people or maybe the census taker didnt understand what the person said. These mistakes are less likely to happen with the current census forms because they can be filled out over time by all household occupants, but, I would suspect that people might intentionally lie if they felt their privacy was at risk or if they were worried about immigration status. When a person can’t trust the government, they will lie to protect themselves.

  4. Avatar for tpr tpr says:

    Something like this is a key element in the GOP plan to weaponize the census.

    Remember that the reason for adding a citizenship question is to scare minorities and non-citizens away from responding to the census, primarily over fears that they’ll have trouble with INS.

    To give weight to that threat, the GOP and its wealthy owners have a strong interest in introducing evidence into the public consciousness showing that those data-privacy fears have merit.

    • They will search high and low for evidence of abuses that already occur, and publicize them.
    • They’ll deliberately encourage such abuses in the future.
    • As the census approaches, they will suddenly make a lot of noisy attempts to access past census data in ways that harm respondents. It’s unimportant whether any of those attempts succeed because the message will be clear: responding to the census irrevocably exposes non-whites to potential harm.

    Expect to see at least one non-citizen get deported or jailed based on their past census responses, prior to the 2020 census. Even better (for census suppression) would be a citizen with a “foreign-sounding name” being wrongly identified and having to endure some unpleasant hardship until the error is corrected. If it hasn’t already happened, they will make it happen.

  5. Karma.

    So, good luck, Republicans, and don’t worry.

    Everything’s just been going your way lately.

Continue the discussion at forums.talkingpointsmemo.com

3 more replies

Participants

Avatar for system1 Avatar for xpurg8d Avatar for clunkertruck Avatar for parnest Avatar for fiftygigs Avatar for ljb860 Avatar for demyankee Avatar for chlarry Avatar for tpr

Continue Discussion
Masthead Masthead
Founder & Editor-in-Chief:
Executive Editor:
Managing Editor:
Deputy Editor:
Editor at Large:
General Counsel:
Publisher:
Head of Product:
Director of Technology:
Associate Publisher:
Front End Developer:
Senior Designer: