Data Confidentiality Workshop
Home Workshop Agenda Participants Travel Information

 

Contact

 


WORKSHOP ON DATA CONFIDENTIALITY

September 6-7, 2007 in Arlington, VA

White Paper & Bio


In early July, I purchased an IPhone and went online to activate my AT&T wireless account. I had to provide my Social Security number in order to enter the information—there was no alternative—and after a short delay my account was approved and my IPhone activated. I presumed my Social Security number was used as part of a credit check via a major data warehouse. A few days later I tried to activate international roaming for a trip to the UKvia the telephone and through a series of questions and answers it became apparent that the credit report was filled with information from someone else—since the questions I was improperly answering related to former residences in Ohio. Fixing these errors appears to be impossible. Others can tell similar stories with far more disastrous consequences.


In the UK I heard a lecture by Hans Rosling, of the Karolinska Institute, on how we are moving from closely-held microdata to web-accessible data through such tools as Swivel, Mapping Worlds, Many Eyes (IBM), and Trendanalyzer (Google)—featuring several of the co-sponsors of this workshop. Rosling envisions, as do many companies, a tremendous new information frontier driven by large integrated shared databases.


These two stories lead me to the challenges confronting our research communities, especially those assembled by the private sector, often including government collected data, often under pledges of confidentiality. There are at least three interlocking components:


• As individual data are selected for release, as in my credit report, and extracts shared across companies and with the Department of Homeland Security, what guarantees do we have on the preservation of privacy and confidentiality, if any? What is the technical basis of any such guarantees? [Remember that I was in effect forced to surrender my Social Security number to activate the AT&T account, even though the law nominal precludes such use.]
• In order to achieve the goal of a large integrated data bases we need to merge data from disparate sources, using record linkage methods. Much rests on the accuracy of the individual data components, on the quality of the matching and the “resolution of discrepancies due to measurement and other forms of error. What methods are used for such record linkage? What are there formal properties? And what are the implications for other peoples use of the data?
• How useful are the merged integrated databases? Do we have correct methods for their analyses, especially in light of the measurement error and confidentiality protection techniques (e.g., addition of noise or other forms of perturbation) that may have been applied?


If we can address some of these challenges with new and creative research, scaling up to the giant databases that now exist and are envisioned for our future, we are still left with the questions of how to communicate with users and the public what we have done and how their private lives are protected and enhanced.

Stephen E. Fienberg

Carnegie Mellon University

 

 

Biographical Data

Stephen E. Fienberg is Maurice Falk University Professor of Statistics and Social Science at Carnegie Mellon University, with appointments in the Department of Statistics and in the Machine Learning Department, and Calyx (a Center for Computer and Communications Security). He was founding co-editor of Chance and served as the Coordinating and Applications Editor of the Journal of the American Statistical Association. He is one of the founding editors of the Annals of Applied Statistics whose first issue appeared this summer, and a founder of the new electronic Journal of Privacy and Confidentiality. His research includes the development of statistical methods, especially tools for categorical data analysis and data mining, confidentiality and disclosure limitations, and research data access as well as applications in a broad array of areas. He is a member of the U. S. National Academy of Sciences, a fellow of the American Academy of Arts and Sciences, a fellow of the Royal Society of Canada, and Thorsten Sellin Fellow of the American Academy of Political and Social Science. He has been serving as a member of the National Research Council Committee on Technical and Policy Dimensions of Information for Terrorism Prevention and Other National Goals.