Data Confidentiality Workshop
Home Workshop Agenda Participants Travel Information

 

Contact

 


WORKSHOP ON DATA CONFIDENTIALITY

September 6-7, 2007 in Arlington, VA

White Paper & Bio


Data privacy has become a fundamental problem of the modern information infrastructure. Increasing volumes of information are collected and archived by health networks, financial organizations, search engines, intrusion detection systems, social networking systems, retailers and other enterprizes. The potential social benefits of analyzing these databases are enormous. An important problem is to learn the properties of the databases as a whole while protecting the privacy of individual contributors. There is a vast body of work on the database privacy problem, both in statistics and computer science. Until recently, there were two nearly disjoint fields studying the database privacy problem: ``statistical disclosure limitation'', initiated by the statistics community in 1960s, and what is now called ``privacy-preserving data mining'' which was active in the database community during the 1980's and was rekindled at the turn of the 21st century by researchers in data mining. In 2003, the database privacy problem caught attention of cryptographers. The resulting line of work acquired the name ``private data analysis''. Private data analysis advocates formal privacy guarantees. Each proposed scheme for releasing information should come with a clear statement of what it achieves not only in terms of utility, but also in terms of privacy. Most currently utilized schemes either have no formal privacy guarantees or ensure security only against a specific suite of attacks. This leaves them potentially vulnerable to unforeseen attacks, and makes it difficult to compare different schemes because each of them is solving a different problem. For example, one commonly overlooked issue is that people reading the released statistics might have access to side information, that is, information that comes from other sources. In the light of the ubiquity of cheap collections of marketing data and other personal information, ignoring this issue is dangerous. Another problem is that privacy guarantees are most commonly stated as conditions on the output ( i.e., at least 3 individuals in the table should have the same attributes), and do not take the procedures that were used to obtain the output into account. For example, the fact the output is this particular L-diverse table, and not another, might leak information. Private data analysis focuses on pinning down precise mathematical definitions of database privacy, that provide meaningful privacy in realistic settings. The notion of differential privacy that emerged from this line of work makes assumptions neither about what kind of attack might be perpetrated against the released statistics nor about what side information the attacker might possess. It limits the incremental information the attacker might learn in addition to what he knew before seeing the released statistics. Several works proposed frameworks for computing and releasing information about a database while provably satisfying differential privacy. One of the major challenges is to design automatable protocols that release a variety of statics about a database while provably preserving privacy. Such protocols would relieve the Census Bureau from having to manually do a disclosure review on each product, and enable organizations that collect enormous amounts of data, such as search engines, to release global information about their data without violating confidentiality. The first step towards that goal is to give algorithmic formulations to as many statistical tasks as possible. This workshop is a wonderful opportunity for researchers from different communities to exchange ideas and work towards a common understanding of the problem. Two challenges that I would like to see addressed are meaningful definitions of privacy and automatable protocols that satisfy them.

Sofya Raskhodnikova

Penn State University

 

 

Biographical Data

Sofya Raskhodnikova is an assistant professor of Computer Science and Engineering at Penn State University. She received Ph.D. in Computer Science and Electrical Engineering from MIT in 2003. Her main area of research is randomized and approximation algorithms and, in particular, algorithms for massive data sets. Together with Kobbi Nissim and Adam Smith, she is a co-author of Smooth Sensitivity and Sampling in Private Data Analysis, Proceedings of the 39th ACM STOC, 75--84, 2007.