Data Confidentiality Workshop
Home Workshop Agenda Participants Travel Information

 

Contact

 


WORKSHOP ON DATA CONFIDENTIALITY

September 6-7, 2007 in Arlington, VA

White Paper & Bio


Coming from a theoretical computer science background, my interest is in data analyses that provably do not reveal private information of individuals. One would like to give such privacy guarantees even when the "attacker" may have access to some prior knowledge about individuals, possibly by access to a different database also containing the individual under attack; often it may be easy to deduce private information of individuals from two sanitized versions each of which hides the individual's information on its own. "Diferential privacy" is one such definition, that is robust against any prior knowledge that the attacker might have and against composition. Specifically, an attacker dealing with a differential privacy mechanism cannot distinguish the case when a particular individual is in the database, from the case when this individual is removed from the database. Somewhat surprisingly, even with such a strong definition of privacy, one can design privacy mechanisms satisfying the definition that give very strong utility guarantees. Several useful queries--means, variances, correlations, etc.--can be answered very accurately, in fact with error much smaller than the sampling error, while guaranteeing differential privacy. The only constraint is that the number of queries be not too large. While it is useful to think of this approach as an interactive one where a curator accepts queries and returns approximate answers to them, this is by no means inherent in the approach. In recent joint work with Barak, Chaudhuri, Dwork, Kale and McSherry, we showed how consistent contingency tables can be published in a privacy preserving manner while giving strong guarantees on accuracy. As mentioned above, the curator cannot answer a large number of queries accurately. This limitation is in fact inherent. Any mechanism that gives approximately accurate answer to a large number of queries can be attacked in such a way that the attacker can reconstruct essentially the whole database. This strong impossibility result argues that one must reason about privacy mechanisms in an interactive setting: any non-interactive mechanism that answers a large number of questions with reasonable accuracy must be inherently non-private. Thus one can get significantly better accuracy guarantees if one answers queries interactively, or as in the contingency table setting, knows the set of desired queries in advance and worries only about answering them accurately. While we now know how to handle a large number of common data analyses in a privacy preserving manner, much work remains to be done. Understanding the full power of differential privacy mechanisms is a major open question, and progress on it would involve both better mechanisms and stronger lower bounds. Bringing the known techniques to applications is another very fruitful direction of further work. Moreover, there are settings, such as social networks, where more definitional work still needs to be done.

Kunal Talwar

Microsoft Research

 

 

Biographical Data

Kunal Talwar received his Ph.D. from the University of California at Berkeley working in the areas of approximation algorithms, metric embeddings and algorithmic game theory. He currently works as a researcher at Microsoft Research's Silicon Valley Campus in Mountain View, CA, where he works on randomized and approximation algorithms, privacy and metric embeddings. He has recently worked on provably private, and accurate release of consistent contingency tables (joint work with Barak, Chaudhuri, Dwork, Kale and McSherry), proving inherent limitations of non-interactive privacy mechanisms (joint work with Dwork and Mcsherry), and on new privacy mechanisms and their connections to game theory and to machine learning (joint work with McSherry).