|
White Paper & Bio
Coming from a theoretical computer science background, my interest
is in data analyses that provably do not reveal private information
of individuals. One would like to give such privacy guarantees even
when the "attacker" may have access to some prior knowledge about
individuals, possibly by access to a different database also containing
the individual under attack; often it may be easy to deduce private
information of individuals from two sanitized versions each of which
hides the individual's information on its own. "Diferential privacy"
is one such definition, that is robust against any prior knowledge
that the attacker might have and against composition. Specifically,
an attacker dealing with a differential privacy mechanism cannot distinguish
the case when a particular individual is in the database, from the
case when this individual is removed from the database. Somewhat surprisingly,
even with such a strong definition of privacy, one can design privacy
mechanisms satisfying the definition that give very strong utility
guarantees. Several useful queries--means, variances, correlations,
etc.--can be answered very accurately, in fact with error much smaller
than the sampling error, while guaranteeing differential privacy.
The only constraint is that the number of queries be not too large.
While it is useful to think of this approach as an interactive one
where a curator accepts queries and returns approximate answers to
them, this is by no means inherent in the approach. In recent joint
work with Barak, Chaudhuri, Dwork, Kale and McSherry, we showed how
consistent contingency tables can be published in a privacy preserving
manner while giving strong guarantees on accuracy. As mentioned above,
the curator cannot answer a large number of queries accurately. This
limitation is in fact inherent. Any mechanism that gives approximately
accurate answer to a large number of queries can be attacked in such
a way that the attacker can reconstruct essentially the whole database.
This strong impossibility result argues that one must reason about
privacy mechanisms in an interactive setting: any non-interactive
mechanism that answers a large number of questions with reasonable
accuracy must be inherently non-private. Thus one can get significantly
better accuracy guarantees if one answers queries interactively, or
as in the contingency table setting, knows the set of desired queries
in advance and worries only about answering them accurately. While
we now know how to handle a large number of common data analyses in
a privacy preserving manner, much work remains to be done. Understanding
the full power of differential privacy mechanisms is a major open
question, and progress on it would involve both better mechanisms
and stronger lower bounds. Bringing the known techniques to applications
is another very fruitful direction of further work. Moreover, there
are settings, such as social networks, where more definitional work
still needs to be done.
|
|
Kunal Talwar
Microsoft
Research
|
|
|
Biographical Data
Kunal Talwar received his Ph.D. from the University of California
at Berkeley working in the areas of approximation algorithms, metric
embeddings and algorithmic game theory. He currently works as a
researcher at Microsoft Research's Silicon Valley Campus in Mountain
View, CA, where he works on randomized and approximation algorithms,
privacy and metric embeddings. He has recently worked on provably
private, and accurate release of consistent contingency tables (joint
work with Barak, Chaudhuri, Dwork, Kale and McSherry), proving inherent
limitations of non-interactive privacy mechanisms (joint work with
Dwork and Mcsherry), and on new privacy mechanisms and their connections
to game theory and to machine learning (joint work with McSherry).
|
|