Data Confidentiality Workshop
Home Workshop Agenda Participants Travel Information

 

Contact

 


WORKSHOP ON DATA CONFIDENTIALITY

September 6-7, 2007 in Arlington, VA

White Paper & Bio


There is a long history in the United States of distinguishing between routine governmental surveillance activities and those that require either a court order or “probable cause.” General public acceptance of certain practices as routine rests, at least in part, on the assumption that the gathered information is put to very specific and limited use. The nearly universal acceptance of relatively invasive airport screening procedures is due primarily to the great danger from hijackings. However, I believe that acceptance also rests on the belief that information about what specific passengers carry on or pack is discarded once the material is deemed safe for travel.

Technology advances now make it much easier to combine arguably innocent bits of information into an ominous whole. If I visit my local library, a drug store across town, and then a park three towns away, I would have no expectation of anonymity for any of those visits. However, I would not expect anyone to know about all three. Yet, increasing prevalence of high resolution cameras in public places (think Great Britain), advances in face recognition software, and plummeting costs of computer hardware may soon allow law enforcement to follow us automatically. Combined with data sources like E-ZPass and some of the ubiquitous privately collected databases, one could imagine governmental authorities fishing for people judged likely to engage in troubling activities like building bombs, dealing illegal drugs, using illegal drugs, gambling on sports, cross dressing, eating too much junk food, attending church X (stop me when I’ve crossed the line).

I propose two research questions:

1. How should we assess the risk from combining otherwise non invasive bits of information with the potential to discover a multitude of unanticipated relationships? I contrast this question with the more common one that addresses disclosure risk of otherwise unidentified data through record linkage methods. In this case, individuals might already be identified, but the increased risk comes from the potential to derive new data from the combination of existing fields.

2. How can technology facilitate legitimate uses of data bases while limiting risks of either intentional or inadvertent discoveries that breech publicly accepted boundaries? Government certainly needs to be able to “connect the dots” to discover certain suspicious patterns, but too much data in one place raises serious civil liberty concerns (witness Adm. Poindexter’s Total Information Awareness program).

Dr. Robert Bell

AT&T Labs

 

Biographical Data

Dr. Robert Bell has been a member of the Statistics Research Department at AT&T Labs-Research since 1998. He previously worked at RAND doing public policy analysis. His current research interests include machine learning methods, analysis of data from complex samples, and record linkage methods. He has served on several National Research Council panels advising the Census Bureau and chairs a current panel on coverage measurement for the 2010 census. He is a member of the board of the National Institute of Statistical Sciences and a fellow of the American Statistical Association.