Data Confidentiality Workshop
Home Workshop Agenda Participants Travel Information

 

Contact

 


WORKSHOP ON DATA CONFIDENTIALITY

September 6-7, 2007 in Arlington, VA

White Paper & Bio


 

Driven by a changing mix of public, political, media, special interest, and moral beliefs, the issue of data confidentiality for federal statistical agencies has changed over the past 20 years. Evolving from an inexpensive, practical, ethical, response-motivating concept (which required minimal added precautions to insure that the paper data stream was protected - a few locking file cabinets, some employee consciousness-raising, and some laws to put force behind the promise) - into an expensive fundamental business requirement dealing with constant and serious business and legal risks to agency operations.

The technical environment has made confidentiality harder to protect and easier to break. In addition, the potential seriousness of breaches of confidentiality has raised the stakes of even what were once minor and/or rare potential breaches. Changes in the managerial and technical processes for collecting and handling data have opened the door to brand new unintentional consequences that could facilitate unforeseen ways for breaches to happen. Finally, new tracking and reporting requirements concerning the patentcy of the data stream have caused a recent rise in potential breaches by increasing reporting.

Prior to the ubiquitous information technology revolution and the relentless exponential increase in computing power, most agencies saw confidentiality as a quid pro quo for collecting respondent data and encouraging respondent participation. As the technology-driven information economy began to find uses for marketing and statistical information in business, and governments began to need and use more and more timely data to set and evaluate policies, a somewhat schizophrenic American public has become ever more sensitive to the issues and consequences of confidentiality practices among dataset creators and users; willing to give intimate personal and financial information to commercial interests just for the asking, and deeply suspicious of giving the same to a government that might (and now could) use new technology to assemble detailed dossiers from the many pieces of administrative and survey data collected for specific operational.

Humans are notoriously bad at weighing risk and the common sense thought that data was more dangerous in the government’s hands than in the hands of commercial interests has turned out to be the opposite largely due to differences in data sharing practices and concern for data security, and an increase in criminal use of commercial data. Once the craft of ID theft embraced computer and Internet technology, the danger from commercial use of critical information has continued to mushroom. The fear of an abusive government (largely demonstrated abroad – and a particular phobia of the U.S. since its violent independence from Britain) has seen some validation in some wartime and intelligence gathering activities that have been revealed despite their deep secret classification. But still the main current concern is the inadvertent loss of commercial and government data that could allow individuals to be identified and perhaps subject to all manner of mistreatment or criminal victimization.

These latest attitudinal forces have galvanized the political system to require more and more efforts to prevent data loss and re-identification, and bigger and bigger reductions in data availability and access.

Now, all government agencies, including the statistical and research agencies are required to file a report within one hour of suspecting an identifiable-data loss. This has brought new light to parts of data streams where losses (increasing the risk of breaches) may occur. Good examples are losses of large volumes of easily accessible (data on CD-ROM media) by third party transportation providers. Internal data handling procedures have also been tested and weaknesses found. The field data collection has continued to suffer from tradition losses from using a somewhat transient workforce. Assignment sheets, or follow-up lists with survey identifiers on them have been lost, tossed, and destroyed intentionally and unintentionally. Now that laptops are in widespread use, some have been lost or stolen with small and large amounts of identifiable data on them.

In turn, methods have been changed to address these that may turn out to have their own different vulnerabilities. Eliminating paper by, for example, transmitting data over the internet in encrypted form to secure servers may eliminate losses in the mail; however, the data streams are now subject to more and more sophisticated interception and decryption programs. Now, unlike noticing missing paper, it is possible that you might not realize that the data have been compromised until they are used in way sufficiently egregious to attract real attention.

The changes in technology and methods come with real benefits and real costs. Unlikely to be able to simply adopt existing solutions from the intelligence and military world, the civilian and non-military government data collectors and users are having to scramble to invent and institute their own solutions simple to stay in business and to maximize their effectiveness as statistical agencies. Some methods yield legal protection but may not actually prevent all re-identification. Data swapping methods can have this problem in that, for optimal usefulness, small percentages of data elements are swapped. Allowing one to say that there is no way anyone could be 100 percent sure that an apparent re-identification was real. Yet a simple criminal test using identification theft methods might prove that is was indeed a real breach. Should the motivation to re-identify increase in value, soon computers may filter swapped data sets for real versus swapped re-identifications.

These relatively constant changes and adaptation to new technology or to newly found data loss and breach pathways concern me the most. If we effectively guard against all of the minor losses, that will guarantee that only major losses will occur in the future. What can statistical and computer science practitioners do to make sure that lost data are useless in the wrong hands, or are destroyed before they can be misused if in the wrong hands?


Andrew White

NCES

 

 

Biographical Data

 

Andrew White is Special Assistant to the U.S. Commissioner of Education Statistics at the National Center for Education Statistics (NCES). Dr. White manages forward-looking, outside-expert task forces reviewing the Center’s data collection efforts including: maintaining confidentiality, improving effectiveness and efficiency of major national longitudinal surveys, the use of computer adaptive testing for student assessment in a survey context. He advises the commissioner on a broad range of federal statistical issues and practices and served as Deputy Director of Science for the IES in 2006, managing ongoing peer review and quality control for the Institute’s products, and establishing the Institute’s annual research conference.
Before coming to NCES in 2005, White served for a number of years as the deputy director and director of the Committee on National Statistics (CNSTAT) at the National Academies. He is a former executive staff member, research staff chief, and senior survey designer for the National Center for Health Statistics (NCHS). His career started before and during graduate school at the Census Bureau and the Michigan Department of Public Health. He holds a B.A. (political science) and an M.P.H. and Ph.D. in Biostatistics from the University of Michigan.
At CNSTAT, White oversaw and contributed to over 35 major science methods and policy studies commissioned by Congress, the Executive Office of the President, foundations, and various executive agencies. These multidisciplinary, collaborative studies covered many areas including: small-area estimates of school children, confidentiality and research data access, the census, performance measurement, welfare reform, disability measurement and evaluation, the economy, eliminating disparities, software engineering and survey automation, transportation, and homeland security.
At NCHS he managed and led a multidisciplinary research staff in statistical technology, data mapping, and cognitive aspects of data presentation. He directed the agency’s early research in computerized data access systems for access and analysis of NCHS data sets, engaged with academic experts to advance cognitive aspects of data presentation and mapping, and published the first Atlas of United States Mortality. He served as senior statistician for several national surveys including the National Health Interview Survey and the National Hispanic Health and Nutrition Examination Survey, and provided advice to other countries through the NCHS international statistics program.
Dr. White has written numerous articles and technical reports, lectured on a wide variety of statistical and survey related topics, taught in a variety of settings, refereed for prominent academic journals, served on dissertation committees, and consulted for industry and non-profit research organizations as well as state, federal, United Nations, and other international agencies.
He is a Fellow of the American Statistical Association and an elected member of the International Statistics Institute. White has served as president of the Washington Statistical Society and chair of the APHA Statistics Section, and holds many awards for accomplishments and contributions to various organizations. He has led and helped organize numerous professional conferences as well as multidisciplinary panels to conduct and review scientific work, served as primary investigator on NSF grants, and contributed to successful grant proposals to a variety of private foundations.