Driven by a changing mix of public, political, media, special interest,
and moral beliefs, the issue of data confidentiality for federal statistical
agencies has changed over the past 20 years. Evolving from an inexpensive,
practical, ethical, response-motivating concept (which required minimal
added precautions to insure that the paper data stream was protected
- a few locking file cabinets, some employee consciousness-raising,
and some laws to put force behind the promise) - into an expensive
fundamental business requirement dealing with constant and serious
business and legal risks to agency operations.
The technical environment has made confidentiality harder to protect
and easier to break. In addition, the potential seriousness of breaches
of confidentiality has raised the stakes of even what were once minor
and/or rare potential breaches. Changes in the managerial and technical
processes for collecting and handling data have opened the door to
brand new unintentional consequences that could facilitate unforeseen
ways for breaches to happen. Finally, new tracking and reporting requirements
concerning the patentcy of the data stream have caused a recent rise
in potential breaches by increasing reporting.
Prior to the ubiquitous information technology revolution and the
relentless exponential increase in computing power, most agencies
saw confidentiality as a quid pro quo for collecting respondent data
and encouraging respondent participation. As the technology-driven
information economy began to find uses for marketing and statistical
information in business, and governments began to need and use more
and more timely data to set and evaluate policies, a somewhat schizophrenic
American public has become ever more sensitive to the issues and consequences
of confidentiality practices among dataset creators and users; willing
to give intimate personal and financial information to commercial
interests just for the asking, and deeply suspicious of giving the
same to a government that might (and now could) use new technology
to assemble detailed dossiers from the many pieces of administrative
and survey data collected for specific operational.
Humans are notoriously bad at weighing risk and the common sense
thought that data was more dangerous in the government’s hands
than in the hands of commercial interests has turned out to be the
opposite largely due to differences in data sharing practices and
concern for data security, and an increase in criminal use of commercial
data. Once the craft of ID theft embraced computer and Internet technology,
the danger from commercial use of critical information has continued
to mushroom. The fear of an abusive government (largely demonstrated
abroad – and a particular phobia of the U.S. since its violent
independence from Britain) has seen some validation in some wartime
and intelligence gathering activities that have been revealed despite
their deep secret classification. But still the main current concern
is the inadvertent loss of commercial and government data that could
allow individuals to be identified and perhaps subject to all manner
of mistreatment or criminal victimization.
These latest attitudinal forces have galvanized the political system
to require more and more efforts to prevent data loss and re-identification,
and bigger and bigger reductions in data availability and access.
Now, all government agencies, including the statistical and research
agencies are required to file a report within one hour of suspecting
an identifiable-data loss. This has brought new light to parts of
data streams where losses (increasing the risk of breaches) may occur.
Good examples are losses of large volumes of easily accessible (data
on CD-ROM media) by third party transportation providers. Internal
data handling procedures have also been tested and weaknesses found.
The field data collection has continued to suffer from tradition losses
from using a somewhat transient workforce. Assignment sheets, or follow-up
lists with survey identifiers on them have been lost, tossed, and
destroyed intentionally and unintentionally. Now that laptops are
in widespread use, some have been lost or stolen with small and large
amounts of identifiable data on them.
In turn, methods have been changed to address these that may turn
out to have their own different vulnerabilities. Eliminating paper
by, for example, transmitting data over the internet in encrypted
form to secure servers may eliminate losses in the mail; however,
the data streams are now subject to more and more sophisticated interception
and decryption programs. Now, unlike noticing missing paper, it is
possible that you might not realize that the data have been compromised
until they are used in way sufficiently egregious to attract real
attention.
The changes in technology and methods come with real benefits and
real costs. Unlikely to be able to simply adopt existing solutions
from the intelligence and military world, the civilian and non-military
government data collectors and users are having to scramble to invent
and institute their own solutions simple to stay in business and to
maximize their effectiveness as statistical agencies. Some methods
yield legal protection but may not actually prevent all re-identification.
Data swapping methods can have this problem in that, for optimal usefulness,
small percentages of data elements are swapped. Allowing one to say
that there is no way anyone could be 100 percent sure that an apparent
re-identification was real. Yet a simple criminal test using identification
theft methods might prove that is was indeed a real breach. Should
the motivation to re-identify increase in value, soon computers may
filter swapped data sets for real versus swapped re-identifications.
These relatively constant changes and adaptation to new technology
or to newly found data loss and breach pathways concern me the most.
If we effectively guard against all of the minor losses, that will
guarantee that only major losses will occur in the future. What can
statistical and computer science practitioners do to make sure that
lost data are useless in the wrong hands, or are destroyed before
they can be misused if in the wrong hands?