First a rant:
A supermarket I regularly shop at recently added beer and wine to
its selections. When I went to purchase some beer, I was asked to
present my license as proof of age. The clerk took the identification
and entered my birth date into the register. This is one of the worst
cases of "collect more data" I have ever encountered--no
consent was asked to add data entry to a routine social interaction.
In a subsequent transaction, I requested that my birth date not be
entered, and as I suspected, the transaction could not be completed
without the data entry. The best I could achieve was for the clerk
to volunteer to enter false data, which she did with some zeal.
That experience hit a number of interests:
Fair information practices
Social interactions involving privacy
Proliferation of data
And primed the ground for a few more:
Techniques that compensate for false data
Checking a certain company’s responsiveness to privacy complaints
To continue the saga, the customer service department assured me
that the data is entered but not captured (as evidenced by the reapplication
each time). Should I trust them? More importantly, why should I have
to trust them? What happens if the policy changes? I wouldn’t
be able to tell the difference. What if the ownership changes, will
they honor the same policy? Is there any benefit to data entry? If
it is a guarantee of the social interaction (she really did check
my age!) does it really help? … It seems more likely that the
problem of not really looking at the id isn’t fixed in the long
term (eventually you don’t really look, just enter the data).
In short, it seems like the intent was ok but probably flawed, and
the application was seriously flawed.
Professional interests:
Since a good chunk of my job goes toward making the Census Bureau’s
microdata publications safe, the proliferation of data is a major
concern. My current research project is secure regression analysis.
Can statistical modeling be done without examination of record level
data, without running into differencing problems, without disclosing
low level counts, and with synthetic diagnostic information (e.g.
residuals)? This is partly software development, partly understanding
the ground of statistical perception (what really constitutes “playing
with the data”?) and partly a good set of data protection rules.
Philip Steel
Disclosure Avoidance Group
Statistical Research Division
Census Bureau