|
White Paper & Bio
Microdata X should be of high quality (Winkler 2004 Info. Sys) if
they are used in modeling and analyses. Due to confidentiality concerns,
public-use microdata X1 needs to minimize the chance of re-identification
while still yielding approximately one or two sets of analyses (models)
that are allowed by the original, confidential microdata X. Some authors
(Palley & Simonoff 1987 TDBS; Lambert 1993 JOS; Fienberg 1997
CNSTAT) have demonstrated that some re-identification can occur based
only on analytic properties (even with synthetic data generated from
accurate models M on original microdata X). Other authors (Mera 1998;
Moore & Lee 1998 JAIR; DuMouchel et al. 1999 KDD) have demonstrated
(sometimes approximately) that if there are sufficient analytic restraints
on microdata X1, then the microdata X1 must be nearly identical to
original microdata X.
If one assures that the released microdata X1 has one or two valid
analytic properties, then one can attempt re-identification using
a variety of analytic and record linkage techniques (Yancey, Winkler,
& Creecy 2002; Evfimievski 2004).
The first issue is: How does one create a modeling framework and software
that can be used on a variety of microdata X to assure that certain
analytic properties are satisfied and can be use to verify the analytic
validity of masked, public-use microdata X1? For discrete data, Winkler
(2007a) has created an edit/imputation/modeling framework that allows
altering models/data that approximately preserve the models while
satisfying additional constraints. The new methods pull together and
enhance (Winkler 1990 Ann Prob 1993, 1997, 2003, 2006; Meng &
Rubin Biometrika 1993; Little & Rubin 2002; D’Orazio, DiZio
& Scanu 2006 JOS).
The second issue is: For analytically valid public-use microdata X1,
how does one alter the microdata X1 to produce microdata X2 where
X2 has significantly reduced risk of re-identification and allows
nearly the same modeling (analytic properties) as X1 (or X)? For discrete
data, Winkler (2007b) shows how to create models M2 (and generate
synthetic microdata from X2) that approximate models from original
microdata X while reducing the risk of re-identification.
|
|
William E. Winkler
U.S Census
Bureau
|
|
|
Biographical Data
B.S. Mathematics, Phi Beta Kappa
Ph.D. Probability Theory
Fellow, American Statistical Association
Principal Researcher, U.S. Census Bureau
Expertise: Record Linkage, Edit/Imputation,
Multi-way and Multi-purpose Sampling, and Microdata Confidentiality
Author or co-author of 130+ papers and Data
Quality and Record Linkage Techniques (2007 – with T. Herzog
& F. Scheuren)
Author or co-author of more than 12 generalized
computer systems, some of which are used for production in the largest
survey situations
|
|