The National Center for Education Statistics (NCES) is the primary
federal entity for collecting, analyzing, and reporting data related
to education in the United States and other nations. NCES is congressionally
mandated to perform these functions under the Education Sciences Reform
Act of 2002 (ESRA). ESRA also includes the confidentiality provisions
that protect individually identifiable data about students, their
families, and their schools from disclosure. Any such data collected
by, or on behalf of, NCES are restricted by law to statistical uses,
and are immune from legal process or any other nonstatistical use,
without the consent of the individual involved, except in the case
of terrorism. Violations of this law are subject to a Class E Felony
(punishable with up to a $250,000 penalty and/ or up to a 5 year prison
term).
Like other Federal statistical agencies, NCES is tasked with balancing
the protection of the data provided in confidence against its’
responsibility to report full and complete data on the condition of
education in the United States, and to make those data available to
the policy and research community. On the side of protecting the data,
NCES has a Disclosure Review Board that reviews all data files that
include any individually identifiable data prior to the release of
public or restricted use data files. No NCES micro record files (public-
or restricted-use) include any direct identifiers; where they exist,
they are stored separately. All NCES data are protected by the introduction
of data perturbations (almost always swapping) prior to any analysis.
In addition, public use data are matched against extant related universe
data files, and various data coarsening techniques are employed to
protect against potential disclosures. In addition, when data are
made available through online data analysis systems, cell size limitations
are imposed, and the functionality of the analysis allowed is restricted.
On the data access side, historically NCES has made public use data
available, whenever possible, and has used data analysis systems to
make restricted data that do not have public use versions available
to the public. One of the issues in providing such access to restricted
data is the amount of functionality that can be included in the analysis
system. A related issue that NCES is currently struggling with is
whether to move to having all restricted NCES data available to the
public through a data analysis system (DAS). This would result in
the discontinuation of public use files for those data collections
that include any confidential data items. (The logic here is that
making an anonymised version available in the public domain would
allow the data snooper to work with the restricted information in
the DAS through repeated requests to attempt to identify individuals.)
This move would have the advantage of allowing a potentially broader
group of data users to have access to basic analytic tools for the
tabular display of distributions calculated from the data, with statistical
tests of comparisons that use appropriate adjustments to reflect the
complex sample designs of most NCES sample surveys. The disadvantage
of moving all restricted data to a public access DAS is the possibility
that this allows too much access to restricted data. To mitigate against
this possibility, NCES uses a number of data protections in the DAS.
No unweighted counts are included in the DAS, and rounding is required
for all unweighted counts published from micro records of restricted
data. In addition, NCES currently allows the DAS user to collapse
across categories with in an individual variable, but adds additional
protections by not allowing for the computation of new composite variables
or interactions between variables. NCES has recently added limited
regression capabilities to the DAS, but continues to ban interactions
and does not provide residuals from the regression analyses. NCES
currently has a panel of experts examining the trade offs between
data protection and data access as operationalized by NCES; a subgroup
of the panel is looking specifically at the NCES use of data analysis
systems for restricted use data.
NCES also uses licenses to allow qualified researchers (agents) to
have access to micro level restricted use data files. To obtain a
license, the researcher must agree to the terms of the license with
regards to the use, handling, and protection of the data and the publication
of any results. As part of the license, the primary applicant, and
each authorized user must sign and submit notarized affidavits of
nondisclosure that bind each of them to the confidentiality provisions
contained in ESRA. In addition, a senior representative, who has the
authority to legally bind the primary user’s organization, must
be a signatory to the license.
NCES faces several new challenges associated with the licensing program.
We have just converted the application and amendment process to an
online application, with only the signature pages being transmitted
in hard copy to NCES. This will allow us to both provide more efficient
service to this group of users, and to have an improved electronic
database to use to manage this program. In light of new OMB reporting
requirements for reporting on the use of agents and for the annual
training of such agents, this database will help support these requirements.
Also, in response to new requirements for the use of encryption when
transferring restricted data, NCES is in the process of converting
its’ distribution system from sending unprotected restricted
files by certified mail to the use of encryption software (PGP) to
protect restricted files prior to distribution.
Another aspect of confidentiality that has received increased attention
in the past year has to do with the management, monitoring, and reporting
of any potential confidentiality breaches. In a data collection agency
with a budget of approximately $200 million, at any point in time
there are a number of data collections in the field that involve the
collection and transmission of individually identifiable data and
thus the potential for confidentiality breaches. If a computer or
hard drive with confidential information is missing, or if a computer
system with confidential information is hacked and compromised, a
potential confidentiality breach has occurred. However, it is also
the case that in the extreme, anytime an envelope or folder with hard
copy or an electronic file of confidential data is not where it is
supposed to be or does not show up as expected from a transmission,
a potential breach has occurred. Sorting out the identification and
management of these different types of potential security breaches
is another challenge facing data collection agencies.