Data Confidentiality Workshop
Home Workshop Agenda Participants Travel Information

 

Contact

 


WORKSHOP ON DATA CONFIDENTIALITY

September 6-7, 2007 in Arlington, VA

White Paper & Bio


The National Center for Education Statistics (NCES) is the primary federal entity for collecting, analyzing, and reporting data related to education in the United States and other nations. NCES is congressionally mandated to perform these functions under the Education Sciences Reform Act of 2002 (ESRA). ESRA also includes the confidentiality provisions that protect individually identifiable data about students, their families, and their schools from disclosure. Any such data collected by, or on behalf of, NCES are restricted by law to statistical uses, and are immune from legal process or any other nonstatistical use, without the consent of the individual involved, except in the case of terrorism. Violations of this law are subject to a Class E Felony (punishable with up to a $250,000 penalty and/ or up to a 5 year prison term).

Like other Federal statistical agencies, NCES is tasked with balancing the protection of the data provided in confidence against its’ responsibility to report full and complete data on the condition of education in the United States, and to make those data available to the policy and research community. On the side of protecting the data, NCES has a Disclosure Review Board that reviews all data files that include any individually identifiable data prior to the release of public or restricted use data files. No NCES micro record files (public- or restricted-use) include any direct identifiers; where they exist, they are stored separately. All NCES data are protected by the introduction of data perturbations (almost always swapping) prior to any analysis. In addition, public use data are matched against extant related universe data files, and various data coarsening techniques are employed to protect against potential disclosures. In addition, when data are made available through online data analysis systems, cell size limitations are imposed, and the functionality of the analysis allowed is restricted.

On the data access side, historically NCES has made public use data available, whenever possible, and has used data analysis systems to make restricted data that do not have public use versions available to the public. One of the issues in providing such access to restricted data is the amount of functionality that can be included in the analysis system. A related issue that NCES is currently struggling with is whether to move to having all restricted NCES data available to the public through a data analysis system (DAS). This would result in the discontinuation of public use files for those data collections that include any confidential data items. (The logic here is that making an anonymised version available in the public domain would allow the data snooper to work with the restricted information in the DAS through repeated requests to attempt to identify individuals.) This move would have the advantage of allowing a potentially broader group of data users to have access to basic analytic tools for the tabular display of distributions calculated from the data, with statistical tests of comparisons that use appropriate adjustments to reflect the complex sample designs of most NCES sample surveys. The disadvantage of moving all restricted data to a public access DAS is the possibility that this allows too much access to restricted data. To mitigate against this possibility, NCES uses a number of data protections in the DAS. No unweighted counts are included in the DAS, and rounding is required for all unweighted counts published from micro records of restricted data. In addition, NCES currently allows the DAS user to collapse across categories with in an individual variable, but adds additional protections by not allowing for the computation of new composite variables or interactions between variables. NCES has recently added limited regression capabilities to the DAS, but continues to ban interactions and does not provide residuals from the regression analyses. NCES currently has a panel of experts examining the trade offs between data protection and data access as operationalized by NCES; a subgroup of the panel is looking specifically at the NCES use of data analysis systems for restricted use data.

NCES also uses licenses to allow qualified researchers (agents) to have access to micro level restricted use data files. To obtain a license, the researcher must agree to the terms of the license with regards to the use, handling, and protection of the data and the publication of any results. As part of the license, the primary applicant, and each authorized user must sign and submit notarized affidavits of nondisclosure that bind each of them to the confidentiality provisions contained in ESRA. In addition, a senior representative, who has the authority to legally bind the primary user’s organization, must be a signatory to the license.

NCES faces several new challenges associated with the licensing program. We have just converted the application and amendment process to an online application, with only the signature pages being transmitted in hard copy to NCES. This will allow us to both provide more efficient service to this group of users, and to have an improved electronic database to use to manage this program. In light of new OMB reporting requirements for reporting on the use of agents and for the annual training of such agents, this database will help support these requirements. Also, in response to new requirements for the use of encryption when transferring restricted data, NCES is in the process of converting its’ distribution system from sending unprotected restricted files by certified mail to the use of encryption software (PGP) to protect restricted files prior to distribution.

Another aspect of confidentiality that has received increased attention in the past year has to do with the management, monitoring, and reporting of any potential confidentiality breaches. In a data collection agency with a budget of approximately $200 million, at any point in time there are a number of data collections in the field that involve the collection and transmission of individually identifiable data and thus the potential for confidentiality breaches. If a computer or hard drive with confidential information is missing, or if a computer system with confidential information is hacked and compromised, a potential confidentiality breach has occurred. However, it is also the case that in the extreme, anytime an envelope or folder with hard copy or an electronic file of confidential data is not where it is supposed to be or does not show up as expected from a transmission, a potential breach has occurred. Sorting out the identification and management of these different types of potential security breaches is another challenge facing data collection agencies.


Marilyn Seastrom

National Center for Education Statistics

 

Biographical Data

 

Marilyn is the Chief Statistician at NCES. Her responsibilities as the Director of the Statistical Standards Program include both ensuring the technical quality of NCES products and administering the NCES Data Security Program. She is also the Chair of a government-wide OMB sponsored FCSM Working Group on Data Privacy that is examining matters related to informed consent, definitions of personally identifiable information, and the management of confidentiality breaches.