Data Confidentiality Workshop
Home Workshop Agenda Participants Travel Information

 

Contact

 


WORKSHOP ON DATA CONFIDENTIALITY

September 6-7, 2007 in Arlington, VA

White Paper & Bio


I am concerned that possibly misplaced concerns about confidentiality coupled with a reluctance to deploy anonomization tools stymie important public health research. Here are two examples worthy of discussion. The FDA's Adverse Event Reporting System (AERS) represents the primary U.S. data source for post-marketing surveillance of drug safety. Pharmaceutical companies, medical professionals and members of the public submit reports of drug adverse reactions to the system. The FDA does make a version of the AERS data available via the web. However, two serious flaws render this “Freedom of Information” (FOI) version essentially useless for methodological research. First, for confidentiality reasons the FOI version does not include the original adverse event narratives. Without the narrative data the validity of the adverse event codes cannot be assessed. Second, the drug identifiers in the FOI version include generic names, brand names, dose levels, and ingredient names in myriad combinations with and without misspellings. In fact, the drug dictionary contains over 330,000 verbatim terms; the actual number of unique drugs is closer to 10,000. I believe that solutions to both of the flaws are within reach. Major clinical trials of medical products generate high quality data pertaining to efficacy and safety. Typically these data remain under the tight control of the trial sponsor (often a pharmaceutical company) and myriad important secondary analyses remain undone or, at the very least, under wraps. Furthermore, I contend that published reports of these trials, often featuring "ghost-writers," sometimes cherry pick results. A mechanism whereby the trial data could be made available more broadly would serve the public well. Some progress has been made in recent years but fundamental barriers remain.

David Madigan

Columbia University

 

 

Biographical Data

 

David Madigan is Professor of Statistics at Columbia University. His previous appointments include the University of Washington and Rutgers University, as well as Soliloquy Inc., AT&T, KPMG, and SmartForce Inc. His current research interests include large-scale Bayesian statistics, text mining, Monte Carlo methods, and drug safety. He is a Fellow of the American Statistical Association and of the Institute of Mathematical Statistics.