Data Confidentiality Workshop
Home Workshop Agenda Participants Travel Information

 

Contact

 


WORKSHOP ON DATA CONFIDENTIALITY

September 6-7, 2007 in Arlington, VA

White Paper & Bio


Consider the problem of concealing the identity of a patient who has AIDS. That person's medical records are to be combined with other records for use by researchers and insurance companies. An attacker, who wishes to determine the identity of the individual, gains access to the sanitized records.

For the moment, we assume that the sanitization is effective. The attacker cannot determine the identity from the information given. So the attacker waits, and examines subsequent records handled in a similar manner. She obtains access to additional datasets. Individually, the datasets reveal nothing. But over a period of time, combining information from the datasets enables the attacker to identify the individual with AIDS.

The trivial answer to this problem is to determine in advance what information each dataset will contain, and ensure the aggregation of that information will not enable an attacker to determine any individual identity. This solution is unsatisfying for at least three reasons, one organizational and two analytical.

First, the above solution assumes that the contents of the datasets will be known in advance. However, if the datasets consist of information gathered over time, the generators of the datasets will not know what the data in the dataset will be. Further, if the datasets come from many organizations, co-ordinating the analysis of the datasets poses management problems. If two datasets taken together enable the identification of an individual, but neither one alone does, which one is to be released? Which one is to be withheld?

Second, there is an implicit assumption that the attacker only has the information in the datasets available. But external knowledge may enable the attacker to determine the individual. For example, if the records identify the pharmacy at which the patients purchased their medication, the attacker can correlate dates of visits to the pharmacy (external knowledge) with dates prescriptions were filled (information taken from the records). Unless the organizations know what the attacker knows, it seems unlikely the above trivial approach would protect against this attack.

Third, the attacker's goal may not require identification of an individual. Identifying a small set of people to whom the individual belongs (k-anonymity) may be enough. As an example, if an insurance company identifies that one of 3 people has AIDS and therefore will require expensive treatment, it may refuse to cover all three.

A solution to the above problem requires that a threat model be articulated: what does the attacker know? It also requires a precise delineation of what is considered a valid solution: must the attacker identify a single individual, or is t enough to identify the individual as one of a set of possible people? From this, the role of the environment in which both the problem is posed and the solution determined becomes clear, and its influence critical. Extending the above problem a bit, as environments change, so will the solutions and problems; and sanitization requires protecting data in multiple environments, including those that the sanitizers may not foresee.


Matt Bishop

University of California at Davis

 

 

Biographical Data

 

Matt Bishop received his Ph.D. in computer science from Purdue University, where he specialized in computer security, in 1984. He was a research scientist at the Research Institute of Advanced Computer Science and was on the faculty at Dartmouth College before joining the Department of Computer Science at the University of California at Davis.

His main research area is the analysis of vulnerabilities in computer systems, including modeling them, building tools to detect vulnerabilities, and ameliorating or eliminating them. This includes detecting and handling all types of malicious logic. He is active in the areas of network security, the study of denial of service attacks and defenses, policy modeling, software assurance testing, and formal modeling of access control. He also studies the issue of trust as an underpinning for security policies, procedures, and mechanisms.

He is active in information assurance education, is a charter member of the Colloquium on Information Systems Security Education, and led a project to gather and make available many unpublished seminal works in computer security. His textbook, Computer Security: Art and Science, was published in December 2002 by Addison-Wesley Professional.

He also teaches software engineering, machine architecture, operating systems, programming, and (of course) computer security.

His web site is http://seclab.cs.ucdavis.edu/~bishop