Large, multi-institute, scientific collaborations typically publish
large data sets consisting of terabytes of information; they also
replicate these data sets for reasons of performance and fault tolerance
and to make them available to scientists at different sites. A number
of tools are currently used in scientific Grid environments that provide
data replication capabilities, efficient data transfer, and catalog
support for registration and discovery of data sets. However, most
of these tools provide limited support for protecting replicated data
from the risks of tampering or unauthorized access.
To address the risks that scientific applications are exposed to
when sharing their data sets, we are investigating enhancing existing
tools for data management to protect users’ data. These enhancements
would make scientific applications less vulnerable to data security
threats that could jeopardize multi-institute collaborations and the
integrity of scientific results.
These improvements would also satisfy the needs of additional application
domains, such as medical applications that require more stringent
security and protection. In the medical domain, we are particularly
interested in using Grid tools to store and retrieve radiology images
and other medical data sets associated with a patient. Protecting
the confidentiality of this patient information is a key requirement.
In addition, a health Grid of this type could be used by researchers
conducting clinical trials. These researchers could query the Grid
to obtain images with a specified set of characteristics relevant
to the research study. Such studies would require that patient data
are sufficiently anonymized so that researchers can examine collections
of images without danger of discovering patient identities.