Document Type

Book Chapter

Publication Date



Biobanks represent an opportunity for the use of big data to drive translational medicine. Precision medicine demands data to shape treatments to individual patient characteristics; large datasets can also suggest new uses for old drugs or relationships between previously unlinked conditions. But these tasks can be stymied when data are siloed in different datasets, smaller biobanks, or completely proprietary private resources. This hampers not only analysis of the data themselves, but also efforts to translate data-based insights into actionable recommendations and to transfer the discovered technology into a commercialization pipeline. Cross-project technological innovation, development, and validation are all more difficult when data are divided between different biobanks and other data repositories.

One way to conceive of biobanks and the big medical datasets they create and embody uses the lens of infrastructure: how can biobanks and their data serve as infrastructure to support later innovation? Some efforts already fit into this model; for example, the United States’ Precision Medicine Cohort—now renamed All of Us—aims to create a large, uniform dataset to be used for widespread future research. Other biobank-related data efforts, like Myriad’s dataset on BRCA1/2 genetic variations, still function as entirely private resources. Treating medical big data as infrastructure has implications for how they should be governed, and suggests advantages to centralized control and relatively broad access. More broadly, viewing biobank-related data as infrastructure would place them at a distinctly earlier point in the commercialization pipeline, serving more to facilitate later steps in translational medicine rather than being viewed as potentially commercializable products themselves.

This chapter is divided into two parts. In the first, I briefly describe big data in medicine: the sources of medical data, the promises of medical big data, and a key challenge: data fragmentation. In the second, I discuss the role of biobanks in medical big data, focusing on their role in infrastructure for innovation and their potential for facilitating translational research.


Reproduced with permission.