Genetics data available for secondary analysis
Looking for genetic, health, and medical data to use in your research? Are you thinking of investigating genetic risks and influences on health conditions, particularly those related to aging but wondering how to get the data?
Qualified researchers can now for the first time access data from one of the United States’ largest and most diverse genomics projects—the Genetic Epidemiology Research on Aging (GERA). The GERA cohort, at Kaiser Permanente Northern California system, has data on 78,000 members. You can apply to use these data in your research.
What makes these data so special?
The genetic information in the GERA cohort translates into more than 55 billion bits of genetic data.
These genetic data are combined with information derived from Kaiser Permanente’s comprehensive longitudinal electronic medical records, as well as extensive survey data on participants’ health habits and backgrounds. NIH funding helped to make this resource available. I think the data provide researchers with an unparalleled research resource. Read more about the cohort and exactly what is available.
The researchers who put it together conducted genome-wide scans to rapidly identify single nucleotide polymorphisms (SNPs) in the genomes of the people in the GERA cohort. These data will form the basis of genome-wide association studies that can look at hundreds of thousands to millions of SNPs at the same time.
Additionally, an exciting aspect of this dataset is that it will be continuously updated and refreshed with additional patient information.
Get started with the GERA data set here.
What kinds of studies could use this new data resource?
Researchers studying diseases and conditions traditionally associated with aging, such as cardiovascular disease, cancer and osteoarthritis, may be interested in these data.
In addition, researchers can explore the potential genetic underpinnings of a variety of diseases that affect people in adulthood, including:
- certain eye diseases
- and many others representing a variety of disease domains
You might already have realized that you could use the database to confirm or disprove other studies that use data from relatively small numbers of people. Or, you could increase the size and power of your samples by adding participants from GERA to meta-analyses. The large cohort will also serve as a reference source of controls that you could use to compare to individuals with different conditions that you have studied.
Where do the data come from?
Information and DNA were collected and compiled collaboratively by the Research Program on Genes, Environment, and Health (RPGEH) at Kaiser Permanente (Catherine Schaefer, PhD, co-principal investigator) and the University of California, San Francisco (Neil Risch, PhD, co-principal investigator). The addition of the data to the NIH database of Genotypes and Phenotypes (dbGaP) was made possible with $24.9 million in support from the National Institute on Aging and the National Institute of Mental Health, and the Office of the Director, all at NIH. The RPGEH database was made possible largely through early support from the Robert Wood Johnson Foundation to accelerate such health research.
Read our NIH press release about this new data resource for scientists.
How do I get access to the data for my research?
The GERA data are available through dbGaP, an online genetics database from NIH. dbGaP was developed and is managed by the National Center for Biotechnology Information, part of the National Library of Medicine.
Interested in applying for access to this database? Follow the procedures on the dbGaP website.
Have questions for me about scientific resources for research like this database or NIA funding for these resources? Comment below.
A valuable data resource: Baltimore Longitudinal Study of Aging