Skip to main content

Some of the 79 million reasons to use the HRS genomics data

Jon King
Jonathan KING,
Senior Scientific Advisor to the Division Director,
Division of Behavioral and Social Research (DBSR)
.

In 2009, NIH received its first year of funding through the American Recovery and Reinvestment Act (ARRA). NIA received $275 million over two years in ARRA funds. Overall, these funds were used to intensify and expand scientific study and support the research infrastructure in aging and age-related cognitive change, including Alzheimer’s disease, through a series of grants and initiatives.

Among the many important projects NIA supported using ARRA funds was the genotyping of DNA samples collected from almost 20,000 participants in the Health and Retirement Study (HRS). Many of you will be familiar with the HRS—a nationally representative longitudinal panel study that collects data from 20,000 Americans age 50+ every two years, exploring changes in health, wealth, and well-being. Its unique sampling frame makes it especially valuable when nationally representative samples are needed. In 2007, HRS was crucial for estimating dementia prevalence in the US; in 2013, it provided the key data that confirmed the enormous monetary costs of dementia in America.

Wide participation and quality publications

Although HRS was not the only cohort study genotyped thanks to ARRA funds, it has turned out to be an especially important one, giving us the best picture we have of the genetic diversity of older adults in the U.S. HRS genomic and phenotype data have also been included in over 200 genome-wide association studies in 39 different projects, and as a partner in 6 different consortia. Initially, many of the papers using HRS genomic data focused on biomedical and disease phenotypes; more recently the data have been used to answer behavioral and social science research questions, including a study suggesting that more than mere chance dictates marriage, as it turns out that married couples are more genetically similar than chance would allow.

Using HRS genomic data can definitely put you in good company; publications using these data have appeared in several leading journals, including Nature, Nature Genetics, Science, and PNAS. Dozens more papers are in the works from the 152 research groups that have sought and received approval to download and use the data from dbGaP.

More data will be available soon

Those of you who just clicked that link might feel a bit cheated, because as of today, data from only 12,507 participants are available on dbGaP. We anticipate the completed sample of 18,764 unrelated individuals to be available in the first half of 2016. Particularly important in this full dataset is the genotype information from more than 6,000 HRS participants from racial and ethnic minorities, including over 3,000 African American participants. Indeed, it was the presence of rich racial and ethnic diversity in the cohort that drove the decision to genotype (at significantly higher cost) on the (then) densest genotyping array available (Illumina Omni2.5), yielding over 2 million single nucleotide polymorphisms (SNPs) per individual; these 2 million SNPs in turn have been used to impute 79 million variants using the 1000 Genomes Phase 3 reference panel. That’s 79 million possibilities just waiting for a question to be asked.

Of course, not every research question requires quite that much data, and HRS also provides a range of genomic products, including data better suited for candidate gene analyses and (in the near future) the full genetic relatedness matrix for HRS participants and polygenic risk scores obtained from GWAS studies for a variety of traits.

With such an abundance of phenotype and genotype data, it can be difficult to know where to start, but now that you know at least some of what is available, please feel free to send us an e-mail and let us help you get started.