Alzheimer's Disease Sequencing Project (ADSP): Description of Discovery Phase and Follow Up Study
The ADSP Discovery Phase
The initial phase of the ADSP research plan is called the Discovery Phase. Samples were selected from well-characterized study cohorts of individuals with or without an AD diagnosis and the presence or absence of known risk factor genes. Details about the samples are available at the NIA Genetics of Alzheimer's Disease Data Storage Site (NIAGADS). The ADSP generated three sets of genome sequence data for these samples as part of the Discovery Phase: (1) WGS for 584 samples from 113 multiplex families, (2) Whole Exome Sequence (WES) for 5,096 AD cases and 4,965 controls, and (3) WES of an Enriched sample set comprised of 853 AD cases from multiply affected families and 171 Hispanic controls. Sequence data are available by application to the Database for Genotypes and Phenotypes (dbGaP). Applicants can obtain: (1) cleaned, quality control (QC) checked sequence data, (2) information on the composition of the study cohorts (e.g. case-control, family based, and epidemiology cohorts), (3) descriptions of the study cohorts included in the analysis, (4) accompanying phenotypic information such as age at disease onset, gender, diagnostic status, and cognitive measures, and (5) epidemiological information such as educational level and certain demographic data available on the subjects genotyped.
As part of the Discovery Phase, the NIA ADSP genetics investigators funded under PAR-12-183 and HG-15-001 are conducting analysis of sequence data, including quality assessments and variant calling. Analysis of the Discovery Phase sequence data is revealing many new variations in the genome that may be implicated as new genetic risk or protective factors in older adults at risk for AD. Additional information on ADSP activities can be found in RFA-AG-16-001 and RFA-AG-16-002.
Under funding provided by HG-15-001 the Large Scale Sequencing and Analysis Centers (LSACs) transitioned to the Centers for Common Disease Genomics (CCDGs) and are referred to as such in the rest of this document. In February 2016 ADSP consultants recommended that subsequent sequencing and analysis be done on whole genomes in lieu of whole exome or targeted sequencing.
The ADSP Discovery Family-Based Extension Study: To further assess the genomes in multiply affected families, under funding provided by NHGRI, an additional 428 samples were whole genome sequenced by the CCDGs. This included 107 additional samples from families studied under the Discovery Phase, 207 samples from 77 new families, and 114 Hispanic Controls. This portion of the study is called the Discovery Extension Phase.
The ADSP Discovery Case-Control Based Extension Study: Also under funding provided by NHGRI, an additional 3,000 subjects were whole genome sequenced. This included 1,466 cases and 1,534 controls. Of these 1,000 each of Non-Hispanic White (NHW), Caribbean Hispanic (CH), and African American (AA) descent were sequenced. Of these a total of 739 autopsy samples were sequenced [568 cases (500 NHW cases and 68 AA cases) and 171 controls (164 NHW and 7 AA)].
Plans for analysis of data from the ADSP Discovery Extension Studies should be included in analytical approaches under the present FOA.
The ADSP Follow-Up Study (FUS)
The present FOA is designed to create an avenue to maintain and leverage existing ADSP infrastructure and collaborations; to ensure continuity of ADSP analysis as funding for the ADSP Discovery Phase ends in 2018; and to provide a funding stream for the continuing analysis and sharing of ADSP data generated on a large number of samples from individuals affected by AD.
ADSP Follow-up Study Sequencing, Quality Control, and Data Sharing
Under PAR 16-406 (the companion to this FOA) entitled "Additional Sequencing for the Alzheimer's Disease Sequencing Project", separate funds will be provided for: (1) ADSP investigators to identify, assemble, and send up to 10,000 DNA samples from well phenotyped subjects affected with AD for WGS to NCRAD; (2) the National Cell Repository for Alzheimer's Disease (NCRAD) to receive and prepare DNA, perform quality control (QC) checks, retain aliquots of DNA, plate and ship samples to sequencing centers, and track samples through the sequencing process; (3) NCRAD to acquire and archive appropriate documentation for compliance of sample and data handling with NIH policy and ensure that standard operating procedures for sample handling are followed; (4) sequencing centers to perform GWAS and WGS and to process sequence data; (5) NIAGADS to receive and manage the WGS and GWAS data sets and coordinate ADSP phenotype and GWAS data collection, sequence data production and delivery to the Database for Genotypes and Phenotypes (dbGaP) for public data release; (6) the NIA Genetics and Genomics Center for Alzheimer's Disease (GCAD) to receive and process data and perform QC checks, variant calling, and harmonization with other ADSP data; and (7) GCAD to provide these processed data to the NIA Genetics of Alzheimer' Disease Data Storage Site (NIAGADS).
The ADSP Replication Work Group recommended criteria for sample selection:
- The study must have completed genomic data sharing documentation (GDS).
- Racial/ethnic diversity continues to be a priority.
- There will be preferential selection of autopsy-confirmed cases/controls.
- For non-European ancestry samples, there are limited numbers of individuals with autopsy, but when available, these samples should be prioritized.
- Longitudinal data continues to be an important further selection criteria in samples of European ancestry.
- No previous WES for European Americans (EA). Given the large numbers of cohorts/ samples of European ancestry, priority will continue to be given to subjects without previous WES.
- Previous WES for African American (AA) or Hispanic cohorts is allowed. Numbers are more limited for these cohorts, so whenever possible, subjects without previous WES will be prioritized.
- There will be no age limit for cases (either age of onset or current age). Data from individuals 89 or older must be grouped into ≥ 89 years; however, this does not preclude their inclusion in this new sequencing effort.
- Cases should be unrelated to each other. Families and related individuals were sequenced earlier in the ADSP efforts. This new effort can focus on unrelated individuals.
- Availability of appropriate matched controls. Controls that are appropriately matched to cases is criterion for cohort selection.
Sequence data will be available through NIAGADS which works in partnership with the Database for Genotypes and Phenotypes (dbGaP). In order to meet time constraints, financial considerations, and the milestones provided under the National Alzheimer's Project Act (NAPA), sequence data from unaffected subjects will be drawn from existing WGS data from sequencing projects performed in large, well characterized age-matched cohorts with documented appropriate cognitive function testing such as the Trans-Omics for Precision Medicine (TOPMed) program. Samples from AD subjects will be selected from existing cohorts and sample sets where possible. Samples may come from all types of epidemiology study designs, existing case/control, family based, and other sample sets where AD is the underlying form of dementia. Collection of samples, genetic sequencing, quality control, variant calling, and data harmonization are supported under RFA AG-16-001 and PAR-16-406.
ADSP FUS WGS and GWAS data will be generated under PAR AG 16-406 by one or more NIA approved sequencing centers. The NIA GCAD was funded in June 2016 to QC, harmonize and meta-analyze all ADSP data. The AD Genome Center has implemented the ADSP SNP genotype calling and QC workflow. Data will be provided by sequencing centers to CGAD whose pipelines are ready for whole genome sequence (WGS) data. Data will then be provided by GCAD to NIAGADS and dbGaP which work in partnership for the sharing of ADSP data. NIAGADS is an NIA approved data repository for ADSP analysis data. NIAGADS also works in partnership with GCAD on data analysis and data sharing. NIAGADS performs final ADSP quality checks and genotype calling, and will submit data to dbGaP. Download and QC of BAMs are managed by NIAGADS. Both the sequencing data provided by sequencing centers and the outcome data derived from the analyses will be stored at an NIA approved data repository. Data will be provided to GCAD for harmonization, meta-analysis, and sharing with the research community at large.