The main goal for 2010-2012 is to perform a genome-wide association study (GWAS) of 2,100 asthma cases and 10,000 unselected controls from Australia. There will be seven main stages for this period:
(1) Genotype 2,100 doctor-diagnosed asthmatics. The AAGC will first select 2,100 unrelated individuals that have been diagnosed with asthma by a physician as part of six studies conducted across Australia in previous years. These studies are described in more detail in the cohorts section. DNA from these patients will be sent for genotyping at the Diamantina Institute (University of Queensland), in Dr Matt Brown's lab using Illumina arrays. As a result, we will have collected information on just over 600,000 genetic markers dispersed throughout the genome for each asthmatic patient (informally called a "case").
(2) Combine data with 10,000 previously genotyped controls. In order to identify genetic variants ("mutations") that increase the risk of developing asthma in a group of individuals when compared to the general population, we also need to have access to genetic information for a group of individuals who do not have asthma. We will then compare how often certain "mutations" occur in the asthmatic group when compared to the control group. Given the high cost of the genotyping method we use ($410/person) and considering that other studies in Australia and overseas have already genotyped a large number of control samples, we will not genotype any new controls as part of this study. Instead, we will collaborate with other research groups that generously agreed to share their genetic data with us. In total, we expect to be able to use genetic data for about 10,000 controls. These data will be combined with the genetic data from the asthma cases and prepared for analysis.
(3) Data cleaning & Primary analyses. Once genotype data from cases and controls has been combined, data analyses begin. The first and very important component of the analysis will be data cleaning. In a perfect world, no errors have been made when collecting samples, handling them, shipping them for genotyping, etc, and genotyping arrays would be error free. However, the reality is that there are many sources of error that can lead to incorrect data being generated, in one form or another. For example, DNA samples may have accidently been swapped between two individuals, or the genetic data for particular genetic markers may be unreliable due to technical reasons. Although these events are infrequent, they do take place in almost every study. Our first analytical goal is therefore to identify and correct these errors. Once this has been done, then we are left with high quality data that is ready for the primary analyses. The simplest of these will identify out of the 600,000 genetic variants tested, any "mutations" that occur at a significantly different frequency in the group of asthma patients when compared to the control group - this is called a case-control genetic association analysis.
(4) Identify top associated SNPs and CNVs. At present, a typical whole-genome case-control association analysis tests two types of genetic variants: single nucleotide polymorphisms (SNPs) and copy number variants (CNVs). The former are single base substitutions of the DNA sequence and as such are considered sequence variants. In contrast, CNVs are considered structural variants in that they correspond to sections of the DNA that have been duplicated, deleted, translocated or inverted. In this stage, out of all variants tested, we will identify those that are either significantly more common (ie. they predispose to asthma) or less common (ie. they protect against asthma) in cases when compared to controls. These variants are said to be associated with asthma risk and will likely be located in genes that may play an important role in the development of asthma.
(5) Replication of SNPs in 1,500 cases and 1,500 controls. One of the drawbacks of testing thousands of genetic markers for association is that by chance alone, some variants will appear to be associated with ashtma when in reality they are not. This is called a false-positive finding. To minimize this risk, we will test the SNPs selected from stage (4) in an independent group of cases and controls. If we see a consistent, significant association between the selected SNPs also in this independent group, then this suggests that our original findings are very likely to be a true-positive finding, ie. that we are very likely to have found a gene that influences asthma risk.
(6) Validation of CNVs. In this stage we will first use standard molecular biology methods to confirm that the CNVs identified in stage (4) are real structural events and not artifacts of the statistical methods used to identify them. Secondly, we will genotype the CNVs that are confirmed to be real in the independent group of 1,500 cases and 1,500 controls to identify those that are robustly associated with asthma risk.
(7) Join meta-analysis of asthma GWAS It is currently thought that there may be many mutations that significantly increase asthma risk, but only very weakly. For example, some mutations may increase asthma risk by only 0.1% when compared to the general population. To identify these mutations, a study needs to analyse much bigger samples than the 2,100 cases and 10,000 controls that we plan to analyse. Given the costs involved, it is unlikely that any single study can analyse such large datasets. The alternative approach currently used is the establishment of large international collaborations between groups such as ours, which have collected their own samples but are willing to share their results to promote a more rapid progress towards the identification of the genetic causes underlying asthma. We will actively engage with other national and international groups to achieve this goal.