David L. Duffy, MBBS PhD.
QIMR Berghofer Medical Research Institute,
300 Herston Road,
Herston, Queensland 4029, Australia.
Email: davidD@qimr.edu.au
Version 1: 2007-01-10
Human pigmentation is a genetically complex phenotype, but not too complicated, and the effects of the major genes are large in size. Therefore, pigmentation is a good test for methodology, as well as intrinsically interesting.
Eye (iris) colour is one of the classical examples of a genetic trait.
green = | pheomelanin |
blue, blue-green = | eumelanin |
brown, brown-green = | mixed |
Eye colour is also of biomedical interest:
Traits associated with Dark brown iris colour | increased cataracts |
decreased age-related macular degeneration | |
increased glaucoma | |
increased speed on reaction-time tasks (?) | |
decreased risk of melanoma | |
Response to drugs: | tropicamide |
Side effects of drugs | prostaglandins |
beta-adrenergic agonists | |
Forensics | prediction of eye and hair colour based on DNA samples |
"Usual" genetic model is two major loci
Brown | B | b |
Green | G | g |
BEY | ||||
---|---|---|---|---|
bb | Bb | BB | ||
GEY | gg | blue | brown | brown |
Gg | green | brown | brown | |
GG | green | brown | brown |
We have collected a large number of twin families, where eye colour and other pigmentary phenotypes are recorded.
First we will see if eye colour segregates according to the model shown above.
>> include bluetwin.in |
The describe command gives the proportion of blue eyed children to the different matings. This is consistent with a major recessive gene of intermediate penetrance.
Before we do any further analysis, we will remove the data for one person from each MZ twin pair. Most of Sib-pair's command do not know about MZ twins, and this also true for packages such as FBAT. Leaving MZ twins in your data gives spurious linkage and spurious association if the analysis does not recognize this fact.
>> twin blue |
The next analysis we will do will be a nonparametric analysis using Merlin's NPL command using the microsatellite markers.
>> keep blue D15S* |
This does not look particularly impressive. Perhaps if we try brown eye colour?
>> set locus brown aff; brown=(eyecol==3) |
|
An alternative "nonparametric" analysis is a WROD score approach ("Wrong LOD"). We will fit two parametric models, an intermediate dominant and an intermediate recessive model with a phenocopy rate, and take the best of the two models. We will discount the lod score for two tests by taking off 0.3.
|
>> keep blue D15S* |
|
|
For completeness, we can look at the Sib-pair result for an equivalent analysis.
>> undrop |
Between D15S1002 and D15S165, OCA2 (Oculocutaneous Albinism 2) is an extremely good candidate. Rare mutations in this gene cause Type 2 oculocutaneous albinism, and the mouse homologue is the Pink-eye dilution factor. The gene spans 344 kbp of sequence (0.34 cM), so finding causative variants by sequencing is daunting (these variants may not be coding variants).
Because of this, we have moved to fine mapping by allelic association analysis. We have genotyped 70 single nucleotide polymorphisms (SNPs) right through the gene.
>> undrop |
|
>> drop where monomorphic |
There is at least one SNP we should check. Since we are doing 58 statistical tests, the Bonferroni correction for multiple testing would be to multiple the P-values by 58. So most of these are fine. Since we thought that BEY3 is recessive for blue, we can perform a type of association based homozygosity mapping.
>> hwe rs728404 ; # check the odd SNP incl parents and offspring >> set iter 0 ; # a quick preliminary scan >> homozygosity blue >> keep blue rs3794604 to rs7495174 >> set iter 1000 ; # Obtain simulated P-values >> homozygosity blue |
The simulation-based P-values are generated by gene dropping, and so are correct for the observed structure of the pedigrees. Let's now look at the penetrances of these SNPs
>> table eyecol { $m } ; |
|
These look very impressive, but the next question is which of these SNPs is causative, and which are associated purely because of linkage disequilibrium with a causative variant. We need to look at the pattern of LD among these SNPs.
To obtain a nice graphical representation, we can turn to another program Haploview. Here is the Haploview plot of LD across these SNPs. If we are happy looking at a matrix of disequilibrium coefficients, we can:
>> disequilibrium >> disequilibrium all |
There is strong linkage disequilibrium among these SNPs, so we will to carry out a haplotype based association analysis to further resolve causation.
Before this, we will look at the results from some family based tests of association.
The main benefit of performing a family based test of association on the present dataset is to avoid discarding information. We have genotyped all the family members (so that we can look for genotyping errors and utilize the SNP markers for linkage), and "ordinary" association analysis would discard related individuals.
We need sophisticated methods of familial association analysis because other pigmentation genes also segregate through the family, and so there may be a residual familial correlation not captured by the measured SNPs. If not allowed for, this gives rise to false positive associations.
Nevertheless, we will start with a relatively simple analysis again, the TDT:
>> tdt blue >> tdt eyecolour >> schaid blue { rs11855019 rs6497268 rs7495174 } |
|
We'll now export our data to the FBAT format, and look at the results from that test.
>> undrop; # get all the SNPs |
In FBAT, we try:
>> load pedigree bluefbat.ped |
|
We'll now look at those three intron 1 SNPs showing the greatest association.
>> hapfreq rs11855019 rs6497268 rs7495174 |
|
We'll now run MENDEL on the same data. The first test we will look at will compare haplotypes. As usual, MENDEL is not quite as friendly as some of the other packages. It is fairly flexible, and offers analyses other packages do not.
>> include bluetwin.in |
To write MENDEL scripts, the easiest approach is look for the chapter in the manual that covers the task you wish to carry out, then find the corresponding script ("Control3a.in" etc) in the Example_Input directory. Copy that script and then edit it appropriately. Once you have the hang of this, you can write computer programs to automate these tasks for you.
Setting up haplotypes is in Chapter 18. We need to write a "SNP file" listing the SNPs to be combined into haplotypes. One shortcoming of MENDEL is that the haplotype can contain no more than four SNPs at a time. We can specify a sliding window approach, so that N-L sets of SNPS will be combined (N=total, L=haplotype width). Another annoying limitation is that locus names are usually restricted to 8 characters -- fortunately Sib-pair truncates the locus names intelligently for us.
For our purposes today, we will just combine our 3 intron 1 SNPs together, as we did in FBAT. The script will look like makehaplo.con.
> mendel -c makehaplo.con |
|
To carry out our association analysis is not much harder. Because we have pedigrees, we will use PENETRANCES (which is Chapter 14) to find a template.
|
For a binary trait such as blue eye colour, we will use a Binomial distribution. The script will look like blueass.con. We have rewritten the map file (now blue2.map) to contain only those loci we wish to run today -- "030201" is the automatically created name of the haplotype.
|