Exercise 3: Fine mapping of Eye Colour


David L. Duffy


David L. Duffy, MBBS PhD.
QIMR Berghofer Medical Research Institute,
300 Herston Road,
Herston, Queensland 4029, Australia.
Email: davidD@qimr.edu.au

Version 1: 2007-01-10


Eye Colour

Human pigmentation is a genetically complex phenotype, but not too complicated, and the effects of the major genes are large in size. Therefore, pigmentation is a good test for methodology, as well as intrinsically interesting.

Eye (iris) colour is one of the classical examples of a genetic trait.

green = pheomelanin
blue, blue-green = eumelanin
brown, brown-green = mixed

Eye colour is also of biomedical interest:
Traits associated with
Dark brown iris colour
increased cataracts
decreased age-related macular degeneration
increased glaucoma
increased speed on reaction-time tasks (?)
decreased risk of melanoma
Response to drugs: tropicamide
Side effects of drugsprostaglandins
beta-adrenergic agonists
Forensicsprediction of eye and hair colour based on DNA samples

Genetics of Eye Colour

"Usual" genetic model is two major loci


GEYggblue brownbrown

Twin-family data on eye colour

We have collected a large number of twin families, where eye colour and other pigmentary phenotypes are recorded.

First we will see if eye colour segregates according to the model shown above.

>> include bluetwin.in
>> describe blue

The describe command gives the proportion of blue eyed children to the different matings. This is consistent with a major recessive gene of intermediate penetrance.

Before we do any further analysis, we will remove the data for one person from each MZ twin pair. Most of Sib-pair's command do not know about MZ twins, and this also true for packages such as FBAT. Leaving MZ twins in your data gives spurious linkage and spurious association if the analysis does not recognize this fact.

>> twin blue
>> mztwin drop

The next analysis we will do will be a nonparametric analysis using Merlin's NPL command using the microsatellite markers.

>> keep blue D15S*
>> ls $a $m; # Just checking
>> merlin --npl

This does not look particularly impressive. Perhaps if we try brown eye colour?

>> set locus brown aff; brown=(eyecol==3)
>> keep brown D15S*
>> merlin --npl

  1. Why did I think brown eye colour might be a better trait?
  2. Was it any better?
  3. Why is an affected-only analysis not very useful here?

An alternative "nonparametric" analysis is a WROD score approach ("Wrong LOD"). We will fit two parametric models, an intermediate dominant and an intermediate recessive model with a phenocopy rate, and take the best of the two models. We will discount the lod score for two tests by taking off 0.3.

  1. Why 0.3?

>> keep blue D15S*
>> merlin --model blue4.model

  1. How could we improve on this analysis?
  2. What is the most likely location for the BEY3 locus?
We can also treat eye colour as a quantitative trait.

  1. What caveats would we place on the results from a variance components linkage analysis of the eyecol variable?
  2. How could we test the lod scores from a VC linkage analysis?

For completeness, we can look at the Sib-pair result for an equivalent analysis.

>> undrop
>> keep eycol D15S*
>> sib eyecol

The OCA2 locus

Between D15S1002 and D15S165, OCA2 (Oculocutaneous Albinism 2) is an extremely good candidate. Rare mutations in this gene cause Type 2 oculocutaneous albinism, and the mouse homologue is the Pink-eye dilution factor. The gene spans 344 kbp of sequence (0.34 cM), so finding causative variants by sequencing is daunting (these variants may not be coding variants).

Because of this, we have moved to fine mapping by allelic association analysis. We have genotyped 70 single nucleotide polymorphisms (SNPs) right through the gene.

>> undrop
>> keep blue brown eyecol rs989869 to rs7495174
>> describe snps

  1. How many SNPs are monomorphic? Why were they genotyped?

>> drop where monomorphic
>> hwe founders

There is at least one SNP we should check. Since we are doing 58 statistical tests, the Bonferroni correction for multiple testing would be to multiple the P-values by 58. So most of these are fine. Since we thought that BEY3 is recessive for blue, we can perform a type of association based homozygosity mapping.

>> hwe rs728404 ; # check the odd SNP incl parents and offspring
>> set iter 0 ; # a quick preliminary scan
>> homozygosity blue
>> keep blue rs3794604 to rs7495174
>> set iter 1000 ; # Obtain simulated P-values
>> homozygosity blue

The simulation-based P-values are generated by gene dropping, and so are correct for the observed structure of the pedigrees. Let's now look at the penetrances of these SNPs

>> table eyecol { $m } ; 

  1. Which SNP is most strongly associated with eye colour?

These look very impressive, but the next question is which of these SNPs is causative, and which are associated purely because of linkage disequilibrium with a causative variant. We need to look at the pattern of LD among these SNPs.

To obtain a nice graphical representation, we can turn to another program Haploview. Here is the Haploview plot of LD across these SNPs. If we are happy looking at a matrix of disequilibrium coefficients, we can:

>> disequilibrium 
>> disequilibrium all

There is strong linkage disequilibrium among these SNPs, so we will to carry out a haplotype based association analysis to further resolve causation.

Before this, we will look at the results from some family based tests of association.

Family based association tests for OCA2 data

The main benefit of performing a family based test of association on the present dataset is to avoid discarding information. We have genotyped all the family members (so that we can look for genotyping errors and utilize the SNP markers for linkage), and "ordinary" association analysis would discard related individuals.

We need sophisticated methods of familial association analysis because other pigmentation genes also segregate through the family, and so there may be a residual familial correlation not captured by the measured SNPs. If not allowed for, this gives rise to false positive associations.

Nevertheless, we will start with a relatively simple analysis again, the TDT:

>> tdt blue
>> tdt eyecolour
>> schaid blue { rs11855019 rs6497268 rs7495174 }

  1. Are these results consistent with those above?
  2. Schaid and Sommer suggested a genotypic TDT, which is implemented in Sib-pair. What does the pattern of transmission suggest?

We'll now export our data to the FBAT format, and look at the results from that test.

>> undrop; # get all the SNPs
>> drop D15S*
>> drop where monomorphic
>> select where eyecol^=x and anytyp
>> keep blue $m
>> write fbat bluefbat.ped
>> undrop
>> keep eyecol
>> standardize eyecol
>> write pheno bluefbat.phe

In FBAT, we try:

>> load pedigree bluefbat.ped
>> load phenotype bluefbat.phe
>> fbat -e
>> model g
>> fbat -e
>> model a
>> trait eyecol
>> fbat -e

  1. How do you interpret the results of the genotypic model?
  2. Are they in agreement with the Schaid and Sommer analysis?

FBAT haplotype analysis of OCA2

We'll now look at those three intron 1 SNPs showing the greatest association.

>> hapfreq rs11855019 rs6497268 rs7495174
>> hbat -e rs11855019 rs6497268 rs7495174
>> model r
>> hbat -e rs11855019 rs6497268 rs7495174
>> model a
>> trait eyecol
>> hbat -e rs11855019 rs6497268 rs7495174

  1. Are these results consistent with those above?
  2. Which haplotype is most strongly associated with eye colour?

MENDEL haplotype analysis of OCA2

We'll now run MENDEL on the same data. The first test we will look at will compare haplotypes. As usual, MENDEL is not quite as friendly as some of the other packages. It is fairly flexible, and offers analyses other packages do not.

  1. Prepare MENDEL format data files using Sib-pair
  2. Prepare a MENDEL job to set up haplotypes
  3. Prepare a MENDEL job to carry out haplotype association analysis
  4. Prepare a MENDEL job to carry out haplotype TDT-like analysis

>> include bluetwin.in
>> set loc bluecol qua
>> bluecol=blue
>> keep blue bluecol eyecol rs989869 to rs7495174
>> drop where monomorphic
>> write mendel bluemendel.ped
>> write locus mendel bluemendel.loc
>> write map mendel bluemendel.map
>> write var mendel bluemendel.var
$ echo "3" > bluemendel.snp $ echo "rs749517rs649726rs118550" >> bluemendel.snp >> quit

To write MENDEL scripts, the easiest approach is look for the chapter in the manual that covers the task you wish to carry out, then find the corresponding script ("Control3a.in" etc) in the Example_Input directory. Copy that script and then edit it appropriately. Once you have the hang of this, you can write computer programs to automate these tasks for you.

Setting up haplotypes is in Chapter 18. We need to write a "SNP file" listing the SNPs to be combined into haplotypes. One shortcoming of MENDEL is that the haplotype can contain no more than four SNPs at a time. We can specify a sliding window approach, so that N-L sets of SNPS will be combined (N=total, L=haplotype width). Another annoying limitation is that locus names are usually restricted to 8 characters -- fortunately Sib-pair truncates the locus names intelligently for us.

For our purposes today, we will just combine our 3 intron 1 SNPs together, as we did in FBAT. The script will look like makehaplo.con.

> mendel -c makehaplo.con

  1. Examine the new files. How does the locus file work?

To carry out our association analysis is not much harder. Because we have pedigrees, we will use PENETRANCES (which is Chapter 14) to find a template.

  1. In passing, examine table 14.1 (page 131 of the manual). Which link function would one use for a count phenotype such as mole count?
  2. Chapter 15 of the manual describes how to generate an ethnic admixture estimate for each person in the pedigree. This index can then be included in the association analysis as a covariate. Would this be a good idea for analysis of this trait?

For a binary trait such as blue eye colour, we will use a Binomial distribution. The script will look like blueass.con. We have rewritten the map file (now blue2.map) to contain only those loci we wish to run today -- "030201" is the automatically created name of the haplotype.

  1. Why is bluecol the trait and not blue?
  2. If I wished to add sex as a covariate, what line would I add to this script (look at Page 132 of the manual)?
  3. Run the script. Are the results qualitatively different from the earlier results?
  4. Look at the example file Control8a.in. Change it so that the analysis is of eye colour versus our haplotype. You will need to change the FILES and AFFECTED. Run it. What is it giving you?

Our Eye Colour paper