Exercise 2: Nonparametric linkage and family based association


David L. Duffy


David L. Duffy, MBBS PhD.
QIMR Berghofer Medical Research Institute,
300 Herston Road,
Herston, Queensland 4029, Australia.
Email: davidD@qimr.edu.au

Version 1: 2007-01-10


Nonparametric Linkage Analysis using MERLIN

The "--npl" option in MERLIN is straightforward to use, and gives a Kong and Cox style lod score nonparametric analysis. The Kong and Cox exponential model is the default, and is thought to be more robust than the original linear model. As noted before, the exponential model is equivalent (at least for sib pairs) to a parametric recessive model where the lod score is maximized by changing the penetrances and risk allele frequency in a systematic way.

The Lander and Green multipoint algorithm limits the use of MERLIN to small and moderate sized pedigrees, but can handle large numbers of markers (including thousands of SNPs at once).

Linkage mapping of BRCA1

Hall et al [1990] present densely affected breast cancer pedigrees. These pedigrees seem consistent with an autosomal dominant high penetrance risk locus (and see segregation analysis).

In breastex.in, we have marker information for one marker D17S74. The pedigrees contain an average of 14 members.

>> include breastex.in
>> show pedigrees

Because the Sib-pair macros are set up, it is trivial to now perform a linkage analysis:

>> merlin --npl

Part of the message is that some families are larger than the default maximum size for analysis, and have been excluded. But we can force their inclusion using the "--bits" option (and because this is a two-point analysis, it still runs quickly).

>> merlin --bits 34 --npl --perfamily

  1. Does this nonparametric linkage analysis find significant evidence of linkage analysis?
  2. Examine the output in merlin.lod. What do you notice?
  3. What result do you get rerunning the analysis using the "--pairs" option?


>> write locus linkage hall.loc
>> write linkage hall.pre

  1. What format file have we written?
  2. What model have we written? Should we change any of the values?
  3. What size parametric lod score can be obtained for these pedigrees?
  4. How do these compare to the NPL lod scores?

Sib-pair TDT analysis of LINCL

A major advantage of association analysis over linkage analysis is power: it is able to detect trait loci of far smaller effect size than a linkage study, providing the appropriate markers are used.

Linkage disequilibrium acts over much shorter distances than linkage, so this means localization is more precise, but more markers are required.

The benefits of performing a family based test of association are:

One of the Finnish Disease Heritage genetic diseases is Finnish variant LINCL (late infantile neuronal ceroid lipofuscinosis, CLN5), occurring at rates of 1 in 1500 live births in Western Finland. Savukoski et al [1994] describe fine mapping of this locus, and pedigree data from that paper is in linclex.in.

Given the population history underlying the FDH, we would expect most of the disease cases will carry the same mutation that arose 16-20 generations earlier in a single founder. If we could reconstruct that deep pedigree, a linkage analysis would have great power, and the large number of recombination events would guarantee a narrow critical interval.

The linkage disequilibrium between the disease locus and nearby marker loci carries some of the information about that deep pedigree -- this is the link between linkage analysis and association analysis.

>> clear; # Clear the program memory
>> include linclex.in
>> show pedigrees
>> head
>> count lincl
>> describe
>> davie lincl

The families as included here do not look like they will be hugely informative for linkage analysis.

  1. Why did I say that?
  2. What mode of inheritance are these families consistent with?

>> merlin --npl --model lincl.model

We could increase the power of the linkage analysis if we correctly modelled any linkage disequilibrium (this can be done in ILINK). But we will instead try a simpler test first (the transmission equilibrium test).

>> disequilibrium all
>> tdt oneperfam
>> set plevel 1
>> tdt oneperfam
>> set plevel 0

  1. What is the most likely location of CLN5?
  2. What happens if we analyse the trait lincl instead of oneperfam?

FBAT analysis of LINCL

A deficiency with the traditional TDT is that it requires both parents to be genotyped. It is only possible to infer the missing parental alleles in a subet of families , and relying on this subset leads to a bias in the TDT statistic produced.

It is possible to correct for this ascertainment bias in a fashion that does not rely on estimates of the population allele frequencies for that marker. This is implemented in the FBAT (and PBAT) programs.

FBAT also offers tests of haplotype transmission, and can perform multitrait analyses.

>> keep lincl $m
>> write fbat fbatlincl.ped
>> quit
> fbat

* *
* ********* * * * * ********* *
* * * * * * * *
* ******* * * * * * * *
* * * * * *** * * *
* * * * * * * *
* * * * * * * * *
* *
* Xin Xu C1999-2006 v1.7.2 *
* Program for Population Genetics *
* Harvard School of Public Health *
* *

>> ?

So FBAT is another command line driven program. The ? command gives us a list of commands. The documentation that comes with the program is here. We will carry out simple analyses first.

>> load ped fbatlincl.ped
>> afreq
>> fbat
>> fbat -e
>> mode m
>> fbat -e
>> minsize 2
>> fbat -e

  1. What is the most likely location of CLN5?
  2. What does the mode command do?
  3. What does the minsize command do?
  4. Why would minsize normally be set to 10? Why is it OK to change it for this example?
  5. What is the difference in results between fbat and fbat -e?

>> hapfreq
>> mode b
>> hbat -e

  1. Does the haplotype based analysis refine the most likely location?

The FBAT tutorial datasets

FBAT comes with the beginnings of a tutorial, which analyses three moderately large real datasets.