Variance Components Analysis
Exercise 1 : serum ACE

by

David L. Duffy

(2007)

David L. Duffy, MBBS PhD.
QIMR Berghofer Medical Research Institute,
300 Herston Road,
Herston, Queensland 4029, Australia.
Email: davidD@qimr.edu.au

Version 1: 2007-01-06

CONTENTS

Angiotensin Converting Enzyme (ACE)

Angiotensin-I converting enzyme (ACE) hydrolyses angiotensin I to angiotensin II, (as well as inactivating bradykinin). It is present in two forms: bound to vascular endothelial cell membranes, or circulating in a free form.

In normal individuals, serum ACE levels can show as much as a 5-fold differences between individuals, but intra-individual variation is much less. Most circulating ACE derives from the lung vascular beds.

Although ACE is critical to control of blood pressure, serum levels do not reflect any disease states (except sarcoidosis).

OMIM Entry on ACE
Entrez Gene entry for ACE

Familial Correlations

We will examine data from two collections of families:

To determine if sACE level is under genetic control, we will examine the correlations between various classes of relatives.

Keavney et al Study of sACE levels

We'll start by using my program called Sib-pair. The most recent version of this program is called sib-pair on Unix or sib-pair.exe on Windows. First move to the directory where the ACE example datasets are, then start Sib-pair:


davidD@moonboom:~$ cd Example1
davidD@moonboom:~$ sib-pair
|||| SIB-PAIR: A program for simple genetic analysis
|\/| Version : F95-DEV (06-Jan-2007)
|/\| Author  : David L Duffy (c) 1995-2007
|||| Job run : Sat Jan  6 18:50:39 2007

>>

Another document usingsib-pair gives a gentle introduction to using Sib-pair, and a more detailed description is found in the manual. For this exercise we will use just a few commands.

The first of these is the include command, that reads a series of commands into Sib-pair from a file. The contents of the file (called twinace.in) are here. We'll have a look at them now.


>> include aceex.in

Have a preliminary look at the data:


>> write
>> head
>> tail
>> show
>> show pedigrees

The write printed the entire dataset -- all 553 records. The head and tail commands show just the first and last 10 records. The show command prints summary information about the dataset, and the show pedigrees command wrote one line of data for each of the 69 pedigrees.

>> tab
>> histogram sace

  1. What statistical distribution do you think best matches the distribution of serum ACE level in this sample?
  2. Have you heard of the Filliben correlation? If not, search the Sib-pair manual, or perhaps use Google to search the Internet. What do you think the P-value means?

The table command without any arguments prints a one line summary for each variable (sace and aceins). The histogram command gives a simple histogram of the quantitative trait.

>> describe sace

The describe command gives us more genetically oriented information about serum ACE level. The first table now divides the individuals in the study into pedigree founders and nonfounders.

------------------------------------------------
Summary statistics for trait "sace"
------------------------------------------------

Descriptive Stats       All     Founders  Nonfounders
-----------------------------------------------------
Means                0.0025       0.1164      -0.0283
Variances            0.9975       0.9585       1.0066
Stand Devs           0.9987       0.9790       1.0033
Maxima               3.0270       2.1870       3.0270
Minima              -2.2070      -2.0280      -2.2070
No. obs            404           86          318
No. missing        149           98           51

The second table gives us familial correlations. Sib-pair doesn't give us P-values or standard errors here. Approximately, the standard error of a correlation coefficient is the reciprocal of the square-root of the number of pairs. If a correlation is more than two standard errors greater than zero, we can say it is "significantly" different from zero (note that Sib-pair can be used as a calculator).

-------------- Familial correlations (pairwise) --------------
Rel 1   Rel 2    Std Dev 1    Std Dev 2   Correlation  N Pairs
--------------------------------------------------------------
Husband Wife         1.0658       1.0559       0.0689       32
Gparent Gchild       0.9015       1.1713       0.1775       58
Halfsib Hsib         0.7056                   -0.4541        7
Parent  Off          0.9956       1.0165       0.3371      303
Fullsib Fsib         0.9928                    0.3943      386

Father  Son          0.9832       1.0427       0.1671       67
Father  Dau          0.9241       0.9455       0.6356       64
Mother  Son          0.9726       0.9864       0.3311       85
Mother  Dau          1.0796       1.0801       0.3028       87

Brothers             0.8978                    0.3752       77
Sisters              1.0132                    0.3035      119
Brother-Sister       1.0137       0.9837       0.4398      190

  1. Why are there sometimes two standard deviations (the 3rd and 4th columns), and sometimes only one value?
  2. Are any of these correlations significantly greater than zero?
  3. What conclusion do you draw from the size of the marital correlation?
  4. What conclusion do you draw from the size of the halfsib correlation?
  5. Based on the fullsib correlation, what is your best guess for what the halfsib correlation should be?
  6. Do you think there are significant differences between the correlations according to sex?

The next command carries out a maximum likelihood based variance components analysis of serum ACE level.

>> varcom sace

  1. What should the estimates of the additive genetic variance and the environmental variance add up to?
  2. Based on this study, do you think serum ACE level is under genetic control?
  3. Is there a significant effect of dominance on this trait?

The effects of the ACE Indel on sACE levels

There is an insertion/deletion polymorphism (287 base pair Alu repeat) in intron 16 on the ACE gene that has been extensively studied over the past 14 years. A major focus of this work has been testing for an association between this polymorphism and a range of cardiovascular and neurological diseases.

The initial interest in this polymorphism was whether it was associated with serum ACE level. Since I have already used this as an example of a major gene, you will not be surprised by our results. But first, we will obtain some some descriptive results about this polymorphism, that can be used to check data integrity.

>> describe aceins

------------------------------------------------
Allele frequencies for locus "aceins"
------------------------------------------------
   Allele  Frequency   Count  Histogram
       1     0.5022      460  **********
       2     0.4978      456  **********

Number of alleles    =    2
Heterozygosity (Hu)  =    0.5005
Poly. Inf. Content   =    0.3750
4 Neff mu (SSMM)     =    1.78587527
Number persons typed =    458 ( 82.8%)

>> hwe

--------------------------------------------------
Hardy-Weinberg equilibrium for marker loci
--------------------------------------------------

Marker     Typed  Genos  Chi-square Asy P  Emp P  Iters
---------- ------ ------ ---------- ------ ------ ------
aceins        458      3        1.5 0.2619 0.2667     75 HWE .

>> set plevel 1
>> hwe
>> set plevel 0

  1. The plevel controls the amount of output printed by each command. What does the additional hwe output show you?
  2. Are there any unusual or inconsistent results?

We will temporarily increase the plevel again to look at the next results:

>> set ple 1
>> assoc sace genotypic
>> set ple 0

  1. What does the first test show?
  2. Work out the genetic variance due to this polymorphism, remembering that the formula for this looks like:
    VG=Proportion(1/1)*[mean(1/1)-{overall mean}]2 + ...
    (Don't forget the calculator function).
  3. Does this agree with results of the earlier variance components analysis?
  4. What is the predicted parent-offspring correlation based on your calculation?

I am interested by the last result. The Sib-pair varcom command can include the effects of the genotype within the analysis. I first create two new dummy variables that represent the ACE indel genotypes, and include them as covariates, that is as fixed effects. This means that the genetic variance due to this indel will be removed from the random effects part of the model:

>> set locus a1 qua
>> set locus a2 qua
>> a1 = (aceins=="1/2")
>> a2 = (aceins=="2/2")
>> varcom sace covariates a1 a2

  1. Why did I only create two dummy variables?
  2. What do the regression coefficients for these variables mean? Shouldn't they be the same as in our earlier analysis?
  3. What do the results show?
  4. How many QTLs for serum ACE level are there? Possibly a Pubmed search may be useful here. Here is one recent paper I found.

Twin Study of sACE levels

We'll now look at the results from the twin study.

>> clear; # clear the last dataset from program memory
>> include twinace.in

We will jump in and do a quick twin analysis of serum ACE level.

>> twin ace zyg == 1

  1. Are these in agreement with your understanding of the genetics of ACE level?
  2. Do they make sense?

Perhaps we should be more cautious.

>> hist ace
>> set locus sx aff; sx=male
>> kruskal ace sx
>> reg ace = sx

  1. Does this distribution look similar to that from the Keavney et al data?
  2. Are these data standardized?
  3. Are there any unusual values?

Perhaps we need to clean the data.

>> if (ace>600) then ace=x
>> twin ace zyg == 1
>> residuals ace = sx
>> twin ace zyg == 1

  1. Are these results more sensible?
  2. Are these results more compatible with the Keavney results?
  3. What differences are there?

A multipoint linkage analysis of sACE levels using the twins

We'll now use the Australian DZ twins (remembering that MZ twins don't contribute any linkage information) to carry out a multipoint linkage analysis using another program, MERLIN. Sib-pair can be used to prepare the necessary data files for MERLIN.

Here are further links:

Merlin also several useful ancillary programs, notably PEDSTATS and MERLIN-REGRESS.

The main disadvantage with MERLIN is the limitation on pedigree size. For most studies, however, this is not a problem. An slight advantage for the present problem is that MERLIN automatically deals with MZ twins in the dataset.

Returning now to our data, we will enter the Sib-pair commands to create the Merlin data files:

>> keep ace $m
>> write merlin example1.ped
>> write locus merlin example1.dat
>> write map merlin example1.map
>> quit

Open these files with an editor and confirm that they are the correct format.

We are now ready to run some of the different Merlin commands.

> pedstats -d example1.dat -p example1.ped
> merlin
> merlin --information -m example1.map -d example1.dat -p example1.ped
> merlin --error -m example1.map -d example1.dat -p example1.ped
> merlin --vc --pdf -m example1.map -d example1.dat -p example1.ped
> merlin --vc --start 90 --stop 120 --grid 1 --prefix hires -m example1.map -d example1.dat -p example1.ped

Examine the contents of "merlin.err", "merlin.pdf" and "hires.ped".

  1. Is there linkage evidence of a QTL for serum ACE level on chromosome 17?
  2. What is the strength of this evidence?
  3. What is your best estimate for the location of such a QTL?
  4. How wide is the credible interval for its location?
  5. What is the equivalent interval on the physical map?
  6. Where is the ACE gene? Is it within this interval?

We can use the Merlin simulate command to generate empirical P-values. These have the advantage of:

> merlin --simulate --save --rerun 5 -m example1.map -d example1.dat -p example1.ped
> merlin --vc  -m merlin-00123456-00001-replicate.map \
               -d merlin-00123456-00001-replicate.dat \
               -p merlin-00123456-00001-replicate.ped
> merlin --vc  -m merlin-00123456-00002-replicate.map \
               -d merlin-00123456-00002-replicate.dat \
               -p merlin-00123456-00002-replicate.ped
> merlin --vc  -m merlin-00123456-00003-replicate.map \
               -d merlin-00123456-00003-replicate.dat \
               -p merlin-00123456-00003-replicate.ped
> merlin --vc  -m merlin-00123456-00004-replicate.map \
               -d merlin-00123456-00004-replicate.dat \
               -p merlin-00123456-00004-replicate.ped
> time merlin --vc  -m merlin-00123456-00005-replicate.map \
                    -d merlin-00123456-00005-replicate.dat \
                    -p merlin-00123456-00005-replicate.ped

  1. How many times did you observe a lod score greater than the true maximum lod score for this chromosome?
  2. If I required 1000 replicates of a genome scan (23 chromosomes), how long would this take to run?
  3. What strategies could you use to reduce that time?

Merlin has a lot other options for linkage analysis of quantitative traits. Some of these are more experimental, and I would regard the variance components result as the gold standard to check these against. The advantage of these other methods is they are much faster than the Merlin VC implemention, and so make generating simulation based P-values more attractive.

> merlin --qtl -m example1.map -d example1.dat -p example1.ped
> merlin --deviates -m example1.map -d example1.dat -p example1.ped
> merlin-regress -m example1.map -d example1.dat -p example1.ped

Sib-pair Utility Functions

Sib-pair can be used, among other things, as a calculator:

>> 1/sqrt(303)
=> 0.0574484989621426
>>  0.25*(2-0.1)^2 + 0.50*(3-0.1)^2 + 0.25*(-1-0.1)^2
=>  5.41
>> sml 0.4978  -0.9038  0.0672   0.8285 

Have a look at the results of the "help Operators" command:

Basic Language Objects
0,1,0.23 Numbers
pi y n x Named constants
"<all>/<all>" Genotypes
Alleles are 1..999, A-Z, a-z
Operators
( ) ; Grouping and precedence
if <expr> then <expr> else <expr> Conditional Branching
* / + - ^ = Arithmetic operations
not and or Logical operation
< > >= <= ^= == Comparisons
ge le ne eq Comparisons
Mathematical Functions
neg pos abs sqrt log exp Basic functions
sin cos tan asin acos atan inht Trigonometric functions
int round Truncating functions
rand rnorm Uniform or Gaussian random numbers
Data Functions
istyp | untyp <mar> genotype available this person?
ishom | ishet <mar> hom/het genotype?
alla | allb <mar> first/second allele of genotype
anytyp alltyp numtyp available active markers? no. typed
commar max markers typed common to rels
isfou isnon founder/nonfounder?
female male male/female?
num nfoun family size/no. founders
famnum index family no. 1..n, person no. 1..n