||||
|\/|
|/\|
||||

Using the SIB-PAIR program
to analyse genetic data

by

David L. Duffy

(2012)

David L. Duffy, MBBS PhD.
Queensland Institute of Medical Research,
300 Herston Road,
Herston, Queensland 4029, Australia.
Email: davidD@qimr.edu.au

Last Updated: 2012-09-11

ACKNOWLEDGMENTS

Gabriella Duffy has done a lot of editing and improvement of this document.

CONTENTS

INTRODUCTION

Sib-pair is a computer program for the manipulation and statistical analysis of genetic datasets. It implements a simple interpreted language in which the user writes commands. These can be entered interactively, or submitted as a batch from a text file in the usual way.

I have developed the program over a number of years in the open software way of "scratching an itch". That is, Sib-pair carries out practical procedures which I have required in the day-to-day handling of genetic data, and were not available using other computer programs and/or were interesting as a research or educational question.

These include:

There are also a variety of standard statistical procedures, such as multiple regression analysis (including unconditional and conditional logistic regression), principal components analysis, contingency table analysis, survival analysis, random number generators, quantiles for assorted statistical distributions.

Finally, there are a few utilities that give expectations for specified genetic models eg recurrence risks for a single major locus model, and can carry out file manipulations.

In this document, I try to give a gentle introduction to using Sib-pair. The other sources of information about Sib-pair are the manual, the extended help pages for each command, and reading the Fortran 95 source code (it does have some comments!).

A very first session

Sib-pair is not difficult to use. In a Windows environment, double-click the icon or open a DOS box in your working directory (folder) and call Sib-pair from the command line. The latter assures that Sib-pair will find your pedigree files and write output in the same directory. In a Unix environment, run Sib-pair from the command line.

davidD@moonboom:~$ sib-pair 
|||| SIB-PAIR: A program for simple genetic analysis
|\/| Version : Version 1.00.beta g95 (17-Aug-2012)
|/\| Author  : David L Duffy (c) 1995-2012
|||| Job run : Tue Sep 11 12:15:00 2012 (orpheus.qimr.edu.au)

Type "help" for help, "quit" to quit, "ctrl-C" to interrupt.


>>

You see much the same on Windows.

DOS box session

Now type in "help" on the command line (followed by <ENTER>):

>> help
Keywords can be shortened to the first 3 letters.
Help prints a brief description of a command or group of commands:

  help [ <search string> | All | Globals | Operators | Data | Analysis | Examples]

For full online help:

  $ your_favourite_HTML_browser sib-pair.html


Now try "help Examples"

Entering "help All" gives a long list of all the commands (see Appendix).

To list just the commands mentioning "merlin":

>> help merlin
rea loc mer <fil> {read Merlin locus file}
wri mer [dum] <fil> {write Merlin pedigree file}
wri loc mer <fil> {write Merlin locus file}
wri map mer <fil> {write Merlin map file}

The latter three commands can produce the files needed to run a multipoint linkage analysis in MERLIN.

Here is the list of commands mentioning "risk". Then we will try the "sml" (short for single major locus) command.

>> help risk
sml <pA> <penAA> <penAB> <penBB> {recurrence risks}
grr <prev> <pA> <GRR> [<add|dom|rec>] {recurrence risks}
hrr <tra> [<c_op> <thr>] {haplotype relative risk}

>> sml 0.17 0.21 0.08 0.03

------------------------------------------------
Single Major Locus Recurrence Risk Calculation
------------------------------------------------

 Frequency(A): 0.170000; Pen(AA): 0.210; Pen(AB): 0.080; Pen(BB): 0.030
 Trait Prev  : 0.049312; Pop AR:  39.2%; Var(Add): 0.001141; Var(Dom): 0.000127

 Measure      MZ Twin       Sib-Sib        Par-Off      Second    
----------   ----------    ----------     ---------    ----------
 Rec Risk         0.075         0.062         0.061         0.055
 Rel Risk         1.564         1.264         1.250         1.124
 Odds Rat         1.610         1.281         1.266         1.131
 PRR              1.522         1.248         1.235         1.117
 Tet Corr         0.106         0.054         0.051         0.026
 ibd|A-A          1.000         0.552         0.500         0.276
 ibd|A-U          1.000         0.497         0.500         0.248

 Freq of A if Affected: 0.351983 (0.123,0.458,0.419) 
 Freq of A if Unaffctd: 0.160561 (0.024,0.273,0.703)

 Mating       Proportion    Risk to offspring
----------   -----------   ------------------ 
UnA x UnA         0.904         0.048
Aff x UnA         0.094         0.060
Aff x Aff         0.002         0.075

The "sml" results show that a gene with the given allele frequency and penetrances (APOE*E4 and Alzheimer's disease by age 75) give rise to a recurrence risk of 6.2% in siblings of cases, a 1.3-fold increase in risk compared to the baseline population risk of 4.9%. The frequency of the risk allele (here APOE*4) is doubled in cases when compared to controls, and IBD sharing (Identity-by-Descent) in affected sib pairs is expected to be 55%.

Simultaneously pressing the combination of the "Ctrl" and "c" keys interrupts whatever calculation Sib-pair is currently doing and returns you back to the Sib-pair command prompt. Usually Sib-pair will first try to complete a part of the calculation and give a result, so there may be a small delay. Pressing "Ctrl-C" multiple times causes the Sib-pair to stop completely, and returns you to the operating system.

Two last things:

>> 1+1 ; 2+2
=>  2.
=>  4.

>> quit

This job took  28.1 minutes

A genetic analysis from scratch

In this case, I will go through the analysis of some pedigree data from beginning to end. I won't describe the commands in too much detail, as they will be covered later.

Usually, I have:

  1. A description of the pedigree, such as a pedigree diagram
  2. Trait information about individuals in the pedigree, such as disease status, or age or body mass index.
  3. Genotype information about individuals in the pedigree, which will usually be for codominant loci, but can be mitochondrial or Y-chromosomal.

Here is a diagram of a pedigree described in a paper by Lawler and Sandler (1954) where members are affected by the condition familial elliptocytosis.

The diagram in Morton (1956):

Familial elliptocytosis pedigree diagram

And here is the same diagram amended to include some additional necessary family members, and recoding the alleles to 1, 2, and 3 (Sib-pair only allows single letter or numerical allele codes):

                             
                               [101]   (102)
                                 |       |
                                 +---+---+
                                     |
              +----------------------+-----------------------------------------------+
              |                                                                      |
      [201] (202)                                                            (203) [204]
        |     |                                                                |     |
        +--+--+                                                                +--+--+
           |                                                                      |
  +--------+--+------------+-----------+----------------------+             +-----+-----+
  |           |            |           |                      |             |     |     |
(301) [302] (303)  [304] [305]       (306) [307]      [308] (309) [311]   (310) (312) [314]
 Aff   UnA   Aff     ?    UnA         Aff   UnA         ?    Aff   UnA     Aff   UnA   Aff
 1 3   x x   1 3    x x   3 3         1 3   x x        x x   1 3   x x     1 2   2 3   3 3
  |     |     |      |                 |     |          |     |     |       |
  +--+--+     +--+---+                 +--+--+          +----++     +------++
     |           |                        |                  |             |
     |     +-----+-----+-----+           ++----+-----+     +-+-------+     |
     |     |     |     |     |           |     |     |     |         |     |
   [405] [407] [408] [409  [410] [412] (411) [413] (414) [415](417)(418) (419)
    UnA   UnA   Aff   Aff   Aff    ?    Aff   Aff   UnA   Aff   ?   Aff   UnA
    3 3   2 3   1 1   1 2   1 1   x x   1 1   1 3   3 3   1 3  x x  1 3   2 2
                              |    |                            |    |    
                              +--+-+                            +--+-+    
                                 |                                 |    
                                 |                           +-----+-----+
                                 |                           |     |     |
                               (504)                       [505] [506] (507)
                                Aff                         Aff   Aff   Aff
                                1 3                         1 1   1 1   1 1

Note every pedigree member has been given a unique identifier. In the later generations, there is information as to whether individuals are affected or unaffected (by elliptocytosis), and their genotype at the Rhesus blood group locus. Each genotype is given as the two alleles.

From this, I will make a single plain text file that contains all this information in the form that Sib-pair reads. The pedigree is written as one line per pedigree member. Each line contains the same number of items separated by spaces. The columns are:

ColumnName
1 Pedigree name
2 Individual identifier
3 Father's identifier
4 Mother's identifier
5 Sex (m or f)
6 Affected or unaffected by elliptocytosis (y or n)
7 First allele of Rhesus genotype (1, 2 or 3)
8 Second allele of Rhesus genotype (1, 2 or 3)

Because columns are "free format" white-space separated, if a value is missing, there must be a placeholder to identify the column (Sib-pair uses an "x" for this purpose).

I use the gvim text editor, but you can use NotePad or any other editor. We open a new document. Now we go through the pedigree diagram person by person, adding one line to our document for each person. The first record (for person 101) will be:

ped5 101 x x m x x x

The parent identifiers are set to "x" because they are not given. Furthermore the trait status and genotype are also unknown. Where sex is not given in the diagram, we will assign them so that the marriages in the drawing are consistent.

The first "nonfounder" (that is a person with parents included in the diagram) is written as:

ped5 202 101 102 f x x x

Skipping forward, the last pedigree record will be:

ped5 507 417 418 f y 1 1

The pedigree columns now need to be identified in such a fashion that Sib-pair can recognize them. At the start of the file, I first prepend a comment, so I know what this file does. Comment lines are ignored by Sib-pair and are prefixed by a "#" at the start of the line:

#
# Elliptocytosis pedigree 5 from Lawler & Sandler 1954, Morton 1956
# Zmax of 3.31 at c=0.05
#

Following this, we declare the columns. The first five columns are standard and so don't need to be named. The sixth column is disease state, so the first line of the declarations (using the "set locus" command) is:

set locus ellipto affection

The seventh and eight columns represent the alleles of the genotype, so the second declaration line is:

set locus rhesus marker

To show where the pedigree data starts, we add a line:

read pedigree inline

The end of the pedigree data is marked by ";;;;" on its own line. After this we add the "run" command, which tells Sib-pair to process the pedigree. We save the completed script to a file "ellipto5ex.in". Its contents looks like this.

So we can start up Sib-pair, and "include" the contents of this file into our session:

>> include ellipto5ex.in

If you got the message

>> include ellipto5ex.in

ERROR:  Unable to open "ellipto5ex.in".

Then you are not in the correct directory (folder). The "pwd" command in Sib-pair shows you which directory you are in, and can be used to change directory as well. Alternatively, you can try again, using just

>> include

which will give you some type of file chooser menu. This will allow you to navigate to the correct directory and include the file. You should now get:

Reading commands from "ellipto5ex.in".
!
! Elliptocytosis pedigree 5 from Lawler & Sandler 1954, Morton 1956
! Zmax of 3.31 at c=0.05
!
Pedigree file        = inline.ped   
Number of loci       =     2

Locus           Type Position
--------------- ---- ----------------
ellipto          a         6
rhesus           m         7--      8

Number of marker loci=       1
Bonferroni corr. 5%  =    0.050000
Bonferroni corr. 1%  =    0.010000
Bonferroni corr. 0.1%=    0.001000

 Max record length is  25  characters
 Screened  1  pedigrees,  36  records (0.00 s).
 Read in   1  pedigrees,  36  individuals (0.01 s).
 Dataset occupies 0.002 Mb.

No sex-informative markers.
Nuclear family error checking.

Nuclear family error checking completed.

Number of data problems    =     0
Starting values for missing genotypes generated.


Total number of pedigrees  =       1
Number with only 1 member  =       0
Total number of sibships   =      10
Total number of subjects   =      36
Total subjects genotyped   =      23 (63.9%)
Total number of genotypes  =      23
Largest pedigree (members) =      36

Mean size of pedigrees     =      36.0

Closing include file "ellipto5ex.in".

>>

So I have successfully read in the pedigree data, and there are no messages about errors of various types. The "describe" command tells me about the phenotype (ellipto) and genotypes (rhesus):

>> describe

------------------------------------------------
Segregation ratios for trait "ellipto   "
------------------------------------------------

Total sample   All       Fndrs     Nonfndrs
-------------------------------------------
   Aff/Tot    17/  23    0/   0   17/  23
   Prop Aff     0.739     0.000     0.739
   Missing         13        11         2

Mating Type     UxU       UxA       AxA
-------------------------------------------
   Matings          0         0         0
   Aff/Tot     0/   0    0/   0    0/   0
   Prop Aff     0.000     0.000     0.000

Relative pair   RecRisk     Aff-Aff     Aff-UnA  Tetrachoric r
-----------------------------------------------  -------------
   Marital        0.000           0           0        0.000
   Gparent        1.000           4           0        1.000
   Halfsib        0.000           0           0        0.000
   Par-Off        0.846          11           4        0.629
   Fullsib        0.732          15          11        0.000
   MZ twin        0.000           0           0        0.000

------------------------------------------------
Allele frequencies for locus "rhesus"
------------------------------------------------
   Allele  Frequency   Count  Histogram
       1     0.4783       22  **********
       2     0.1304        6  ***
       3     0.3913       18  ********

Number of alleles    =    3
Heterozygosity (Hu)  =    0.6145
Poly. Inf. Content   =    0.5181
4 Neff mu (SSMM)     =    2.44529900
Number persons typed =     23 ( 63.9%)

I can see that 17 pedigree members are affected by the condition, and that there are 23 individuals genotyped. The "hwe" command doesn't suggest marker problems.

>> hwe

--------------------------------------------------
Hardy-Weinberg equilibrium for marker loci
--------------------------------------------------

Marker         Typed  Genos  Chi-square Asy P  Emp P  Iters
-------------- ------ ------ ---------- ------ ------ ------
rhesus             23      6        1.2 0.7496 0.8696     23 HWE .

Is there genetic linkage between familial elliptocytosis and Rhesus blood group? Sib-pair offers a nonparametric identity-by-descent test of linkage ("apm"):

>> apm ellipto ibd

------------------------------------------------
APM for trait "   ellipto" v. all markers
------------------------------------------------

NOTE:  Identity-by-descent based statistic used.


Marker         NFams  NAff   Z-value    Asy P  Emp P  Iters
-------------- ------ ------ ---------- ------ ------ ------
    rhesus          1     17        1.7 0.0443 0.0700    200 APM-IBD +
    rhesus          1     17        4.1 0.0000 0.0050    200 GPM-IBD ***

There are results from two tests here, one using affected pedigree members only (APM), and the other including information from unaffecteds as well (GPM). The latter is more impressive, with a Z-score of 4.1 being the equivalent of a lod score of

>> 4.1^2 / (2*log(10))
=>  3.650245120396831

This score seems a bit high perhaps (I know that the parametric lod score is only 3.3). Like many Sib-pair tests, this test is simulation based, so it is worthwhile increasing the number of simulations used to refine the estimates:

>> set iterations 10000
>> apm ellipto ibd

------------------------------------------------
APM for trait "   ellipto" v. all markers
------------------------------------------------

NOTE:  Identity-by-descent based statistic used.

Marker         NFams  NAff   Z-value    Asy P  Emp P  Iters
-------------- ------ ------ ---------- ------ ------ ------
    rhesus          1     17        1.4 0.0757 0.0852  10000 APM-IBD +
    rhesus          1     17        4.3 0.0000 0.0024  10000 GPM-IBD ***

So, elliptocytosis seems to be linked to this marker, but the empirical P-value is equivalent to a lod of 2. We can look at some other linkage tests, including the original Penrose sib-pair linkage test:

>> pen ellipto rhesus

---------------------------------------------------------------
Penrose Sib Pair Linkage Analysis for "ellipto" v. "rhesus"
---------------------------------------------------------------

                     rhesus
ellipto    Concordant  Discordant
Concordant         11           4
Discordant          0          11

No. of sib pairs   =    26
No. of sibships    =     6

    No. complete observations =    26
    LR contingency chi-square =  18.03
           Degrees of freedom =   1
           Asymptotic P-value =   0.0000
              Empiric P-value =   0.0001 ( 2600000 MCMC iterations)
             Trend chi-square =  13.44   (P=0.0002)
                    Agreement =   0.846  (   22/   26)
                Cohen's Kappa =   0.6994

We can also look at this using the assoc test:

>> set plevel 1
>> assoc ellipto

--------------------------------------------------
Allelic association testing for trait "ellipto"
--------------------------------------------------

  ---- Association Analysis for "rhesus    "------
    Allele  Affected    Unaffected    Total    Dev
  ------------------------------------------------
     1       22 (.647)      0 (.000)      22    8.1
     2        2 (.059)      4 (.333)       6    0.4
     3       10 (.294)      8 (.667)      18    2.9
  ------------------------------------------------
   Total     34            12             23

       No. trait(+) marker(-) =     0
       No. trait(+) marker(+) =    23
                Fis, Fit, Fst =   0.0818   0.4125   0.3602
   Contingency Pearson chi-sq =  16.0
   Nominal degrees of freedom =   2
              Nominal P-value =   0.0003
      Equalled or exceeded by =   1/10001 simulated values (0.0001)
 Mean (Var) simulated chi-sqs =   1.7 (   2.7)

 -------- Combined transmission test for "    rhesus" --------
   Allele   Affected   Unaffected   Total   E(Aff)    Z    P
 -------------------------------------------------------------
     1       13 (.59)      0 (.00)     13    7.9    2.8 0.0053
     2        2 (.09)      2 (.25)      4    2.4   -0.4 0.6653
     3        7 (.32)      6 (.75)     13   11.7   -2.6 0.0094
 -------------------------------------------------------------
   Total     22            8           30

                    marker(-) =     1
       No. trait(+) marker(+) =    15
          No. useful sibships =     4
 Global association statistic =   2.5
           Degrees of freedom =   2
      Equalled or exceeded by =  20/ 1019 simulated values (0.0196)
 Mean (Var) simulated chi-sqs =   0.5 (   0.3)

Because the assoc empirical P-value is generated by gene dropping, in a single pedigree like this it gives a valid test of cosegregation of alleles at the Rhesus locus with elliptocytosis. The "1" allele was never seen in an unaffected individual. The FBAT (combined transmission test) confirms children with elliptocytosis received the "1" allele more often than expected.

Now I'll use Sib-pair to output the files that Superlink requires for a parametric linkage analysis. I'll model elliptocytosis as due to a rare fully penetrant dominant locus:

>> set sml 0.0001 1 1 0
SML model: P(A)=0.000100 Pen(AA)=1.000 Pen(AB)=1.000 Pen(BB)=0.000

>> write locus linkage el.loc

Writing LINKAGE type locus file: el.loc

>> write ppd el.ppd

Writing post-Makeped Linkage style pedigree file: el.ppd

I'll then change a couple of parameters (the starting recombination distance between ellipto and rhesus, and the evaluation step size) in the locus file "el.loc" using my favourite text editor. Then I'll run Superlink (for this to work you must have already installed Superlink into your directory path):

>> $ gvim el.loc
>> $ superlink el.loc el.ppd

....
                -6.391669        -51.787565     3.265655
                -5.268026        -51.752765     3.280768
                -4.169080        -51.760457     3.277427
                -3.093770        -51.834167     3.245415
                -2.041100        -52.026618     3.161835
                -1.010135        -52.505211     2.953985
   rhesus
                0.000000         -56.912210     1.040050
--------------------------------------------------------------
  MAX           -5.268026        -51.752765     3.280768
--------------------------------------------------------------

The maximum lod is 3.28 at 5.3 cM distant from the marker. The Rh-linked elliptocytosis gene (EPB41) is now known to lie at 29.2 Mbp from the 1p telomere (1p33-p32), and the Rhesus complex of genes at about 25.5 Mbp. They are separated by approximately 3.9 cM on the Rutgers linkage map.

We can actually fit the equivalent model in Sib-pair by creating a marker to represent elliptocytosis. and using the lod command.

>> set locus traitlocus marker
>> if (ellipto) then traitlocus="1/2"
>> if (not ellipto) then traitlocus="1/1"
>> set freq traitlocus 0.9999 0.0001

NOTE:  The marker "traitlocus" has prespecified allele frequencies:
       1=0.9999 2=0.0001

>> lod traitlocus rhesus

------------------------------------
Two-point lod score linkage analysis
------------------------------------

NOTE:  Population allele frequencies for "traitlocus" are prespecified as:
       1=0.9999 2=0.0001

"traitlocus" (2 alleles) v. "rhesus" (3 alleles).

 LogLikelihood   LOD       Theta
--------------  ---------  -------
    -59.3092        0.000   0.5000
    -56.2217        1.341   0.0001
    -52.5057        2.955   0.0100
    -51.9104        3.213   0.0250
    -51.7533        3.282   0.0500
    -51.8939        3.220   0.0750
    -52.1640        3.103   0.1000
    -52.9107        2.779   0.1500
    -53.8264        2.381   0.2000
    -55.9736        1.449   0.3000
    -58.2181        0.474   0.4000

>> quit

This job took   8.8 minutes

LANGUAGE ELEMENTS

There are over two hundred different commands in Sib-pair now (see Appendix). The language has an irregular grammar (like all good computer languages), with a mixture of three sorts of statements:

These act upon two types of data:

Data

A scalar constant is either a number or a genotype value. A genotype is made up of two alleles separated by a forward slash and enclosed in a set of double quotes. Each allele can take the values 1...999 or a single letter A...Z and a...z (noting that "x" is reserved as a missing allele).


>> 10
=>  10.
>> "a/b"
=>    a/b   

There are a few named constants:


>> pi
=>  3.14159265359
>> y
=> 1.
>> n
=> 0.
>> x
=> MISS
>> .
=> MISS

A variable is a column or vector of values in the dataset, and is identified by a unique variable name. The dataset is a rectangular matrix of numbers or genotypes (stored in computer memory), each row representing data for an individual. A variable can be of eight types:

marker autosomal genotype
xmarker X-chromosome genotype
ymarker Y genotype
mitchondrialmitochondrial
haploid Y or mitochondrial genotype
affection dichotomous trait value
categorical categorical trait value
quantitativecontinuous trait value

The name and column location of a variable is declared by using the set locus command. Usually the values for a variable will be read into Sib-pair from a file (see below). Each row of the dataset is associated with a pedigree and individual ID, IDs of parents of the individual, sex of the individual.

The "head" command prints out the first 10 values of each variable. One can get an equivalent result using the "print" command:


>> head

!
!               S                                               p
!               e A                                             r
!   Per Fat Mot x D   onset      age    D14S52  D14S43  D14S53  o
!
AM  101 x   x   m y   43.0000     x       x/x     x/x     x/x   n
AM  102 x   x   f n   77.0000   77.0000  83/87  183/183 151/151 n
AM  203 x   x   f n     x         x      83/93  171/181 151/157 n
AM  201 101 102 m y   41.0000     x       x/x     x/x     x/x   y
AM  202 101 102 m y   43.0000     x       x/x     x/x     x/x   n
AM  204 101 102 m n   63.0000   63.0000   x/x     x/x     x/x   n
AM  205 101 102 m y   46.0000     x      83/87  183/183 151/155 n
AM  206 101 102 m y   41.0000     x      83/87  183/183 151/155 n
AM  207 101 102 m n   52.0000   52.0000   x/x     x/x     x/x   n
AM  208 101 102 m y   41.0000     x      83/87  183/183 151/155 n

>> set print observed
>> print index <= 10

Print where "index < = 10":

ped=AM id=101 fa=x mo=x sex=m AD=y onset=43.0000 proband=n
ped=AM id=102 fa=x mo=x sex=f AD=n onset=77.0000 age=77.0000 D14S52=83/87 D14S43=183/183 D14S53=151/151 proband=n
ped=AM id=203 fa=x mo=x sex=f AD=n D14S52=83/93 D14S43=171/181 D14S53=151/157 proband=n
ped=AM id=201 fa=101 mo=102 sex=m AD=y onset=41.0000 proband=y
ped=AM id=202 fa=101 mo=102 sex=m AD=y onset=43.0000 proband=n
ped=AM id=204 fa=101 mo=102 sex=m AD=n onset=63.0000 age=63.0000 proband=n
ped=AM id=205 fa=101 mo=102 sex=m AD=y onset=46.0000 D14S52=83/87 D14S43=183/183 D14S53=151/155 proband=n
ped=AM id=206 fa=101 mo=102 sex=m AD=y onset=41.0000 D14S52=83/87 D14S43=183/183 D14S53=151/155 proband=n
ped=AM id=207 fa=101 mo=102 sex=m AD=n onset=52.0000 age=52.0000 proband=n
ped=AM id=208 fa=101 mo=102 sex=m AD=y onset=41.0000 D14S52=83/87 D14S43=183/183 D14S53=151/155 proband=n


There are no vector constants. One way to specify new values for an existing variable is to use the "edit" command


>> edit Smith John Chol to 6.85
Changing Smith--John at locus "Chol" from   5.5000 to   6.8500

The alternative is to read in new values from a file using the "update" command.

Quantitative traits can also be dates, taking the form of 8 digit integers (YYYYmmdd). These can be converted to and from Julian days (number of days since an epoch, such as 01-Jan-1970 - the usual computing epoch; 16-Nov-1858 - the "reduced" Julian day or 01-Jan-4713 BCE - standard Julian day).


>> date 19000101
Date: 19000101 =    -25567
>> date onset
Converting dates at "onset" from Gregorian to Julian (epoch="1970-01-01").

Commands

Entering a command in Sib-pair causes the program to apply an algorithm to data. This leads either to a manipulation of the variables in the dataset "in-place", or the printing of a result to the standard output.

One issues one command per statement. The statement will start with the command name, that can usually be shortened to the first three letters, followed by white-space separated positional parameters: either numerical constants, modifier keywords or variable names.


>> help Globals
>> chi 2 2
>> pchisq 3.84 1
>> set locus Chol quantitative
>> set locus age quantitative
>> read pedigree cholesterol.ped  
>> run
>> varcom Chol ae covariates age

These commands respectively printed out the on-line help summary for all global commands, performed a contingency chi-square analysis of a two-by-two table to be entered from the keyboard, printed the P-value for a specified chi-square value and degree of freedom, defined the names of two variables in the dataset for analysis, declared the file to read the dataset from, read that dataset into memory ("run"), and fitted a biometrical genetic linear mixed model to the "Chol" variable.

Algebraic expressions

The algebraic operations (which include if/then type statements) involve a set of standard operations applied either to constants, or to the vector of values for a given variable in the dataset. They return a value of the same size. If the result is a dataset variable, this results in that variable being updated, otherwise a result is written to the standard output.

A new variable in a dataset can be created by a "set locus" declaration if a dataset has already been read into memory by the "run" command.


>> 10^2 + sqrt 7
=>   102.64575131106459
>> "a/b" + 1
=> b/c
>> set locus logChol quantitative
>> logChol = log(Chol+1)
>> set locus hichol affection
>> if (male and Chol > 7) then hichol=y else hichol=n  

Note that because command names can be shortened, there can be a clash between commands and the name of a variable starting an algebraic expression. This can be avoided by enclosing the variable name in brackets, forcing it to be evaluated as (part of) an expression (or using the let command).


>> set loc var qua    
>> (var) = 1
# or equivalently
>> (var = 1)
# or equivalently
>> let var = 1

A sequence of expressions can be evaluated together, speeding up execution and extending usefulness of the if-then construction.

>> (a = index) (b=a^2-2*a+1)
>> if (b<0) then (a = b) (b=1/b)

Algebraic expressions cannot be passed directly to commands, so

 

>> var (log(Chol+1))       

does not work. However, a number of analysis commands allow simple comparison operators eg

 

>> tdt Chol > 6.5   
>> ass Group odd
 

are legal.

Macros

Macros (by definition) can contain any type of the last two types of command. Macro functions take positional parameters like analytic commands do.


>> macro add    
add> %1 + %2
add> ;;;;
>> add 2 2
=> 4

The macro declaration above consists of the macro keyword followed by the name of the new macro. The lines up until the final line (either ";;;;" or an empty line) are the body of the macro.

The macro parameters are represented by "%1", "%2", etc. When the macro is called, the "%1" in the body will be replaced by the first token (word, number) after the macro name, and so forth, and the new string will then be evaluated as either a command or algebraic expression.

In the example above, the newly defined add command adds together the first two arguments following the command name.

Macros can contain multiple commands and expressions.

# Comment lines are prefixed by "#" or "!"
# Draw a pedigree (need to have dot and gv):
# First set up macro
>> macro draw
draw> select pedigree %1
draw> write dot %%.dot %2 %3
draw> $ dot -Tps -o %%.ps %%.dot
draw> $ gv --scale=-1 %%.ps
draw> $ rm %%.dot %%.ps
draw> unselect
draw>
# Call macro
>> draw SNW AD D14S43
Example Pedigree Drawing

This macro first selects a pedigree or pedigrees named by the first argument of the macro (select temporarily subsets the dataset, and allows wild-card searches). It then calls the "write dot" command to create a file in the dot graphical language to draw the pedigree as a marriage node diagram. The circle (female) or square (male) representing an individual in the drawing will be shaded if that person is affected at the trait (the second argument to the macro). The "$" command allows two other programs to be called by Sib-pair, dot and gv (ghostview). Finally, the intermediate files are deleted, and all the other pedigrees returned to the list of active pedigrees.

The intermediate file names were created using the %% variable, which contains a (random) character string uniquely identifying the current macro call.

Macros can also be used as variables. The value of the macro is set using the macro command, but the contents are accessed using %<macro_name>". A macro variable can appear anywhere within a normal command.


>> macro a=1
>> macro b=+
>> %a%b%a
=> 2.
>> macro a=D14S52 D14S43
>> tab %a
------------------------------------------------
Cross-tabulation of "D14S52" ... "D14S43"
------------------------------------------------
[...]


When a macro is evaluated as a macro variable, macro positional parameters (%1, %2 etc) within the macro body are not evaluated.

Macro iteration

This is another recent expansion of the language. Many Sib-pair commands automatically iterate over a class of eligible variables, for example the list of all active marker loci. In this case, the "keep" and "drop" commands can be used refine the selection.

For any command, a list of tokens enclosed in braces causes the command to be called repeatedly, substituting each member of the list into the command line at that location.


>> {1 2} + 1
=> 2.
=> 3.
>> tab { CHD carrier } ldl

------------------------------------------------
Cross-tabulation of "CHD" ... "ldl"
------------------------------------------------
                      ldl
CHD               1/1          1/2          2/2     Allele Freq  Exact HWE-P
           -----------------------------------------------------------------
     n         12 (.462)    11 (.423)     3 (.115)  0.6731  0.3269    1.0000
     y          0 (.000)     1 (1.00)     0 (.000)  0.5000  0.5000    1.0000
[...]
------------------------------------------------
Cross-tabulation of "carrier" ... "ldl"
------------------------------------------------
                      ldl
carrier           1/1          1/2          2/2     Allele Freq  Exact HWE-P
           -----------------------------------------------------------------
     n         24 (.774)     6 (.194)     1 (.032)  0.8710  0.1290    0.4027
     y          0 (.000)    13 (.867)     2 (.133)  0.4333  0.5667    0.0079

>> set loc a{1 2 3} aff; ls
a1* a2* a3*
 3  active traits;  0  active markers.


As can be seen in the last example, iteration is implemented in a macro fashion, and usually precedes evaluation of the other token. The exception to this is the handling of class summary tokens such as "$m" (all active autosomal markers) when they are the only value in the list. Multiple iteration and nested iteration are implemented.


>> 1{1 2} + {1 2}
=> 12.
=> 13.
=> 13.
=> 14.

>> 1{1 2 {3 4}}
=>  11.
=>  12.
=>  13.
=>  11.
=>  12.
=>  14.

DATASETS

Sib-pair can read in datasets in a variety of formats:

In addition, it can also merge data into existing pedigrees as a: