||||
|\/|
|/\|
||||

SIB-PAIR can automate
the process of using other programs

by

David L. Duffy

(2010-2012)

David L. Duffy, MBBS PhD.
Queensland Institute of Medical Research,
300 Herston Road,
Herston, Queensland 4029, Australia.
Email: davidD@qimr.edu.au

Last Updated: 2012-07-25

CONTENTS

INTRODUCTION

Sib-pair is a computer program for the manipulation and statistical analysis of genetic datasets. It implements a simple interpreted language in which the user writes commands. These can be entered interactively, or submitted as a batch from a text file in the usual way.

As well as implementing a number of analysis methods, Sib-pair can export data for analysis by a number of other programs, including APM, Arlequin, ASPEX, CRIMAP, FISHER, GAS, GDA, Genehunter, LINKAGE, LOKI, MENDEL, MERLIN, MORGAN, PAP, PLINK, SAGE, STRUCTURE, Superlink.

The combination of programmability and prepackaged data export commands allows one to use Sib-pair to automate analysis using a variety of other computer packages. Some of this could be done using other programming languages (Unix shell, Python etc), but Sib-pair's domain specific language is quick and compact.

PROVIDED MACROS

I have written a library of Sib-pair macros that call other programs to perform linkage, association, and segregation analysis. Most are straightforward to use. These are usually distributed in a single Sib-pair script file.

Command Program Analysis
gdt GDT familial association
lamp LAMP joint linkage and association
menasslink Mendel joint linkage and association
mendelass Mendel association
mendelfreq Mendel MLE allele frequencies
mendelvc Mendel variance components
merlin merlin linkage and association
merlinpar merlin lod score linkage
mqls32 mqls32 MQLS
MQLStest mqls MQLS
qtdt QTDT QTDT
relpair RELPAIR pedigree checking
simwalkhap Simwalk2 haplotyping
superlink Superlinkmultipoint linkage
supertwo Superlinktwopoint linkage
varass MERLIN association
wombat WOMBAT variance components

Most of these commands take zero, one or two arguments.

mendelvc

This uses mendel to carry out a univariate variance components analysis To use, issue the command keyword along with the name of the quantitative trait followed by the names of any covariates:


>> mendelvc mytrait covariate1 covariate2


merlin

This uses merlin to carry out any of its analyses using the currently active traits and markers. To use, select the trait and markers using the "keep" command. Then issue the command:


>> keep myqtrait mar1 mar2 mar4
>> merlin --vc


merlinpar

This uses merlin to carry out a multipoint linkage analysis currently a single active binary trait and multiple markers. Two models are fitted automatically: one recessive, one dominant. To use, select the trait and markers using the "keep" command. Then issue the command:


>> keep mytrait mar1 mar2 mar4
>> merlinpar


superlink

This uses Superlink to carry out a "Genehunter-style" multipoint linkage analysis using the (single) currently active binary trait and markers. To use, select the trait and markers using the "keep" command, and the trait locus model using the set sml command. Then issue the command:


>> set sml 0.01 1 0 0
>> keep mytrait mar1 mar2 mar4
>> superlink


The results are written to the screen.

supertwo

This uses Superlink to carry out a two-point linkage analysis. Set the trait locus model using the set sml command. Then issue the command:


>> supertwo mytrait mar1


CREATING YOUR OWN MACROS

Sib-pair actually has two mechanisms for writing such programs: the simple macro system in the main analysis language, and the embedded Scheme (LISP) interpreter. For most jobs, the simple macro facility will be sufficient.


macro superlink
The macro command declares a new macro. The text below down to the ";;;;" is the body of the macro.

 write locus superlink %%.dat
 write ppd %%.ppd
The locus information is written in the format used by Superlink, and the pedigrees are written to a "post-Makeped" Linkage style pedigree file. The files are given unique names using the the macro's unique 5 letter ID ("%%").

 $ superlink %%.dat %%.ppd
The $ command calls the operating system to run Superlink.

 file delete %%.dat %%.ppd
;;;;
The file delete command deletes the two files we created.

I'll go through a more complicated example, the MQLStest command. If you have used this macro, you will have realized that it has some side effects on your original dataset (it renumbers the IDs for all the records). Many of these steps could be carried out in other languages (such as the unix shell), but a few, such as ID renumbering and writing Linkage pedigree files, require a bit more programming, so having them already set up in Sib-pair makes life a lot easier.


#
# Calling the original MQLS program
# 
macro prev=0.05
macro MQLStest

This declares a new macro, with the name MQLStest. Unlike core Sib-pair commands, macros must be called by their full name. The lines starting with "#" are comments. Before we declare it, we set the trait prevalence for the MQLS analysis to 5%.

macro ple <- ple
set ple -1 silent

These commands save the current print level, and then change the print level so the macro runs quietly.

unique_ids sequential
write linkage mqls%%.dat

the unique_ids sequential command generates sequential numerical IDs for all active individuals. This is required by the KinInbcoef and MQLStest programs. The phenotypes and genotypes are written to a Linkage style pedigree file to be read by the MQLStest program. The file is called "mqls<STRING>.dat", where <STRING> is the macro's unique 5 letter ID ("%%").

out mqls%%.prev
echo %prev
out

This writes the prevalence to a file to be read by the MQLStest program.

drop
set mis 0
set pri 11110
write kinped%%
set pri 11000
write kinlist%%

This writes two files required by KinInbcoef. One contains only the pedigree ID, individual ID, father ID and mother ID for each active individual. The second contains only the pedigree ID and individual ID. The set printstyle command controls the first five fields of any pedigree file output (ped,id,fa,mo,sex).

$ KinInbcoef kinped%% kinlist%% mqls%%.kin
$ MQLStest mqls%%.dat mqls%%.kin mqls%%.prev 1

Finally we can actually run the two external programs.

echo ==============================================
echo Results from MQLS Program
echo ==============================================
echo 
file cat MQLStest.top
Here we write out summary output from the MQLStest program.

file delete kinped%% kinlist%% mqls%%.kin mqls%%.dat mqls%%.prev
undrop
set ple %ple silent
;;;;

Finally, we clean up our work files.

An extended scripting example

Here is an example combining Sib-pair commands, macros, and external calls to generate files for a bivariate model to be fitted by the WOMBAT package. There are no preset commands to write data files in the format that WOMBAT requires, but we can easily produce them using existing commands.


#
# A mixed linear model of association using WOMBAT
# First read in data (not provided in this case)
#
eval (memory-allocate 10000)
read bin ../mt20090531.bin.gz
#
# Create variables
#
select cmm
if (cmm==x) then cmm=n
unselect
macro traits=tnc molenum
macro covariates=age sx cmm gene agegeno sxgeno
set loc tnc qua . cube-root transformed total mole count
set loc sx  qua .
set loc gene qua . candidate SNP dosage score
set loc traitno qua .
set loc agegeno qua .
set loc sxgeno qua .
tnc=V1_NT^(1/3)
sx=male+1
recast cmm
cmm=cmm+1
#
# Estimate gene dose and generate interaction terms
#
gpe rs12203592 gene
agegeno=age*gene
sxgeno=sx*gene
mztwin drop
#
# Write pedigree triples (unique ID, father and mother IDs).
# The printstyle command controls which columns from ped,id,fa,mo,sex
# are printed. The unique_id command gives each individual a unique ID.
#
set printstyle 01110
uniq seq
file delete wombat2.ped
drop
set ple -2
set mis "0"
output wombat2.ped
print 1
output
undrop  
order traitno %traits %covariates
#
# write one record per trait
# 
set printstyle 01000
file delete wombat2.dat
traitno=1
drop molenum
out wombat2.tmp
print where tnc ^= x and age ^=x
out
traitno=2
undrop molenum
drop tnc
out wombat2.tmp
print where molenum ^= x and age ^=x
out
$ sort -n -k1,1 -k2,2 wombat2.tmp > wombat2.dat
file delete wombat2.tmp
#
# generate parameter file
#
file delete wombat2.par
out wombat2.par
echo COM Association Analysis of %traits v. rs12203592 
set ple 0
eval (let ((nargs (length (string-split traits)))) \
       (begin \
         (if (= nargs 1) \
           (display "ANAL UNI") \
           (begin (display "ANAL MUV ") (display nargs))) \
         (newline)))
set ple -2 silent
echo PEDS wombat2.ped
echo
echo DATA wombat2.dat
echo tr1 { traitno animal tnc %covariates } 
echo tr2 { traitno animal molenum %covariates } 
echo END DATA
echo
echo MODEL
echo RAN animal nrm
echo COV age(1)
echo FIX sx
echo FIX cmm
echo COV gene(1)
echo COV agegeno(1)
echo COV sxgeno(1)
echo trait tnc 1
echo trait molenum 2
echo END MODEL
echo
echo SPECIAL
echo COVZER gene(1) FIT
echo END SPECIAL
echo
echo VAR animal 2 2
echo 0.55 .23 0.1
echo      .45 0.1
echo VAR residuals 2 2
echo 0.05  0.005
echo       0.003
echo 
out