|||| |\/| |/\| ||||
David L. Duffy, MBBS PhD.
Queensland Institute of Medical Research,
300 Herston Road,
Herston, Queensland 4029, Australia.
Email: davidD@qimr.edu.au
Last Updated: 2012-07-25
Sib-pair is a computer program for the manipulation and statistical analysis of genetic datasets. It implements a simple interpreted language in which the user writes commands. These can be entered interactively, or submitted as a batch from a text file in the usual way.
As well as implementing a number of analysis methods, Sib-pair can export data for analysis by a number of other programs, including APM, Arlequin, ASPEX, CRIMAP, FISHER, GAS, GDA, Genehunter, LINKAGE, LOKI, MENDEL, MERLIN, MORGAN, PAP, PLINK, SAGE, STRUCTURE, Superlink.
The combination of programmability and prepackaged data export commands allows one to use Sib-pair to automate analysis using a variety of other computer packages. Some of this could be done using other programming languages (Unix shell, Python etc), but Sib-pair's domain specific language is quick and compact.
I have written a library of Sib-pair macros that call other programs to perform linkage, association, and segregation analysis. Most are straightforward to use. These are usually distributed in a single Sib-pair script file.
Command | Program | Analysis |
---|---|---|
gdt | GDT | familial association |
lamp | LAMP | joint linkage and association |
menasslink | Mendel | joint linkage and association |
mendelass | Mendel | association |
mendelfreq | Mendel | MLE allele frequencies |
mendelvc | Mendel | variance components |
merlin | merlin | linkage and association |
merlinpar | merlin | lod score linkage |
mqls32 | mqls32 | MQLS |
MQLStest | mqls | MQLS |
qtdt | QTDT | QTDT |
relpair | RELPAIR | pedigree checking |
simwalkhap | Simwalk2 | haplotyping |
superlink | Superlink | multipoint linkage |
supertwo | Superlink | twopoint linkage |
varass | MERLIN | association |
wombat | WOMBAT | variance components |
Most of these commands take zero, one or two arguments.
This uses mendel to carry out a univariate variance components analysis To use, issue the command keyword along with the name of the quantitative trait followed by the names of any covariates:
>> mendelvc mytrait covariate1 covariate2 |
This uses merlin to carry out any of its analyses using the currently active traits and markers. To use, select the trait and markers using the "keep" command. Then issue the command:
>> keep myqtrait mar1 mar2 mar4 >> merlin --vc |
This uses merlin to carry out a multipoint linkage analysis currently a single active binary trait and multiple markers. Two models are fitted automatically: one recessive, one dominant. To use, select the trait and markers using the "keep" command. Then issue the command:
>> keep mytrait mar1 mar2 mar4 >> merlinpar |
This uses Superlink to carry out a "Genehunter-style" multipoint linkage analysis using the (single) currently active binary trait and markers. To use, select the trait and markers using the "keep" command, and the trait locus model using the set sml command. Then issue the command:
>> set sml 0.01 1 0 0 >> keep mytrait mar1 mar2 mar4 >> superlink |
The results are written to the screen.
This uses Superlink to carry out a two-point linkage analysis. Set the trait locus model using the set sml command. Then issue the command:
>> supertwo mytrait mar1 |
Sib-pair actually has two mechanisms for writing such programs: the simple macro system in the main analysis language, and the embedded Scheme (LISP) interpreter. For most jobs, the simple macro facility will be sufficient.
macro superlink | The macro command declares a new macro. The text below down to the ";;;;" is the body of the macro. |
write locus superlink %%.dat write ppd %%.ppd | The locus information is written in the format used by Superlink, and the pedigrees are written to a "post-Makeped" Linkage style pedigree file. The files are given unique names using the the macro's unique 5 letter ID ("%%"). |
$ superlink %%.dat %%.ppd | The $ command calls the operating system to run Superlink. |
file delete %%.dat %%.ppd ;;;; | The file delete command deletes the two files we created. |
I'll go through a more complicated example, the MQLStest command. If you have used this macro, you will have realized that it has some side effects on your original dataset (it renumbers the IDs for all the records). Many of these steps could be carried out in other languages (such as the unix shell), but a few, such as ID renumbering and writing Linkage pedigree files, require a bit more programming, so having them already set up in Sib-pair makes life a lot easier.
# # Calling the original MQLS program # macro prev=0.05 macro MQLStest |
This declares a new macro, with the name MQLStest. Unlike core Sib-pair commands, macros must be called by their full name. The lines starting with "#" are comments. Before we declare it, we set the trait prevalence for the MQLS analysis to 5%. |
macro ple <- ple set ple -1 silent | These commands save the current print level, and then change the print level so the macro runs quietly. |
unique_ids sequential write linkage mqls%%.dat |
the unique_ids sequential command generates sequential numerical IDs for all active individuals. This is required by the KinInbcoef and MQLStest programs. The phenotypes and genotypes are written to a Linkage style pedigree file to be read by the MQLStest program. The file is called "mqls<STRING>.dat", where <STRING> is the macro's unique 5 letter ID ("%%"). |
out mqls%%.prev echo %prev out |
This writes the prevalence to a file to be read by the MQLStest program. |
drop set mis 0 set pri 11110 write kinped%% set pri 11000 write kinlist%% |
This writes two files required by KinInbcoef. One contains only the pedigree ID, individual ID, father ID and mother ID for each active individual. The second contains only the pedigree ID and individual ID. The set printstyle command controls the first five fields of any pedigree file output (ped,id,fa,mo,sex). |
$ KinInbcoef kinped%% kinlist%% mqls%%.kin $ MQLStest mqls%%.dat mqls%%.kin mqls%%.prev 1 |
Finally we can actually run the two external programs. |
echo ============================================== echo Results from MQLS Program echo ============================================== echo file cat MQLStest.top |
Here we write out summary output from the MQLStest program. |
file delete kinped%% kinlist%% mqls%%.kin mqls%%.dat mqls%%.prev undrop set ple %ple silent ;;;; |
Finally, we clean up our work files. |
Here is an example combining Sib-pair commands, macros, and external calls to generate files for a bivariate model to be fitted by the WOMBAT package. There are no preset commands to write data files in the format that WOMBAT requires, but we can easily produce them using existing commands.
# # A mixed linear model of association using WOMBAT # First read in data (not provided in this case) # eval (memory-allocate 10000) read bin ../mt20090531.bin.gz # # Create variables # select cmm if (cmm==x) then cmm=n unselect macro traits=tnc molenum macro covariates=age sx cmm gene agegeno sxgeno set loc tnc qua . cube-root transformed total mole count set loc sx qua . set loc gene qua . candidate SNP dosage score set loc traitno qua . set loc agegeno qua . set loc sxgeno qua . tnc=V1_NT^(1/3) sx=male+1 recast cmm cmm=cmm+1 # # Estimate gene dose and generate interaction terms # gpe rs12203592 gene agegeno=age*gene sxgeno=sx*gene mztwin drop # # Write pedigree triples (unique ID, father and mother IDs). # The printstyle command controls which columns from ped,id,fa,mo,sex # are printed. The unique_id command gives each individual a unique ID. # set printstyle 01110 uniq seq file delete wombat2.ped drop set ple -2 set mis "0" output wombat2.ped print 1 output undrop order traitno %traits %covariates # # write one record per trait # set printstyle 01000 file delete wombat2.dat traitno=1 drop molenum out wombat2.tmp print where tnc ^= x and age ^=x out traitno=2 undrop molenum drop tnc out wombat2.tmp print where molenum ^= x and age ^=x out $ sort -n -k1,1 -k2,2 wombat2.tmp > wombat2.dat file delete wombat2.tmp # # generate parameter file # file delete wombat2.par out wombat2.par echo COM Association Analysis of %traits v. rs12203592 set ple 0 eval (let ((nargs (length (string-split traits)))) \ (begin \ (if (= nargs 1) \ (display "ANAL UNI") \ (begin (display "ANAL MUV ") (display nargs))) \ (newline))) set ple -2 silent echo PEDS wombat2.ped echo echo DATA wombat2.dat echo tr1 { traitno animal tnc %covariates } echo tr2 { traitno animal molenum %covariates } echo END DATA echo echo MODEL echo RAN animal nrm echo COV age(1) echo FIX sx echo FIX cmm echo COV gene(1) echo COV agegeno(1) echo COV sxgeno(1) echo trait tnc 1 echo trait molenum 2 echo END MODEL echo echo SPECIAL echo COVZER gene(1) FIT echo END SPECIAL echo echo VAR animal 2 2 echo 0.55 .23 0.1 echo .45 0.1 echo VAR residuals 2 2 echo 0.05 0.005 echo 0.003 echo out