Utilities for use with Sib-pair

These are a number of small programs that manipulate GAS type pedigree files or Sib-pair output. To run most of them you need a version of awk. Awk is standard on Unix -- there are several good free versions available for DOS, such as mawk, and the DJGPP port of the GNU gawk. I strongly suggest that it is worth learning some elementary awk programming, as it offers a very flexible, easy and quick method for manipulating ascii based data files of the type Sib-pair uses. For example:

awk '$1=="0001"' file.ped

prints out all records where the first field equals 0001 (ie extracts pedigree 0001 from the file). If using awk under DOS, the quotation marks will have to be reversed, as in this command:

mawk "$1=='0001' {print $2,$5, log($6)}" file.ped

which prints out the second and fifth fields (ie id and sex), and the log transformed value of the sixth field (the first trait) for members of pedigree 0001. It has the classic form of an awk command:

pattern_to_match { action_to_be_taken }

Finally, an awk program is called as in this example:

awk -f addpar.awk file.ped > newfile.ped

which applies addpar.awk to file.ped, writing the resulting new pedigree file to newfile.ped. We can change the value of a constant, say the column number of the trait to be analysed by the program:

awk -f describe.awk -v var=6 file.ped

giving the mean, median etc for the 6th column of file.ped.


Add missing but necessary parents to a pedigree file.


Count the number of full and half sib-pairs in a GAS type pedigree file. Can also give number of affected sib-pairs.


Emulate cpp if access restricted or unavailable. Will need to add a #define line in the file to be analysed eg #define SUN.


Comments out listed pedigrees within a GAS type pedigree file.


Generate descriptive statistics for Nth column of numbers. Ignores comment lines.


A UNIX shell script that attempts to pinpoint the individual in a pedigree whose genotype is causing a mendelian inconsistency. It sequentially sets every typed person in the pedigree to missing, then runs Sib-pair to see if the inconsistency has gone away. Called as:

ksh dropone.sh pedigree_ID first_column_of_genotype pedigree_file.


Count number of fields in each line, flag any inconsistencies in number compared to previous line (excluding comment lines), and write an (editable) awk script (list.awk) to write out the dataset. With this it is very easy to reorder traits etc.


Converts a GAW10 dataset to GAS type pedigree file.


Produce GAS type pedigree file of dummy unrelated individuals with specified gene frequencies at a marker. Since the released version of Sib-pair does not read in gene frequencies (other versions check for a named marker in a database of frequencies), this allows analysis using "known" population frequencies. Since such figures are often based on 50-100 subjects (as big as your own study really), why not obtain the raw datafile and include it in your own pedigree file?


Writes a pedigree file for the Pedview program. This is a freely available program that draws a "marriage node" like graph. Can only write one phenotype per individual at a time, but is designed for screen use -- mouse can select a particular person, so that only he/she and offspring are shown etc.


The awk version of Hinds' perl script to summarize sib_ibd output (Aspex).


This awk script reorders the loci in a pedigree file, where the appropriate Sib-pair control file was file1, so that they can be read in by control file file2. If a locus is present in file1 but not file2, it will be deleted from the new pedigree file. It would be called as:

awk -v file1=old.in -v file2=new.in -f reorder.awk old.ped > new.ped.

re-sort.bat and re-sort.sh

This rewrite GAS type pedigree file so that individuals appear in a specified (by key file) order (need awk, paste, sort). The default ordering produced by Sib-pair, though logical, is often different from that originally chosen by the user.


Summarize Sib-pair output as a HTML document. This searches out TDT, Haseman-Elston, association and APM test statistics and P-values and tabulates them by locus, followed by the detailed output.


Summarize Sib-pair output. This searches out TDT, Haseman-Elston, association and APM test statistics and P-values and tabulates them by locus.


This summarizes CRI-MAP chrompic output, to highlight double recombinants.