Utilities for use with Sib-pair
These are a number of small programs that manipulate GAS type pedigree files
or Sib-pair output. To run most of them you need a version of awk. Awk is
standard on Unix -- there are several good free versions available for DOS,
such as mawk, and the DJGPP port of the GNU gawk. I strongly suggest that
it is worth learning some elementary awk programming, as it offers a very
flexible, easy and quick method for manipulating ascii based data files of
the type Sib-pair uses. For example:
awk '$1=="0001"' file.ped
prints out all records where the first field equals 0001 (ie extracts pedigree
0001 from the file). If using awk under DOS, the quotation marks will have
to be reversed, as in this command:
mawk "$1=='0001' {print $2,$5, log($6)}" file.ped
which prints out the second and fifth fields (ie id and sex), and the log
transformed value of the sixth field (the first trait) for members of
pedigree 0001. It has the classic form of an awk command:
pattern_to_match { action_to_be_taken }
Finally, an awk program is called as in this example:
awk -f addpar.awk file.ped > newfile.ped
which applies addpar.awk to file.ped, writing the resulting new pedigree file
to newfile.ped. We can change the value of a constant, say the column number
of the trait to be analysed by the program:
awk -f describe.awk -v var=6 file.ped
giving the mean, median etc for the 6th column of file.ped.
addpar.awk
Add missing but necessary parents to a pedigree file.
countsp.awk
Count the number of full and half sib-pairs in a GAS type pedigree file. Can
also give number of affected sib-pairs.
cpp.awk
Emulate cpp if access restricted or unavailable. Will need to add a
#define line in the file to be analysed eg #define SUN.
delped.sh
Comments out listed pedigrees within a GAS type pedigree file.
describe.awk
Generate descriptive statistics for Nth column of numbers. Ignores
comment lines.
dropone.sh
A UNIX shell script that attempts to pinpoint the individual in a pedigree
whose genotype is causing a mendelian inconsistency. It sequentially sets
every typed person in the pedigree to missing, then runs Sib-pair to see
if the inconsistency has gone away. Called as:
ksh dropone.sh pedigree_ID first_column_of_genotype pedigree_file.
fields.awk
Count number of fields in each line, flag any inconsistencies in number
compared to previous line (excluding comment lines), and write an (editable)
awk script (list.awk) to write out the dataset. With this it is very easy
to reorder traits etc.
gawprep.awk
Converts a GAW10 dataset to GAS type pedigree file.
genpop.awk
Produce GAS type pedigree file of dummy unrelated individuals with specified
gene frequencies at a marker. Since the released version of Sib-pair does
not read in gene frequencies (other versions check for a named marker in a
database of frequencies), this allows analysis using "known" population
frequencies. Since such figures are often based on 50-100 subjects (as big
as your own study really), why not obtain the raw datafile and include it in
your own pedigree file?
pedview.awk
Writes a pedigree file for the Pedview program. This is a freely available
program that draws a "marriage node" like graph. Can only write one phenotype
per individual at a time, but is designed for screen use -- mouse can select
a particular person, so that only he/she and offspring are shown etc.
rec_dist.awk
The awk version of Hinds' perl script to summarize sib_ibd output (Aspex).
reorder.awk
This awk script reorders the loci in a pedigree file, where the appropriate
Sib-pair control file was file1, so that they can be read in by
control file file2. If a locus is present in file1 but not file2,
it will be deleted from the new pedigree file. It would be called as:
awk -v file1=old.in -v file2=new.in -f reorder.awk old.ped
> new.ped.
re-sort.bat and re-sort.sh
This rewrite GAS type pedigree file so that individuals appear in
a specified (by key file) order (need awk, paste, sort). The default
ordering produced by Sib-pair, though logical, is often different from that
originally chosen by the user.
sp2html
Summarize Sib-pair output as a HTML document. This searches out TDT,
Haseman-Elston, association and APM test statistics and P-values and
tabulates them by locus, followed by the detailed output.
sumtdt.awk
Summarize Sib-pair output. This searches out TDT, Haseman-Elston, association
and APM test statistics and P-values and tabulates them by locus.
sumpic.awk
This summarizes CRI-MAP chrompic output, to highlight double recombinants.