Sib-pair Command: file

Arguments rename <file_name> <new_file_name>
delete <file_name> [... <file_nameN>]
query <file_name> [... <file_nameN>]
cat <file_name> [... <file_nameN>]
head <file_name> [... <file_nameN>]
hex <file_name> [<last> | (<start> +<span>|<end>)
print [/<search string>/] [(<fortran format>)] [NR] [+ [+..]] [<col_no_1> ... <col_no_N>] <file_name>
wc|fields <file_name> [... <file_nameN>]
transpose <file_name> [... <file_nameN>]
inverse <file_name> [... <file_nameN>]
metaanalyse <file_name> [<first_column>]
fasta <FASTA_file_name>
vcf <VCF_file_name> [ann] [(<loc1>..<locN><start_pos>..<end_pos>])
vcf order <input_VCF_file> <output_VCF_file>
vcf liftover <input_VCF_file> <output_VCF_file>
tbi <tabix_index_file_name> [<position>|<loc_name>] [ann]

Performs a number of standard UNIX type operating system file manipulations as well as utility type operations such as transposition of a rectangular text file. All commands can read gzip compressed files.

The rename modifier renames the named file in the current directory to a new name.

The delete modifier deletes the named file(s) in the current directory.

The query modifier tests if the named files are present in the current directory and returns their sizes and whether gzipped.

The cat modifier sends the named file(s) in the current directory to the Sib-pair standard output.

The head modifier sends the first 10 lines of the named file(s) in the current directory to the Sib-pair standard output.

The hex modifier reads a file and gives output similar to unix's hexdump -C. Default is first 96 bytes, but can specify a range, or start and span.

The print modifier prints selected columns from the named file to the Sib-pair standard output. If no columns are selected, then this has the same effect as cat. If the NR keyword is present, then the line is prefixed by the line number. If the first or second modifier keyword encountered is enclosed in brackets, this is taken to be a Fortran format statement to be used printing the columns. If the modifier is enclosed in forward slashes ("/"), this is taken to be a search string, so that only lines containing that string are printed. This may have to be protected by quotes. The + modifies the search so that the line following a line matching the search is printed (multiple + increase the number of lines to skip before printing).

The columns are read as strings, so the only allowable formats are X, I[<w>] for the line number, and A[<w>] for the actual data.

The wc or fields modifier counts the total number of lines and words in a file. It also counts the number of words in each line, flagging those lines where the count changes from that of the line above. This is useful for checking errors in pedigree and data files, which should be rectangular.

The transpose modifier swaps rows and columns of the input text file(s) writing the result to the Sib-pair standard output.

The inverse modifier inverts a symmetric numeric matrix represented in a file in a sparse format: each line of text contains the row and column indices followed by the element value. Output is in the same format.

The metaanalysis modifier combines the P-values from each line of a file using the Fisher method. The first field to be read can be specified.

The fasta modifier prints summary information from the .fai index file for a FASTA file. The FASTA file must be indexed, ie have a .fai file produced by the samtools faidx program.

The vcf modifier prints locus information from a VCF file. A subset of loci can be specified, either by map interval or a list of locus names. See read locus vcf for more details. If the annovar modifier is added, the file is expected to the ANNOVAR format, with genotype data starting after column 51.

The vcf order modifier opens a VCF file, reorders matching loci using the existing Sib-pair map, and writes out a new sorted VCF file .

The vcf liftover modifier opens a VCF file, modifies the map positions for matching loci using the existing Sib-pair map, and writes out a new VCF file. This might be used after running read chain on the map read in from the target VCF file.

The tbi modifier prints a summary of tabix index file (list of indexed sequences and range of map positions) or prints the record from the indexed VCF file that matches those coordinates (directly specified or current map position for a given locus). If the annotation modifier is added, the values of the INFO variables for that record are printed one line per variable-value pair.


>> file delete bad.ped
Deleted file "bad.ped".

>> file print /ERROR:/ bad.out

>> out cleanhash.out
>> file print NR 2 5 hash.out
>> out

>> output mat1
>> echo 1 2 3 4 5 6 7 8
>> echo 1 2 3 4 5 6 7 8
>> output
>> output mat2
>> echo 1 2 3
>> echo 1 2 3 4
>> echo 1 2
>> echo 1
>> output
>> file transpose mat1 mat2

1 1 
2 2 
3 3 
4 4 
5 5 
6 6 
7 7 
8 8 
1 1 1 1 
2 2 2 x 
3 3 x x 
x 4 x x 

>> file delete mat1 mat2

>> file wc do22.out.gz

Field counts for "do22.out.gz":

L 1 Len 52 NFields 8: "|||| SIB-PAIR: A program for simple genetic analys"
L 2 Len 61 NFields 9: "|\/| Version : Version 1.00.b [64 bit] gfortran (0"
L 3 Len 42 NFields 8: "|/\| Author  : David L Duffy (c) 1995-2011"
L 4 Len 63 NFields 10: "|||| Job run : Sat Jan  7 16:55:38 2012 (gb-r35n20"
L 5 Len 0 NFields 0: ""
L 6 Len 60 NFields 10: "Type "help" for help, "quit" to quit, "ctrl-C" to"
L 7 Len 0 NFields 0: ""
L 8 Len 30 NFields 4: "-> macro CHROM=22;inc"
L 9 Len 0 NFields 0: ""

Number of lines               = 34019
Length of longest line        = 80 chars
Total number of words         = 305512
Maximum words per line        = 12
Constant word count per line? = F
Changes in word count/line    = 103
Counts changed at lines       =  2 3 4 5 6 7 8 9 10 ...

>> set timer on
>> file wc All_hair5.assoc.txt.gz

Field counts for "All_hair5.assoc.txt.gz":

L 1 Len 253 NFields 22: "chr     rs      ps      n_miss  allele1 allele0 af"

Number of lines               = 9130946
Length of longest line        = 296750 chars
Total number of words         = 200880812
Maximum words per line        = 22
Constant word count per line? = T
Length of longest word        = 32745 chars
[   69.75 s]

$ time zcat All_hair5.assoc.txt.gz | wc
9130946 200880812 2177418095

real	0m40.135s
user	1m3.215s
sys	0m1.002s

>> file tbi  /home/davidD/Genetics/Maps/UK10K_COHORT.20160215.sites.vcf.gz rs4668106 ann

Reading tabix index file "/home/davidD/Genetics/Maps/UK10K_COHORT.20160215.sites.vcf.gz.tbi".
rs4668106 -> 2:169608348                             
2	169608348	rs4668106	T	G	999	PASS	DP=33829;VQSLOD=8.1333;AN=7562;AC=437

INFO variable       Value
-------------------- ----------
                  DP 33829
              VQSLOD 8.1333
                  AN 7562
                  AC 4373
                  AF 0.578286
          AN_TWINSUK 3708
          AC_TWINSUK 2143
          AF_TWINSUK 0.57794
           AN_ALSPAC 3854
           AC_ALSPAC 2230
           AF_ALSPAC 0.57862
              AF_AFR 0.711382
              AF_AMR 0.654696
              AF_ASN 0.426573
              AF_EUR 0.579156
              AF_MAX 0.711382
                 CSQ ENST00000305747:CERS6:intron_variant+ENST00000392687:CERS6:intron_variant+GERP,-5.05
    AF_TWINSUK_NODUP 0.579183

See also:

dirfile listing for current directory
pwdprint or change current directory
read locus vcfread VCF file locus information

<< (pwd)Up to index>> (show annotations)