Class | Data Declaration command |
Name | read statistics |
Arguments | vcf <vcf_file> <INFO_variable>[:<subvariable>] |
<data_file> [:<column_number>]) |
Read values for each active locus from a file into the Sib-pair locstat array. These are then accessible using the summary and drop commands. Computing on these values is possible in Scheme using the (locstat) and (locstat-set!) procedures.
If a VCF file is specified, matching is on locus name, falling back to map position. The INFO variable is selected by name, and may include a subvariable, using ":" as the separator. If just a file name is given, then the file is expected to have a locus name as the first field. The default behaviour is to read the last field of each line, but this can be changed by giving a column number.
Example:
>> read loc vcf topHRC.sites.vcf.gz
No. INFO vars = 7 (...AA)
Locus Position (bp)
--------------- ----------------
rs571093408 1:13380 C G AC=5;AN=64940;AF=7.69941e-05;AC_EXCLUDING_1000G=0;AN_EXCLUDING_1000G=59950;AF_EXCLUDING_1000G=0;AA=. .
rs541172944 1:16071 G A AC=8;AN=64940;AF=0.000123191;AC_EXCLUDING_1000G=0;AN_EXCLUDING_1000G=59950;AF_EXCLUDING_1000G=0;AA=G .
rs529651976 1:16141 C T AC=9;AN=64940;AF=0.000138589;AC_EXCLUDING_1000G=4;AN_EXCLUDING_1000G=59950;AF_EXCLUDING_1000G=6.67223e-05
...
VCF file = topHRC.sites.vcf.gz
Number of subjects = 0 (with genotype data)
Number of markers = 950
Number passing QC = 0 (0.000)
Number indels = 0
Number SVs = 0
Total genotypes = 0
Chromosomes = 1
Map range (bp) = 13380 -- 771063
Declaring 950 loci.
>> keep 1 -- 100
Keeping 100 active loci.
>> show annotations topHRC.sites.vcf.gz
"AC": Alternate allele count in called genotypes
"AN": Total number of alleles in called genotypes
"AF": Alternate allele frequency from called genotypes
"AC_EXCLUDING_1000G": Alternate allele count in called genotypes excluding 1000G samples
"AN_EXCLUDING_1000G": Total number of alleles in called genotypes excluding 1000G samples
"AF_EXCLUDING_1000G": Alternate allele frequency from called genotypes excluding 1000G samples
"AA": Ancestral allele
No. INFO vars = 7
>> read statistics topHRC.sites.vcf.gz AC
*FOUND* "AC": Alternate allele count in called genotypes
No. INFO vars = 7
Total number of loci scanned = 950
Number of annotations updated = 100 (10.5%)
>> summary
Test: "AC".
Total number of tests = 100
Locus Chr Position Statistic
------------------ --- ---------- -----------
rs62641298 1 0.079033 0.6472E+05 [A/G] AC=64721;AN=64940;AF=0.996628;AC_E
rs200943160 1 0.049298 0.4157E+05 [T/C] AC=41571;AN=64940;AF=0.640145;AC_E
rs2462492 1 0.054676 0.2418E+05 [C/T] AC=24175;AN=64940;AF=0.372267;AC_E
rs114608975 1 0.086028 5949. [T/C] AC=5949;AN=64940;AF=0.0916076;AC_E
rs3107975 1 0.055326 591.0 [T/C] AC=591;AN=64940;AF=0.00910071;AC_E
>> summary table
Test: "AC".
Intvl Midpt Count Histogram (Nobs=100)
---------------------------------------------------------
3240.8000 97 + **************************************************
9712.4000 0
16184.0000 0
24175.0000 1 *
29127.2000 0
35598.8000 0
41571.0000 1 *
48542.0000 0
55013.6000 0
61485.2000 0
64721.0000 1 *
Median (IQR) = 10.0000 ( 7.0000 -- 20.2500)
Symmetry test J(.02) = 924.8272 (P=0.000)
>> keep whe test > 5000
Selecting markers where test statistic > 5000.0000000000000
Keeping 4 active loci.
>> list $m
Locus Type Position
------------ ---- ----------------
rs200943160 ms 5 [T/C] AC=41571;AN=64940;AF=0.640145;AC_E
rs2462492 ms 9 [C/T] AC=24175;AN=64940;AF=0.372267;AC_E
rs62641298 ms 60 [A/G] AC=64721;AN=64940;AF=0.996628;AC_E
rs114608975 ms 82 [T/C] AC=5949;AN=64940;AF=0.0916076;AC_E
Number of marker loci= 4
See also:
show annotations | show or tabulate VCF INFO variables |
<< (read chain) | Up to index | >> (set fre) |