Sib-pair Command: read locus vcf


ClassData Declaration command
Nameread locus vcf
Arguments <VCF file name> [<start_position> [<end_position>]].

Read locus names, types and map positions from a VCF file. Reading can be retricted to a subset of loci within an interval (specified in base pairs). The related file vcf command prints locus information without reading it into Sib-pair - it offers the additional ability to search on locus names. This is designed to allow the user to refine the interval to be read in.

A VCF style file contains one record per locus. These are preceded by meta-information about the data fields, identifying and documenting the key-value pairs within each field. After the meta-information, there is a header line describing the columns (fields) of data. The first eight fields are compulsory and are:

CHROM: chromosome number, X, Y
POS: physical map position (bp)
ID: marker name
REF: reference allele/sequence
ALT: other alleles/sequences
QUAL: quality metric for genotypes
FILTER: usually "PASS" or "FAIL"
INFO: other locus information eg number genotyped, allele count

A ninth field is almost always present, and is followed by the individual IDs for the genotypes.

FORMAT: genotype format type GT=phased or unphased genotype
DS=dosage score etc
<ID1>: first genotype column ID header

Since the VCF format is designed for genome-wide sequence data, files are commonly large, so the command (unlike other Sib-pair locus commands) offers the opportunity to just read a subset of loci, by specifying an interval start and ending position in base pairs. Locus IDs are read from the "ID" field, and if missing, is generated from the map position.

Example:

>> file head chr9.refpanel.EUR.vcf.gz

##fileformat=VCFv4.1
##INFO=<ID=LDAF,Number=1,Type=Float,Description="MLE Allele Frequency Accounting
##INFO=<ID=AVGPOST,Number=1,Type=Float,Description="Average posterior probabilit
##INFO=<ID=RSQ,Number=1,Type=Float,Description="Genotype imputation quality from
##INFO=<ID=ERATE,Number=1,Type=Float,Description="Per-marker Mutation rate from
##INFO=<ID=THETA,Number=1,Type=Float,Description="Per-marker Transition rate fro
##INFO=<ID=SNPSOURCE,Number=.,Type=String,Description="indicates if a snp was ca
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	HG00096	HG00097
9	10023	.	CCAA	C	15	PASS	AN=758;NS=379;AC=74	GT	0|0	0|0	0|0	0|0	0|0	0|0	0|0


>> read loc vcf chr9.refpanel.EUR.vcf.gz 1 100000

Locus           Position (bp)
--------------- ----------------
chr9:10023           10023  AN=758;NS=379;AC=74 PASS CCAA C
chr9:10097           10097  AN=758;NS=379;AC=6 PASS CCA C
rs185444096          10177  AN=758;NS=379;AC=47 PASS C T
rs190296880          10192  AN=758;NS=379;AC=44 PASS T A
chr9:10288           10288  AN=758;NS=379;AC=11 PASS A AC
chr9:10299           10299  AN=758;NS=379;AC=56 PASS TA T
rs56377469           10469  AN=758;NS=379;AC=410 PASS G C
rs143946323          10690  AN=758;NS=379;AC=3 PASS G A
rs7341907            10869  AN=758;NS=379;AC=354 PASS C G
rs149305563          14665  AN=758;NS=379;AC=86 PASS G A
rs149079262          14690  AN=758;NS=379;AC=165 PASS C G
rs141156662          15883  AN=758;NS=379;AC=132 PASS A G
chr9:17614           17614  AN=758;NS=379;AC=606 PASS CT C
rs184525769          33204  AN=758;NS=379;AC=13 PASS C T
chr9:33302           33302  AN=758;NS=379;AC=19 PASS G GAT
chr9:33311           33311  AN=758;NS=379;AC=28 PASS TC T
rs2492179            39037  AN=758;NS=379;AC=489 PASS A C
rs9408135            39043  AN=758;NS=379;AC=532 PASS T C
rs188840810          39457  AN=758;NS=379;AC=25 PASS G A
...

VCF file           = chr9.refpanel.EUR.vcf.gz
Number of markers  = 174 (out of total 583105)
Number passing QC  = 174 (1.000)
Number indels      = 24
Number SVs         = 0
Number of subjects = 379
Total genotypes    = 65946
Chromosome         = 9    
Map range (bp)     = 10023 -- 99384


Declaring 174 loci.

See also:

merge vcf merge VCF file


<< (read locus plink)Up to index>> (read pedigree)