Class | Data Declaration command |
Name | read locus vcf |
Arguments | <VCF file name> [<start_position> [<end_position>]]. |
Read locus names, types and map positions from a VCF file. Reading can be retricted to a subset of loci within an interval (specified in base pairs). The related file vcf command prints locus information without reading it into Sib-pair - it offers the additional ability to search on locus names. This is designed to allow the user to refine the interval to be read in.
A VCF style file contains one record per locus. These are preceded by meta-information about the data fields, identifying and documenting the key-value pairs within each field. After the meta-information, there is a header line describing the columns (fields) of data. The first eight fields are compulsory and are:
CHROM: chromosome number, X, Y |
POS: physical map position (bp) |
ID: marker name |
REF: reference allele/sequence |
ALT: other alleles/sequences |
QUAL: quality metric for genotypes |
FILTER: usually "PASS" or "FAIL" |
INFO: other locus information eg number genotyped, allele count |
A ninth field is almost always present, and is followed by the individual IDs for the genotypes.
FORMAT: genotype format type GT=phased or unphased
genotype DS=dosage score etc |
<ID1>: first genotype column ID header |
Since the VCF format is designed for genome-wide sequence data, files are commonly large, so the command (unlike other Sib-pair locus commands) offers the opportunity to just read a subset of loci, by specifying an interval start and ending position in base pairs. Locus IDs are read from the "ID" field, and if missing, is generated from the map position.
Example:
>> file head chr9.refpanel.EUR.vcf.gz ##fileformat=VCFv4.1 ##INFO=<ID=LDAF,Number=1,Type=Float,Description="MLE Allele Frequency Accounting ##INFO=<ID=AVGPOST,Number=1,Type=Float,Description="Average posterior probabilit ##INFO=<ID=RSQ,Number=1,Type=Float,Description="Genotype imputation quality from ##INFO=<ID=ERATE,Number=1,Type=Float,Description="Per-marker Mutation rate from ##INFO=<ID=THETA,Number=1,Type=Float,Description="Per-marker Transition rate fro ##INFO=<ID=SNPSOURCE,Number=.,Type=String,Description="indicates if a snp was ca #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00096 HG00097 9 10023 . CCAA C 15 PASS AN=758;NS=379;AC=74 GT 0|0 0|0 0|0 0|0 0|0 0|0 0|0 >> read loc vcf chr9.refpanel.EUR.vcf.gz 1 100000 Locus Position (bp) --------------- ---------------- chr9:10023 10023 AN=758;NS=379;AC=74 PASS CCAA C chr9:10097 10097 AN=758;NS=379;AC=6 PASS CCA C rs185444096 10177 AN=758;NS=379;AC=47 PASS C T rs190296880 10192 AN=758;NS=379;AC=44 PASS T A chr9:10288 10288 AN=758;NS=379;AC=11 PASS A AC chr9:10299 10299 AN=758;NS=379;AC=56 PASS TA T rs56377469 10469 AN=758;NS=379;AC=410 PASS G C rs143946323 10690 AN=758;NS=379;AC=3 PASS G A rs7341907 10869 AN=758;NS=379;AC=354 PASS C G rs149305563 14665 AN=758;NS=379;AC=86 PASS G A rs149079262 14690 AN=758;NS=379;AC=165 PASS C G rs141156662 15883 AN=758;NS=379;AC=132 PASS A G chr9:17614 17614 AN=758;NS=379;AC=606 PASS CT C rs184525769 33204 AN=758;NS=379;AC=13 PASS C T chr9:33302 33302 AN=758;NS=379;AC=19 PASS G GAT chr9:33311 33311 AN=758;NS=379;AC=28 PASS TC T rs2492179 39037 AN=758;NS=379;AC=489 PASS A C rs9408135 39043 AN=758;NS=379;AC=532 PASS T C rs188840810 39457 AN=758;NS=379;AC=25 PASS G A ... VCF file = chr9.refpanel.EUR.vcf.gz Number of markers = 174 (out of total 583105) Number passing QC = 174 (1.000) Number indels = 24 Number SVs = 0 Number of subjects = 379 Total genotypes = 65946 Chromosome = 9 Map range (bp) = 10023 -- 99384 Declaring 174 loci.
See also:
merge vcf | merge VCF file |
<< (read locus plink) | Up to index | >> (read pedigree) |