Sib-pair Command: update

Class Analysis and data manipulation command
Name update | merge
Arguments [<locus1>...<locusN>] <phenotype_file_name> [lines_to_skip]
fimpute <locus_file_name> <genotype_file_name> [ped_id]
genotypes <genotype_file_name>
probabilities <mergekey>|pedid|id |
mac <datfile> <pedfile> [<thresh>]
plink <file_name_root> [join|compress] [id] [pos]
vcf <file_name> [ped_id] [<line_length>]

Updates phenotype/genotype data in the current dataset using values read from a file. The first line of the default file format gives the names of the variables that are included in the subsequent lines. Usually, the first column is the pedigree_ID and is named ped[igree]; the second column is the individual_ID, and is named id.

Alternatively, the pedigree ID can be omitted, so the merge assumes the individual IDs are all unique. If nonunique IDs are present in the pedigree then a warning is given, even if those IDs are not going to be updated. Only the first matching record is updated if duplicate IDs are present.

The remaining columns should have names that match locus names in the current dataset (data for nonmatching names are skipped, and it is assumed each of these takes up only one field).

ped id locus_name1 locus_name2 locus_name3 ...
1    1  A/A         12.434      y ...
1    2  A/B           x         n ...
...

Where the pedigree and individual IDs for a record in the update file match that of an active individual in the current dataset, the corresponding phenotype and genotype values for that individual are updated using the values read from the file.

If merge was called, only missing values in the current dataset are updated. All values are overwritten if the call was to update.

By specifying locus names on the command line, you can further control the loci that are updated from the new file.

The merge genotypes option reads files containing one record per individual locus genotype (ie fields are individual_ID, locus name, genotype):

1    locus1  A/A
1    locus2  1/2
...

The merge probabilities option reads SNP genotypic probability files from Beagle and Impute, converting to the most likely genotype. The merge can be on values of a numeric key (stored as a trait value), or ID.

The merge mach option reads SNP allelic dosages from a pedigree prepared using MaCH or minimac. The names of the pedigree file and datfile specifying the locus names need to be specified. A "hard" threshold is used to convert dosages to genotypes, which can be set by appending a <threshold> value to the command. The default threshold value is 0.5, giving cutpoints of 0.5 and 1.5 (the MaCH dosage scores take values 0-2).

The merge plink option reads the supplied PLINK .bed, .bim, .fam files and appends the new SNP genotypes to the existing pedigree data. If the join modifier is present, all new loci in the .bim file are automatically declared using the .bim data. The id specifies whether the merge is done on pedigree and individual IDs or just individual ID. The compress option activates 4 bit per genotype storage for the data being read in (the additionals bits over that used by PLINK are used to mark if a genotype is observed or imputed). If the pos argument is present, the loci in the .bim file are matched by map position to those in the current dataset.

The merge vcf option reads SNP genotypes from a VCF file. The ped_id specifies whether the merge is done on pedigree and individual IDs or just individual ID. To compare genotypes in the present dataset to those in a VCF file, use the test vcf command.

The merge fimpute option merges in genotypes from an FImpute genotype file (both FImpute locus and genotype files need to be specified). To compare genotypes in the present dataset to those in a FImpute file, use the test fimpute command. The ped_id specifies whether the merge is done on pedigree and individual IDs or just individual ID.

Example:

>> set loc a mar
>> read cases inline
1 x/x
2 A/A
3 A/B
4 B/B
5 x/x
;;;;
>> run
>> set loc b qua
>> #
>> # Make a suitable phenotypes file for updating
>> #
>> output update.dat
>> echo #
>> echo # Example update dataset
>> echo # 
>> echo ped id b a z
>> echo 1 1 1 C/C 9
>> echo 2 2 2 C/C 9
>> echo #
>> echo # Person 3
>> echo #
>> echo 3 3 3 A/B x
>> echo 4 4 4 B/B x
>> echo 5 5 5 C/C 9
>> echo 6 6 6 Z/Z 9
>> output
>> 
>> echo Contents of Update file:
>> file cat update.dat
>> 
>> echo
>> echo Original file
>> echo
>> write
>> update a update.dat
>> echo
>> echo After updating a
>> echo
>> write
!
!       S
!       e
!       x    a        b
!
1 1 x x x   x/x       x
2 2 x x x   A/A       x
3 3 x x x   A/B       x
4 4 x x x   B/B       x
5 5 x x x   x/x       x

>> update a b update.dat
>> echo
>> echo After updating a and b
>> echo
>> write
!
!       S
!       e
!       x    a        b
!
1 1 x x x   C/C      1.0000
2 2 x x x   C/C      2.0000
3 3 x x x   A/B      3.0000
4 4 x x x   B/B      4.0000
5 5 x x x   C/C      5.0000

>> set locus z qua
>> update z update.dat
>> file delete update.dat

<< (edit)

Up to index

>> (delete)

Class	Analysis and data manipulation command
Name	update \| merge
Arguments	[<locus1>...<locusN>] <phenotype_file_name> [lines_to_skip]
	fimpute <locus_file_name> <genotype_file_name> [ped_id]
	genotypes <genotype_file_name>
	probabilities <mergekey>\|pedid\|id \|
	mac <datfile> <pedfile> [<thresh>]
	plink <file_name_root> [join\|compress] [id] [pos]
vcf <file_name> [ped_id] [<line_length>]