image is not displayed...


dose2plink (download)

dose2plink (download zip file if link above doesn't work)

What it is?

dose2plink is a tool for converting genotypic imputation data from Mach or Minimac dosage format to plink-dosage format for analysis in plink
(If you want to convert mach/minimac probs files or have other imputed data formats consider using fcgene or prob2plink )

The code is writen in perl and is adapted from prob2plink

If you use this program please site the following:
Medland, SE dose2plink "http://genepi.qimr.edu.au/staff/sarahMe/dose2plink"

Three arguments are required:

• the name and location of a dose/mldose file (-dose or -d)
• the name and location of a info/mlinfo file (-info or -i)
      •  the dose and info files can be gziped

• the output prefix (-out or -o)

• The outputted plink dosage file will be gzipped by default
    If you want to produce non gzipped files use
(-gz 0 )
    If you get an error saying the program can't create the pdat file try using the -gz 0 option

dose2plink produces two output files:

• pfam file eg plink_dosage.pfam
       •  Contains a plink format fam file

• pdat file eg plink_dosage.pdat.gz (or plink_dosage.pdat if you used the -gz 0 flag)
       •  Contains a plink format dosage file

Whats the difference between minimac and plink dosage files:

minimac format                         plink format
-no header-                              SNP A1 A2 F1 I1 F2 I2 F3 I3
F1->I1 DOSE 0.00 0.00 1.99         rs0001 A C 0.00 1.01 0.00
F2->I2 DOSE 1.01 0.00 0.99         rs0002 G A 0.00 0.00 0.94
F3->I3 DOSE 0.00 0.94 0.00         rs0003 A C 1.99 0.99 0.00

Example usage:

./dose2plink.pl -dose chunk1.21.imputed.dose.gz -info chunk1.21.imputed.info.gz -out chunk1.21
       •  Will produce chunk1.21.pfam and chunk1.21.pdat.gz

./dose2plink.pl -dose chunk1.21.imputed.dose.gz -info chunk1.21.imputed.info.gz -gz 0 -out chunk1.21
       •  Will produce chunk1.21.pfam and chunk1.21.pdat

Using these files in plink:

For gzipped files:
  plink --noweb --dosage chr21.pdat.gz format=1 Z --fam chr.21.pfam --score example.score --out example

For non-gzipped files:
  plink --noweb --dosage chr21.pdat format=1 --fam chr.21.pfam --score example.score --out example

Merging files across clumps:

Plink can merge files on the fly:
plink --dosage myfile.lst list --fam mydata.fam
where myfile.lst is a list of file names (full paths can be specified if the dosage files are in different directories), e.g.
       chr1.dose
       chr2.dose
       chr3.dose

Alternatively if the same individuals are in all files and the order of individuals is the same you can cat the files together - remembering to remove the headers on the subsequent files.

A couple of words of caution about using plink dosage files to make polygenic risk scores:

1. When using hard-calls/genotyped snps as input plink will issue a warning if the allele refered to in the score file is not present in the observed data (ie if there is a strand flip on a non-ambiguous SNP).
When using dosage as input plink does not issue these warnings and all individuals will recieve a score of 0 for these SNPs.
Thus, the user needs to make sure they manually strand allign the score files file the genotypes before calculating the polygenic risk scores.

2. When using hard-calls/genotyped snps as input plink calculates the score and then divides it by the number of SNPs used to create that score (on a per individual level).
When using dosage as input plink does not divide the total score by the number of SNPs. Thus the scores look much larger than what you get from hard-call data.
If you imputed using mach/minimac this should not be a problem as there dose files do not contain any missingness (ie you would be dividing by the same number for all individuals.
If you used IMPUTE/IMPUTE2 or BEAGLE you may have missingness in your dosage data and you should proceed with much caution.