dose2plink is a tool for converting genotypic imputation data from Mach or Minimac dosage format to plink-dosage format for analysis in plink
(If you want to convert mach/minimac probs files or have other imputed data formats consider using fcgene or
prob2plink )
The code is writen in perl and is adapted from prob2plink
If you use this program please site the following:
Medland, SE dose2plink "http://genepi.qimr.edu.au/staff/sarahMe/dose2plink"
• the name and location of a dose/mldose file (-dose or -d)
• the name and location of a info/mlinfo file (-info or -i)
      •  the dose and info files can be gziped
• the output prefix (-out or -o)
• The outputted plink dosage file will be gzipped by default
If you want to produce non gzipped files use (-gz 0 )
If you get an error saying the program can't create the pdat file try using the -gz 0 option
• pfam file eg plink_dosage.pfam
       •  Contains a plink format fam file
• pdat file eg plink_dosage.pdat.gz (or plink_dosage.pdat if you used the -gz 0 flag)
       •  Contains a plink format dosage file
minimac format                         plink format
-no header-                              SNP A1 A2 F1 I1 F2 I2 F3 I3
F1->I1 DOSE 0.00 0.00 1.99         rs0001 A C 0.00 1.01 0.00
F2->I2 DOSE 1.01 0.00 0.99         rs0002 G A 0.00 0.00 0.94
F3->I3 DOSE 0.00 0.94 0.00         rs0003 A C 1.99 0.99 0.00
./dose2plink.pl -dose chunk1.21.imputed.dose.gz -info chunk1.21.imputed.info.gz -out chunk1.21
       •  Will produce chunk1.21.pfam and chunk1.21.pdat.gz
./dose2plink.pl -dose chunk1.21.imputed.dose.gz -info chunk1.21.imputed.info.gz -gz 0 -out chunk1.21
       •  Will produce chunk1.21.pfam and chunk1.21.pdat
For gzipped files:
  plink --noweb --dosage chr21.pdat.gz format=1 Z --fam chr.21.pfam --score example.score --out example
For non-gzipped files:
  plink --noweb --dosage chr21.pdat format=1 --fam chr.21.pfam --score example.score --out example
Plink can merge files on the fly:
plink --dosage myfile.lst list --fam mydata.fam
where myfile.lst is a list of file names (full paths can be specified if the dosage files are in different directories), e.g.
       chr1.dose
       chr2.dose
       chr3.dose
Alternatively if the same individuals are in all files and the order of individuals is the same you can cat the files together - remembering to remove the headers on the subsequent files.
1. When using hard-calls/genotyped snps as input plink will issue a warning if the allele refered to in the score file is not present in the observed data (ie if there is a strand flip on a non-ambiguous SNP).
When using dosage as input plink does not issue these warnings and all individuals will recieve a score of 0 for these SNPs.
Thus, the user needs to make sure they manually strand allign the score files file the genotypes before calculating the polygenic risk scores.
2. When using hard-calls/genotyped snps as input plink calculates the score and then divides it by the number of SNPs used to create that score (on a per individual level).
When using dosage as input plink does not divide the total score by the number of SNPs. Thus the scores look much larger than what you get from hard-call data.
If you imputed using mach/minimac this should not be a problem as there dose files do not contain any missingness (ie you would be dividing by the same number for all individuals.
If you used IMPUTE/IMPUTE2 or BEAGLE you may have missingness in your dosage data and you should proceed with much caution.