Usage

Assuming basic knowledge of PLINK, the multivariate association test can be run using the following commands:

(1) Population-based design

plink.multivariate --file ex1 --mult-pheno ex1.phen --mqfam

where the file ex1.phen contains the phenotypes to be analysed. For details on the format of this file, see the input files section. Running this analysis will create an output file named plink.mqfam.total, with the following fields:

CHR, SNP, BP, NFAM, NIND, F, P, LOADINGS

Corresponding respectively to Chromosome code, SNP identifier, Base-Pair position, Number of families used, Number of individuals used, F-test and associated P-value, and Correlation between each individual trait and the score corresponding to the linear combination of traits with maximum correlation with the marker. The latter can be used to identify the trait(s) with stronger evidence for association with the marker.

To supress the LOADINGS field from the output file (useful when there are many traits being analysed), use

plink.multivariate --file ex1 --mult-pheno ex1.phen --mqfam --mqfam-noloadings

The sign of individual LOADINGS can be used to assess whether the SNP reference allele influences different traits in the same or opposite direction, but does necessarily indicate the right direction of effect for a given trait. For example, if phenotype T1 has a negative loading and phenotype T2 a positive loading, this indicates that the reference allele either increases T1 AND decreases T2 or, alternatively, decreases T1 AND increases T2. Both situations lead to the same canonical correlation. A SNP that shows an interesting multivariate association should be confirmed by testing it against the individual phenotypes with stronger loadings (typically > |0.2| or |0.3|) through standard univariate analyses; the sign of the regression coefficient (or the OR) from these analyses will confirm the right direction of effect for each trait.

By default, individuals with data missing for >50% of phenotypes get automatically removed from analysis. This behavior can be changed with the --mqfam-find option:

plink.multivariate --file ex1 --mqfam --mult-pheno ex1.phen --mqfam-find 0

which in this case would exclude individuals with data missing for >0% of phenotypes (ie, this is the most strict setting, in which only individuals with complete available data are used).

On the other hand

plink.multivariate --file ex1 --mqfam --mult-pheno ex1.phen --mqfam-find 0.1

would exclude individuals with 10% or more missing data and

plink.multivariate --file ex1 --mqfam --mult-pheno ex1.phen --mqfam-find 1

would not exclude any individuals (ie, this is the most loose setting).

It is often useful to know exactly which individuals were used for the analysis. To write out a file with the family and personal identifiers for all individuals used in the analysis of a specific SNP, try:

plink.multivariate --file ex1 --mqfam --mult-pheno ex1.phen --mqfam-ids marker_name

Finally, if you only want to analyse a subset of the phenotypes supplied in the phenotype file, you can either select these by name:

plink.multivariate --file ex1 --mqfam --mult-pheno ex1.phen --pheno-name T1,T3,T4

which in this case would analyse phenotypes T1, T3 and T4 supplied in the ex1.phen file or, alternatively, you can choose phenotypes based on their position in the phenotype file, for example:

plink.multivariate --file ex1 --mqfam --mult-pheno ex1.phen --pheno-number 2,5,6

which would analyse the second, fifth and sixth phenotype in the ex1.phen file. Note, however, that the order of phenotypes in the LOADINGS column (or WEIGHTS in some earlier versions) of the *.mqfam.total output file corresponds to the order of phenotypes in the phenotype file and not the order by which they were selected in the --pheno-name or --pheno-number options. For example, requesting --pheno-name T1,T3,T4 would produce loadings in the same order as --pheno-name T3,T4,T1.

NOTE: it's important to make sure that PLINK knows what the missing value code is in the phenotype file. By default, this is assumed to be -9, but this can be modified using the --missing-phenotype option.

(2) Family-based design

The same options apply to the analysis of family data, except that permutation testing is required to correct for family structure. This analysis can be slow when analysing many traits and a large number of SNPs.

plink.multivariate --bfile ex1 --mult-pheno ex1.phen --mqfam --perm

This will use an adaptive permutation procedure to speed up the analysis. An additional output file named plink.mqfam.total.perm contains the empirical P-values for each SNP. When analysing family data, ignore the P-value column in the plink.mqfam.total, as this is not corrected for family structure.

With family data, instead of a Total test of association, one can also choose a Within

plink.multivariate --file ex1 --mult-pheno ex1.phen --mqfam-within --perm

or Between test

plink.multivariate --file ex1 --mult-pheno ex1.phen --mqfam-between --perm

The Within test should be robust to population stratification.