After downloading the R source code, open R and load MULTILINK using:
To run any analysis, you need to call the loaded multilink() function, which requires 3 mandatory arguments, the name of the .obs file, the name of the .null file and what type of analysis to perform. The following 7 types of analyses are available:
input Read data in, no analysis performed
pwise For each marker and trait, estimate empirical univariate pointwise significance
gwide For each marker and trait, estimate empirical univariate genome-wide significance
(ie. corrected for number of markers tested)
pwise-corr Same as pwise, but corrected for the number of traits tested
gwide-corr Same as gwide, but corrected for the number of traits tested
pwise-sum For each marker, estimate empirical multivariate pointwise significance
gwide-sum For each marker, estimate empirical multivariate genome-wide significance
For example, using the files available for download in the input section, we can run the following command to check that the input files are correctly formatted:
After the program banner, the log file would have the following output:
[1] "Analysis started on Mon Mar 16 17:34:44 2009" [1] "Options in effect:" [1] " analysis: input" [1] " obsfile: multilink.obs" [1] " nullfile: multilink.null" [1] " analysis: input" [1] " prefix: multilink" [1] "Loading obs and null linkage results" [1] "Read observed linkage results for 7 traits and 1796 markers from [ multilink.obs ]" [1] "Read null linkage results for 100 simulation replicates from [ multilink.null ]" [1] "There are 1796 markers in each replicate" [1] "Analysis finished on Mon Mar 16 17:34:45 2009"
If there are any issues with how the files are formatted, MULTILINK should give an error. If not, one can proceed to estimate an empirical pointwise (ie. not corrected for the number of markers tested) P-value for each marker and trait, based on the supplied results obtained under the null hypothesis of no linkage. This can be requested with the "pwise" analysis option:
By default, all markers in the .obs file will be analysed, with results written to multilink.pwise. The output file contains empirical P-values for each marker and trait:
[first 5 lines of multilink.pwise] CHR MARKER POSITION T1 T2 T3 T4 T5 T6 T7 1 M1 0 0.62619 0.94766 0.59483 0.24552 0.04275 0.29179 1 1 M2 2 0.62619 0.94766 0.59483 0.23572 0.04275 0.29179 1 1 M3 4 0.62619 0.94766 0.59483 0.23572 0.04275 0.29179 1 1 M4 6 0.62619 0.946 0.62484 0.24552 0.04238 0.2867 1 1 M5 8 0.62619 0.93927 0.62484 0.24552 0.04135 0.27068 0.46724
To analyse only a particular chromosome and change the prefix of the output file, use for example:
In this case, analysis would be restricted to chromosome 17 markers, and results would be written to the file chr17.pwise. Analysis can also be restricted to a specific marker, for example:
Inspecting the corresponding output for this example:
CHR MARKER POSITION T1 T2 T3 T4 T5 T6 T7 20 M1724 102 0.08749 0.00319 0.00041 2e-05 0.00266 0.10998 1
suggests that T4 shows some evidence for linkage with marker M1724 (P=2e-5), as do T1, T2, T3, T5 and T6, although to a less extent.
The output file for the gwide, pwise-corr or gwide-corr analyses looks identical, except that P-values for each trait are corrected for the number of markers tested and/or the number of traits tested. For example, to correct the P-values at marker M1724 for the number of traits tested, we would run:
Which would give the following results:
CHR MARKER POSITION T1 T2 T3 T4 T5 T6 T7 20 M1724 102 0.4082 0.02031 0.00271 0.00013 0.0171 0.48474 1
In other words, after correcting for the number of traits tested, the evidence for linkage between trait 5 and M1724 dropped from P=0.00002 to P=0.00013.
In this example, there was some suggestion that up to 6 traits could be linked to M1724, albeit to a different degree. One may want to assess whether such multi-phenotype linkage was due to a potential underlying pleiotropic locus or simply due to chance. The pwise-sum and gwide-sum options can be used to try to distinguish between these two alternatives. Specifically, they will aggregate the evidence for linkage at a given marker across the 2 best traits, 3 best traits, etc, and estimate how often such aggregate signals are observed by chance alone, taking into account the number of and correlation between the traits tested. For example, to aggregate the evidence for linkage at marker M1724 across the 7 traits tested, we would run:
For this analysis, MULTILINK would first combine the evidence for linkage (ie. P-value) when considering only the best trait at this marker [S1=-log10(2e-5)], the 2 best traits [S2=-log10(2e-5) + -log10(0.00041)], the 3 best traits [S3=-log10(2e-5) + -log10(0.00041) + -log10(0.00266)], etc. Then, it would estimate how often the computed S1, S2, S3, ..., S7 sum statistcs would occur by chance alone, using results from the simulated datasets. If the observed linkage between T1-T6 and M1724 was coincidental (eg. as a result of a their residual phenotypic correlation), then we would expect to see similarly high Sk statistics when analysing the datasets simulated under the null hypothesis of no linkage. On the other hand, if the significance of these Sk statistics decreases as more traits are considered, then it provides some evidence that their aggregate linkage is less likely to occur by chance alone, which is more consistent with the hypothesis of an underlying pleiotropic QTL.
Results for this analysis would be:
CHR MARKER POSITION PMIN KMIN S1 S2 S3 S4 S5 S6 S7 20 M1724 102 6e-05 6 0.00016 6e-05 7e-05 5e-05 5e-05 4e-05 4e-05
The output shows the evidence for linkage when considering only the best trait (S1, P=0.00016), the 2 best traits (S2, P=0.00006), 3 best traits (S3, P=0.00007), etc. In this example, the overall evidence for linkage improves as more traits are added to the aggregate signal, reaching a minimum (P=0.00004) when considering the best 6 traits (ie KMIN = 6). Since 7 sum statistics were calculated to identify the most significant multivariate combination (S6), the significance of this (P=0.00004) needs to be corrected for multiple testing. The corresponding corrected P-value for S6 is Pmin = 0.00006, which is the P-value one would report for the multivariate analysis.