MULTILINK will not perform linkage analyses for you. What it will is do is correct linkage results that were generated by proper linkage software (eg Merlin) for the number of traits tested, or combine results across different traits to obtain an overall evidence for linkage between a marker and multiple traits.
Therefore, before you run MULTILINK, you first need to analyse your phenotypes individually with appropriate linkage programs. There are two steps you need to carry out prior to running MULTILINK.
First, analyse each individual phenotype with an appropriate linkage method, as you would normally do for a standard univariate analysis. For example, for a disease trait, you may want to run the --npl analysis in Merlin, whereas for a quantitative trait you may want to use Merlin-regress or variance components analysis.
Second, repeat the exact same analysis for each trait, but use a simulated marker set rather than the real dataset. The simulated genome-scan contains the same number of individuals, markers, marker spacing, information content, etc, except that because markers were simulated under the null hypothesis of no linkage and then "gene-dropped" through each pedigree, there is no linkage between any marker and your traits. You can use Merlin to simulate such datasets, as we would normally do to estimate empirical significance in standard univariate analyses.
Once these steps have been performed, two types of files are needed:
(1) OBS file (download example: multilink.obs)
This file should contain the results for the univariate linkage analysis of the real dataset performed outside MULTILINK. There are three mandatory columns, followed by a column for each trait tested:CHR: chromosome number MARKER: marker name POSITION: marker position TRAIT_1: univariate linkage analysis statistic for TRAIT_1 TRAIT_2: univariate linkage analysis statistic for TRAIT_2 ... TRAIT_N: univariate linkage analysis statistic for TRAIT_N
This file *must* have a header row, starting with "CHR MARKER POSITION", followed by the names of each phenotype tested. MULTILINK will check that the first three names match exactly (not case sensitive) those headers. The names for the phenotypes cannot include spaces.
The results for each trait should be supplied as a LOD score, chi-square, -log10(P-value), etc, ie. with a larger statistic implying stronger evidence for linkage. If you supply results as a P-value - ie. lower means stronger - you must use the option inputP (see usage)
The first five data lines of the multilink.obs file (see link above) look like this:
[contents of multilink.obs]
CHR MARKER POSITION T1 T2 T3 T4 T5 T6 T7
1 M1 1 -0.03 -0.58 -0.02 0.1 0.8724 0.0534 0
1 M2 2 -0.03 -0.58 -0.02 0.11 0.8724 0.0534 0
1 M3 3 -0.03 -0.58 -0.02 0.11 0.8724 0.0534 0
1 M4 4 -0.03 -0.57 -0.03 0.1 0.8754 0.0563 0
1 M5 5 -0.03 -0.53 -0.03 0.1 0.8879 0.0659 0.0008
This example file has results for the 7 asthma traits analysed in Ferreira et al AJHG 2005. Following the mandatory first 3 fields, this file contains Kong & Cox LOD scores for 4 disease traits (Asthma, BHR, Atopy and Dpter) and variance components LOD scores for 3 quantitative traits (FEV1,FEV1/FVC and total IgE). Results are available for 1,796 markers across the 22 autosomes.
(2) NULL file (download example: multilink.null)
This file should contain the results for the univariate linkage analysis of the simulated datasets performed outside MULTILINK (step two above). Each dataset simulated should have the same number of individuals, markers, etc, as the real dataset, except that markers were simulated under the null hypothesis of no linkage (see above). The univariate analysis of each trait in the simulated dataset should also be identical to the analysis used for the real dataset. There is an extra mandatory column in this file, which indicates to which of the simulated datasets the line of results belongs to:
SIM: simulation number
CHR: chromosome number
MARKER: marker name
POSITION: marker position
TRAIT_1: univariate linkage analysis statistic for TRAIT_1 in the simulated dataset number SIM
TRAIT_2: univariate linkage analysis statistic for TRAIT_2 in the simulated dataset number SIM
...
TRAIT_N: univariate linkage analysis statistic for TRAIT_N in the simulated dataset number SIM
This file *must* also have a header row, with names matching those used in the OBS file.
The first five data lines of the example multilink.null file look like:
[contents of multilink.null]
SIM CHR MARKER POSITION T1 T2 T3 T4 T5 T6 T7
10001 1 M1 0 -0.08 0.35 -0.06 0 0 0 0
10001 1 M2 2 -0.08 0.35 -0.06 0 0 0 0
10001 1 M3 4 -0.08 0.35 -0.06 0 0 0 0
10001 1 M4 6 -0.08 0.34 -0.06 0 0 0 0
In this example, the first line shows the univariate results for the seven traits for marker M1 in simulation number 10001 (file contains results for the first 100 of the 1000 simulations conducted for this anaysis). It is desirable to run at least a few hundred simulations, or more, but this can be limited by computer time if there are many phenotypes and markers (eg. genome-scan) to be analysed.