Sib-pair Command: regress


ClassAnalysis and data manipulation command
Nameregress
Arguments <ytrait> = <x1>...[to]... <xN> [offset <offset>]
[poisson|
(exponential|weibull|fixedweibull [<censoring_trait>]
[shape <shape>])]
[sim] [rep <imputations>].

Performs linear or logistic or poisson or weibull regression of trait ytrait on set of loci x1...xN. If an x variable is a marker genotype, that independent variable is the mean allele size in the genotype, with the exception of the first marker locus encountered in the list, which is fully allelic effect coded.

The offset option reads an offset for the linear predictor from the specified trait.

Addition of a binary trait name to the end of the keyword list when the regression is weibull or exponential declares this as the censoring indicator. The shape keyword declares a starting value for the solution of the Weibull distribution shape parameter. If fixedweibull is chosen as the distribution, the shape is fixed to the value given by the shape option.

The sim keyword gives a gene-dropped P-value for the first marker locus in the list. The rep keyword specifies a number of replicates for multiple imputation of the test marker locus genotypes, and is usually used when set analysis imputed has already been issued. Each imputed replicate is generated by running the MCMC genotype sampler for a number of iterations (based on the number of unobserved genotypes for that pedigree).

The models are fitted by the usual IRLS algorithms (using AS 164, [Stirling 1981]). The exponential and Weibull regressions are implemented as Poisson regressions (with log time as offset) as per Aitken and Clayton [1980].

Note in the survival analysis example below that the Weibull parameterization is the "usual" one, not that used by for example the survreg command in the R survival package (where the scale is the reciprocal of the Sib-pair shape parameter, and the regression coefficients are scaled by that value). Similarly, the Weibull "scale" parameter is obtained as:

exp(Intercept/Shape)

Specifically, the Weibull parameterization is:

f(t) = a L ta-1 exp(-L ta) ,
S(t) = exp(-L ta) ,
h(t) = a L ta-1,

where a is the shape parameter, and L the scale parameter. The linear model is for:

log(L) = Bx.

Example:

#
# Survival analysis: gold standard results from R
#
# survreg(formula = Surv(time, status) ~ rx, data = rats)
#              Value Std. Error     z        p
# (Intercept)  4.983     0.0833 59.81 0.00e+00
# rx          -0.239     0.0891 -2.68 7.42e-03
# Log(scale)  -1.333     0.1439 -9.26 2.01e-20
#
# *and*
#
# weibreg(formula = Surv(time, status) ~ rx, data = rats)
# 
# Covariate           Mean       Coef Exp(Coef)  se(Coef)    Wald p
# rx                  0.325     0.904     2.470     0.317     0.004 
# 
# log(scale)                    4.983   145.931     0.083     0.000 
# log(shape)                    1.333     3.791     0.144     0.000 
# 
#
>> loc ratsurv.in
>> regress time = rx weibull status

------------------------------------------------
Weibull  regression analysis of trait "time"
------------------------------------------------
Censoring variable: status.

    Variable         Beta    Stand Error        t-Value
  -----------------------------------------------------
  Intercept       18.8907         0.2294        82.3425 ***
  rx              -0.9042         0.3166         2.8557 *

No. usable observations =    150      ( 60.0%)
No. of uncensored times =     40      ( 26.7%)

Weibull shape parameter =      3.7909
Number of iterations    =     84
Model LR Chi-square     =    166.4316 (df=   148)
Akaike Inf. Criterion   =    170.4316

>> log 3.7909
=>  1.332603457921975
>> 18.8907/3.7909
=>  4.983170223429792
>> -0.9042/3.7909
=>  -0.23851855759845947

>> regress uti on age ocp vic vicl vis dia

------------------------------------------------
Binomial regression analysis of trait "uti"
------------------------------------------------

    Variable             Beta    Stand Error        t-Value
  ---------------------------------------------------------
  Intercept            0.1283         0.4910         0.2614 .  
  age                 -1.1644         0.4345         2.6797 *  
  ocp                 -0.0736         0.4497         0.1636 .  
  vic                  2.4059         0.5695         4.2244 ***
  vicl                -2.2462         0.5643         3.9803 ***
  vis                 -0.8201         0.4216         1.9453 +  
  dia                 15.5777       448.3780         0.0347 .  

No. usable observations =    239      (100.0%)
Number of affecteds     =    130

Null deviance           =    329.4768
Number of iterations    =     15
Model LR Chi-square     =     53.5274 (df=   6)
Akaike Inf. Criterion   =    289.9494

>> regress Employed on GNP.deflat GNP Unemployed Armed.Forc Population Year

------------------------------------------------
Linear regression analysis of trait "Employed"
------------------------------------------------

    Variable             Beta    Stand Error        t-Value
  ---------------------------------------------------------
  Intercept       -3482.2586       890.4204         3.9108 ***
  GNP.deflat           0.0151         0.0849         0.1774 .  
  GNP                 -0.0358         0.0335         1.0695 .  
  Unemployed          -0.0202         0.0049         4.1364 ***
  Armed.Forc          -0.0103         0.0021         4.8220 ***
  Population          -0.0511         0.2261         0.2261 .  
  Year                 1.8292         0.4555         4.0159 ***

No. usable observations =     16      ( 100.0%)
Model Mean Square       =     30.6954 (df=   6)
Mean Square Error       =      0.0929 (df=   9)
Multiple R**2           =      0.9955
Akaike Inf. Criterion   =     -1.6258

See also:

gpe Genotype probability estimates.
set analysis Includes imputed genotypes in GLMs
set model allelic or genotypic marker encoding


<< (kruskal-wallis)Up to index>> (clreg)