Loglin: A program for loglinear analysis of complete and incomplete count data

Written by David L. Duffy (1994)

CONTENTS

  • Introduction
  • Methods
  • Usage
  • References
  • Examples from literature

    INTRODUCTION

    Program LOGLIN performs generalised log-linear modelling of categorical data. It can fit any of the log-linear models available to standard packages such as GLIM, SAS, BDMP or SPSS, including models with structural zeros (as in PROC CATMOD). In addition, it can fit models for missing data and/or unobserved data. Although it can fit the more general latent variable models described by Haberman (1980), Goodman (1981) or Hagenaars (1990a, 1990b), these can be cumbersome and slow to converge (David Rindskopf was very helpful in pointing out how to fit these in the present log-linear framework).

    LOGLIN can be used for:

    1. Models where imprecise measures have been calibrated using a "perfect" gold standard, and the true association between imperfectly measured variables is to be estimated.
    2. Where data is missing for a subsample of the population i.e. the same as (1).
    3. Latent variable models where latent variables are "errorless" functions of observed variables - eg ML gene frequency estimation from counts of observed phenotypes.
    4. Specialised measurement models eg where observed counts are mixtures due to perfect measures and error prone measures.
    5. Standard models which are difficult to fit in some packages, such as symmetry and quasi-symmetry models.

    METHODS

    The general framework underlying these models is summarised by Espeland (1986), and Espeland & Hui (1987), and is originally due to Thompson & Baker (1981). An observed contingency table y, which will be treated as a vector, is modelled as arising from an underlying complete table z, where observed count y(j) is the sum of a number of elements of z, such that each z(i) contributes to no more than one y(j). Therefore one can write y=F'z, where F is made up of orthogonal columns of ones and zeros.

    We then specify a loglinear model for z, so that log(E(z))=X'b, where X is a design matrix, and b a vector of loglinear parameters. The loglinear model for z and thus y, can be fitted using two methods, both of which are available in LOGLIN. The first was presented as AS207 by Michael Haber (1984) and combines an iterative proportional fitting algorithm for b and z, with an EM fitting for y, z and b. The second is a Fisher scoring approach, presented in Espeland (1986).

    Each iteration of the Fisher scoring algorithm is

    b(t+1) = b(t) + I-1 (PX')' (m - F(F'F)-1 y) ,

    where,

    b(t) is the estimate of b for the tth iteration,

    m = exp(X'b) ,

    P = F (F' diag(m) F)-1 F' diag(m) ,

    and

    I = (PX')' diag(m) (PX').

    The default option provided by the program is to use the EM algorithm to provide starting values for the scoring algorithm, thus gaining a modest improvement in speed. However, each method can be called in isolation. The EM algorithm needs to call the scoring algorithm to get the information matrix for the loglinear parameters in any case. In the case of missing data, one is usually interested in collapsing the complete table to give expected counts for subtables, and often summary measures for these subtables. Standard errors of collapsed counts, and measures can be calculated using the covariance matrix for the loglinear parameters of the complete table using the delta method.

    As an alternative, LOGLIN allows (nonparametric) bootstrap estimates of standard errors to be obtained. These are currently only for Poisson models, and will differ if sampling is constrained - eg product-multinomial - for incomplete tables. Espeland (1985) discusses approaches for this and other situations. Bootstrap percentiles for the model LR chi-square are also produced.

    USAGE

    The program reads commands from standard input, and writes to standard output. The commands are made up of the following key words and data (note that the parser usually reads only the first two to four characters of a keyword, and will usually read a long form key word as well eg bootstrap|boot|bs):

      COMPULSORY

    1. da <nj> where nj is the number of cells in the observed table. Followed by (on the next line): y(1..nj) the nj cell counts read in free format.
    2. mo <nid> <nk> where nid is the number of counts the model is to be fitted to, and nk is the number of loglinear parameters to be fitted. Followed by: the design matrix C(1..nid,1..nk) read in free format.

      OPTIONAL

    3. ce <ni> where ni is the number of cells in the underlying complete table that gives rise to the observed counts. Followed by: ji(1..ni) the ni elements of the scatter matrix that maps y onto x, the complete table. Each y(j) is a sum of one or more x(i)'s. ji is read free format. ji can be replaced mathematically by S(1..ni,1..nj), made up of 1's and 0's such that y=S'x.
    4. se <nkk> where nkk is a number of loglinear parameters selected from the design matrix C. This allows easy selection of hierarchical models. Followed by: csel(1..nkk) the number of each column of the original design matrix selected for fitting, read in free format.
    5. cl <ncoll> where the first ncoll cells of x are to be collapsed over (Maximum therefore ni). This is useful in missing data models to give mean counts for variables unobserved in a given subtable. Followed by: coll(1..ncoll) a scatter vector containing each number of the cell of the collapsed table that the particular x(i) value contributes to. If coll(j)=0 then the jth cell does not contribute to the resulting collapsed table.
    6. fi em|sc|hy [<it>] determines which algorithm the program will use to fit the model: either EM/Iterative Proportional Fitting, Fisher scoring algorithm or both - the latter where the EM algorithm runs for it iterations (default it=3) to provide starting values for the scoring algorithm. The default is hy[brid].
    7. bs <bs> [em] controls whether bootstrap standard errors for collapsed tables and summary measures for these tables will be calculated. bs is the number of bootstrap samples to be generated. The default fitting algorithm for each bootstrap sample is the scoring algorithm, but the keyword em forces the use of the EM algorithm. This is considerably slower in some circumstances, but will converge when the scoring algorithm does not.
    8. pr <t> <b> calculates the proportion x(t)/x(t)+x(b) from the collapsed table along with bootstrapped a standard error if the bs option is active.
    9. cw <t> <b> calculates the proportion 2*x(t)/2*x(t)+x(b) from the collapsed table along with bootstrapped a standard error if the bs option is active.
    10. or <c1> <c2> <c3> <c4> calculates the odds ratio x(c1)*x(c4)/x(c2)/x(c3) from the collapsed table along with a bootstrapped standard error if the bs option is active.
    11. la attaches labels to the nk loglinear parameters. Followed by: term(1..nk) the nk labels maximum length 10 characters. Terminating a line with ":" allows the list of labels to extend over to the next line.
    12. ou [print=1|2] [co] [de] controls the amount of output. print controls whether estimates are printed each iteration, where print=1 gives EM and score estimates for x each iteration and print=2 prints the IPF estimates as well. co prints out the covariance matrix for the loglinear parameters. de prints out the normalised design matrix used by the EM algorithm.
    13. st leads to starting values for the loglinear parameters being read. Followed by: pars(1..nk) the starting values read free format.
    14. conv <conv> convergence criterion. Note that this is divided by 100 to act as criterion for change in loglinear parameters in the scoring algorithm, and used unchanged as criterion for changes in counts for the EM algorithm.
    15. au <aug> adds a constant aug to each count. Appropriate for models with sampling zeros and/or small counts. In the 2x2 case at least, reduces bias in odds ratio estimate.
    16. ! | rem | c denotes a comment. The line is copied to output.

    REFERENCES

    EXAMPLES

    The following jobs fit a variety of loglinear models.

    Complete Data

    Incomplete Data

    Latent Class Analysis

    Example 1

    This example fits to a 2x2 table, and bootstraps the standard error of the odds ratio.
    ! simplest table
    data 4
    31 109 17 122
    ! intercept row and col, odds ratio
    mo 4 4
    1 1 1 1
    1 0 1 0
    1 1 0 0
    1 0 0 0
    ! labels for loglinear terms
    la
    intercept row col oddsr
    ! fit saturated model, and reverses the order of parameter printing
    se 4
    4 3 2 1
    or 1 2 3 4
    bs 200
    

    Example 2

    This example is slightly more complex and looks for effects of zygosity on concordance in twins. The prevalence of the condition is constrained to be equal for the first and second twins, and the second and third order term weights adjusted to produce the (smoothed) OR(DZ) and OR(MZ)/OR(DZ).

    ! DZ 2x2 table then MZ 2x2 table
    data 8
    12 12 10 1335
    5 12 24 1506
    mo 8 6
      1  2  1  2   2  2 
      1  1  1  0.5 1  0.5
      1  1  1  0.5 1  0.5 
      1  0  1  0   0  0 
      1  2  0  2   0  0 
      1  1  0  0.5 0  0 
      1  1  0  0.5 0  0 
      1  0  0  0   0  0 
    !--------------------
    ! 1  2  3  4   5  6 
    ! i  a  z  a1  a  a1 
    !          a2  z  a2 
    !                 z  
    la
    i a z aa az aaz
    

    Example 3

    This job estimates the true prevalence of asthma from an imperfect proxy measure - cross-reporting by cotwin. Sensitivity and specificity are obtained from cross-reporting versus self report in pairs where both twins returned a questionnaire. The chi-square compares prevalence of proxy asthma in the two groups.

    !
    ! Adjust cross-reported asthma in singles using data from complete pairs 
    !
     cells 8
     1 1 2 2 
     3 4 5 6
    !
    ! One 2x1 tables and one 2x2 table giving sens and spec 
    !
     data 6
     116 
     540
     451 91
     168 2075
     model 8 5
       1   0   0   0  1
       1   0   0   1  0
       1   0   1   0  0
       1   0   1   1  0
       1   1   0   0  1
       1   1   0   1  0
       1   1   1   0  0
       1   1   1   1  0
    !  i   L   T   A  AT
    la
    i L T A AT
    conv 0.001
    cl 4
    1 2 1 2
    pr 1 2
    bs 200 
    ou 
    

    Example 4

    This very similar job estimates the population cumulative incidence of asthma and a standard error from a stratified random sample. Stratum 1 is a sample of probands with a history of childhood asthma (C+), and stratum 2 those without such a history (C-). Because the sampling fraction is dependent on C, the model chi-square is zero. The bootstrap standard error for the weighted risk agrees with the analytic asymptotic standard error to three decimal places (cumulative incidence=0.231; SE=0.012).

    !
    ! Look at Mark Jenkins' asthma data - Brit Med J 1994;309:90-3. 
    ! compare delta estimator of SE for stratified sample to that in LOGLIN
    !
     cells 8
     1 2 3 4 
     5 5 6 6
    !
    !  2x2 table for the sampled probands (A+,A- in C+, then C-).  
    !  One 2x1 table for unsampled subjects, giving therefore the sampling
    fraction.
    !
     data 6
     414 327
     127 626 
     608
     6240
     model 8 6
       1  0  0  0  0  0
       1  0  0  1  0  0
       1  0  1  0  0  0
       1  0  1  1  0  1
       1  1  0  0  0  0
       1  1  0  1  0  0
       1  1  1  0  1  0
       1  1  1  1  1  1
    !
    !  i  S  C  A SC CA
    !
    ! S=sampled; C=childhood asthma; A=adult asthma
    !
    la
    i S C A SC CA
    conv 0.001
    cl 8
    1 2 1 2 1 2 1 2
    pr 1 2
    bs 500 
    ou 
    

    Example 5

    Here, we estimate the gene frequencies for the AB0 system by ML scoring methods. The resulting parameter estimates and confidence limits have to be rescaled by N^(-.5) to give the actual proportions. This approach is easily generalised to larger AB0-like systems such as the HLA system, where some types may not yet be identified ("blanks").

    !
    ! Estimation AB0 frequencies Elandt-Johnson, 1971, p 401, Ex 14.1
    ! A  B  AB  0
    !
    data 4
     725 258 72 1073
    ce 9
     1 3 1 3 2 2 1 2 4
    model 9 3
     2 0 0 1 1 0 1 0 1 1 1 0 0 2 0 0 1 1 1 0 1 0 1 1 0 0 2
    la
     A B 0    
    

    Example 6

    Test Hardy-Weinberg equilibrium in two samples typed at the ApoE locus, and whether gene frequencies are the same.

    !
    ! Test for HWE ApoE Cauley et al 1993 across two age cohorts
    !
    ! 2-2, 3-2, 4-2, 3-3, 4-3, 4-4
    !
    data 12
     2 47  5  315 98  6 
     5 126 11 581 135 12
    
    ce 18
     1 2 3
     2 4 5
     3 5 6
     7 8  9
     8 10 11
     9 11 12
    !
    ! e2 e3 e4 age
    !
    model 18 7
     2 0 0 0 0 0 0 
     1 1 0 0 0 0 0 
     1 0 1 0 0 0 0 
     1 1 0 0 0 0 0 
     0 2 0 0 0 0 0 
     0 1 1 0 0 0 0 
     1 0 1 0 0 0 0 
     0 1 1 0 0 0 0 
     0 0 2 0 0 0 0 
     2 0 0 1 2 0 0 
     1 1 0 1 1 1 0 
     1 0 1 1 1 0 1 
     1 1 0 1 1 1 0 
     0 2 0 1 0 2 0 
     0 1 1 1 0 1 1 
     1 0 1 1 1 0 1 
     0 1 1 1 0 1 1 
     0 0 2 1 0 0 2 
    !
    !1 2 3 4 5 6 7 
    !
    la
    e2 e3 e4 age e2*age e3*age e4*age 
    !
    ! se 4      Comparing LR for full model versus no interaction
    ! 1 2 3 4   model tests for gene frequencies conditional on 
    !           HWE
    

    Example 7

    Test for linkage disequilibrium discussed by Aston and Wilson (1986). This is their "easy" two-locus example, also evaluated by Ott (1985).

    ! gametic (pair) frequency   gamma        two alleles A(ij), B(kl)
    ! allelic    "               alpha        two gametes G1(ik), G2(jl)
    ! deviation from HWE         phi
    ! intragametic allelic assoc epsilon
    ! intergametic allelic assoc delta
    !
    ! ln g(ijkl) = mu + a(i) + a(j)  + a(k) + a(l) + p(ij) + p(kl)
    !                 + e(ik) + e(jl) + d(il) + d(jk)
    !
    ! a(i) and a(j) are represented by a combined parameter in the model below,
    ! as is a(k) & a(l) and e(ik) and e(jl).  
    ! epsilon and delta are confounded and cannot be simultaneously estimated.
    ! Locus B 3 alleles versus Locus H three alleles.
    
     data 36
      2   2   1   7   3   3   
      6   11  10  18  30  15  
      6   9   12  22  45  45
      14  19  11  31  23  19
      31  66  37  110 93  72
      37  57  15  53  43  22
     cells   81
      1  2  4  2  3  5  4  5  6
      7  8 10  8  9 11 10 11 12
     19 20 22 20 21 23 22 23 24
      7  8 10  8  9 11 10 11 12
     13 14 16 14 15 17 16 17 18
     25 26 28 26 27 29 28 29 30
     19 20 22 20 21 23 22 23 24
     25 26 28 26 27 29 28 29 30
     31 32 34 32 33 35 34 35 36
     model  81 21
       1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   1 0 0 1 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0
       1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   1 0 0 1 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0
       1 0 0 2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0   1 0 0 1 1 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0
       1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   1 0 0 1 1 0 0 0 0 0 1 0 0 0
    0 0 0 0 0 0 0
       1 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0   1 1 0 0 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0
       1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0   1 1 0 0 1 0 0 0 0 0 0 0 0 0
    0 1 0 0 0 0 0
       1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0   1 1 0 2 0 0 0 0 0 1 0 0 0 1
    0 0 0 1 0 0 0
       1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0   1 1 0 0 1 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 1 0
       1 1 0 1 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0   1 1 0 0 2 0 0 0 0 0 0 0 1 0
    0 1 0 0 0 1 0
       1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   1 0 1 1 0 0 0 0 0 0 0 0 0 0
    1 0 0 0 0 0 0
       1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0   1 0 1 1 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 1 0 0
       1 0 1 2 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0   1 0 1 1 1 0 0 0 0 0 0 1 0 0
    0 0 1 0 1 0 0
       1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1   1 0 1 1 1 0 0 0 0 0 1 0 0 0
    1 0 0 0 0 0 1
       1 0 1 0 2 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1   1 1 0 0 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0
       1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0   1 1 0 0 1 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 1 0
       1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0   1 1 0 2 0 0 0 0 0 1 0 0 0 1
    0 0 0 1 0 0 0
       1 1 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0   1 1 0 0 1 0 0 0 0 0 0 0 0 0
    0 1 0 0 0 0 0
       1 1 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0   1 1 0 0 2 0 0 0 0 0 0 0 1 0
    0 1 0 0 0 1 0
       1 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   1 2 0 1 0 1 0 0 0 0 0 0 0 1
    0 0 0 1 0 0 0
       1 2 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0   1 2 0 1 0 1 0 0 0 0 0 0 0 1
    0 0 0 1 0 0 0
       1 2 0 2 0 1 0 0 0 1 0 0 0 2 0 0 0 2 0 0 0   1 2 0 1 1 1 0 0 0 0 0 1 0 1
    0 1 0 1 0 1 0
       1 2 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0   1 2 0 1 1 1 0 0 0 0 1 0 0 1
    0 1 0 1 0 1 0
       1 2 0 0 2 1 0 0 0 0 0 0 1 0 0 2 0 0 0 2 0   1 1 1 0 0 0 0 1 0 0 0 0 0 0
    0 0 0 0 0 0 0
       1 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0   1 1 1 0 1 0 0 1 0 0 0 0 0 0
    0 0 1 0 0 1 0
       1 1 1 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0   1 1 1 2 0 0 0 1 0 1 0 0 0 1
    1 0 0 1 1 0 0
       1 1 1 1 1 0 0 1 0 0 0 1 0 1 0 0 1 0 1 1 0   1 1 1 0 1 0 0 1 0 0 0 0 0 0
    0 1 0 0 0 0 1
       1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1   1 1 1 0 2 0 0 1 0 0 0 0 1 0
    0 1 1 0 0 1 1
       1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   1 0 1 1 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 1 0 0
       1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1   1 0 1 1 0 0 0 0 0 0 0 0 0 0
    1 0 0 0 0 0 0
       1 0 1 2 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0   1 0 1 1 1 0 0 0 0 0 0 1 0 0
    1 0 0 0 0 0 1
       1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0   1 0 1 1 1 0 0 0 0 0 1 0 0 0
    0 0 1 0 1 0 0
       1 0 1 0 2 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1   1 1 1 0 0 0 1 0 0 0 0 0 0 0
    0 0 0 0 0 0 0
       1 1 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0   1 1 1 0 1 0 1 0 0 0 0 0 0 0
    0 1 0 0 0 0 1
       1 1 1 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0   1 1 1 2 0 0 1 0 0 1 0 0 0 1
    1 0 0 1 1 0 0
       1 1 1 1 1 0 1 0 0 0 0 1 0 0 1 1 0 1 0 0 1   1 1 1 0 1 0 1 0 0 0 0 0 0 0
    0 0 1 0 0 1 0
       1 1 1 1 1 0 1 0 0 0 1 0 0 1 0 0 1 0 1 1 0   1 1 1 0 2 0 1 0 0 0 0 0 1 0
    0 1 1 0 0 1 1
       1 0 2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0   1 0 2 1 0 0 0 0 1 0 0 0 0 0
    1 0 0 0 1 0 0
       1 0 2 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1   1 0 2 1 0 0 0 0 1 0 0 0 0 0
    1 0 0 0 1 0 0
       1 0 2 2 0 0 0 0 1 1 0 0 0 0 2 0 0 0 2 0 0   1 0 2 1 1 0 0 0 1 0 0 1 0 0
    1 0 1 0 1 0 1
       1 0 2 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1   1 0 2 1 1 0 0 0 1 0 1 0 0 0
    1 0 1 0 1 0 1
       1 0 2 0 2 0 0 0 1 0 0 0 1 0 0 0 2 0 0 0 2
    !------------------------------------------
    ! 1 2 3 4 5 6 7 8 9101112131415161718192021
    ! i a a a a p p p p p p p p e e e e d d d d
    !
    ! Allelic association and deviation from HWE
    ! Since epsilon and delta terms are confounded - one set (delta's) is zeroed
    ! ie assume no intergametic association
     se 17
     1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
    !
    ! No allelic association - deviation from HWE
    ! se 13
    ! 1 2 3 4 5 6 7 8 9 10 11 12 13
    !
    ! HWE; no allelic association
    ! se 5 
    ! 1 2 3 4 5 
    ou 
    

    Example 8

    (8) This job fits a model of errors in rating X-rays of dental caries to data for two observers. It assumes that one group of X-rays is easy to read and gives rise to no disagreement, while the remainder are difficult and give rise to a number of disagreements.
    ! Fit teeth from Espeland et al 1986
     cells 12
     1 5 9 1 2 3 4 5 6 7 8 9
    !  3x3 table of rating of caries 3 point scale 2 observers
     data 9
     1450 55 74
       99 35 33
       22 11 64
     model 12 8
     1 0 0 0 0 0 0 0
     0 1 0 0 0 0 0 0
     0 0 1 0 0 0 0 0
     0 0 0 1 1 0 1 0
     0 0 0 1 1 0 0 1
     0 0 0 1 1 0 0 0
     0 0 0 1 0 1 1 0
     0 0 0 1 0 1 0 1
     0 0 0 1 0 1 0 0
     0 0 0 1 0 0 1 0
     0 0 0 1 0 0 0 1
     0 0 0 1 0 0 0 0
    

    Example 9

    Hochberg (1977) presents a double sampling experiment where a smaller subsample of subjects were measured using a gold standard, while all Ss were measured using "cheap" unreliable measures.

    ! Fit Hochberg 1977 double sampling data
     cells 32
     1 1 2 2 1 1  2  2  3  3  4  4  3  3  4  4 
     5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
    !  2x2 table of imprecise measures and 2x2x2x2 reliability data
     data 20
     1196 13562 
     7151 58175
     17  3  10  258
     3   4  4   25
     16  3  25  197
     100 13 107 1014
    
    !
    ! model is AA*BB* + L (dummy study variable)
    ! so vars are intercept, A, A*, B, B*, L, A.A*, A.B, A.B*, A*.B, A*.B*
    !             B.B*, A.A*.B, A.A*.B*, A.B.B*, A*.B.B*, A.A*.B.B*
    
     model 32 17
     1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
     1  0  1  1  1  1  0  0  0  1  1  1  0  0  0  1  0 
     1  1  0  1  1  1  0  1  1  0  0  1  0  0  1  0  0 
     1  0  0  1  1  1  0  0  0  0  0  1  0  0  0  0  0 
     1  1  1  0  1  1  1  0  1  0  1  0  0  1  0  0  0 
     1  0  1  0  1  1  0  0  0  0  1  0  0  0  0  0  0 
     1  1  0  0  1  1  0  0  1  0  0  0  0  0  0  0  0 
     1  0  0  0  1  1  0  0  0  0  0  0  0  0  0  0  0 
     1  1  1  1  0  1  1  1  0  1  0  0  1  0  0  0  0 
     1  0  1  1  0  1  0  0  0  1  0  0  0  0  0  0  0 
     1  1  0  1  0  1  0  1  0  0  0  0  0  0  0  0  0 
     1  0  0  1  0  1  0  0  0  0  0  0  0  0  0  0  0 
     1  1  1  0  0  1  1  0  0  0  0  0  0  0  0  0  0 
     1  0  1  0  0  1  0  0  0  0  0  0  0  0  0  0  0 
     1  1  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0 
     1  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0 
     1  1  1  1  1  0  1  1  1  1  1  1  1  1  1  1  1 
     1  0  1  1  1  0  0  0  0  1  1  1  0  0  0  1  0 
     1  1  0  1  1  0  0  1  1  0  0  1  0  0  1  0  0 
     1  0  0  1  1  0  0  0  0  0  0  1  0  0  0  0  0 
     1  1  1  0  1  0  1  0  1  0  1  0  0  1  0  0  0 
     1  0  1  0  1  0  0  0  0  0  1  0  0  0  0  0  0 
     1  1  0  0  1  0  0  0  1  0  0  0  0  0  0  0  0 
     1  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0 
     1  1  1  1  0  0  1  1  0  1  0  0  1  0  0  0  0 
     1  0  1  1  0  0  0  0  0  1  0  0  0  0  0  0  0 
     1  1  0  1  0  0  0  1  0  0  0  0  0  0  0  0  0 
     1  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0 
     1  1  1  0  0  0  1  0  0  0  0  0  0  0  0  0  0 
     1  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
     1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
     1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
    !
    ! recover estimated A.B collapsed table for entire sample
     collapse 32
     1 2 1 2 3 4 3 4 1 2 1 2 3 4 3 4 1 2 1 2 3 4 3 4 1 2 1 2 3 4 3 4
     odds_ratio 1 2 3 4
    !
    ! get bootstrapped standard errors of mean values collapsed table
     bootstrap 150
    
    The edited output from example (4) is:
       +---------------------------------+ 
       |             LOGLIN              | 
       |   General Log-linear Modelling  | 
       |   Using AS 207 (Haber, 1984)    | 
       +---------------------------------+ 
          Written by David L Duffy 1992    
                 QIMR Australia            
                HP Fortran version         
    
      Program LOGLIN run at 14:31:52 on  8-Apr-92
      The following input lines were read:
    .
    . [as above]
    .
      Output: 
    
      No. cells complete table=  32
      No. cells observed table=  20
      No. parameters estimated=  17
      Convergence criterion   =  .100E-02
    
      Fitting via Fisher score algorithm
    
      Mean observed cell size = 4094.00
    
      Rank of design matrix   =  17
    
       Gibbs Chi-square =    6.49 P= .09
     Pearson Chi-square =    6.18 P= .10
                     df =    3.
    
      Observed Table ------------------------- 
           Observed    Fitted      F-T Deviate 
     [ 1]   1196.00  1196.13          .00
     .
     .
     [20]   1014.00   987.30          .85
    
    
    
    
      Full Table ---- 
            Fitted    
    [ 1]   753.12
    .
    [32]   987.30
    
      Full Table ------------------------------- 
         Parameter      S.E.  exp(Par)   95% Confidence Limits   Term 
    [ 1]    6.895       .028    987.297    933.776   1043.885    
    [ 2]   -2.249       .103       .106       .086       .129    
    [ 3]   -4.138       .240       .016       .010       .026    
    .
    [16]   -2.794       .987       .061       .009       .423    
    [17]    3.991      1.349     54.099      3.846    761.043    
    
      Collapsed table ------------ 
    
    [ 1]   3227.39
    [ 2]  21071.03
    [ 3]  10581.91
    [ 4]  47002.67
     --------------
      OR       .68
     --------------
    
      Bootstrap mean    S.E.   95% CL-----------
    
    [ 1]   3198.97    376.18   2461.66   3936.28 
    [ 2]  21125.60    654.63  19842.53  22408.66 
    [ 3]  10601.53    596.20   9432.98  11770.08 
    [ 4]  46957.00    826.98  45336.11  48577.89 
     -------------------------------------------
    logOR     -.40       .16      -.72      -.09 
       OR      .67                 .49       .92 
     -------------------------------------------
    
      No. of bootstrap samples=  150
    
      Job completed in    153.0 seconds.
                            2.5 minutes.
    
    
    Espeland and Hui (1987) give their results for the same model. The overall model goodness-of-fit was G23=6.49. The standard errors are calculated using the delta method.
    ----------------------------------------------------------------------------
    Precise Injury  Precise belt use       Fitted Estimate        Standard Error
    ----------------------------------------------------------------------------
    Yes             Yes                     3227.4                344.9
    Yes             No                     21071.0                660.0
    No              Yes                    10581.9                527.2
    No              No                     47002.7                787.6
    ----------------------------------------------------------------------------
    Odds ratio from collapsed table           0.68                  0.16
    ----------------------------------------------------------------------------
    

    Example 10

    (8) This job performs a latent class analysis of an example from the LEM manual (Vermunt 1997). Four manifest binary variables are taken as indicators of a single underlying binary latent variable. It is essential that the EM fitting algorithm is used, because the scoring algorithm fails in this example.
    !
    ! Example data from Lem manual
    !
    data 16
    59 56 14 36 7 15 4 23
    75 162 22 115 8 68 22 123
    ce 32
     1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
     1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
    !
    ! X A B C D XA XB XC XD
    !
     design  32 16
      1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
      1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
      1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1
      1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
      1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0
      1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0
      1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1
      1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
      1 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0
      1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0
      1 0 1 0 1 1 0 0 0 0 0 1 1 0 0 1
      1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0
      1 0 1 1 0 1 0 0 0 0 1 0 1 0 1 0
      1 0 1 1 1 0 0 0 0 0 1 1 0 1 0 0
      1 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1
      1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0
      1 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0
      1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 1
      1 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0
      1 1 0 1 0 1 0 1 0 1 0 0 0 0 1 0
      1 1 0 1 1 0 0 1 1 0 0 0 0 1 0 0
      1 1 0 1 1 1 0 1 1 1 0 0 0 1 1 1
      1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0
      1 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0
      1 1 1 0 1 0 1 0 1 0 0 1 0 0 0 0
      1 1 1 0 1 1 1 0 1 1 0 1 1 0 0 1
      1 1 1 1 0 0 1 1 0 0 1 0 0 0 0 0
      1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0
      1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 0
      1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    !--------------------------------
    ! 1 2 3 4 5 6 7 8 910111213141516
    !--------------------------------
    !i X2A2B2C2D2X2X2X2X2A2A2A2B2B2C2
    !            A2B2C2D2B2C2D2C2D2D2
    !                                
    !                                
    la
    i X A B C D XA XB XC XD AB AC AD BC BD CD
    se 10
    1 2 3 4 5 6 7 8 9 10
    fi em
    conv 1e-6