Loglin: A program for loglinear analysis of complete and incomplete count data

Written by David L. Duffy (1994)

INTRODUCTION

Program LOGLIN performs generalised log-linear modelling of categorical data. It can fit any of the log-linear models available to standard packages such as GLIM, SAS, BDMP or SPSS, including models with structural zeros (as in PROC CATMOD). In addition, it can fit models for missing data and/or unobserved data. Although it can fit the more general latent variable models described by Haberman (1980), Goodman (1981) or Hagenaars (1990a, 1990b), these can be cumbersome and slow to converge (David Rindskopf was very helpful in pointing out how to fit these in the present log-linear framework).

LOGLIN can be used for:

Models where imprecise measures have been calibrated using a "perfect" gold standard, and the true association between imperfectly measured variables is to be estimated.
Where data is missing for a subsample of the population i.e. the same as (1).
Latent variable models where latent variables are "errorless" functions of observed variables - eg ML gene frequency estimation from counts of observed phenotypes.
Specialised measurement models eg where observed counts are mixtures due to perfect measures and error prone measures.
Standard models which are difficult to fit in some packages, such as symmetry and quasi-symmetry models.

METHODS

The general framework underlying these models is summarised by Espeland (1986), and Espeland & Hui (1987), and is originally due to Thompson & Baker (1981). An observed contingency table y, which will be treated as a vector, is modelled as arising from an underlying complete table z, where observed count y(j) is the sum of a number of elements of z, such that each z(i) contributes to no more than one y(j). Therefore one can write y=F'z, where F is made up of orthogonal columns of ones and zeros.

We then specify a loglinear model for z, so that log(E(z))=X'b, where X is a design matrix, and b a vector of loglinear parameters. The loglinear model for z and thus y, can be fitted using two methods, both of which are available in LOGLIN. The first was presented as AS207 by Michael Haber (1984) and combines an iterative proportional fitting algorithm for b and z, with an EM fitting for y, z and b. The second is a Fisher scoring approach, presented in Espeland (1986).

Each iteration of the Fisher scoring algorithm is

b(t+1) = b(t) + I^-1 (PX')' (m - F(F'F)^-1 y) ,

where,

b(t) is the estimate of b for the tth iteration,

m = exp(X'b) ,

P = F (F' diag(m) F)^-1 F' diag(m) ,

and

I = (PX')' diag(m) (PX').

The default option provided by the program is to use the EM algorithm to provide starting values for the scoring algorithm, thus gaining a modest improvement in speed. However, each method can be called in isolation. The EM algorithm needs to call the scoring algorithm to get the information matrix for the loglinear parameters in any case. In the case of missing data, one is usually interested in collapsing the complete table to give expected counts for subtables, and often summary measures for these subtables. Standard errors of collapsed counts, and measures can be calculated using the covariance matrix for the loglinear parameters of the complete table using the delta method.

As an alternative, LOGLIN allows (nonparametric) bootstrap estimates of standard errors to be obtained. These are currently only for Poisson models, and will differ if sampling is constrained - eg product-multinomial - for incomplete tables. Espeland (1985) discusses approaches for this and other situations. Bootstrap percentiles for the model LR chi-square are also produced.

USAGE

The program reads commands from standard input, and writes to standard output. The commands are made up of the following key words and data (note that the parser usually reads only the first two to four characters of a keyword, and will usually read a long form key word as well eg bootstrap|boot|bs):

COMPULSORY

da <nj> where nj is the number of cells in the observed table. Followed by (on the next line): y(1..nj) the nj cell counts read in free format.
mo <nid> <nk> where nid is the number of counts the model is to be fitted to, and nk is the number of loglinear parameters to be fitted. Followed by: the design matrix C(1..nid,1..nk) read in free format.
OPTIONAL
ce <ni> where ni is the number of cells in the underlying complete table that gives rise to the observed counts. Followed by: ji(1..ni) the ni elements of the scatter matrix that maps y onto x, the complete table. Each y(j) is a sum of one or more x(i)'s. ji is read free format. ji can be replaced mathematically by S(1..ni,1..nj), made up of 1's and 0's such that y=S'x.
se <nkk> where nkk is a number of loglinear parameters selected from the design matrix C. This allows easy selection of hierarchical models. Followed by: csel(1..nkk) the number of each column of the original design matrix selected for fitting, read in free format.
cl <ncoll> where the first ncoll cells of x are to be collapsed over (Maximum therefore ni). This is useful in missing data models to give mean counts for variables unobserved in a given subtable. Followed by: coll(1..ncoll) a scatter vector containing each number of the cell of the collapsed table that the particular x(i) value contributes to. If coll(j)=0 then the jth cell does not contribute to the resulting collapsed table.
fi em|sc|hy [<it>] determines which algorithm the program will use to fit the model: either EM/Iterative Proportional Fitting, Fisher scoring algorithm or both - the latter where the EM algorithm runs for it iterations (default it=3) to provide starting values for the scoring algorithm. The default is hy[brid].
bs <bs> [em] controls whether bootstrap standard errors for collapsed tables and summary measures for these tables will be calculated. bs is the number of bootstrap samples to be generated. The default fitting algorithm for each bootstrap sample is the scoring algorithm, but the keyword em forces the use of the EM algorithm. This is considerably slower in some circumstances, but will converge when the scoring algorithm does not.
pr <t> <b> calculates the proportion x(t)/x(t)+x(b) from the collapsed table along with bootstrapped a standard error if the bs option is active.
cw <t> <b> calculates the proportion 2*x(t)/2*x(t)+x(b) from the collapsed table along with bootstrapped a standard error if the bs option is active.
or <c1> <c2> <c3> <c4> calculates the odds ratio x(c1)*x(c4)/x(c2)/x(c3) from the collapsed table along with a bootstrapped standard error if the bs option is active.
la attaches labels to the nk loglinear parameters. Followed by: term(1..nk) the nk labels maximum length 10 characters. Terminating a line with ":" allows the list of labels to extend over to the next line.
ou [print=1|2] [co] [de] controls the amount of output. print controls whether estimates are printed each iteration, where print=1 gives EM and score estimates for x each iteration and print=2 prints the IPF estimates as well. co prints out the covariance matrix for the loglinear parameters. de prints out the normalised design matrix used by the EM algorithm.
st leads to starting values for the loglinear parameters being read. Followed by: pars(1..nk) the starting values read free format.
conv <conv> convergence criterion. Note that this is divided by 100 to act as criterion for change in loglinear parameters in the scoring algorithm, and used unchanged as criterion for changes in counts for the EM algorithm.
au <aug> adds a constant aug to each count. Appropriate for models with sampling zeros and/or small counts. In the 2x2 case at least, reduces bias in odds ratio estimate.
! | rem | c denotes a comment. The line is copied to output.

REFERENCES

Aston CE, Wilson SR (1986): Log-linear model analysis of allelic associations. Genet Epidemiol 3: 187-194.
Cauley JA, Eichner JE, Kamboh MI, Ferrell RE, Kuller LH (1993): Apo E allele frequencies in younger (age 42-50) vs older (age 65-90) women. Genet Epidemiol 10: 27-34.
Elandt-Johnson R (1971): Probability models and statistical methods in genetics. New York: Wiley.
Espeland MA, Odoroff CL (1985): Log-linear models for doubly sampled categorical data fitted by the EM algorithm. J Am Statist Ass 80:663-670.
Espeland MA (1986): A general class of models for discrete multivariate data. Commun. Statist.-Simula 15:405-424.
Espeland MA, Hui SL (1987): A general approach to analyzing epidemiologic data that contains misclassification errors. Biometrics 43:1001-1012.
Haber M (1984): AS207: Fitting a general log-linear model. Appl Statist 33:358-362.
Hagenaars JA (1990a): Categorical longitudinal data : log-linear panel, trend, and cohort analysis. Newbury Park, Calif. : Sage Publications.
Hagenaars J, Luijkx R (1990b): LCAG: a program to estimate latent class models and other loglinear models with latent variables with and without missing data. Version 2.1. Groningen: Tilburg University Department of Sociology.
Hochberg Y (1977): On the use of double sampling schemes in analyzing categorical data with misclassification errors. J Am Statist Ass 72:914-921.
Jenkins MA, Hopper JL, Bowes G, Carlin JB, Flander LB, Giles GG (1994): Factors in childhood as predictors of asthma in adult life. Brit Med J 309: 90-93.
Ott J (1985). A chi-square test to distinguish allelic association from other causes of phenotypic association between two loci. Genet Epidemiol 1985; 2: 79-84.
Thompson R, Baker RJ (1981): Composite link functions in generalized linear models. Appl Stat 30: 125-131.
Vermunt JK. LEM: a general program for the analysis of categorical data. Tilburg: Tilburg University 1997.

EXAMPLES

The following jobs fit a variety of loglinear models.

Example 1

This example fits to a 2x2 table, and bootstraps the standard error of the odds ratio.

! simplest table
data 4
31 109 17 122
! intercept row and col, odds ratio
mo 4 4
1 1 1 1
1 0 1 0
1 1 0 0
1 0 0 0
! labels for loglinear terms
la
intercept row col oddsr
! fit saturated model, and reverses the order of parameter printing
se 4
4 3 2 1
or 1 2 3 4
bs 200

Example 2

This example is slightly more complex and looks for effects of zygosity on concordance in twins. The prevalence of the condition is constrained to be equal for the first and second twins, and the second and third order term weights adjusted to produce the (smoothed) OR(DZ) and OR(MZ)/OR(DZ).

! DZ 2x2 table then MZ 2x2 table
data 8
12 12 10 1335
5 12 24 1506
mo 8 6
  1  2  1  2   2  2 
  1  1  1  0.5 1  0.5
  1  1  1  0.5 1  0.5 
  1  0  1  0   0  0 
  1  2  0  2   0  0 
  1  1  0  0.5 0  0 
  1  1  0  0.5 0  0 
  1  0  0  0   0  0 
!--------------------
! 1  2  3  4   5  6 
! i  a  z  a1  a  a1 
!          a2  z  a2 
!                 z  
la
i a z aa az aaz

Example 3

This job estimates the true prevalence of asthma from an imperfect proxy measure - cross-reporting by cotwin. Sensitivity and specificity are obtained from cross-reporting versus self report in pairs where both twins returned a questionnaire. The chi-square compares prevalence of proxy asthma in the two groups.

!
! Adjust cross-reported asthma in singles using data from complete pairs 
!
 cells 8
 1 1 2 2 
 3 4 5 6
!
! One 2x1 tables and one 2x2 table giving sens and spec 
!
 data 6
 116 
 540
 451 91
 168 2075
 model 8 5
   1   0   0   0  1
   1   0   0   1  0
   1   0   1   0  0
   1   0   1   1  0
   1   1   0   0  1
   1   1   0   1  0
   1   1   1   0  0
   1   1   1   1  0
!  i   L   T   A  AT
la
i L T A AT
conv 0.001
cl 4
1 2 1 2
pr 1 2
bs 200 
ou

Example 4

This very similar job estimates the population cumulative incidence of asthma and a standard error from a stratified random sample. Stratum 1 is a sample of probands with a history of childhood asthma (C+), and stratum 2 those without such a history (C-). Because the sampling fraction is dependent on C, the model chi-square is zero. The bootstrap standard error for the weighted risk agrees with the analytic asymptotic standard error to three decimal places (cumulative incidence=0.231; SE=0.012).

!
! Look at Mark Jenkins' asthma data - Brit Med J 1994;309:90-3. 
! compare delta estimator of SE for stratified sample to that in LOGLIN
!
 cells 8
 1 2 3 4 
 5 5 6 6
!
!  2x2 table for the sampled probands (A+,A- in C+, then C-).  
!  One 2x1 table for unsampled subjects, giving therefore the sampling
fraction.
!
 data 6
 414 327
 127 626 
 608
 6240
 model 8 6
   1  0  0  0  0  0
   1  0  0  1  0  0
   1  0  1  0  0  0
   1  0  1  1  0  1
   1  1  0  0  0  0
   1  1  0  1  0  0
   1  1  1  0  1  0
   1  1  1  1  1  1
!
!  i  S  C  A SC CA
!
! S=sampled; C=childhood asthma; A=adult asthma
!
la
i S C A SC CA
conv 0.001
cl 8
1 2 1 2 1 2 1 2
pr 1 2
bs 500 
ou

Example 5

Here, we estimate the gene frequencies for the AB0 system by ML scoring methods. The resulting parameter estimates and confidence limits have to be rescaled by N^(-.5) to give the actual proportions. This approach is easily generalised to larger AB0-like systems such as the HLA system, where some types may not yet be identified ("blanks").

!
! Estimation AB0 frequencies Elandt-Johnson, 1971, p 401, Ex 14.1
! A  B  AB  0
!
data 4
 725 258 72 1073
ce 9
 1 3 1 3 2 2 1 2 4
model 9 3
 2 0 0 1 1 0 1 0 1 1 1 0 0 2 0 0 1 1 1 0 1 0 1 1 0 0 2
la
 A B 0

Example 6

Test Hardy-Weinberg equilibrium in two samples typed at the ApoE locus, and whether gene frequencies are the same.

!
! Test for HWE ApoE Cauley et al 1993 across two age cohorts
!
! 2-2, 3-2, 4-2, 3-3, 4-3, 4-4
!
data 12
 2 47  5  315 98  6 
 5 126 11 581 135 12

ce 18
 1 2 3
 2 4 5
 3 5 6
 7 8  9
 8 10 11
 9 11 12
!
! e2 e3 e4 age
!
model 18 7
 2 0 0 0 0 0 0 
 1 1 0 0 0 0 0 
 1 0 1 0 0 0 0 
 1 1 0 0 0 0 0 
 0 2 0 0 0 0 0 
 0 1 1 0 0 0 0 
 1 0 1 0 0 0 0 
 0 1 1 0 0 0 0 
 0 0 2 0 0 0 0 
 2 0 0 1 2 0 0 
 1 1 0 1 1 1 0 
 1 0 1 1 1 0 1 
 1 1 0 1 1 1 0 
 0 2 0 1 0 2 0 
 0 1 1 1 0 1 1 
 1 0 1 1 1 0 1 
 0 1 1 1 0 1 1 
 0 0 2 1 0 0 2 
!
!1 2 3 4 5 6 7 
!
la
e2 e3 e4 age e2*age e3*age e4*age 
!
! se 4      Comparing LR for full model versus no interaction
! 1 2 3 4   model tests for gene frequencies conditional on 
!           HWE

Example 7

Test for linkage disequilibrium discussed by Aston and Wilson (1986). This is their "easy" two-locus example, also evaluated by Ott (1985).

! gametic (pair) frequency   gamma        two alleles A(ij), B(kl)
! allelic    "               alpha        two gametes G1(ik), G2(jl)
! deviation from HWE         phi
! intragametic allelic assoc epsilon
! intergametic allelic assoc delta
!
! ln g(ijkl) = mu + a(i) + a(j)  + a(k) + a(l) + p(ij) + p(kl)
!                 + e(ik) + e(jl) + d(il) + d(jk)
!
! a(i) and a(j) are represented by a combined parameter in the model below,
! as is a(k) & a(l) and e(ik) and e(jl).  
! epsilon and delta are confounded and cannot be simultaneously estimated.
! Locus B 3 alleles versus Locus H three alleles.

 data 36
  2   2   1   7   3   3   
  6   11  10  18  30  15  
  6   9   12  22  45  45
  14  19  11  31  23  19
  31  66  37  110 93  72
  37  57  15  53  43  22
 cells   81
  1  2  4  2  3  5  4  5  6
  7  8 10  8  9 11 10 11 12
 19 20 22 20 21 23 22 23 24
  7  8 10  8  9 11 10 11 12
 13 14 16 14 15 17 16 17 18
 25 26 28 26 27 29 28 29 30
 19 20 22 20 21 23 22 23 24
 25 26 28 26 27 29 28 29 30
 31 32 34 32 33 35 34 35 36
 model  81 21
   1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   1 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
   1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   1 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
   1 0 0 2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0   1 0 0 1 1 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0
   1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   1 0 0 1 1 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0
   1 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0   1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
   1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0   1 1 0 0 1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0
   1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0   1 1 0 2 0 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0
   1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0   1 1 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0
   1 1 0 1 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0   1 1 0 0 2 0 0 0 0 0 0 0 1 0
0 1 0 0 0 1 0
   1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   1 0 1 1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0
   1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0   1 0 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0
   1 0 1 2 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0   1 0 1 1 1 0 0 0 0 0 0 1 0 0
0 0 1 0 1 0 0
   1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1   1 0 1 1 1 0 0 0 0 0 1 0 0 0
1 0 0 0 0 0 1
   1 0 1 0 2 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1   1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
   1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0   1 1 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0
   1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0   1 1 0 2 0 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0
   1 1 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0   1 1 0 0 1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0
   1 1 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0   1 1 0 0 2 0 0 0 0 0 0 0 1 0
0 1 0 0 0 1 0
   1 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   1 2 0 1 0 1 0 0 0 0 0 0 0 1
0 0 0 1 0 0 0
   1 2 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0   1 2 0 1 0 1 0 0 0 0 0 0 0 1
0 0 0 1 0 0 0
   1 2 0 2 0 1 0 0 0 1 0 0 0 2 0 0 0 2 0 0 0   1 2 0 1 1 1 0 0 0 0 0 1 0 1
0 1 0 1 0 1 0
   1 2 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0   1 2 0 1 1 1 0 0 0 0 1 0 0 1
0 1 0 1 0 1 0
   1 2 0 0 2 1 0 0 0 0 0 0 1 0 0 2 0 0 0 2 0   1 1 1 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0
   1 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0   1 1 1 0 1 0 0 1 0 0 0 0 0 0
0 0 1 0 0 1 0
   1 1 1 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0   1 1 1 2 0 0 0 1 0 1 0 0 0 1
1 0 0 1 1 0 0
   1 1 1 1 1 0 0 1 0 0 0 1 0 1 0 0 1 0 1 1 0   1 1 1 0 1 0 0 1 0 0 0 0 0 0
0 1 0 0 0 0 1
   1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1   1 1 1 0 2 0 0 1 0 0 0 0 1 0
0 1 1 0 0 1 1
   1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   1 0 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0
   1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1   1 0 1 1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0
   1 0 1 2 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0   1 0 1 1 1 0 0 0 0 0 0 1 0 0
1 0 0 0 0 0 1
   1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0   1 0 1 1 1 0 0 0 0 0 1 0 0 0
0 0 1 0 1 0 0
   1 0 1 0 2 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1   1 1 1 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0
   1 1 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0   1 1 1 0 1 0 1 0 0 0 0 0 0 0
0 1 0 0 0 0 1
   1 1 1 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0   1 1 1 2 0 0 1 0 0 1 0 0 0 1
1 0 0 1 1 0 0
   1 1 1 1 1 0 1 0 0 0 0 1 0 0 1 1 0 1 0 0 1   1 1 1 0 1 0 1 0 0 0 0 0 0 0
0 0 1 0 0 1 0
   1 1 1 1 1 0 1 0 0 0 1 0 0 1 0 0 1 0 1 1 0   1 1 1 0 2 0 1 0 0 0 0 0 1 0
0 1 1 0 0 1 1
   1 0 2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0   1 0 2 1 0 0 0 0 1 0 0 0 0 0
1 0 0 0 1 0 0
   1 0 2 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1   1 0 2 1 0 0 0 0 1 0 0 0 0 0
1 0 0 0 1 0 0
   1 0 2 2 0 0 0 0 1 1 0 0 0 0 2 0 0 0 2 0 0   1 0 2 1 1 0 0 0 1 0 0 1 0 0
1 0 1 0 1 0 1
   1 0 2 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1   1 0 2 1 1 0 0 0 1 0 1 0 0 0
1 0 1 0 1 0 1
   1 0 2 0 2 0 0 0 1 0 0 0 1 0 0 0 2 0 0 0 2
!------------------------------------------
! 1 2 3 4 5 6 7 8 9101112131415161718192021
! i a a a a p p p p p p p p e e e e d d d d
!
! Allelic association and deviation from HWE
! Since epsilon and delta terms are confounded - one set (delta's) is zeroed
! ie assume no intergametic association
 se 17
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
!
! No allelic association - deviation from HWE
! se 13
! 1 2 3 4 5 6 7 8 9 10 11 12 13
!
! HWE; no allelic association
! se 5 
! 1 2 3 4 5 
ou

Example 8

(8) This job fits a model of errors in rating X-rays of dental caries to data for two observers. It assumes that one group of X-rays is easy to read and gives rise to no disagreement, while the remainder are difficult and give rise to a number of disagreements.

! Fit teeth from Espeland et al 1986
 cells 12
 1 5 9 1 2 3 4 5 6 7 8 9
!  3x3 table of rating of caries 3 point scale 2 observers
 data 9
 1450 55 74
   99 35 33
   22 11 64
 model 12 8
 1 0 0 0 0 0 0 0
 0 1 0 0 0 0 0 0
 0 0 1 0 0 0 0 0
 0 0 0 1 1 0 1 0
 0 0 0 1 1 0 0 1
 0 0 0 1 1 0 0 0
 0 0 0 1 0 1 1 0
 0 0 0 1 0 1 0 1
 0 0 0 1 0 1 0 0
 0 0 0 1 0 0 1 0
 0 0 0 1 0 0 0 1
 0 0 0 1 0 0 0 0

Example 9

Hochberg (1977) presents a double sampling experiment where a smaller subsample of subjects were measured using a gold standard, while all Ss were measured using "cheap" unreliable measures.

! Fit Hochberg 1977 double sampling data
 cells 32
 1 1 2 2 1 1  2  2  3  3  4  4  3  3  4  4 
 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
!  2x2 table of imprecise measures and 2x2x2x2 reliability data
 data 20
 1196 13562 
 7151 58175
 17  3  10  258
 3   4  4   25
 16  3  25  197
 100 13 107 1014

!
! model is AA*BB* + L (dummy study variable)
! so vars are intercept, A, A*, B, B*, L, A.A*, A.B, A.B*, A*.B, A*.B*
!             B.B*, A.A*.B, A.A*.B*, A.B.B*, A*.B.B*, A.A*.B.B*

 model 32 17
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
 1  0  1  1  1  1  0  0  0  1  1  1  0  0  0  1  0 
 1  1  0  1  1  1  0  1  1  0  0  1  0  0  1  0  0 
 1  0  0  1  1  1  0  0  0  0  0  1  0  0  0  0  0 
 1  1  1  0  1  1  1  0  1  0  1  0  0  1  0  0  0 
 1  0  1  0  1  1  0  0  0  0  1  0  0  0  0  0  0 
 1  1  0  0  1  1  0  0  1  0  0  0  0  0  0  0  0 
 1  0  0  0  1  1  0  0  0  0  0  0  0  0  0  0  0 
 1  1  1  1  0  1  1  1  0  1  0  0  1  0  0  0  0 
 1  0  1  1  0  1  0  0  0  1  0  0  0  0  0  0  0 
 1  1  0  1  0  1  0  1  0  0  0  0  0  0  0  0  0 
 1  0  0  1  0  1  0  0  0  0  0  0  0  0  0  0  0 
 1  1  1  0  0  1  1  0  0  0  0  0  0  0  0  0  0 
 1  0  1  0  0  1  0  0  0  0  0  0  0  0  0  0  0 
 1  1  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0 
 1  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0 
 1  1  1  1  1  0  1  1  1  1  1  1  1  1  1  1  1 
 1  0  1  1  1  0  0  0  0  1  1  1  0  0  0  1  0 
 1  1  0  1  1  0  0  1  1  0  0  1  0  0  1  0  0 
 1  0  0  1  1  0  0  0  0  0  0  1  0  0  0  0  0 
 1  1  1  0  1  0  1  0  1  0  1  0  0  1  0  0  0 
 1  0  1  0  1  0  0  0  0  0  1  0  0  0  0  0  0 
 1  1  0  0  1  0  0  0  1  0  0  0  0  0  0  0  0 
 1  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0 
 1  1  1  1  0  0  1  1  0  1  0  0  1  0  0  0  0 
 1  0  1  1  0  0  0  0  0  1  0  0  0  0  0  0  0 
 1  1  0  1  0  0  0  1  0  0  0  0  0  0  0  0  0 
 1  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0 
 1  1  1  0  0  0  1  0  0  0  0  0  0  0  0  0  0 
 1  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
 1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
 1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
!
! recover estimated A.B collapsed table for entire sample
 collapse 32
 1 2 1 2 3 4 3 4 1 2 1 2 3 4 3 4 1 2 1 2 3 4 3 4 1 2 1 2 3 4 3 4
 odds_ratio 1 2 3 4
!
! get bootstrapped standard errors of mean values collapsed table
 bootstrap 150

The edited output from example (4) is:

   +---------------------------------+ 
   |             LOGLIN              | 
   |   General Log-linear Modelling  | 
   |   Using AS 207 (Haber, 1984)    | 
   +---------------------------------+ 
      Written by David L Duffy 1992    
             QIMR Australia            
            HP Fortran version         

  Program LOGLIN run at 14:31:52 on  8-Apr-92
  The following input lines were read:
.
. [as above]
.
  Output: 

  No. cells complete table=  32
  No. cells observed table=  20
  No. parameters estimated=  17
  Convergence criterion   =  .100E-02

  Fitting via Fisher score algorithm

  Mean observed cell size = 4094.00

  Rank of design matrix   =  17

   Gibbs Chi-square =    6.49 P= .09
 Pearson Chi-square =    6.18 P= .10
                 df =    3.

  Observed Table ------------------------- 
       Observed    Fitted      F-T Deviate 
 [ 1]   1196.00  1196.13          .00
 .
 .
 [20]   1014.00   987.30          .85




  Full Table ---- 
        Fitted    
[ 1]   753.12
.
[32]   987.30

  Full Table ------------------------------- 
     Parameter      S.E.  exp(Par)   95% Confidence Limits   Term 
[ 1]    6.895       .028    987.297    933.776   1043.885    
[ 2]   -2.249       .103       .106       .086       .129    
[ 3]   -4.138       .240       .016       .010       .026    
.
[16]   -2.794       .987       .061       .009       .423    
[17]    3.991      1.349     54.099      3.846    761.043    

  Collapsed table ------------ 

[ 1]   3227.39
[ 2]  21071.03
[ 3]  10581.91
[ 4]  47002.67
 --------------
  OR       .68
 --------------

  Bootstrap mean    S.E.   95% CL-----------

[ 1]   3198.97    376.18   2461.66   3936.28 
[ 2]  21125.60    654.63  19842.53  22408.66 
[ 3]  10601.53    596.20   9432.98  11770.08 
[ 4]  46957.00    826.98  45336.11  48577.89 
 -------------------------------------------
logOR     -.40       .16      -.72      -.09 
   OR      .67                 .49       .92 
 -------------------------------------------

  No. of bootstrap samples=  150

  Job completed in    153.0 seconds.
                        2.5 minutes.

Espeland and Hui (1987) give their results for the same model. The overall model goodness-of-fit was G23=6.49. The standard errors are calculated using the delta method.

----------------------------------------------------------------------------
Precise Injury  Precise belt use       Fitted Estimate        Standard Error
----------------------------------------------------------------------------
Yes             Yes                     3227.4                344.9
Yes             No                     21071.0                660.0
No              Yes                    10581.9                527.2
No              No                     47002.7                787.6
----------------------------------------------------------------------------
Odds ratio from collapsed table           0.68                  0.16
----------------------------------------------------------------------------

Example 10

(8) This job performs a latent class analysis of an example from the LEM manual (Vermunt 1997). Four manifest binary variables are taken as indicators of a single underlying binary latent variable. It is essential that the EM fitting algorithm is used, because the scoring algorithm fails in this example.

!
! Example data from Lem manual
!
data 16
59 56 14 36 7 15 4 23
75 162 22 115 8 68 22 123
ce 32
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
!
! X A B C D XA XB XC XD
!
 design  32 16
  1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
  1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
  1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1
  1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
  1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0
  1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0
  1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1
  1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
  1 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0
  1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0
  1 0 1 0 1 1 0 0 0 0 0 1 1 0 0 1
  1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0
  1 0 1 1 0 1 0 0 0 0 1 0 1 0 1 0
  1 0 1 1 1 0 0 0 0 0 1 1 0 1 0 0
  1 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1
  1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0
  1 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0
  1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 1
  1 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0
  1 1 0 1 0 1 0 1 0 1 0 0 0 0 1 0
  1 1 0 1 1 0 0 1 1 0 0 0 0 1 0 0
  1 1 0 1 1 1 0 1 1 1 0 0 0 1 1 1
  1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0
  1 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0
  1 1 1 0 1 0 1 0 1 0 0 1 0 0 0 0
  1 1 1 0 1 1 1 0 1 1 0 1 1 0 0 1
  1 1 1 1 0 0 1 1 0 0 1 0 0 0 0 0
  1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0
  1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 0
  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
!--------------------------------
! 1 2 3 4 5 6 7 8 910111213141516
!--------------------------------
!i X2A2B2C2D2X2X2X2X2A2A2A2B2B2C2
!            A2B2C2D2B2C2D2C2D2D2
!                                
!                                
la
i X A B C D XA XB XC XD AB AC AD BC BD CD
se 10
1 2 3 4 5 6 7 8 9 10
fi em
conv 1e-6

Loglin: A program for loglinear analysis of complete and incomplete count data

Written by David L. Duffy (1994)

CONTENTS

COMPULSORY

OPTIONAL

Complete Data

Incomplete Data

Latent Class Analysis