# Loglin: A program for loglinear analysis of complete and incomplete count data

## CONTENTS

• Introduction
• Methods
• Usage
• References
• Examples from literature

## INTRODUCTION

Program LOGLIN performs generalised log-linear modelling of categorical data. It can fit any of the log-linear models available to standard packages such as GLIM, SAS, BDMP or SPSS, including models with structural zeros (as in PROC CATMOD). In addition, it can fit models for missing data and/or unobserved data. Although it can fit the more general latent variable models described by Haberman (1980), Goodman (1981) or Hagenaars (1990a, 1990b), these can be cumbersome and slow to converge (David Rindskopf was very helpful in pointing out how to fit these in the present log-linear framework).

LOGLIN can be used for:

1. Models where imprecise measures have been calibrated using a "perfect" gold standard, and the true association between imperfectly measured variables is to be estimated.
2. Where data is missing for a subsample of the population i.e. the same as (1).
3. Latent variable models where latent variables are "errorless" functions of observed variables - eg ML gene frequency estimation from counts of observed phenotypes.
4. Specialised measurement models eg where observed counts are mixtures due to perfect measures and error prone measures.
5. Standard models which are difficult to fit in some packages, such as symmetry and quasi-symmetry models.

## METHODS

The general framework underlying these models is summarised by Espeland (1986), and Espeland & Hui (1987), and is originally due to Thompson & Baker (1981). An observed contingency table y, which will be treated as a vector, is modelled as arising from an underlying complete table z, where observed count y(j) is the sum of a number of elements of z, such that each z(i) contributes to no more than one y(j). Therefore one can write y=F'z, where F is made up of orthogonal columns of ones and zeros.

We then specify a loglinear model for z, so that log(E(z))=X'b, where X is a design matrix, and b a vector of loglinear parameters. The loglinear model for z and thus y, can be fitted using two methods, both of which are available in LOGLIN. The first was presented as AS207 by Michael Haber (1984) and combines an iterative proportional fitting algorithm for b and z, with an EM fitting for y, z and b. The second is a Fisher scoring approach, presented in Espeland (1986).

Each iteration of the Fisher scoring algorithm is

b(t+1) = b(t) + I-1 (PX')' (m - F(F'F)-1 y) ,

where,

b(t) is the estimate of b for the tth iteration,

m = exp(X'b) ,

P = F (F' diag(m) F)-1 F' diag(m) ,

and

I = (PX')' diag(m) (PX').

The default option provided by the program is to use the EM algorithm to provide starting values for the scoring algorithm, thus gaining a modest improvement in speed. However, each method can be called in isolation. The EM algorithm needs to call the scoring algorithm to get the information matrix for the loglinear parameters in any case. In the case of missing data, one is usually interested in collapsing the complete table to give expected counts for subtables, and often summary measures for these subtables. Standard errors of collapsed counts, and measures can be calculated using the covariance matrix for the loglinear parameters of the complete table using the delta method.

As an alternative, LOGLIN allows (nonparametric) bootstrap estimates of standard errors to be obtained. These are currently only for Poisson models, and will differ if sampling is constrained - eg product-multinomial - for incomplete tables. Espeland (1985) discusses approaches for this and other situations. Bootstrap percentiles for the model LR chi-square are also produced.

## USAGE

The program reads commands from standard input, and writes to standard output. The commands are made up of the following key words and data (note that the parser usually reads only the first two to four characters of a keyword, and will usually read a long form key word as well eg bootstrap|boot|bs):

#### COMPULSORY

1. da <nj> where nj is the number of cells in the observed table. Followed by (on the next line): y(1..nj) the nj cell counts read in free format.
2. mo <nid> <nk> where nid is the number of counts the model is to be fitted to, and nk is the number of loglinear parameters to be fitted. Followed by: the design matrix C(1..nid,1..nk) read in free format.

#### OPTIONAL

3. ce <ni> where ni is the number of cells in the underlying complete table that gives rise to the observed counts. Followed by: ji(1..ni) the ni elements of the scatter matrix that maps y onto x, the complete table. Each y(j) is a sum of one or more x(i)'s. ji is read free format. ji can be replaced mathematically by S(1..ni,1..nj), made up of 1's and 0's such that y=S'x.
4. se <nkk> where nkk is a number of loglinear parameters selected from the design matrix C. This allows easy selection of hierarchical models. Followed by: csel(1..nkk) the number of each column of the original design matrix selected for fitting, read in free format.
5. cl <ncoll> where the first ncoll cells of x are to be collapsed over (Maximum therefore ni). This is useful in missing data models to give mean counts for variables unobserved in a given subtable. Followed by: coll(1..ncoll) a scatter vector containing each number of the cell of the collapsed table that the particular x(i) value contributes to. If coll(j)=0 then the jth cell does not contribute to the resulting collapsed table.
6. fi em|sc|hy [<it>] determines which algorithm the program will use to fit the model: either EM/Iterative Proportional Fitting, Fisher scoring algorithm or both - the latter where the EM algorithm runs for it iterations (default it=3) to provide starting values for the scoring algorithm. The default is hy[brid].
7. bs <bs> [em] controls whether bootstrap standard errors for collapsed tables and summary measures for these tables will be calculated. bs is the number of bootstrap samples to be generated. The default fitting algorithm for each bootstrap sample is the scoring algorithm, but the keyword em forces the use of the EM algorithm. This is considerably slower in some circumstances, but will converge when the scoring algorithm does not.
8. pr <t> <b> calculates the proportion x(t)/x(t)+x(b) from the collapsed table along with bootstrapped a standard error if the bs option is active.
9. cw <t> <b> calculates the proportion 2*x(t)/2*x(t)+x(b) from the collapsed table along with bootstrapped a standard error if the bs option is active.
10. or <c1> <c2> <c3> <c4> calculates the odds ratio x(c1)*x(c4)/x(c2)/x(c3) from the collapsed table along with a bootstrapped standard error if the bs option is active.
11. la attaches labels to the nk loglinear parameters. Followed by: term(1..nk) the nk labels maximum length 10 characters. Terminating a line with ":" allows the list of labels to extend over to the next line.
12. ou [print=1|2] [co] [de] controls the amount of output. print controls whether estimates are printed each iteration, where print=1 gives EM and score estimates for x each iteration and print=2 prints the IPF estimates as well. co prints out the covariance matrix for the loglinear parameters. de prints out the normalised design matrix used by the EM algorithm.
13. st leads to starting values for the loglinear parameters being read. Followed by: pars(1..nk) the starting values read free format.
14. conv <conv> convergence criterion. Note that this is divided by 100 to act as criterion for change in loglinear parameters in the scoring algorithm, and used unchanged as criterion for changes in counts for the EM algorithm.
15. au <aug> adds a constant aug to each count. Appropriate for models with sampling zeros and/or small counts. In the 2x2 case at least, reduces bias in odds ratio estimate.
16. ! | rem | c denotes a comment. The line is copied to output.

## REFERENCES

• Aston CE, Wilson SR (1986): Log-linear model analysis of allelic associations. Genet Epidemiol 3: 187-194.
• Cauley JA, Eichner JE, Kamboh MI, Ferrell RE, Kuller LH (1993): Apo E allele frequencies in younger (age 42-50) vs older (age 65-90) women. Genet Epidemiol 10: 27-34.
• Elandt-Johnson R (1971): Probability models and statistical methods in genetics. New York: Wiley.
• Espeland MA, Odoroff CL (1985): Log-linear models for doubly sampled categorical data fitted by the EM algorithm. J Am Statist Ass 80:663-670.
• Espeland MA (1986): A general class of models for discrete multivariate data. Commun. Statist.-Simula 15:405-424.
• Espeland MA, Hui SL (1987): A general approach to analyzing epidemiologic data that contains misclassification errors. Biometrics 43:1001-1012.
• Haber M (1984): AS207: Fitting a general log-linear model. Appl Statist 33:358-362.
• Hagenaars JA (1990a): Categorical longitudinal data : log-linear panel, trend, and cohort analysis. Newbury Park, Calif. : Sage Publications.
• Hagenaars J, Luijkx R (1990b): LCAG: a program to estimate latent class models and other loglinear models with latent variables with and without missing data. Version 2.1. Groningen: Tilburg University Department of Sociology.
• Hochberg Y (1977): On the use of double sampling schemes in analyzing categorical data with misclassification errors. J Am Statist Ass 72:914-921.
• Jenkins MA, Hopper JL, Bowes G, Carlin JB, Flander LB, Giles GG (1994): Factors in childhood as predictors of asthma in adult life. Brit Med J 309: 90-93.
• Ott J (1985). A chi-square test to distinguish allelic association from other causes of phenotypic association between two loci. Genet Epidemiol 1985; 2: 79-84.
• Thompson R, Baker RJ (1981): Composite link functions in generalized linear models. Appl Stat 30: 125-131.
• Vermunt JK. LEM: a general program for the analysis of categorical data. Tilburg: Tilburg University 1997.

## EXAMPLES

The following jobs fit a variety of loglinear models.

## Example 1

This example fits to a 2x2 table, and bootstraps the standard error of the odds ratio.
```! simplest table
data 4
31 109 17 122
! intercept row and col, odds ratio
mo 4 4
1 1 1 1
1 0 1 0
1 1 0 0
1 0 0 0
! labels for loglinear terms
la
intercept row col oddsr
! fit saturated model, and reverses the order of parameter printing
se 4
4 3 2 1
or 1 2 3 4
bs 200
```

## Example 2

This example is slightly more complex and looks for effects of zygosity on concordance in twins. The prevalence of the condition is constrained to be equal for the first and second twins, and the second and third order term weights adjusted to produce the (smoothed) OR(DZ) and OR(MZ)/OR(DZ).

```! DZ 2x2 table then MZ 2x2 table
data 8
12 12 10 1335
5 12 24 1506
mo 8 6
1  2  1  2   2  2
1  1  1  0.5 1  0.5
1  1  1  0.5 1  0.5
1  0  1  0   0  0
1  2  0  2   0  0
1  1  0  0.5 0  0
1  1  0  0.5 0  0
1  0  0  0   0  0
!--------------------
! 1  2  3  4   5  6
! i  a  z  a1  a  a1
!          a2  z  a2
!                 z
la
i a z aa az aaz
```

## Example 3

This job estimates the true prevalence of asthma from an imperfect proxy measure - cross-reporting by cotwin. Sensitivity and specificity are obtained from cross-reporting versus self report in pairs where both twins returned a questionnaire. The chi-square compares prevalence of proxy asthma in the two groups.

```!
! Adjust cross-reported asthma in singles using data from complete pairs
!
cells 8
1 1 2 2
3 4 5 6
!
! One 2x1 tables and one 2x2 table giving sens and spec
!
data 6
116
540
451 91
168 2075
model 8 5
1   0   0   0  1
1   0   0   1  0
1   0   1   0  0
1   0   1   1  0
1   1   0   0  1
1   1   0   1  0
1   1   1   0  0
1   1   1   1  0
!  i   L   T   A  AT
la
i L T A AT
conv 0.001
cl 4
1 2 1 2
pr 1 2
bs 200
ou
```

## Example 4

This very similar job estimates the population cumulative incidence of asthma and a standard error from a stratified random sample. Stratum 1 is a sample of probands with a history of childhood asthma (C+), and stratum 2 those without such a history (C-). Because the sampling fraction is dependent on C, the model chi-square is zero. The bootstrap standard error for the weighted risk agrees with the analytic asymptotic standard error to three decimal places (cumulative incidence=0.231; SE=0.012).

```!
! Look at Mark Jenkins' asthma data - Brit Med J 1994;309:90-3.
! compare delta estimator of SE for stratified sample to that in LOGLIN
!
cells 8
1 2 3 4
5 5 6 6
!
!  2x2 table for the sampled probands (A+,A- in C+, then C-).
!  One 2x1 table for unsampled subjects, giving therefore the sampling
fraction.
!
data 6
414 327
127 626
608
6240
model 8 6
1  0  0  0  0  0
1  0  0  1  0  0
1  0  1  0  0  0
1  0  1  1  0  1
1  1  0  0  0  0
1  1  0  1  0  0
1  1  1  0  1  0
1  1  1  1  1  1
!
!  i  S  C  A SC CA
!
! S=sampled; C=childhood asthma; A=adult asthma
!
la
i S C A SC CA
conv 0.001
cl 8
1 2 1 2 1 2 1 2
pr 1 2
bs 500
ou
```

## Example 5

Here, we estimate the gene frequencies for the AB0 system by ML scoring methods. The resulting parameter estimates and confidence limits have to be rescaled by N^(-.5) to give the actual proportions. This approach is easily generalised to larger AB0-like systems such as the HLA system, where some types may not yet be identified ("blanks").

```!
! Estimation AB0 frequencies Elandt-Johnson, 1971, p 401, Ex 14.1
! A  B  AB  0
!
data 4
725 258 72 1073
ce 9
1 3 1 3 2 2 1 2 4
model 9 3
2 0 0 1 1 0 1 0 1 1 1 0 0 2 0 0 1 1 1 0 1 0 1 1 0 0 2
la
A B 0
```

## Example 6

Test Hardy-Weinberg equilibrium in two samples typed at the ApoE locus, and whether gene frequencies are the same.

```!
! Test for HWE ApoE Cauley et al 1993 across two age cohorts
!
! 2-2, 3-2, 4-2, 3-3, 4-3, 4-4
!
data 12
2 47  5  315 98  6
5 126 11 581 135 12

ce 18
1 2 3
2 4 5
3 5 6
7 8  9
8 10 11
9 11 12
!
! e2 e3 e4 age
!
model 18 7
2 0 0 0 0 0 0
1 1 0 0 0 0 0
1 0 1 0 0 0 0
1 1 0 0 0 0 0
0 2 0 0 0 0 0
0 1 1 0 0 0 0
1 0 1 0 0 0 0
0 1 1 0 0 0 0
0 0 2 0 0 0 0
2 0 0 1 2 0 0
1 1 0 1 1 1 0
1 0 1 1 1 0 1
1 1 0 1 1 1 0
0 2 0 1 0 2 0
0 1 1 1 0 1 1
1 0 1 1 1 0 1
0 1 1 1 0 1 1
0 0 2 1 0 0 2
!
!1 2 3 4 5 6 7
!
la
e2 e3 e4 age e2*age e3*age e4*age
!
! se 4      Comparing LR for full model versus no interaction
! 1 2 3 4   model tests for gene frequencies conditional on
!           HWE
```

## Example 7

Test for linkage disequilibrium discussed by Aston and Wilson (1986). This is their "easy" two-locus example, also evaluated by Ott (1985).

```! gametic (pair) frequency   gamma        two alleles A(ij), B(kl)
! allelic    "               alpha        two gametes G1(ik), G2(jl)
! deviation from HWE         phi
! intragametic allelic assoc epsilon
! intergametic allelic assoc delta
!
! ln g(ijkl) = mu + a(i) + a(j)  + a(k) + a(l) + p(ij) + p(kl)
!                 + e(ik) + e(jl) + d(il) + d(jk)
!
! a(i) and a(j) are represented by a combined parameter in the model below,
! as is a(k) & a(l) and e(ik) and e(jl).
! epsilon and delta are confounded and cannot be simultaneously estimated.
! Locus B 3 alleles versus Locus H three alleles.

data 36
2   2   1   7   3   3
6   11  10  18  30  15
6   9   12  22  45  45
14  19  11  31  23  19
31  66  37  110 93  72
37  57  15  53  43  22
cells   81
1  2  4  2  3  5  4  5  6
7  8 10  8  9 11 10 11 12
19 20 22 20 21 23 22 23 24
7  8 10  8  9 11 10 11 12
13 14 16 14 15 17 16 17 18
25 26 28 26 27 29 28 29 30
19 20 22 20 21 23 22 23 24
25 26 28 26 27 29 28 29 30
31 32 34 32 33 35 34 35 36
model  81 21
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   1 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   1 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
1 0 0 2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0   1 0 0 1 1 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   1 0 0 1 1 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0
1 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0   1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0   1 1 0 0 1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0
1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0   1 1 0 2 0 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0
1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0   1 1 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0
1 1 0 1 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0   1 1 0 0 2 0 0 0 0 0 0 0 1 0
0 1 0 0 0 1 0
1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   1 0 1 1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0
1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0   1 0 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0
1 0 1 2 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0   1 0 1 1 1 0 0 0 0 0 0 1 0 0
0 0 1 0 1 0 0
1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1   1 0 1 1 1 0 0 0 0 0 1 0 0 0
1 0 0 0 0 0 1
1 0 1 0 2 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1   1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0   1 1 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0
1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0   1 1 0 2 0 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0
1 1 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0   1 1 0 0 1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0
1 1 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0   1 1 0 0 2 0 0 0 0 0 0 0 1 0
0 1 0 0 0 1 0
1 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   1 2 0 1 0 1 0 0 0 0 0 0 0 1
0 0 0 1 0 0 0
1 2 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0   1 2 0 1 0 1 0 0 0 0 0 0 0 1
0 0 0 1 0 0 0
1 2 0 2 0 1 0 0 0 1 0 0 0 2 0 0 0 2 0 0 0   1 2 0 1 1 1 0 0 0 0 0 1 0 1
0 1 0 1 0 1 0
1 2 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0   1 2 0 1 1 1 0 0 0 0 1 0 0 1
0 1 0 1 0 1 0
1 2 0 0 2 1 0 0 0 0 0 0 1 0 0 2 0 0 0 2 0   1 1 1 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0
1 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0   1 1 1 0 1 0 0 1 0 0 0 0 0 0
0 0 1 0 0 1 0
1 1 1 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0   1 1 1 2 0 0 0 1 0 1 0 0 0 1
1 0 0 1 1 0 0
1 1 1 1 1 0 0 1 0 0 0 1 0 1 0 0 1 0 1 1 0   1 1 1 0 1 0 0 1 0 0 0 0 0 0
0 1 0 0 0 0 1
1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1   1 1 1 0 2 0 0 1 0 0 0 0 1 0
0 1 1 0 0 1 1
1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   1 0 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0
1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1   1 0 1 1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0
1 0 1 2 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0   1 0 1 1 1 0 0 0 0 0 0 1 0 0
1 0 0 0 0 0 1
1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0   1 0 1 1 1 0 0 0 0 0 1 0 0 0
0 0 1 0 1 0 0
1 0 1 0 2 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1   1 1 1 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0
1 1 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0   1 1 1 0 1 0 1 0 0 0 0 0 0 0
0 1 0 0 0 0 1
1 1 1 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0   1 1 1 2 0 0 1 0 0 1 0 0 0 1
1 0 0 1 1 0 0
1 1 1 1 1 0 1 0 0 0 0 1 0 0 1 1 0 1 0 0 1   1 1 1 0 1 0 1 0 0 0 0 0 0 0
0 0 1 0 0 1 0
1 1 1 1 1 0 1 0 0 0 1 0 0 1 0 0 1 0 1 1 0   1 1 1 0 2 0 1 0 0 0 0 0 1 0
0 1 1 0 0 1 1
1 0 2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0   1 0 2 1 0 0 0 0 1 0 0 0 0 0
1 0 0 0 1 0 0
1 0 2 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1   1 0 2 1 0 0 0 0 1 0 0 0 0 0
1 0 0 0 1 0 0
1 0 2 2 0 0 0 0 1 1 0 0 0 0 2 0 0 0 2 0 0   1 0 2 1 1 0 0 0 1 0 0 1 0 0
1 0 1 0 1 0 1
1 0 2 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1   1 0 2 1 1 0 0 0 1 0 1 0 0 0
1 0 1 0 1 0 1
1 0 2 0 2 0 0 0 1 0 0 0 1 0 0 0 2 0 0 0 2
!------------------------------------------
! 1 2 3 4 5 6 7 8 9101112131415161718192021
! i a a a a p p p p p p p p e e e e d d d d
!
! Allelic association and deviation from HWE
! Since epsilon and delta terms are confounded - one set (delta's) is zeroed
! ie assume no intergametic association
se 17
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
!
! No allelic association - deviation from HWE
! se 13
! 1 2 3 4 5 6 7 8 9 10 11 12 13
!
! HWE; no allelic association
! se 5
! 1 2 3 4 5
ou
```

## Example 8

(8) This job fits a model of errors in rating X-rays of dental caries to data for two observers. It assumes that one group of X-rays is easy to read and gives rise to no disagreement, while the remainder are difficult and give rise to a number of disagreements.
```! Fit teeth from Espeland et al 1986
cells 12
1 5 9 1 2 3 4 5 6 7 8 9
!  3x3 table of rating of caries 3 point scale 2 observers
data 9
1450 55 74
99 35 33
22 11 64
model 12 8
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 1 0 1 0
0 0 0 1 1 0 0 1
0 0 0 1 1 0 0 0
0 0 0 1 0 1 1 0
0 0 0 1 0 1 0 1
0 0 0 1 0 1 0 0
0 0 0 1 0 0 1 0
0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 0
```

## Example 9

Hochberg (1977) presents a double sampling experiment where a smaller subsample of subjects were measured using a gold standard, while all Ss were measured using "cheap" unreliable measures.

```! Fit Hochberg 1977 double sampling data
cells 32
1 1 2 2 1 1  2  2  3  3  4  4  3  3  4  4
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
!  2x2 table of imprecise measures and 2x2x2x2 reliability data
data 20
1196 13562
7151 58175
17  3  10  258
3   4  4   25
16  3  25  197
100 13 107 1014

!
! model is AA*BB* + L (dummy study variable)
! so vars are intercept, A, A*, B, B*, L, A.A*, A.B, A.B*, A*.B, A*.B*
!             B.B*, A.A*.B, A.A*.B*, A.B.B*, A*.B.B*, A.A*.B.B*

model 32 17
1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
1  0  1  1  1  1  0  0  0  1  1  1  0  0  0  1  0
1  1  0  1  1  1  0  1  1  0  0  1  0  0  1  0  0
1  0  0  1  1  1  0  0  0  0  0  1  0  0  0  0  0
1  1  1  0  1  1  1  0  1  0  1  0  0  1  0  0  0
1  0  1  0  1  1  0  0  0  0  1  0  0  0  0  0  0
1  1  0  0  1  1  0  0  1  0  0  0  0  0  0  0  0
1  0  0  0  1  1  0  0  0  0  0  0  0  0  0  0  0
1  1  1  1  0  1  1  1  0  1  0  0  1  0  0  0  0
1  0  1  1  0  1  0  0  0  1  0  0  0  0  0  0  0
1  1  0  1  0  1  0  1  0  0  0  0  0  0  0  0  0
1  0  0  1  0  1  0  0  0  0  0  0  0  0  0  0  0
1  1  1  0  0  1  1  0  0  0  0  0  0  0  0  0  0
1  0  1  0  0  1  0  0  0  0  0  0  0  0  0  0  0
1  1  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0
1  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0
1  1  1  1  1  0  1  1  1  1  1  1  1  1  1  1  1
1  0  1  1  1  0  0  0  0  1  1  1  0  0  0  1  0
1  1  0  1  1  0  0  1  1  0  0  1  0  0  1  0  0
1  0  0  1  1  0  0  0  0  0  0  1  0  0  0  0  0
1  1  1  0  1  0  1  0  1  0  1  0  0  1  0  0  0
1  0  1  0  1  0  0  0  0  0  1  0  0  0  0  0  0
1  1  0  0  1  0  0  0  1  0  0  0  0  0  0  0  0
1  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0
1  1  1  1  0  0  1  1  0  1  0  0  1  0  0  0  0
1  0  1  1  0  0  0  0  0  1  0  0  0  0  0  0  0
1  1  0  1  0  0  0  1  0  0  0  0  0  0  0  0  0
1  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0
1  1  1  0  0  0  1  0  0  0  0  0  0  0  0  0  0
1  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0
1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
!
! recover estimated A.B collapsed table for entire sample
collapse 32
1 2 1 2 3 4 3 4 1 2 1 2 3 4 3 4 1 2 1 2 3 4 3 4 1 2 1 2 3 4 3 4
odds_ratio 1 2 3 4
!
! get bootstrapped standard errors of mean values collapsed table
bootstrap 150
```
The edited output from example (4) is:
```   +---------------------------------+
|             LOGLIN              |
|   General Log-linear Modelling  |
|   Using AS 207 (Haber, 1984)    |
+---------------------------------+
Written by David L Duffy 1992
QIMR Australia
HP Fortran version

Program LOGLIN run at 14:31:52 on  8-Apr-92
The following input lines were read:
.
. [as above]
.
Output:

No. cells complete table=  32
No. cells observed table=  20
No. parameters estimated=  17
Convergence criterion   =  .100E-02

Fitting via Fisher score algorithm

Mean observed cell size = 4094.00

Rank of design matrix   =  17

Gibbs Chi-square =    6.49 P= .09
Pearson Chi-square =    6.18 P= .10
df =    3.

Observed Table -------------------------
Observed    Fitted      F-T Deviate
[ 1]   1196.00  1196.13          .00
.
.
[20]   1014.00   987.30          .85

Full Table ----
Fitted
[ 1]   753.12
.
[32]   987.30

Full Table -------------------------------
Parameter      S.E.  exp(Par)   95% Confidence Limits   Term
[ 1]    6.895       .028    987.297    933.776   1043.885
[ 2]   -2.249       .103       .106       .086       .129
[ 3]   -4.138       .240       .016       .010       .026
.
[16]   -2.794       .987       .061       .009       .423
[17]    3.991      1.349     54.099      3.846    761.043

Collapsed table ------------

[ 1]   3227.39
[ 2]  21071.03
[ 3]  10581.91
[ 4]  47002.67
--------------
OR       .68
--------------

Bootstrap mean    S.E.   95% CL-----------

[ 1]   3198.97    376.18   2461.66   3936.28
[ 2]  21125.60    654.63  19842.53  22408.66
[ 3]  10601.53    596.20   9432.98  11770.08
[ 4]  46957.00    826.98  45336.11  48577.89
-------------------------------------------
logOR     -.40       .16      -.72      -.09
OR      .67                 .49       .92
-------------------------------------------

No. of bootstrap samples=  150

Job completed in    153.0 seconds.
2.5 minutes.

```
Espeland and Hui (1987) give their results for the same model. The overall model goodness-of-fit was G23=6.49. The standard errors are calculated using the delta method.
```----------------------------------------------------------------------------
Precise Injury  Precise belt use       Fitted Estimate        Standard Error
----------------------------------------------------------------------------
Yes             Yes                     3227.4                344.9
Yes             No                     21071.0                660.0
No              Yes                    10581.9                527.2
No              No                     47002.7                787.6
----------------------------------------------------------------------------
Odds ratio from collapsed table           0.68                  0.16
----------------------------------------------------------------------------
```

## Example 10

(8) This job performs a latent class analysis of an example from the LEM manual (Vermunt 1997). Four manifest binary variables are taken as indicators of a single underlying binary latent variable. It is essential that the EM fitting algorithm is used, because the scoring algorithm fails in this example.
```!
! Example data from Lem manual
!
data 16
59 56 14 36 7 15 4 23
75 162 22 115 8 68 22 123
ce 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
!
! X A B C D XA XB XC XD
!
design  32 16
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1
1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0
1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0
1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1
1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0
1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0
1 0 1 0 1 1 0 0 0 0 0 1 1 0 0 1
1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0
1 0 1 1 0 1 0 0 0 0 1 0 1 0 1 0
1 0 1 1 1 0 0 0 0 0 1 1 0 1 0 0
1 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0
1 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0
1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 1
1 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0
1 1 0 1 0 1 0 1 0 1 0 0 0 0 1 0
1 1 0 1 1 0 0 1 1 0 0 0 0 1 0 0
1 1 0 1 1 1 0 1 1 1 0 0 0 1 1 1
1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0
1 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0
1 1 1 0 1 0 1 0 1 0 0 1 0 0 0 0
1 1 1 0 1 1 1 0 1 1 0 1 1 0 0 1
1 1 1 1 0 0 1 1 0 0 1 0 0 0 0 0
1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0
1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
!--------------------------------
! 1 2 3 4 5 6 7 8 910111213141516
!--------------------------------
!i X2A2B2C2D2X2X2X2X2A2A2A2B2B2C2
!            A2B2C2D2B2C2D2C2D2D2
!
!
la
i X A B C D XA XB XC XD AB AC AD BC BD CD
se 10
1 2 3 4 5 6 7 8 9 10
fi em
conv 1e-6
```