SOME NOTES ON GENETIC EPIDEMIOLOGY

By David L Duffy

A Definition : A science that deals with aetiology, distribution, and control of disease in groups of relatives and with inherited causes of disease in populations (Morton, 1982). Note that this is double barrelled, but information about populations is inferred via correlations observed among samples of relatives.

Study materials

(1) Populations

Closest to population genetics.

(a) "General" population surveys eg gene frequency studies.

(b) Population isolates eg Amish, Aland Islanders.

(d) Migration of populations into different environments eg Tokelauan islanders and hypertension; American blacks and sickle cell anaemia.

(2) Families (nuclear or multigenerational)

Examining aggregation of disease in families which may be due to heredity or shared exposure to environmental risk factors.

(a) "General" population samples

(b) Ascertained through cases (probands) including family case-control studies. NB complex sampling theory for some types of analysis.

(3) Pairs of relatives

Often the most convenient sample.

(a) "General" population samples eg degree of relatedness of pairs of cancer cases compared to that of pairs of controls eg prostate cancer in Utah (expressed as mean coefficient of relationship so clustering in terms of genetic distance); all cancers and sharing surnames in UK.

(b) Incomplete families eg pairs of sibs or twins as 2(a) and 2(b).

Study types

(1) Phenotypic

("Phenometric") - based on expressed traits/diseases

(a) Conventional epidemiological studies of diseases of known genetic aetiology eg Downs Syndrome versus maternal age.

(b) Association of disease within pairs of relatives eg relative risk of affection in relative given affection status of index person known. A variety of genetic models can be fitted to such data such as Mendelian major gene models, or polygenic models (eg path models) that can estimate effects of unmeasured familial environmental factors given certain assumptions.

(c) Association of disease within families - classical and complex segregation analysis. This can test major gene and polygenic models more effectively. These are, in the case of complex segregation analysis (nonclassical Mendelian traits eg hypertension), computationally intensive. The mixed model, which is fitted using computer packages such as POINTER, PAP, and SAGE, allows fitting of models that include associations due to a major gene and residual associations due to polygenes or shared environment.

(2) Genotypic/phenotypic ("Genometric")

(a) Population/evolutionary genetics of disease - examines occurrence of disease and different genotypes in populations. Hemoglobinopathies offer classic example of this area.

(b) Linkage studies - if a marker gene is close on a chromosome to a disease gene (locus), then both will undergo recombination similarly, and particular marker gene alleles will cosegregate with disease within particular families. Analysis is usually performed by the specialised programs such as LINKAGE or LIPED.

(c) Association studies - if a putative disease gene is identified, or a gene marker is in linkage disequilibrium with a disease gene, then particular gene alleles will be associated with disease within the general population. Analysis is by the standard methods of epidemiology for case-control or cohort data. Confounding due to other genetic or environmental risk factors can be dealt with by analysing family data and including family of origin as a covariate. These methods extend down to the molecular level.

Terminology

(a) Alleles - alternative forms of a gene, polymorphisms. Usually only one allele of a disease gene actually increases risk (at least within a given family).

(b) Gene frequency - the proportion (rather than count) of different alleles of a gene in the population.

(d) Penetrance - risk of expressing trait associated with particular genotype. For continuous traits such as height, the mean effect of a genotype is the equivalent. An additive gene effect means that the penetrance or effect for a genotype containing 0,1 or 2 given alleles increases linearly. A dominant gene effect is nonlinear and thus includes both classical Mendelian dominant and recessive types of transmission.

(e) Phenocopy - a case of the trait/disease of interest not due to the action of genes eg nonfamilial versus familial breast cancer. A sporadic case of a "genetic" disease occurs in the absence of family history either because of environmental causes or via a new gene mutation. To minimise "uninteresting" sporadic cases, often a family will only be included in a genetic study if two cases have occurred within it.

(f) Segregation - transmission of genes from parents to offspring. The Mendelian transmission probabilities for allele A from a parent of genotype AA is unity, of genotype AB is a half, and of genotype BB is zero.

(g) Segregation ratios - ratios of different genotypes in offspring arising from particular matings.

(h) Identity by descent - two individuals share the same (allele of a) gene inherited from a common ancestor.

(i) Coefficient of relationship - probability that two individuals share one randomly selected allele at a locus identical by descent (twice the coefficient of kinship).

(j) Coefficient of identity - probability that two individuals share both alleles at a locus identical by descent. Using the coefficients of relationship and identity allows genetic disease risk of any relative of a proband to be calculated. Risch (1990) uses these to show that genetic risk will usually halve with each decrease in the degree of relationship (identical twin to parent or sibling to second degree relative).

(k) Polygenic traits - traits due to effects of multiple genes, usually thought of as all having small effects on risk. Polygenes may interact (multiplicatively) to affect risk - epistasis. In this case, Risch (1990) has suggested that risk falls off as the square root with each decrease in the degree of relationship. Many complex traits and diseases are thought to be polygenic eg height, psoriasis.

(l) Genetic heterogeneity - a trait or disease due to different genes acting in different individuals/populations (usually the gene frequency of each gene is low, and each gene is a sufficient cause). Cystic fibrosis and familial Alzheimer's disease are two examples of allelic and genotypic heterogeneity respectively.

(m) Sex-limitation - penetrance of a disease gene depends on sex of

individual eg familial breast cancer.

(n) Genetic imprinting - penetrance of a disease gene depends on sex of parent of origin of gene eg Huntingdon's disease, asthma.

(o) Linkage disequilibrium (allelic association) - particular alleles on different but physically close genes cosegregate (appear together in individuals) more frequently than expected. This may imply one gene has recently developed, or that a selective advantage of that particular haplotype. Disequilibrium is common in the HLA system, and particular HLA haplotypes are associated with particular diseases.

(p) Gene-environment interaction - represents interaction between particular genotypes and environmental exposures, and is ubiquitous eg phenylalanine hydroxylase deficiency, dietary phenylalanine and mental retardation; debrisoquine metabolism, smoking and lung cancer; fair skin, sun exposure and skin cancer; Lewis blood group, alcohol intake and coronary atherosclerosis; APOE genotype, head injury and Alzheimer's disease.

(q) Selection - "natural selection" by environmental factors of genotypes that increase likelihood of viable offspring. Classic examples are many recessive diseases, where the heterozygous carrier has increased overall reproductive fitness such as cystic fibrosis, sickle cell anaemia, thalassemia, despite the fact that a proportion of their offspring suffer lethal diseases.

(r) Cultural inheritance/domesticity/shared environment - environmental risk factors that aggregate within a family eg passive smoking, diet, infection etc. These exposures can be incorporated in genetic models if measured (see GxE interaction), and estimated if unmeasured in twin studies.

(s) Path models - Sewall Wright introduced this extension of regression analysis to genetics in the 1920's, where predictors in one regression equation are dependent variables in other equations. Familial aggregation of continuous variables under polygenic control is especially suited to these methods. Alternating or dichotomous traits such as diseases can also be modelled using probit models, which have been traditionally (since the 1960's) used for complex (polygenic) traits such as psoriasis and pyloric stenosis. Broad heritability is the R² of a trait due to genetic factors in such a regression model. An environmental index is sometimes used as a measure of family environment and is the result of the regression of all measured risk factors for that family.

(t) Classical twin design - This also dates from the 1920's, when diagnosis of identical (MZ) and nonidentical (DZ) twins became reliable. It is assumed that twins in the same household are exposed to shared environmental risk factors to the same extent whether they are MZ or DZ. Incidence (or prevalence) of disease in a cotwin of a case can then be interpreted as follows:

I_popul = I_DZ = I_MZ	No familial factors involved
I_popul < I_DZ = I_MZ	Family environmental factors
I_popul < I_DZ < I_MZ	Genetic factors family environment
I_popul I_DZ <<I_MZ	Epistatic polygenes and/or gene-environment interaction

For continuous traits, there are tests for genetical dominance, interaction between twins' phenotypes, GxE interaction and covariation, and even direction of causation, though these require large samples of twins.

(u) Adoption design - In the case where the effects of genetic and family environment on disease expression cannot be easily differentiated in standard designs (eg complex phenotypes), the examination of the trait correlation between parents and offspring, and caregivers and children can be useful. Note that selective placement of children (eg by social class) for adoption weakens some studies.

(v) Transmission-disequilibrium test (TDT) - This refers to a 1:1 or 1:4 matched case:control analysis of a measured gene and a trait, where the controls are synthetic, being the offspring the parents of the "case" could have had.