Elementary classical genetics

Introduction

Most textbooks start with Mendel's work. This description follows Strickberger, Chapters 6-7, and Srb, Chapter 1.

Mendel, an Augustinian monk and teacher, had training in physics and mathematics, as well as natural sciences. In the 1850s, he studied one experimental organism -- the garden pea -- using a systematic experimental approach and performing a statistical analysis of his data.

Mendel chose seven binary traits for his analysis, such as whether the peas on the plant were wrinkled or smooth, yellow or green, the plant short or tall. In each case, he had varieties that bred true for each phenotype, that is, self-fertilization and propagation from the resulting seeds consistently gave rise to the same phenotype in the offspring. Each such variety is described as homozygous for the trait.

When plants from two different varieties were crossed (parental generation P1), all the resulting offspring of the first filial generation (F1) appeared to be of only one type. For example, crossing

Wrinkled x Smooth

gives rise to hybrid

Smooth offspring.

When these F1 plants self-fertilized (or were fertilized by other F1 plants), both Wrinkled and Smooth plants were observed among the offspring in this F2 generation.

Critically, Mendel counted the seeds of each type in the F2 generation. In the Wrinkled x Smooth cross F2, there were

5474 Smooth, 1850 Wrinkled.

When Mendel self-fertilized F2 plants, the offspring (F3) of the Wrinkled plants were always Wrinkled. Smooth plants (565) either gave rise consistently to Smooth offspring (193), while others (372) gave rise to a mixture of Wrinkled and Smooth.

Similarly, for the Yellow Green F2, there were

6002 Yellow, 2001 Green.

For all seven traits, the F2s exhibited ratios of 3:1. Similarly, the F3s of selfed Smooths or Yellows also exhibited ratios of 3:1.

Table 6-1 of Strickberger: a meta-analysis of wrinkling trait.
Study Smooth Wrinkled Percent (Wrinkled)
F2
Mendel 5474 1850 25.2
Tschermak 884 288 24.6
Bateson 10793 3542 24.8
Hurst 1335 420 23.9
Lock 620 197 24.1
F3
Tschermak 2087 661 24.0
Lock 769 259 25.2

Mendel's first law: principle of segregation

Mendel interpreted these consistent ratios as being due to the segregation of transmissible discrete factors that control the observed trait into the gametes (pollen and eggs). There seemed to be two such factors in each plant for each trait, and two types of factor in each case, one of which was dominant over the other factor. For example, the Smooth allelomorph or allele is dominant over the Wrinkled allele. The two alleles in each plant is the genotype. The single allele present in the gamete is the haplotype.

Let S be the Smooth allele, and s the Wrinkled allele.

Genotype Phenotype
SS Smooth
Ss Smooth
ss Wrinkled

In segregation, one allele from each parent is "chosen" at random, and passed in the gamete onto the offspring.

Segregation in the F1 generation

Paternal genotype = Ss

Smooth

S (50%) s (50%)
Maternal Genotype = Ss

Smooth

S (50%) SS (25%)

Smooth

Ss (25%)

Smooth

s (50%) Ss (25%)

Smooth

ss (25%)

Wrinkled



Backcross

The Mendelian model can be tested by backcrossing an F1 hybrid to one of the P1 types.

Segregation in backcross matings
Paternal genotype = Ss

Smooth

S (50%) s (50%)
Maternal Genotype = SS

Smooth

S (50%) SS (25%)

Smooth

Ss (25%)

Smooth

S (50%) SS (25%)

Smooth

Ss (25%)

Smooth

Maternal Genotype = ss

Wrinkled

s (50%) Ss (25%)

Smooth

ss (25%)

Wrinkled

s (50%) Ss (25%)

Smooth

ss (25%)

Wrinkled



Codominance

The genotype to phenotype relationship (the gene penetrance) does not have to be completely dominant, as in all the examples Mendel chose. If one crosses frizzled chickens with normal chickens, the F1 generation are all slightly frizzled. The F2 generations are a mixture of frizzled, slightly frizzled, and normal in a ratio of 1:2:1. For example, in one experiment, Landauer and Dunn (1930) obtained 23 frizzled, 50 slightly frizzled, 20 normal.

Segregation in the F1 generation
Paternal genotype = Ff

Slightly frizzled

F (50%) f (50%)
Maternal Genotype = Ff

Slightly frizzled

F (50%) FF (25%)

Frizzled

Ff (25%)

Sl. Frizzled

f (50%) Ff (25%)

Sl Frizzled

ff (25%)

Normal



Segregation in backcross matings
Paternal genotype = Ff

Slightly frizzled

F (50%) f (50%)
Maternal Genotype = FF

Frizzled

F (50%) FF (25%)

Frizzled

Ff (25%)

Sl. Frizzled

F (50%) FF (25%)

Frizzled

Ff (25%)

Sl. Frizzled

Maternal Genotype = ff

Normal

f (50%) Ff (25%)

Sl. Frizzled

ff (25%)

Normal

f (50%) Ff (25%)

Sl Frizzled

ff (25%)

Normal



In this case of codominance, there is one phenotype for each genotype. As opposed to the case of dominance, we can reliably infer the genotype by examination of the organism. The marker genes used by modern geneticists are usually codominant.

Mendel's second law: principle of independent assortment

Mendel also carried out experiments involving two traits simultaneously. For example, crossing,

P1: Smooth,Yellow x Wrinkled,Green

gave the dihybrid,

F1: Smooth,Yellow

which selfed gave,

Counts in the F2 generation (Mendel)
Smooth Wrinkled
Yellow 315 101 416
Green 108 32 140
423 133 556

He concluded that each trait was segregating independently from the other, giving a 9:3:3:1 joint ratio, and 3:1 marginal ratios for each trait.

Dihybrid backcross

We can perform backcrosses as before. This dihybrid backcross is crucial to the understanding and detection of linkage. Srb gives data for two traits in the potato plant: tall v. dwarf, and cut leaf v. potato cut leaf.

Tall is dominant to short, and cut dominant to potato cut, so the Tall,Cut Dwarf,Potato F1 is Tall,Cut. The backcross with the Dwarf,Potato parental line gave

Counts in the backcross generation (MacArthur 1931)
Tall Dwarf
Cut 77 72 149
Potato 62 73 135
139 145 284

Only the backcross with the double-recessive parental line gives this nice 1:1:1:1 ratio, which directly reflects the underlying genotypes, so this is denoted the testcross.

Segregation in testcross for two dominant traits.
Paternal genotype = ddcc

Dwarf,Potato

dc (100%)
Maternal genotype

=

DdCc

Tall,Cut

DC (25%) DdCc Tall,Cut (25%)
Dc (25%) Ddcc Tall,Potato (25%)
dC (25%) ddCc Dwarf,Cut (25%)
dc (25%) ddcc Dwarf,Potato (25%)


LINKAGE

Introduction

It was not until 1900 that Mendel's work was replicated, and then rediscovered. Shortly after this, numerous exceptions to Mendel's second law were observed. These were not fully understood until Morgan. Srb gives a (later) example:

In dihybrid testcrosses for frizzle and white in chickens, Hutt (1931) obtained:

frizzled is dominant over normal (if one combines slightly and extremely frizzled).

white is dominant over coloured.

P1: White,Normal Coloured,Frizzle

F1: White,Frizzle

Testcross: White,Frizzle (F1) x Coloured,Normal

Counts in testcross 1 (Hutt 1931)
White Coloured
Frizzled 18 63 81
Normal 63 13 76
81 76 157

Note the marginal counts are in the 1:1 ratio we expect, but there is deviation in the main table from 1:1:1:1. This deviation is due to linkage between the two genes. The percent recombination is 100*(18+13)/157 = 19.7%. Under independent assortment the percent recombination should be 50%.

After mating another set of chickens of exactly the same genotypes however, the following counts were made,

Counts in testcross 2 (Hutt 1933)
White Coloured
Frizzled 15 2 17
Normal 4 12 16
19 14 33

In the first testcross, the Frizzled and Coloured phenotypes seemed to cosegregate, but the reverse is seen in the second cross. This is what is referred to as repulsion of the dominant traits (frizzled and white) in the first case, and coupling in the second. The percentage deviation from 1:1:1:1 seems to be about the same in each table, but in opposite directions. Actually, we always ignore the sign, and calculate the recombination in this table as 100*(4+2)/33=18.2%.

If one examines a large number of genes in such a fashion in any organism, sets of genes are always linked together, while assorting independently (recombination 50%) with respect to members of other linkage groups. It was realised in the 1920s that each linkage group corresponds to a chromosome.

The mechanism underlying linkage in the examples looks like this:

P1: White,Normal Coloured,Frizzle

I f i F

---- ----

I f i F

emit sperm I f and egg i F

F1: White,Frizzle

I f

----

i F

Testcross: White,Frizzle (F1) Coloured,Normal

I f i f

---- ----

i F i f

emit eggs I F, i F, I f, i f, and sperm i f

Now, if white and frizzle were unlinked, we would expect each egg type to be equally frequent (25%). One way to give rise to the observed frequencies would be if the pairs above and below the bar stick together, only recombining a smaller amount than the 50% we would expect under independence.

That is,

I f --> I F c% of occasions

i F --> i f c% of occasions

So, if c is zero, then we would only see Iiff (white, normal) and iiFf (coloured, frizzled) offspring from this testcross. There is in fact zero recombination in male Drosophila flies within linkage groups, and free (50%) recombination between linkage groups.

Exactly the same genotype could be arranged in a different original fashion (different phase):

I F i f

---- ----

i f i f

If c was zero in this case, we would only see IiFf (white, frizzled) and iiff (coloured, normal offspring. These two arrangements correspond to coupling and repulsion of the observed phenotypes.

Coupling Repulsion

I F I f

---- ----

i f i F

Based on our definition of c, the gametic frequencies are:

Gametic frequencies
IF If iF if
IF/if (coupling) (1-c)/2 c/2 c/2 (1-c)/2
If/iF (repulsion) c/2 (1-c)/2 (1-c)/2 c/2

If c is 50%, then this gives 25% for each haplotype as before. Furthermore, it justifies the estimator for c given above.

We can estimate c from other designs than the testcross, although they are slightly to considerably more complicated. In humans, of course, test crosses are difficult to arrange, so sophisticated methods of analysis are mandatory.

Mapping

If one can arrange testcrosses for triple (or higher order) heterozygotes and recessives (a three-point cross), the recombination can be calculated for the three pairs of genes. The data will look like this example (Strickberger, Problem 17-3):

Trait A is controlled by a gene with alleles A and a, A dominant to a

Trait B is controlled by a gene with alleles B and b, B dominant to b

Trait C is controlled by a gene with alleles C and c, C dominant to c

Testcross is AaBbCc x abc/abc

Data from three-point cross of corn (colourless, shrunken, waxy) due to Stadler.

Progeny Phenotype Count
1 A B C 17959
2 a b c 17699
3 A b c 509
4 a B C 524
5 A B c 4455
6 a b C 4654
7 A b C 20
8 a B c 12
Total tested 45832

The table deviates drastically from the expected 1:1:1:1:1:1:1:1, so linkage is being observed.

ABC and abc are the two commonest phenotypes, and are "reciprocal classes", so the heterozygote parent's phase was ABC/abc, rather than AbC/aBc etc. Recombination events between A and B are calculated from the marginal AB table and so forth,

Marginal AB table created by collapsing across the two levels of C.
A a
B 22414 536 22950
b 529 22353 22882
22943 22889 45832

cAB = 100*(529+536)/45832 = 2.3% cAC cAB + cBC

cBC = 100*(4455+12+4654+20)/45832 = 19.9%

cAC = 100*(509+4455+524+4654)/45832 = 22.1%

When similar experiments are carried out involving larger numbers of loci from the same linkage group, it becomes obvious that the set of pairwise recombination percentages suggest strongly that the genes are ordered in a linear fashion, with recombination acting as the distance between them.

The linkage map that one constructs using recombination distance turns out to correspond to the physical map of genes along the linear structure of the chromosome. Recombination is the "phenotypic" effect of crossover or chiasma formation between homologous chromosomes, whereby they exchange segments of DNA.

Abbreviated linkage map of maize chromosome 9 (Brookhaven National Laboratory 1996).
Locus Coord
csu95a 0.00
c1 colored aleurone1 27.90 * A
sh1 shrunken1 31.60 * B
bz1 bronze1 35.20
wx1 waxy1 55.30 * C
acp1 acid phosphatase1 64.30
sus1 sucrose synthase1 75.40
hsp18a 18 kda heat shock protein18a 78.00
csh2c(cdc2) 144.60

Positions on a linkage map are loci. Since "gene" can be taken to mean the different gene forms (alleles), or the factor controlling a phenotype, geneticists often refer to the latter as the locus sh1, rather than the gene sh1.

Double Crossovers

I have glossed over a few possibilities in the three-point cross in the last section. If all three markers are in the same linkage group, that is, on the same chromosome, then we can observe an ABC/abc undergoing two recombination events, one between A and B, and another between B and C, to give AbC/aBc. This is what is going on in cells 7 and 8 of the earlier example. If we had been looking only at dihybrid test cross data, then we would not be able to detect these double recombinants.

One notices that double recombinants are not very common, so the effect on the estimates of the percent recombination is not large. The corollary of this is that most chromosomes will experience only zero or one recombinants. The estimated double recombination rate does add to our estimate of the distance between the more distant loci (A and C in the example). Trow's formula states that

cAC = cAB+cBC-2cABcBC

Interference

For a given distance between two loci, one can estimate the number of double recombinants that one would expect. At a trivial level, imagine three loci, each 10% recombination distance apart. Then we would expect in 1% of cases that a double recombinant would occur (one in each interval). The rate of double recombinants is usually less than this expected value. The term interference refers to the fact that recombination seems to be suppressed close to a first recombination event. The coincidence coefficient is the ratio of the observed number of double recombinants to the expected number.

Mapping functions

The presence of double recombination and interference means that recombination percentage is only roughly additive. A mapping function adjusts for one or both of these phenomena.

We have already seen the Morgan mapping function, x=c, where x is the distance in map units. This assumes complete interference.

The Haldane mapping function is:

x = 0.5 log(1-2c)
c = 0.5 (1-e-2x)

and adjusts for double recombination only. Trow's formula (above) assumes the Haldane mapping function. The Kosambi mapping function also allows for interference,

x = 0.25 log[(1+2c)/(1-2c)]
c = 0.5 (e4x-1)/(e4x+1)

There are various problems with both these mapping functions when applied to particular purposes. When c is small however, x := c.

An additional problem

All the loci descibed so far have two alleles. Many systems have more than two alleles, but the same Mendelian principles hold. In which of the following families has there been a mistake in genotyping at the codominant marker locus?

(a) Father's genotype A/B; mother's genotype A/C;

children's genotypes: A/A, A/B, A/C.

(b) Father's genotype A/B; mother's genotype A/D;

children's genotypes: A/A, B/B, B/D.

(c) Father's genotype A/B; mother's genotype unknown;

children's genotypes: A/A, A/C, A/D, B/C.

(d) Father's genotype A/B; mother's genotype unknown;

children's genotypes: A/A, A/C, A/B, B/C.

(e) Father's genotype unknown; mother's genotype unknown;

children's genotypes: A/A, A/B, B/C.

(f) Father's genotype unknown; mother's genotype unknown;

children's genotypes: A/B, B/C, C/D.

(g) Father's genotype unknown; mother's genotype unknown;

children's genotypes: A/B, B/B, B/C.