Most textbooks start with Mendel's work. This description follows Strickberger, Chapters 6-7, and Srb, Chapter 1.
Mendel, an Augustinian monk and teacher, had training in physics and mathematics, as well as natural sciences. In the 1850s, he studied one experimental organism -- the garden pea -- using a systematic experimental approach and performing a statistical analysis of his data.
Mendel chose seven binary traits for his analysis, such as whether the peas on the plant were wrinkled or smooth, yellow or green, the plant short or tall. In each case, he had varieties that bred true for each phenotype, that is, self-fertilization and propagation from the resulting seeds consistently gave rise to the same phenotype in the offspring. Each such variety is described as homozygous for the trait.
When plants from two different varieties were crossed (parental generation P1), all the resulting offspring of the first filial generation (F1) appeared to be of only one type. For example, crossing
gives rise to hybrid
When these F1 plants self-fertilized (or were fertilized by other F1 plants), both Wrinkled and Smooth plants were observed among the offspring in this F2 generation.
Critically, Mendel counted the seeds of each type in the F2 generation. In the Wrinkled x Smooth cross F2, there were
When Mendel self-fertilized F2 plants, the offspring (F3) of the Wrinkled plants were always Wrinkled. Smooth plants (565) either gave rise consistently to Smooth offspring (193), while others (372) gave rise to a mixture of Wrinkled and Smooth.
Similarly, for the Yellow Green F2, there were
For all seven traits, the F2s exhibited ratios of 3:1. Similarly, the F3s of selfed Smooths or Yellows also exhibited ratios of 3:1.
Study | Smooth | Wrinkled | Percent (Wrinkled) |
F2 | |||
Mendel | 5474 | 1850 | 25.2 |
Tschermak | 884 | 288 | 24.6 |
Bateson | 10793 | 3542 | 24.8 |
Hurst | 1335 | 420 | 23.9 |
Lock | 620 | 197 | 24.1 |
F3 | |||
Tschermak | 2087 | 661 | 24.0 |
Lock | 769 | 259 | 25.2 |
Mendel interpreted these consistent ratios as being due to the segregation of transmissible discrete factors that control the observed trait into the gametes (pollen and eggs). There seemed to be two such factors in each plant for each trait, and two types of factor in each case, one of which was dominant over the other factor. For example, the Smooth allelomorph or allele is dominant over the Wrinkled allele. The two alleles in each plant is the genotype. The single allele present in the gamete is the haplotype.
Let S be the Smooth allele, and s the Wrinkled allele.
Genotype | Phenotype |
SS | Smooth |
Ss | Smooth |
ss | Wrinkled |
In segregation, one allele from each parent is "chosen" at random, and passed in the gamete onto the offspring.
Paternal genotype = Ss
Smooth |
|||
S (50%) | s (50%) | ||
Maternal Genotype = Ss Smooth |
S (50%) | SS (25%) Smooth |
Ss (25%) Smooth |
s (50%) | Ss (25%) Smooth |
ss (25%) Wrinkled |
The Mendelian model can be tested by backcrossing an F1 hybrid to one of the P1 types.
Paternal genotype = Ss
Smooth |
|||
S (50%) | s (50%) | ||
Maternal Genotype = SS Smooth |
S (50%) | SS (25%) Smooth |
Ss (25%) Smooth |
S (50%) | SS (25%) Smooth |
Ss (25%) Smooth |
|
Maternal Genotype = ss Wrinkled |
s (50%) | Ss (25%) Smooth |
ss (25%) Wrinkled |
s (50%) | Ss (25%) Smooth |
ss (25%) Wrinkled |
The genotype to phenotype relationship (the gene penetrance) does not have to be completely dominant, as in all the examples Mendel chose. If one crosses frizzled chickens with normal chickens, the F1 generation are all slightly frizzled. The F2 generations are a mixture of frizzled, slightly frizzled, and normal in a ratio of 1:2:1. For example, in one experiment, Landauer and Dunn (1930) obtained 23 frizzled, 50 slightly frizzled, 20 normal.
Paternal genotype = Ff
Slightly frizzled |
|||
F (50%) | f (50%) | ||
Maternal Genotype = Ff Slightly frizzled |
F (50%) | FF (25%) Frizzled |
Ff (25%) Sl. Frizzled |
f (50%) | Ff (25%) Sl Frizzled |
ff (25%) Normal |
Paternal genotype = Ff
Slightly frizzled |
|||
F (50%) | f (50%) | ||
Maternal Genotype = FF Frizzled |
F (50%) | FF (25%) Frizzled |
Ff (25%) Sl. Frizzled |
F (50%) | FF (25%) Frizzled |
Ff (25%) Sl. Frizzled |
|
Maternal Genotype = ff Normal |
f (50%) | Ff (25%) Sl. Frizzled |
ff (25%) Normal |
f (50%) | Ff (25%) Sl Frizzled |
ff (25%) Normal |
In this case of codominance, there is one phenotype for each genotype. As opposed to the case of dominance, we can reliably infer the genotype by examination of the organism. The marker genes used by modern geneticists are usually codominant.
Mendel also carried out experiments involving two traits simultaneously. For example, crossing,
P1: Smooth,Yellow x Wrinkled,Green
gave the dihybrid,
F1: Smooth,Yellow
which selfed gave,
Smooth | Wrinkled | ||
Yellow | 315 | 101 | 416 |
Green | 108 | 32 | 140 |
423 | 133 | 556 |
He concluded that each trait was segregating independently from the other, giving a 9:3:3:1 joint ratio, and 3:1 marginal ratios for each trait.
We can perform backcrosses as before. This dihybrid backcross is crucial to the understanding and detection of linkage. Srb gives data for two traits in the potato plant: tall v. dwarf, and cut leaf v. potato cut leaf.
Tall is dominant to short, and cut dominant to potato cut, so the Tall,Cut Dwarf,Potato F1 is Tall,Cut. The backcross with the Dwarf,Potato parental line gave
Tall | Dwarf | ||
Cut | 77 | 72 | 149 |
Potato | 62 | 73 | 135 |
139 | 145 | 284 |
Only the backcross with the double-recessive parental line gives this nice 1:1:1:1 ratio, which directly reflects the underlying genotypes, so this is denoted the testcross.
Paternal genotype = ddcc
Dwarf,Potato |
||
dc (100%) | ||
Maternal genotype = DdCc Tall,Cut |
DC (25%) | DdCc Tall,Cut (25%) |
Dc (25%) | Ddcc Tall,Potato (25%) | |
dC (25%) | ddCc Dwarf,Cut (25%) | |
dc (25%) | ddcc Dwarf,Potato (25%) |
It was not until 1900 that Mendel's work was replicated, and then rediscovered. Shortly after this, numerous exceptions to Mendel's second law were observed. These were not fully understood until Morgan. Srb gives a (later) example:
In dihybrid testcrosses for frizzle and white in chickens, Hutt (1931) obtained:
frizzled is dominant over normal (if one combines slightly and extremely frizzled).
white is dominant over coloured.
P1: White,Normal Coloured,Frizzle
F1: White,Frizzle
Testcross: White,Frizzle (F1) x Coloured,Normal
White | Coloured | ||
Frizzled | 18 | 63 | 81 |
Normal | 63 | 13 | 76 |
81 | 76 | 157 |
Note the marginal counts are in the 1:1 ratio we expect, but there is deviation in the main table from 1:1:1:1. This deviation is due to linkage between the two genes. The percent recombination is 100*(18+13)/157 = 19.7%. Under independent assortment the percent recombination should be 50%.
After mating another set of chickens of exactly the same genotypes however, the following counts were made,
White | Coloured | ||
Frizzled | 15 | 2 | 17 |
Normal | 4 | 12 | 16 |
19 | 14 | 33 |
In the first testcross, the Frizzled and Coloured phenotypes seemed to cosegregate, but the reverse is seen in the second cross. This is what is referred to as repulsion of the dominant traits (frizzled and white) in the first case, and coupling in the second. The percentage deviation from 1:1:1:1 seems to be about the same in each table, but in opposite directions. Actually, we always ignore the sign, and calculate the recombination in this table as 100*(4+2)/33=18.2%.
If one examines a large number of genes in such a fashion in any organism, sets of genes are always linked together, while assorting independently (recombination 50%) with respect to members of other linkage groups. It was realised in the 1920s that each linkage group corresponds to a chromosome.
The mechanism underlying linkage in the examples looks like this:
P1: White,Normal Coloured,Frizzle
I f i F
---- ----
I f i F
emit sperm I f and egg i F
F1: White,Frizzle
I f
----
i F
Testcross: White,Frizzle (F1) Coloured,Normal
I f i f
---- ----
i F i f
emit eggs I F, i F, I f, i f, and sperm i f
Now, if white and frizzle were unlinked, we would expect each egg type to be equally frequent (25%). One way to give rise to the observed frequencies would be if the pairs above and below the bar stick together, only recombining a smaller amount than the 50% we would expect under independence.
That is,
I f --> I F c% of occasions
i F --> i f c% of occasions
So, if c is zero, then we would only see Iiff (white, normal) and iiFf (coloured, frizzled) offspring from this testcross. There is in fact zero recombination in male Drosophila flies within linkage groups, and free (50%) recombination between linkage groups.
Exactly the same genotype could be arranged in a different original fashion (different phase):
I F i f
---- ----
i f i f
If c was zero in this case, we would only see IiFf (white, frizzled) and iiff (coloured, normal offspring. These two arrangements correspond to coupling and repulsion of the observed phenotypes.
Coupling Repulsion
I F I f
---- ----
i f i F
Based on our definition of c, the gametic frequencies are:
Gametic frequencies | ||||
IF | If | iF | if | |
IF/if (coupling) | (1-c)/2 | c/2 | c/2 | (1-c)/2 |
If/iF (repulsion) | c/2 | (1-c)/2 | (1-c)/2 | c/2 |
If c is 50%, then this gives 25% for each haplotype as before. Furthermore, it justifies the estimator for c given above.
We can estimate c from other designs than the testcross, although they are slightly to considerably more complicated. In humans, of course, test crosses are difficult to arrange, so sophisticated methods of analysis are mandatory.
If one can arrange testcrosses for triple (or higher order) heterozygotes and recessives (a three-point cross), the recombination can be calculated for the three pairs of genes. The data will look like this example (Strickberger, Problem 17-3):
Trait A is controlled by a gene with alleles A and a, A dominant to a
Trait B is controlled by a gene with alleles B and b, B dominant to b
Trait C is controlled by a gene with alleles C and c, C dominant to c
Testcross is AaBbCc x abc/abc
Data from three-point cross of corn (colourless, shrunken, waxy) due to Stadler.
Progeny Phenotype | Count | |
1 | A B C | 17959 |
2 | a b c | 17699 |
3 | A b c | 509 |
4 | a B C | 524 |
5 | A B c | 4455 |
6 | a b C | 4654 |
7 | A b C | 20 |
8 | a B c | 12 |
Total tested | 45832 |
The table deviates drastically from the expected 1:1:1:1:1:1:1:1, so linkage is being observed.
ABC and abc are the two commonest phenotypes, and are "reciprocal classes", so the heterozygote parent's phase was ABC/abc, rather than AbC/aBc etc. Recombination events between A and B are calculated from the marginal AB table and so forth,
A | a | ||
B | 22414 | 536 | 22950 |
b | 529 | 22353 | 22882 |
22943 | 22889 | 45832 |
cAB = 100*(529+536)/45832 = 2.3% cAC cAB + cBC
cBC = 100*(4455+12+4654+20)/45832 = 19.9%
cAC = 100*(509+4455+524+4654)/45832 = 22.1%
When similar experiments are carried out involving larger numbers of loci from the same linkage group, it becomes obvious that the set of pairwise recombination percentages suggest strongly that the genes are ordered in a linear fashion, with recombination acting as the distance between them.
The linkage map that one constructs using recombination distance turns out to correspond to the physical map of genes along the linear structure of the chromosome. Recombination is the "phenotypic" effect of crossover or chiasma formation between homologous chromosomes, whereby they exchange segments of DNA.
Locus | Coord |
---|---|
csu95a | 0.00 |
c1 colored aleurone1 | 27.90 * A |
sh1 shrunken1 | 31.60 * B |
bz1 bronze1 | 35.20 |
wx1 waxy1 | 55.30 * C |
acp1 acid phosphatase1 | 64.30 |
sus1 sucrose synthase1 | 75.40 |
hsp18a 18 kda heat shock protein18a | 78.00 |
csh2c(cdc2) | 144.60 |
Positions on a linkage map are loci. Since "gene" can be taken to mean the different gene forms (alleles), or the factor controlling a phenotype, geneticists often refer to the latter as the locus sh1, rather than the gene sh1.
I have glossed over a few possibilities in the three-point cross in the last section. If all three markers are in the same linkage group, that is, on the same chromosome, then we can observe an ABC/abc undergoing two recombination events, one between A and B, and another between B and C, to give AbC/aBc. This is what is going on in cells 7 and 8 of the earlier example. If we had been looking only at dihybrid test cross data, then we would not be able to detect these double recombinants.
One notices that double recombinants are not very common, so the effect on the estimates of the percent recombination is not large. The corollary of this is that most chromosomes will experience only zero or one recombinants. The estimated double recombination rate does add to our estimate of the distance between the more distant loci (A and C in the example). Trow's formula states that
cAC = cAB+cBC-2cABcBC
Interference
For a given distance between two loci, one can estimate the number of double recombinants that one would expect. At a trivial level, imagine three loci, each 10% recombination distance apart. Then we would expect in 1% of cases that a double recombinant would occur (one in each interval). The rate of double recombinants is usually less than this expected value. The term interference refers to the fact that recombination seems to be suppressed close to a first recombination event. The coincidence coefficient is the ratio of the observed number of double recombinants to the expected number.
The presence of double recombination and interference means that recombination percentage is only roughly additive. A mapping function adjusts for one or both of these phenomena.
We have already seen the Morgan mapping function, x=c, where x is the distance in map units. This assumes complete interference.
The Haldane mapping function is:
x = 0.5 log(1-2c)
c = 0.5 (1-e-2x)
and adjusts for double recombination only. Trow's formula (above) assumes the Haldane mapping function. The Kosambi mapping function also allows for interference,
x = 0.25 log[(1+2c)/(1-2c)]
c = 0.5 (e4x-1)/(e4x+1)
There are various problems with both these mapping functions when applied to particular purposes. When c is small however, x := c.
All the loci descibed so far have two alleles. Many systems have more than two alleles, but the same Mendelian principles hold. In which of the following families has there been a mistake in genotyping at the codominant marker locus?
(a) Father's genotype A/B; mother's genotype A/C;
children's genotypes: A/A, A/B, A/C.
(b) Father's genotype A/B; mother's genotype A/D;
children's genotypes: A/A, B/B, B/D.
(c) Father's genotype A/B; mother's genotype unknown;
children's genotypes: A/A, A/C, A/D, B/C.
(d) Father's genotype A/B; mother's genotype unknown;
children's genotypes: A/A, A/C, A/B, B/C.
(e) Father's genotype unknown; mother's genotype unknown;
children's genotypes: A/A, A/B, B/C.
(f) Father's genotype unknown; mother's genotype unknown;
children's genotypes: A/B, B/C, C/D.
(g) Father's genotype unknown; mother's genotype unknown;
children's genotypes: A/B, B/B, B/C.