next up previous


This LaTeX document is available as postscript or asAdobe PDF.

General Genetics

Much of the old literature on quantitative genetics was based on knowledge at that time. A gene was known to control a particular chemical function or reaction, and a gene was a gene. The exact number of genes was not known, but thought to be very large. Genes were known to be composed of DNA, but probably most people thought genes were equal in length, and maybe evenly spaced on the chromosomes.

Today, much more is known about the molecular aspects of genes and chromosomes. The human genome has been completely mapped and there are known to be common regions of DNA between species. DNA has two complementary strands which are linear arrangements of nucleotides (A= adenine, T=thymine, G= guanine, and C=cytosine). A base pair is A-T or G-C, and there are 6 billion base pairs in a typical mammalian genome. Sets of 3 (triplets) base pairs code for an amino acid. There are start and stop triplets. Sequences of amino acids subsequently code for an enzyme or protein. A gene is now defined as all of the nucleotides that translate into a polypeptide.

A gene contains a promoter region ( to start the replication ), a number of exon regions (expressed regions), and a number of intron regions (intragenic regions) which appear to be extraneous DNA that does not code for anything. A gene can be 1,000 to 2 million base pairs in length, with the average gene being 100,000 base pairs. Different forms of the gene (differences in base pairs within the specific location) are known as alleles. If the base pair difference occurs in an intron region, then there may not be any difference in gene function, but differences in the exon and promoter regions could result in the production of a different polypeptide or lack of production of the usual polypeptide.

Much of quantitative genetics theory assumes genes have only two possible alleles. However, looking at the genes controlling blood antigens and blood proteins in dairy cattle, there are likely going to be many genes with several alleles. Some alleles are more prevalent than others. Some alleles are lethal is they occur together, such as in the case of white coat colour in horses ( Ww = white colour, ww = non white colour, and WW = lethal).

Blood Antigens   Blood Proteins
Locus Alleles Protein Locus Alleles
A 10 Hemoglobin Hb 5
B >300 Albumin Al 3
C >35 Post-albumin Pa 2
FV 4 Transferrin Tf 7
J >4 $\alpha^{2}$ globulin $S_{\alpha}$ 2
L 2 Alkaline phos. F 2
M 3 Amylase Am 3
S >10   Milk Proteins
R'-S' 2 $\beta$ lactoglobulin Lg 6
Z 3 $\alpha$ lactalbumin $\alpha-$La 1
    $\beta$ casein $\beta-$Cn 7
    $\alpha$ casein $\alpha-$Cn 4
    $\kappa$ casein $\kappa-$Cn 2
    $\gamma$ casein $\gamma-$Cn 4

The site where a gene is located in the genome is known as a locus. In humans, there are at least 547 genes that control the functioning of the eye, and over 1,200 involved with the heart. Thus, many genes are involved in controlling each trait that we observe in most livestock species. Genes also interact with each other, and so if we try to change one gene to have a particular allele, then we will be changing the effects of other genes that depended on the gene we have changed.

The total number of genes in mammals is now estimated to be around 30,000 to 50,000. Work is underway to determine what polypeptides are produced by each gene, what organs or functions do these enzymes regulate, and where are these genes.

The following notes start with a single locus, a gene having two alleles. As you read notice that there is a population perspective about a locus, and also an individual perspective. Later, the notes progress to considering two loci simultaneously, and from there a big jump to an infinite number of loci. Keep this progression in mind.

Single Locus Genetic Model
L. R. Schaeffer, March 1999

Acknowledgement: These notes are largely based upon material prepared by Dr. Brian Kennedy, which he used in his Quantitative Genetics course. In turn, his notes were based on many sources of information too.

1. Introduction

An understanding of genetic improvement of animals begins with an understanding of inheritance at a single locus. Even the action of a single locus is being further defined by molecular geneticists into sub-sections with sites for turning the locus on or off, or controling the activity of that locus. However, the molecular level of genetics will not be covered. Only diploid species will be considered.

2. Hardy-Weinberg Equilibrium

The conditions under which Hardy-Weinberg Equilibrium applies are

1.
Large population size,
2.
Random mating population,
3.
No selection of animals,
4.
No migration of animals, and
5.
No mutation of genetic material.
Assume only two alleles at locus A, A1 and A2, with gene frequencies p and q, respectively. Under the above conditions, then the gene frequencies and genotypic frequencies are constant from generation to generation. Genotypic frequencies are derived from gametic frequencies which arise from gene frequencies. Assuming the gene frequencies are the same in males and females, then the following table gives the genotypic frequencies in offspring (which represents the general population).

      Males
      A1 A2
      p q
Females A1 $\mid$ A1A1 A1A2
  p $\mid$ p2 pq
    $\mid$    
  A2 $\mid$ A2A1 A2A2
  q $\mid$ pq q2

The results can be summarized as

Genotype Frequency
A1A1 p2
A1A2 2pq
A2A2 q2
Note that the frequencies of genotypes come from squaring (p+q). The frequency of A1A1 genotype is greatest when p=1, and the frequency of A1A2 genotype is greatest when p=q=0.5.

In general, with n alleles, Ai, with frequency pi, the genotypic frequency of AiAj in a population under Hardy-Weinberg equilibrium will be 2pipj for $i \neq j$, and is p2i for i=j.

2.1 Sex Differences in Frequencies

If the gene frequencies are different in the sexes, say pim is the frequency of Ai allele in males, and pif is the frequency of Ai allele in females, then the genotypic frequency of AiAj in the next generation of offspring will be pimpjf + pjmpif, for $i \neq j$. To illustrate, assume n=2, p1m=0.2, p2m=0.8, p1f=0.4, and p2f=0.6, then the genotypic outcome is

      Males
      A1 A2
      0.2 0.8
Females A1 $\mid$ A1A1 A1A2
  0.4 $\mid$ 0.08 0.32
    $\mid$    
  A2 $\mid$ A2A1 A2A2
  0.6 $\mid$ 0.12 0.48

The frequency of the A1 allele in the offspring is

\begin{eqnarray*}p_{1} & = & 0.08 + 0.5*(0.32 + 0.12) \\ & = & 0.30 \\
& = & 0.5*p_{1}^{m} + 0.5*p_{1}^{f}.
\end{eqnarray*}


If the offspring are allowed to mate randomly, then the next generation will be in Hardy-Weinberg equilibrium, with the same frequency of alleles for both sexes.

2.2 Sex-Linked Locus

With a sex-linked locus, usually the male carries the Y-chromosome which is considered to be nonactive genetically. Assuming n=2 alleles, then genotypically, males will be either A1- with frequency p1m(t), or A2- with frequency p2m(t). Females, on the other hand, will be AiAj with frequency pif(t)pjf(t) for i and j going from 1 to 2. The subscript t refers to generation number.

As an example, suppose the frequencies of alleles in males and females is unequal in generation 0, let p1m(0) = 0.2, and p1f(0) = 0.4, then the genotypic frequencies in the progeny will be as follows:

      Males
      - A1 A2
      0.5 0.1 0.4
Females A1 $\mid$ A1- A1A1 A1A2
  0.4 $\mid$ 0.2 0.04 0.16
    $\mid$      
  A2 $\mid$ A2- A2A1 A2A2
  0.6 $\mid$ 0.3 0.06 0.24

The frequency of A1 allele in the male progeny is equal to

\begin{eqnarray*}p_{1}^{m}(1) & = & 0.2 / 0.5 \\
& = & 0.4 \\
& = & p_{1}^{f}(0)
\end{eqnarray*}


and for the female progeny is equal to

\begin{eqnarray*}p_{1}^{f}(1) & = & (0.04 + 0.5*(0.16 + 0.06)) / 0.5 \\
& = & 0.30 \\
& = & 0.5* (p_{1}^{m}(0) + p_{1}^{f}(0) )
\end{eqnarray*}


The average frequency of A1 across males and females in generation 1 is

\begin{eqnarray*}p_{1}(1) & = & \frac{1}{3} p_{1}^{m}(1) + \frac{2}{3} p_{1}^{f}...
...= & \frac{1}{3} (0.4) + \frac{2}{3} (0.3) \\
& = & \frac{1}{3}
\end{eqnarray*}


In general,

\begin{eqnarray*}p_{1}^{m}(t+1) & = & p_{1}^{f}(t) \\
p_{1}^{f}(t+1) & = & \frac{1}{2} (p_{1}^{m}(t) + p_{1}^{f}(t) )
\end{eqnarray*}


and p1( ) is constant across all generations. To prove, note that

\begin{eqnarray*}p_{i}(t+1) & = & \frac{1}{3} p_{i}^{m}(t+1) + \frac{2}{3}
p_{i...
...}{3} p_{i}^{m}(t) + \frac{2}{3} p_{i}^{f}(t) \\
& = & p_{i}(t)
\end{eqnarray*}


As the number of generations of random mating increases, the allele frequencies in males and females approaches Hardy-Weinberg equilibrium. Let di be the initial difference in allele frequencies between males and females, then for t=1,

\begin{eqnarray*}d_{i} & = & p_{i}^{m}(0) - p_{i}^{f}(0) \\
& & \\
p_{i}^{m}...
...{i}(0) - \frac{1}{3} d_{i} \\
& = & p_{i} - \frac{1}{3} d_{i}.
\end{eqnarray*}


and similarly, for females,

\begin{eqnarray*}p_{i}^{f}(1) & = & \frac{1}{2} p_{i}^{m}(0) + \frac{1}{2} p_{i}...
...{i}(0) + \frac{1}{6} d_{i} \\
& = & p_{i} + \frac{1}{6} d_{i}.
\end{eqnarray*}


For t=2,

\begin{eqnarray*}p_{i}^{m}(2) & = & p_{i}^{f}(1) \\
& = & p_{i} + \frac{1}{6} ...
..._{i} + \frac{1}{6} d_{i}) \\
& = & p_{i} - \frac{1}{12} d_{i}.
\end{eqnarray*}


For general t,

\begin{eqnarray*}p_{i}^{m}(t) & = & p_{i} + \frac{2}{3} ( - \frac{1}{2})^{t} d_{...
...{i}^{f}(t) & = & p_{i} - \frac{1}{3} ( - \frac{1}{2})^{t} d_{i}
\end{eqnarray*}


and as $t \rightarrow \infty$ then

\begin{eqnarray*}p_{i}^{m}(t) & \rightarrow & p_{i} \\
p_{i}^{f}(t) & \rightarrow & p_{i}
\end{eqnarray*}


3. Means and Variances

Assume two alleles at a locus and the frequency of A1 is p, and of A2 is q. Genotypic values are assigned to each genotype as shown in the table below.

Genotype Frequency Value
A1A1 p2 a
A1A2 2pq d
A2A2 q2 -a
d is the value of the heterozygote genotype, and can take on many different values.
1.
When d=0, then this locus is completely additive. The genotypes can be viewed as being equally spaced. The difference from A1A1to A1A2 is a, and the difference from A1A2 to A2A2 is also a.

2.
When d=a, then the values of A1A1 and A1A2 are the same and the two genotypes cannot be distinguished on the basis of genotypic value alone. This situation is called complete dominance.

3.
When d > a, then the heterozygote has a greater value than the best homozygote. This situation is called overdominance.

4.
Any other value of d between 0 and a gives a situation of incomplete dominance. Within a genome are thousands of loci and most likely all possible values of d occur within a species.

3.1 Genotypic Mean

The mean of the genotypes, $\mu_{G}$, is calculated by weighting the genotypic values by the genotypic frequencies.

\begin{eqnarray*}\mu_{G} & = & p^{2}a + 2pqd + q^{2}(-a) \\
& = & a(p^{2}-q^{2}) + 2pqd \\
& = & a[(p+q)(p-q)] + 2pqd \\
& = & a(p-q) + 2pqd
\end{eqnarray*}


For each of the possible situations for values of d, determine the frequency p that will maximize $\mu_{G}$. For example, when d=0, $\mu_{G}$ will be maximum when p=1.

3.2 Genotypic Variance

The variance of genotypes, $\sigma^{2}_{G}$, is calculated by weighting the square of the genotypic values by the genotypic frequencies and subtracting the square of the genotypic mean.

\begin{eqnarray*}\sigma^{2}_{G} & = & p^{2}a^{2} + 2pqd^{2} + q^{2}a^{2} - \mu_{...
...)^{2}d^{2}+2ad(q-p)] \\
& = & (2pqd)^{2} +2pq[a + (q-p)d]^{2}
\end{eqnarray*}


When d=0, then

\begin{displaymath}\sigma^{2}_{G} = 2pqa^{2} \end{displaymath}

in an additive model. Note that if p=1, then q=0 and $\sigma^{2}_{G}$ is zero.

3.3 Partitions of Genotypic Variance

The genotypic variance can be partitioned into an additive genetic variance and a dominance genetic variance. To determine the additive genetic variance, the breeding values of the genotypes must be determined. Consider the effects that are transmitted from a parent to its offspring (i.e. either one allele or the other are transmitted). If an A1 allele is transmitted, then p offspring will have genotype A1A1with genotypic value of a, and q offspring will have genotype A1A2 with genotypic value of d. The total effect of the A1 allele is then

pa + qd.

The average effect of the A1 allele is the total effect deviated from the population mean. Let

\begin{eqnarray*}\alpha_{1} & = & pa + qd - [a(p-q)+2pqd] \\
& = & q[a+d(q-p)]
\end{eqnarray*}


Similarly, if an A2 allele is transmitted, then p offspring will have genotype A1A2 with value d, and q offspring will have genotype A2A2 with value -a, for a total effect of A2 allele of pd-qa. The average effect of the A2 allele is

\begin{eqnarray*}\alpha_{2} & = & pd-qa - [a(p-q)+2pqd] \\
& = & -p [a + d(q-p)]
\end{eqnarray*}


The average effect of substituting A1 for A2 is

\begin{eqnarray*}\alpha & = & \alpha_{1} - \alpha_{2} \\
& = & q [a + d(q-p)] - (-p) [a + d(q-p)] \\
& = & [ a + d(q-p)]
\end{eqnarray*}


Note that $\alpha$ depends on p, q, a, and d.

The breeding values (BV) can be summarized as follows:

Genotype Frequency BV  
A1A1 p2 $\alpha_{1}+\alpha_{1}$ $= 2q \alpha$
A1A2 2pq $\alpha_{1}+\alpha_{2}$ $= (q-p)\alpha$
A2A2 q2 $\alpha_{2}+\alpha_{2}$ $= -2p \alpha$
Note that the BV sum to zero. The BV represent only the additive genetic portion of the genotypic value. The additive genetic variance is obtained by squaring the BV and weighting by the genotypic frequencies.

\begin{eqnarray*}\sigma^{2}_{A} & = & p^{2}(2q\alpha)^{2} +2pq((q-p)\alpha)^{2}
...
...pha)^{2} \\
& = & 2pq \alpha^{2} \\
& = & 2pq [a+(q-p)d]^{2}
\end{eqnarray*}


The obtain the dominance genetic variance, the dominance deviations for each genotype need to be calculated. If

\begin{displaymath}G = \mu_{G} + A + D, \end{displaymath}

then

\begin{displaymath}D = G - \mu_{G} - A. \end{displaymath}

For genotype A1A1,

\begin{eqnarray*}G & = & a \\
\mu_{G} & = & a(p-q)+2pqd \\
A & = & 2q\alpha \\...
... (q-p)d -pd - \alpha ] \\
& = & 2q [ -qd ] \\
& = & -2q^{2}d
\end{eqnarray*}


Similarly, for genotype A1A2,

\begin{eqnarray*}D & = & d - [a(p-q)+2pqd] - (q-p)\alpha \\
& = & 2pqd
\end{eqnarray*}


and for A2A2,

\begin{eqnarray*}D & = & -a - [a(p-q)+2pqd] + 2p\alpha \\
& = & -2p^{2}d
\end{eqnarray*}


Summarizing,
Genotype Frequency Dominance
A1A1 p2 -2q2d
A1A2 2pq 2pqd
A2A2 q2 -2p2d
Then the dominance genetic variance is

\begin{eqnarray*}\sigma^{2}_{D} & = & p^{2}(-2q^{2}d)^{2} + 2pq(2pqd)^{2}
+ q^{...
...\
& = & (2pqd)^{2} [ q^{2} +2pq + p^{2}] \\
& = & (2pqd)^{2}
\end{eqnarray*}


Note that the dominance genetic variance does not involve a.

Recall that

\begin{displaymath}\sigma^{2}_{G} = 2pq\alpha^{2} + (2pqd)^{2}, \end{displaymath}

and now it has been shown that

\begin{displaymath}\sigma^{2}_{A} = 2pq \alpha^{2}, \end{displaymath}

and

\begin{displaymath}\sigma^{2}_{D} = (2pqd)^{2}, \end{displaymath}

so that

\begin{displaymath}\sigma^{2}_{G} = \sigma^{2}_{A} + \sigma^{2}_{D}. \end{displaymath}

This result implies that there is a zero covariance between the additive and dominance deviations. This can be shown by calculating the covariance between additive and dominance deviations,

\begin{eqnarray*}Cov(A,D) & = & p^{2}(2q\alpha)(-2q^{2}d) \\
& & + 2pq(q-p)\alpha(2pqd) \\
& & + q^{2}(-2p\alpha)(-2p^{2}d) \\
& = & 0
\end{eqnarray*}


The covariance is zero under a large, random mating population without selection.

Gametic Relationships Between Individuals at a Single Locus

Generally, the additive genetic relationship between two individuals is the proportion of genes that they share in common. However, if only a single locus is considered, the possibility exists that none of the alleles are shared between two individuals, or that both of them are shared. The average over all genes in the genome should equal the additive genetic relationship.

Now that markers and eventually QTLs will be used, the need to compute relationships for a single locus is present, and the gametic relationships are the straightforward approach to this problem.

Consider two individuals X and Y, whose genotypes are A1A1 and A1A2, respectively. They have an offpsring, Z, that has genotype A1A2. Clearly, the A2 allele in Zhas come from parent Y, and the A1 allele in Z could be either A1 allele of parent X with equal probability. This can be illustrated in the following table. The diagonals of this table are always equal to unity. Let the rows and columns of this table be numbered from 1 to 6.

Gametic Relationship Table

    X Y Z
            X Y
    A1 A1 A1 A2 A1 A2
  A1 1 0 0 0 a b
X              
  A1 0 1 0 0 c d
  A1 0 0 1 0 e f
Y              
  A2 0 0 0 1 g h
  A1 a c e g 1 i
Z              
  A2 b d f h i 1

Situations 1 and 2

In the above example, the parent source of both alleles in animal Z are known. That is, we know which parent provided the A1allele and we know which parent provided the A2 allele with absolute certainty ( P = 1.0 ). This is situation 1. Thus, the elements in column 6 should be identical to those of column 4.

In the case of the A1 allele, parent X has two such alleles and it could be either of these alleles with equal probability of .5. This is situation 2. Thus, the elements in column 5 are an average of columns 1 and 2 (the possible sources of A1). Therefore, the missing values are computed as follows:


\begin{eqnarray*}a & = & .5 [ (1,1)+(1,2)], \\
c & = & .5 [ (2,1)+(2,2)], \\
e...
... & = & 1 [ (5,4) ], \ \ \mbox{or} \\
& = & .5 [ (6,1)+(6,2)].
\end{eqnarray*}


The end result is

Gametic Relationship Table

    X Y Z
            X Y
    A1 A1 A1 A2 A1 A2
  A1 1 0 0 0 .5 0
X              
  A1 0 1 0 0 .5 0
  A1 0 0 1 0 0 0
Y              
  A2 0 0 0 1 0 1
  A1 .5 .5 0 0 1 0
Z              
  A2 0 0 0 1 0 1

Situation 3

Suppose now that X and Y are both heterozygous A1A2, and their offspring is also heterozygous.

Gametic Relationship Table

    X Y Z
            X,Y X,Y
    A1 A2 A1 A2 A1 A2
  A1 1 0 0 0 a b
X              
  A2 0 1 0 0 c d
  A1 0 0 1 0 e f
Y              
  A2 0 0 0 1 g h
  A1 a c e g 1 i
Z              
  A2 b d f h i 1

Now the parental source of the alleles can not be determined with absolute certainty. The A1 allele could be from either X or Y with equal probability (.5), and the A2 allele could be from either parent with equal probability (.5). Column 5 in the table above is therefore an average of columns 1 and 3 (the sources for the A1 allele) and column 6 is the average of columns 2 and 4 (the sources for the A2 allele).


\begin{eqnarray*}a & = & .5 [ (1,1) + (1,3)], \\
c & = & .5 [ (2,1) + (2,3)], \...
... [ (5,2) + (5,4)], \ \ \mbox{or} \\
& = & .5 [ (6,1) + (6,3)].
\end{eqnarray*}


The completed table is shown below.

Gametic Relationship Table

    X Y Z
            X,Y X,Y
    A1 A2 A1 A2 A1 A2
  A1 1 0 0 0 .5 0
X              
  A2 0 1 0 0 0 .5
  A1 0 0 1 0 .5 0
Y              
  A2 0 0 0 1 0 .5
  A1 .5 0 .5 0 1 0
Z              
  A2 0 .5 0 .5 0 1

There can be various combinations of these situations, but just keep clear for a particular allele the possible sources of that allele from the two parents. For example, suppose that X and Y were both homozygous A1A1, which would make Z also homozygous. Each A1 allele in Z would have a .25 probability of being one of the four A1alleles in the parents.

Example Problem

Below are ten animals, their genotype at the A-locus, and their parents (with their genotypes).

Animal Genotype Sire Genotype Dam Genotype
X 12 -   -  
Y 12 -   -  
W 11 X 12 Y 12
U 12 X 12 Y 12
V 22 X 12 Y 12
T 22 U 12 V 22
S 12 U 12 V 22
R 12 U 12 V 22
P 22 U 12 V 22
Q 12 S 12 T 22

Construct the gametic relationship table for this locus and these individuals. Which animals are inbred? Which pairs of animals have a non-zero dominance relationship?

Suppose we have a cell (between two animals) as follows:

    X
    A1 A2
  A1 e b
Y      
  A2 c f
Additive relationships are computed by summing the four numbers within a cell and multiplying by .5. That is,

aXY = .5 (e + b + c + f).

Dominance relationships are computed by

\begin{displaymath}d_{XY} = (e \times f)+(b \times c). \end{displaymath}


next up previous

This LaTeX document is available as postscript or asAdobe PDF.

Larry Schaeffer
2001-10-22