next up previous


This LaTeX document is available as postscript or asAdobe PDF.

Effects of Non-Random Mating
L. R. Schaeffer, April 1999
Updated April 2000

The models that have been discussed in this course have assumed an infinitesimal animal model, and that animals are mating randomly. What are the effects when these assumptions are not true.

There are many forms of selection that can nullify or render meaningless any statistical analysis if it is ignored. In this set of notes, only selection of animals to produce the next generation, or non-random mating, will be considered.

Non-random mating produces the following consequences.

1.
Causes changes to gene frequencies at all loci.
2.
Changes in gene frequencies cause a change in genetic variance. Recall from quantitative genetics that

\begin{displaymath}\sigma^{2}_{G} = 2pq[a+d(q-p)]^{2}+[2pqd]^{2}. \end{displaymath}

3.
In finite populations, non-random mating causes a reduction in the effective population size, which subsequently causes an increase in levels of inbreeding.
4.
Joint equilibrium becomes joint disequilibrium, and therefore, non-zero covariances between additive and dominance genetic effects are created.
5.
If the pedigrees of animals are not complete nor traceable to the base generation, then non-random mating causes genetic evaluations by BLUP to be biased, and causes estimates of genetic variances, by any method, to be biased.
6.
Because the genetic variance decreases due to joint disequilibrium and inbreeding, response to selection is generally lower than expected by selection index over several generations.

Effect on Inbreeding

Recall that under random mating and finite population size that the change in inbreeding per generation was defined as

\begin{displaymath}\Delta F = \frac{1}{2N_{e}}, \end{displaymath}

where Ne is the effective population size, or effective number of breeding individuals. Wright (1940) and Crow (1954) showed that if the average family size, was v, then

Ne = 4N/(v+2),

where N is the number of mating males and females. An idealized population that is able to maintain itself has v=2, thus, Ne=N. Belonsky and Kennedy (1985) simulated a population of 100 females and 5 males per year over ten generations with a single record per animal and discrete generations. Parents were either randomly selected, selected on the basis of phenotype, or selected on the basis of BLUP EBVs. The change in inbreeding under the different selection criteria are given in the table below.

Change in Inbreeding
Selection Heritability
Criteria .1 .3 .6
Random .152 .152 .152
Phenotypic .169 .206 .215
BLUP EBV .288 .299 .274

Under random selection of mates, the rate of inbreeding over 10 years is the same for all heritabilities, and is lower than the other forms of mate selection. Inbreeding accumulates due simply to the small population size. Phenotypic selection of mates gives higher rates of inbreeding which increase with increasing heritabilities. Selection of mates using BLUP EBV created the highest rates of inbreeding. Note that there was a decline in inbreeding rate at heritability equal to .6. Why? BLUP EBVs make use of information from relatives. At low heritabilities an animal's EBV is more heavily influenced by the parent average, and many progeny are needed to overcome this influence. Selection on the parent average results in more related animals being selected together, and hence more inbreeding. At the higher heritability of .6, the animal's own record has more influence relative to the parent average, resulting in fewer half-sibs and full-sibs being selected as parents of the next generation, and consequently lower inbreeding levels.

Effect on Genetic Evaluation

The effects of non-random mating on genetic evaluation are minimal if

If the above conditions hold, then application of BLUP does not lead to bias in EBVs, but selection increases the variance of prediction error over populations that are randomly mating. However, in animal breeding, the practical situation is that complete pedigrees seldom exist. Thus, bias can creep into estimates of fixed effects and EBVs.

Recall that HMME for a simple animal model are

\begin{displaymath}\left( \begin{array}{ccc}
{\bf X}'{\bf R}^{-1}{\bf X} & {\bf...
...\ {\bf Z}'{\bf R}^{-1}{\bf y} \\
{\bf0} \end{array} \right), \end{displaymath}

where $k_{a} = \sigma^{-2}_{a}$. A generalized inverse of the coefficient matrix can be represented as

\begin{displaymath}\left( \begin{array}{ccc}
{\bf C}_{xx} & {\bf C}_{xn} & {\bf...
...\bf C}_{ox} & {\bf C}_{on} & {\bf C}_{oo} \end{array} \right). \end{displaymath}

Then remember that

\begin{displaymath}Var \left( \begin{array}{c} \hat{\bf a}_{n} - {\bf a}_{n} \\ ...
... C}_{no} \\
{\bf C}_{on} & {\bf C}_{oo} \end{array} \right), \end{displaymath}

and that

\begin{eqnarray*}Cov(\hat{\bf b},\hat{\bf a}_{n}) & = & {\bf0}, \\
Cov(\hat{\bf b},{\bf a}_{n}) & = & -{\bf C}_{xn}.
\end{eqnarray*}


These results indicate that HMME forces the covariance between estimates of the fixed effects and estimates of additive genetic effects to be null. However, there is a non-zero covariance between estimates of the fixed effects and the true additive genetic values of animals. Hence, any problem with the true additive genetic values, and there will be problems with estimates of fixed effects.

Consider the equation for $\hat{\bf b}$,

\begin{displaymath}\hat{\bf b} = ({\bf X}'{\bf R}^{-1}{\bf X})^{-}
({\bf X}'{\bf R}^{-1}{\bf y} - {\bf X}'{\bf R}^{-1}{\bf Z}
\hat{\bf a}_{n}), \end{displaymath}

and the expectation of this vector is

\begin{displaymath}E(\hat{\bf b}) = ({\bf X}'{\bf R}^{-1}{\bf X})^{-}
({\bf X}'...
...1}{\bf Xb} - {\bf X}'{\bf R}^{-1}{\bf Z}
E(\hat{\bf a}_{n})). \end{displaymath}

The fixed effects solution vector contains a function of the expectation of the additive genetic solution vector. Normally, because the BLUP methodology requires

\begin{displaymath}E(\hat{\bf a}_{n}) = E({\bf a}_{n}) = {\bf0}, \end{displaymath}

then the fixed effects solution vector is also unbiased. Due to selection, however,

\begin{displaymath}E({\bf a}_{n}) \neq {\bf0}, \end{displaymath}

and therefore, the expectation of the fixed effects solution vector contains a function of $E({\bf a}_{n})$ and is consequently biased. If $\hat{\bf b}$ is biased, then this will cause a bias in $\hat{\bf a}$.

Alternative Methods

Re-state the model (in general terms) as

\begin{displaymath}{\bf y} = {\bf Xb} + {\bf Zu} + {\bf e}, \end{displaymath}

where

\begin{displaymath}E \left( \begin{array}{c} {\bf u} \\ {\bf e} \end{array} \rig...
...\left( \begin{array}{c} {\bf u} \\ {\bf0} \end{array} \right), \end{displaymath}

and therefore,

\begin{displaymath}E({\bf y}) = {\bf Xb}+{\bf Zu}. \end{displaymath}

To simplify, assume that ${\bf G}=Var({\bf u})$ and ${\bf R}=Var({\bf e})$ and that neither is drastically affected by non-random mating.

The prediction problem is the same as before. Predict a function of ${\bf K'}{\bf b}+{\bf M}'{\bf u}$ by a linear function of the observation vector, ${\bf L}'{\bf y}$, such that

\begin{eqnarray*}E({\bf K'}{\bf b}+{\bf M}'{\bf u}) = E({\bf L}'{\bf y}),
\end{eqnarray*}


and such that $Var({\bf K'}{\bf b}+{\bf M}'{\bf u}-{\bf L}'{\bf y})$is minimized. Form the variance of prediction errors and add a LaGrange multiplier to ensure the unbiasedness condition, then differentiate with respect to the unknown ${\bf L}$ and the matrix of LaGrange multipliers and equate to zero. The solution gives the following equations.

\begin{displaymath}\left( \begin{array}{cc}
{\bf X}'{\bf V}^{-1}{\bf X} & {\bf X...
...1}{\bf y} \\ {\bf Z}'{\bf V}^{-1}{\bf y}
\end{array} \right). \end{displaymath}

Because ${\bf V} = {\bf ZGZ}'+{\bf R}$, and

\begin{displaymath}{\bf V}^{-1} = {\bf R}^{-1}-{\bf R}^{-1}{\bf ZTZ}'{\bf R}^{-1}, \end{displaymath}

for ${\bf T}=({\bf Z}'{\bf R}^{-1}{\bf Z}+{\bf G}^{-1})^{-1}$, then it can be shown that the following equations give the exact same solutions as the previous equations.

\begin{displaymath}\left( \begin{array}{cc}
{\bf X}'{\bf R}^{-1}{\bf X} & {\bf X...
...1}{\bf y} \\ {\bf Z}'{\bf R}^{-1}{\bf y}
\end{array} \right). \end{displaymath}

If a generalized inverse to the above coefficient matrix is represented as

\begin{displaymath}\left( \begin{array}{cc} {\bf C}_{xx} & {\bf C}_{xz} \\
{\bf C}_{zx} & {\bf C}_{zz} \end{array} \right), \end{displaymath}

then some properties of these equations are

\begin{eqnarray*}Cov(\hat{\bf b},{\bf u}) & = & {\bf0}, \\
E(\hat{\bf b}) & = &...
...1}{\bf Xb}, \\
Cov(\hat{\bf b},\hat{\bf u}) & = & {\bf C}_{xz}.
\end{eqnarray*}


Firstly, these results suggest that if non-random mating has occurred and has changed the expectation of the random vector, then an appropriate set of equations is the generalized least squares equations. However, we have seen earlier that such equations give a lower correlation with true values and large mean squared errors (when matings are at random). Secondly, the estimates of the fixed effects have null covariances with the true random effects, and the covariances between estimates of the fixed effects and estimates of the random effects are non-zero, which is opposite to the results from BLUP. With the least squares solutions, application of the regressed least squares procedure could be subsequently used to give EBVs.

There is another problem with these equations. If ${\bf u}={\bf a}$as in an animal model, then ${\bf Z}={\bf I}$, and the generalized least squares equations do not have a solution unless $\hat{\bf a}=
{\bf0}$. This is not very useful for genetic evaluation purposes.

An Alternative Model

Earlier in these notes, the Mendelian sampling variance was assumed to be unaffected by non-random mating, but could be reduced by the accumulation of inbreeding. The animal model equation is

\begin{displaymath}{\bf y} = {\bf Xb} + {\bf Za} + {\bf Zp} + {\bf e}. \end{displaymath}

The animal additive genetic effect can be written as

\begin{displaymath}{\bf a} = {\bf T}_{s}{\bf s} + {\bf T}_{d}{\bf d}
+ {\bf m}, \end{displaymath}

where ${\bf T}_{s}$ and ${\bf T}_{d}$ are matrices of ones and zeros, such that each row has an element that is 1 and all others are 0, and these indicate the sire and dam of the animal, respectively, and ${\bf m}$ is the Mendelian sampling effect. Due to non-random mating then,

\begin{displaymath}E({\bf a}) = {\bf T}_{s}{\bf s} + {\bf T}_{d}{\bf d}, \end{displaymath}

which is not a null vector, in general. Let

\begin{eqnarray*}{\bf Z}_{s} & = & {\bf ZT}_{s}, \\
{\bf Z}_{d} & = & {\bf ZT}_{d},
\end{eqnarray*}


then the model becomes

\begin{displaymath}{\bf y} = {\bf Xb} + {\bf Z}_{s}{\bf s} + {\bf Z}_{d}{\bf d}
+ {\bf Zm} + {\bf Zp} + {\bf e}. \end{displaymath}

Also,

\begin{eqnarray*}E({\bf y}) & = & {\bf Xb}+{\bf Z}_{s}{\bf s}+{\bf Z}_{d}{\bf d}...
...bf0}, \\
E({\bf p}) & = & {\bf0}, \\
E({\bf e}) & = & {\bf0},
\end{eqnarray*}


and

\begin{displaymath}Var \left( \begin{array}{c} {\bf m} \\ {\bf p} \\ {\bf e}
\e...
...
{\bf0} & {\bf0} & {\bf I}\sigma^{2}_{e} \end{array} \right), \end{displaymath}

where ${\bf B}$ is from

\begin{displaymath}{\bf A} = {\bf TBT}' . \end{displaymath}

If all animals were non inbred then all of the diagonals of ${\bf B}$would be equal to .5.

Note that the matrix ${\bf A}$ or its inverse are not necessary in this model, and that sires and dams (resulting from selection) are fixed effects in this model. The equations to solve are

\begin{displaymath}\left( \begin{array}{ccccc}
{\bf X}'{\bf X} & {\bf X}'{\bf Z...
...y} \\ {\bf Z}'{\bf y} \\
{\bf Z}'{\bf y} \end{array} \right). \end{displaymath}

Thus, for each animal with a record we need to know both parents, but we do not need to be able to follow pedigrees back to the base generation, except for calculating inbreeding coefficients of all animals.

This model was applied to the example data which was simulated in Lesson 9. There were 12 animals with records, four sires and four dams. The solutions for the sires and dams are shown below, after forcing their sum to be zero.

\begin{eqnarray*}\hat{\mu} & = & 49.558333, \\
\hat{s}_{1} & = & -7.079167, \\ ...
...5, \\
\hat{d}_{6} & = & 3.9375, \\
\hat{d}_{8} & = & -5.7625,
\end{eqnarray*}


The solutions for sires and dams represent estimated transmitting abilities and should be multiplied by 2 to give EBV. The estimates of the Mendelian sampling effects for animals 5 through 16 were

\begin{displaymath}\hat{\bf m} = \left( \begin{array}{r}
-1.254878 \\ -1.763415 ...
...5 \\ 0.5085366 \\
0.0219512 \\ -2.246341 \end{array} \right). \end{displaymath}

These solutions sum to zero, automatically, but the general property would be ${\bf 1}'{\bf B}^{-1}\hat{\bf m}=0$. In this example all of the diagonal elements of ${\bf B}^{-1}$ were equal to 2, but with inbred individuals this would not be the case.

EBV are created by summing sire and dam solutions with the Mendelian sampling estimates. The results for animals 5 through 16 were

\begin{displaymath}EBV = \left( \begin{array}{r}
-6.60 \\ 5.61 \\ 10.28 \\ -5.25...
...20.66 \\ -3.63 \\ 2.03 \\
-16.04 \\ 7.93 \end{array} \right). \end{displaymath}

The correlation of EBV with the true breeding values (shown in Lesson 9) was .8009 which is greater than the correlation obtained with BLUP (.7547). This model appears to be superior to the simple animal model (based on only one example). However, from this model it is possible to have two EBV for some animals. Animal 5, for example, had an EBV of -6.60 based on its own record plus its sire and dam solutions, and based on its progeny as a sire has an EBV of -14.16. Which evaluation is correct or better to use? If the progeny of animal 5 are random progeny, and if animal 5 has a chance to have many more progeny, then EBV=-14.16 is probably the better result to use. If the progeny are not a random sample of progeny, then the other EBV may be better. The correlation of sire and dam solutions with their true breeding values for animals 1 to 8 was .5573. For animals 1 to 4, the sire and dam solutions are the only information available for these animals because they did not have records.

The simple animal model combines information from data, from the parent average, and from progeny. The above model computes estimated breeding values based on progeny only, and based on parent average plus data. This difference in concept is due to the fact that parents are obtained by selection. Non-random mating is taken into account because the sire and dam of each animal with a record is included in the model. The solutions for sires and dams from this model are valid estimates of transmitting abilities provided that the progeny are a random sample of their progeny. The Mendelian sampling estimates for animals provides a means of estimating the additive genetic variance. Inbreeding should still be taken into account in the matrix ${\bf B}$. This model also avoids the problem of forming phantom parent groups for animals with parent information. If an animal with a record has an unknown dam (or sire), then a phantom dam (sire) can be created which has this animal as its only progeny. If both parents are unknown, then both a phantom sire and phantom dam need to be assumed, with this animal as their only progeny. Further study on this model is warranted.


next up previous

This LaTeX document is available as postscript or asAdobe PDF.

Larry Schaeffer
2000-04-03