next up previous


This LaTeX document is available as postscript or asAdobe PDF.

Inbreeding
L. R. Schaeffer, March 1999

Any positive definite, symmetric matrix can be written as the product of a matrix times its transpose. Henderson noticed that

\begin{displaymath}{\bf A} \mbox{ = } {\bf LL'}, \end{displaymath}

where ${\bf L}$ is a lower triangular matrix which can be obtained by a Cholesky decomposition of ${\bf A}$. In general, the diagonal elements of ${\bf L}$ are not necessarily equal to 1, so Henderson further wrote

\begin{displaymath}{\bf L} \mbox{ = } {\bf TB}^{.5}, \end{displaymath}

where ${\bf B}^{.5}$ is a diagonal matrix containing the diagonal elements of ${\bf L}$, and ${\bf T}$ is a lower triangular matrix whose diagonal elements are all equal to one. Then

\begin{eqnarray*}{\bf A} & = & {\bf TB}^{.5}{\bf B}^{.5}{\bf T'} \\
& = & {\bf TBT'}.
\end{eqnarray*}


Henderson (1976) showed that in situations where animals are not inbred, the diagonal elements of ${\bf B}$ have only three possible values, i.e. 0.5 when both parents of an animal are known, 0.75 when one of the parents is unknown, and 1 when both parents are unknown.

When animals are inbred, then the elements of ${\bf B}$ can have many different values. Quaas (1976) noted that the diagonals of ${\bf B}$, say Bii were

\begin{displaymath}B_{ii} \mbox{ = } ( .5 - .25(F_{s}+F_{d})), \end{displaymath}

where Fs and Fd are the inbreeding coefficients of the sire and dam, respectively, of the ith individual. If one parent is unknown, then

\begin{displaymath}B_{ii} \mbox{ = } ( .75 - .25 F_{p}), \end{displaymath}

where Fp is the inbreeding coefficient of the parent that is known. Lastly, if neither parent is known then Bii=1.

One of the more efficient algorithms for calculating inbreeding coefficients is that of Meuwissen and Luo (1992). Animals should be in chronological order, as for the Tabular Method. To illustrate consider the example given in the Tabular Method section. The corresponding elements of ${\bf B}$ for animals A to F would be

\begin{displaymath}\left( \begin{array}{cccccc} 1 & 1 & 1 & .5 & .5 & .5
\end{array} \right). \end{displaymath}

Now consider a new animal, G, with parents F and B. The first step is to set up three vectors, where the first vector contains the identification of animals in the pedigree of animal G, the second vector will contain the elements of the matrix ${\bf T}$, and the third vector will contain the corresponding Bii for each animal.

Step 1
Add animal G to the ID vector, a 1 to the T-vector, and

\begin{displaymath}B_{GG} \mbox{ = } .5 - .25 (.125+0) = 15/32 \end{displaymath}

to the B-vector, giving
ID vector T-vector B-vector
G 1 15/32
Step 2
Add the parents of G to the ID vector, and because they are one generation back, add .5 to the T-vector for each parent. In the D-vector, animal B has BBB=1, and animal F has BFF=.5. The vectors now appear as
ID vector T-vector B-vector
G 1 15/32
F .5 .5
B .5 1
Step 3
Add the parents of F and B to the ID vector, add .25 (.5 times the T-vector value of the individual (F or B)) to the T-vector, and their corresponding Bii values. The parents of F were E and D, and the parents of B were unknown. These give
ID vector T-vector B-vector
G 1 15/32
F .5 .5
B .5 1
E .25 .5
D .25 .5
Step 4
Add the parents of E and D to the ID vector, .125 to the T-vector, and the appropriate values to the B-vector. The parents of E were A and C, and the parents of D were A and B.
ID vector T-vector B-vector
G 1 15/32
F .5 .5
B .5 1
E .25 .5
D .25 .5
A .125 1
C .125 1
A .125 1
B .125 1
The vectors are complete because the parents of A, B, and C are unknown and no further ancestors can be added to the pedigree of animal G.
Step 5
Accumulate the values in the T-vector for each animal ID. For example, animals A and B appear twice in the ID vector. Accumulating their T-vector values gives
ID vector T-vector B-vector
G 1 15/32
F .5 .5
B .5+.125=.625 1
E .25 .5
D .25 .5
A .125+.125=.25 1
C .125 1
It is important not to do the accumulation until all pathways in the pedigree have been processed, otherwise a coefficient may be missed and the wrong inbreeding coefficient could be calculated.
Step 6
The diagonal of the ${\bf A}$ matrix for animal G is calculated as the sum of the squares of the values in the T-vector times the corresponding value in the B-vector, hence

\begin{eqnarray*}a_{GG} & = & (1)^{2}(15/32) + (.5)^{2}(.5) + (.625)^{2} \\
& ...
...(.25)^{2} + (.125)^{2} \\
& = & 72/64 \\
& = & 1 \frac{1}{8}
\end{eqnarray*}


The inbreeding coefficient for animal G is one-eighth.

The efficiency of this algorithm depends on the number of generations in each pedigree. If each pedigree is 10 generations deep, then each of the vectors above could have over 1000 elements for a single animal. To obtain greater efficiency, animals with the same parents could be processed together, and each would receive the same inbreeding coefficient, so that it only needs to be calculated once. For situations with only 3 or 4 generation pedigrees, this algorithm would be very fast and the amount of computer memory required would be low compared to other algorithms (Golden et al. (1991), Tier(1990)). Additive Matrix

Consider the pedigrees in the table below:

Animal Sire Dam
1 - -
2 - -
3 1 -
4 1 2
5 3 4
6 1 4
7 5 6

Animals with unknown parents may or may not be selected individuals, but their parents (which are unknown) are assumed to belong to a em base generation of animals, i.e. a large, random mating population of unrelated individuals. Animal 3 has one parent known and one parent unknown. Animal 3 and its sire do not belong to the base generation, but its unknown dam is assumed to belong to the base generation. If these assumptions are not valid, then the concept of phantom parent groups needs to be utilized (covered later in these notes). Using the tabular method, the ${\bf A}$ matrix for the above seven animals is given below.

  -,- -,- 1,- 1,2 3,4 1,4 5,6
  1 2 3 4 5 6 7
1 1 0 .5 .5 .5 .75 .625
2 0 1 0 .5 .25 .25 .25
3 .5 0 1 .25 .625 .375 .5
4 .5 .5 .25 1 .625 .75 .6875
5 .5 .25 .625 .625 1.125 .5625 .84375
6 .75 .25 .375 .75 .5625 1.25 .90625
7 .625 .25 .5 .6875 .84375 .90625 1.28125

Now partition ${\bf A}$ into ${\bf T}$ and ${\bf B}$ giving:

Sire Dam Animal 1 2 3 4 5 6 7 ${\bf B}$
    1 1 0 0 0 0 0 0 1.0
    2 0 1 0 0 0 0 0 1.0
1   3 .5 0 1 0 0 0 0 .75
1 2 4 .5 .5 0 1 0 0 0 .50
3 4 5 .5 .25 .5 .5 1 0 0 .50
1 4 6 .75 .25 0 .5 0 1 0 .50
5 6 7 .625 .25 .25 .5 .5 .5 1 .40625
Note that the rows of ${\bf T}$ account for the direct relationships, that is, the direct transfer of genes from parents to offspring.

The breeding value of any animal with both parents known can be written as a linear function of the breeding values of animals with unknown parents in the list of pedigrees, plus Mendelian sampling effects. Take animal 6 as an example,

\begin{eqnarray*}a_{6} & = & .5 \ a_{1} \ + \ .5 \ a_{4} \ + \ m_{6} \\
& = & ...
...& = & .75 \ a_{1} \ + \ .25 \ a_{2} \ + \ .5 \ m_{4} \ + \ m_{6}
\end{eqnarray*}


where mi is the Mendelian sampling term associated with animal i. The variance of Mendelian sampling terms is not affected by non random mating, and if the unknown parents can be assumed to be members of the base population, then the variance of breeding values in the base population are also not affected by later selection. Therefore,

\begin{eqnarray*}Var(a_{6}) & = & \frac{9}{16}Var(a_{1}) + \frac{1}{16}Var(a_{2}...
...
+ \frac{1}{2})\sigma^{2}_{a} \\
& = & 1.25 \ \sigma^{2}_{a}.
\end{eqnarray*}


The 1.25 is due to inbreeding of animal 6, but please note that there is no effect of non random mating (i.e. joint disequilibrium) on the variance of animal 6's breeding value.

Take another example using animal 3:

\begin{displaymath}a_{3} = .5 \ a_{1} \ + \ .5 \ a_{-} \ + \ m_{3}. \end{displaymath}

The one unknown parent seems to cause a problem. If the unknown parent is from the base population and is independent of the other parent (animal 1) then the variance of a3 is not affected. On the other hand, if the unknown parent is a progeny from other previous non random matings from animals in the base population, then there could be a non zero covariance between a1 and a-. In most pedigree files there are often many recently born animals with one or both parents unknown, and often there has been many years of non random matings prior to the birth of this animal, then it is invalid to assume that the unknown parents are from the base population.

The following conclusion can be made and it is important to keep in mind the provisions under which it is true. The additive genetic numerator relationship matrix properly accounts for the variances of breeding values under non random mating provided that both parents for all animals are known, except for parents of animals originating from the base population.

The Inverse of Relationship Matrix

The inverse of the additive genetic numerator relationship matrix is needed for the prediction of breeding values in some methods that will be studied. Henderson (1975) made an important (landmark) discovery that allows the inverse of this matrix to be calculated very quickly from a list of animals and their parents. This discovery allowed animal models to be applied to large populations of animals for genetic evaluation.

Recall that the relationship matrix could be written as the product of triangular matrices and a diagonal matrix as

\begin{displaymath}{\bf A} = {\bf TBT}', \end{displaymath}

and then the inverse is

\begin{displaymath}{\bf A}^{-1} = {\bf T}'^{-1}{\bf B}^{-1}{\bf T}^{-1}. \end{displaymath}

Henderson (1976) showed that the diagonals of ${\bf T}^{-1}$ were all equal to one, and that in the ith row the elements were zero except for the positions corresponding to the parents of the ith animal, which were equal to -.5. From Quaas (1976), the diagonal elements of ${\bf B}$ are

\begin{displaymath}b_{ii} = (.5 - .25 ( F_{s} \ + \ F_{d})) \end{displaymath}

when both parents are known, or

\begin{displaymath}b_{ii} = (.75 - .25 \ F_{p}) \end{displaymath}

when only one parent is known, or bii=1 when both parents are unknown, and where Fs and Fd are the inbreeding coefficients of the sire and dam, respectively, and Fp is the inbreeding coefficient of the known parent. The algorithm of Meuwissen and Luo (1992) can be used to easily determine bii for all animals very readily.

If none of the animals are inbred, then bii can have only three possible values, i.e. .5, .75, or 1.0.

The inverse of the relationship matrix can be constructed very readily by a set of easy rules. Recall the previous example of seven animals with the following values for bii.

Animal Sire Dam bii bii-1
1 - - 1.00 1.00
2 - - 1.00 1.00
3 1 - 0.75 1.33333
4 1 2 0.50 2.00
5 3 4 0.50 2.00
6 1 4 0.50 2.00
7 5 6 0.40625 2.4615385
Let $\delta = b_{ii}^{-1}$, then if both parents are known the following constants are added to the appropriate elements in the inverse matrix:
  animal sire dam
animal $\delta$ $-.5 \delta$ $-.5 \delta$
sire $-.5 \delta$ $.25 \delta$ $.25 \delta$
dam $-.5 \delta$ $.25 \delta$ $.25 \delta$
If one parent is unknown, then delete the appropriate row and column from the rules above, and if both parents are unknown then just add $\delta$ to the animal's diagonal element of the inverse.

Each animal in the pedigree is processed one at a time, but in any order can be taken. Let's start with animal 6 as an example. The sire is animal 1 and the dam is animal 4. In this case, $\delta = 2.0$. Following the rules and starting with an inverse matrix that is empty, after handling animal 6 the inverse matrix should appear as follows:

  1 2 3 4 5 6 7
1 .5     .5   -1  
2              
3              
4 .5     .5   -1  
5              
6 -1     -1   2  
7              

After processing all of the animals, then the inverse of the relationship matrix for these seven animals should be as follows:

  1 2 3 4 5 6 7
1 2.33333 .5 -.66667 -.5 0 -1 0
2 .5 1.5 0 -1.00000 0 0 0
3 -.66667 0 1.83333 .5 -1 0 0
4 -.5 -1 .5 3.0000 -1 -1 0
5 0 0 -1 -1 2.61538 .61538 -1.23077
6 -1 0 0 -1 .61538 2.61538 -1.23077
7 0 0 0 0 -1.23077 -1.23077 2.46154

The reader should verify that the product of the above matrix with the original relationship matrix, ${\bf A}$, gives an identity matrix.

Phantom Parents

In situations where unknown parents could have resulted from non random matings or prior selection of some type, then it is not appropriate to assume that they belong to the base population. However, when their identity is unknown assumptions are still needed. Westell (1984) and Robinson (1986) assigned phantom parents in place of real parents, or today one could think of them as virtual parents. Each phantom parent is assumed to have only one progeny and all phantom parents are assumed to be unrelated to all other real or phantom animals.

The next assumption was that phantom parents of animals that were born in a particular time period probably underwent the same degree of selection intensity, but perhaps differently for phantom sires versus phantom dams. Thus, the phantom parents were assigned to phantom parent genetic groups depending on whether they were sires or dams and on the year of birth of their (real and only) progeny. In application, genetic groups may also be formed depending on breed composition and/or regions within a country. The basis for further groups depends on the belief in the existence of different selection intensities involved in arriving at those particular phantom parents.

Phantom parent genetic groups are best handled by considering them as additional animals in the pedigree. Then the inverse of the relationship matrix can be constructed using the same rules as before. These results are due to Quaas (1984). To illustrate, use the same seven animals as before. Assign the unknown sires of animals 1 and 2 to genetic group 1 (G1) and the unknown dams to genetic group 2 (G2). Assign the unknown dam of animal 3 to genetic group 3 (G3). The resulting matrix will be of order 10 by 10 :

\begin{displaymath}{\bf A}^{-1}_{*} = \left( \begin{array}{cc}
{\bf A}^{rr} & {\bf A}^{rp} \\
{\bf A}^{pr} & {\bf A}^{pp} \end{array} \right), \end{displaymath}

where ${\bf A}^{rr}$ is a 7 by 7 matrix corresponding to the elements among the real animals; ${\bf A}^{rp}$ and its transpose are of order 7 by 3 and 3 by 7, respectively, corresponding to elements of the inverse between real animals and genetic groups, and ${\bf A}^{pp}$ is of order 3 by 3 and contains inverse elements corresponding to genetic groups. ${\bf A}^{rr}$ will be exactly the same as ${\bf A}^{-1}$ given in the previous section. The other matrices are

\begin{eqnarray*}{\bf A}^{rp} & = & \left( \begin{array}{rrr}
-.5 & -.5 & .3333...
...5 & .5 & 0 \\ .5 & .5 & 0 \\ 0 & 0 & .33333
\end{array} \right)
\end{eqnarray*}


In this formulation, genetic groups (according to Quaas (1984)) are additional fixed factors and there is a rank dependency between genetic groups 1 and 2. This singularity can cause problems in deriving solutions for genetic evaluation. The dependency can be removed by adding an identity matrix to ${\bf A}^{pp}$. When genetic groups have many animals assigned to them, then adding the identity matrix to ${\bf A}^{pp}$ does not result in any significant re-ranking of animals in genetic evaluation and aids in getting faster convergence of the iterative system of equations.

There is another potential problem with phantom parent genetic groups, and that is in the variance of breeding values of all animals. Take animal 4 from the example, and represent its breeding value as

\begin{eqnarray*}a_{4} & = & .5 \ a_{1} \ + \ .5 \ a_{2} \ + \ m_{4} \\
& = & ...
....5 \ G1 \ + \ .5 \ G2 \ + \ .5 m_{1} \ + \ .5 m_{2}
\ + \ m_{4}
\end{eqnarray*}


If G1 and G2 are fixed quantities, then they have no variance, and

\begin{displaymath}Var(a_{4}) = \sigma^{2}_{a}. \end{displaymath}

This assumes that m1 and m2 are Mendelian sampling effects, but they will also include errors involved in the estimation of G1 and G2 effects, so that

\begin{displaymath}Var(m_{1}) = .5 \ \sigma^{2}_{a} + c \ \sigma^{2}_{e} . \end{displaymath}

In some cases the variance of an animal's breeding value could actually be less than $\sigma^{2}_{a}$. These issues will be discussed further in topics on genetic evaluation.

Phantom parent genetic groups are used in many genetic evaluation systems today. The phantom parents that are assigned to a genetic group are assumed to be the outcome of non random mating and similar selection differentials on their parents. This assumption, while limiting, is not as severe as assuming that all phantom parents belong to one base population. The effects on variances of breeding values needs to be explored further.

Other Genetic Matrices

Dominance Genetic

Let ${\bf D}$ represent the matrix of dominance genetic relationships among animals. ${\bf D}$ may be constructed from the gametic relationship matrix given earlier. This may or may not be a simple problem. Usually, researchers have only looked at dominance relationships within herds of dairy cattle, for example, and have ignored dominance relationships between herds. Other researchers have changed the model to include a sire-dam interaction and a sire by maternal grandsire interaction (J. Dairy Sci. 74:557), but the procedure only works for noninbred populations.

The problem is that while it may be possible to construct a general ${\bf D}$with some effort, there has not been any discovery of a simple method to obtain ${\bf D}^{-1}$ as easily as obtaining ${\bf A}^{-1}$. Jamrozik and Schaeffer looked at possibly using the full gametic relationship matrix to account for both additive and dominance relationships simultaneously, but further work is needed.

Additive by Additive Genetic

VanRaden and Hoeschele (1991) published a method to invert the epistatic matrix for additive by additive genetic effects (J. Dairy Sci. 74:570). The additive by additive genetic relationship matrix is given by

\begin{displaymath}{\bf A} \char93  {\bf A} = \{ a^{2}_{ij} \}. \end{displaymath}

The reader is directed to the VanRaden and Hoeschele (1991) paper for details on the inverse of this matrix.

Other Epistatic Effects

Except for possibly small experimental situations, there have not been any attempts to go beyond dominance genetic and additive by additive genetic effects. Part of the reason is that estimates of dominance genetic variances have been small in many cases (but not all), and the variances of higher order epistatic effects are expected to be even smaller, making the complicated process of estimating them less appealing. Also, the assumptions needed (in the VanRaden and Hoeschele papers, for example) require a noninbred population, and with real field data there has often been many years of selection and therefore, inbreeding cannot be ignored. Finally, with selection and inbreeding there is joint disequilibrium which creates nonzero covariances between different types of genetic effects where previously these covariances have been assumed to be zero (i.e. joint equilibrium was assumed). Research into this area of study gets very complicated very quickly.


next up previous

This LaTeX document is available as postscript or asAdobe PDF.

Larry Schaeffer
1999-02-26