next up previous


This LaTeX document is available as postscript or asAdobe PDF.

Repeated Records Animal Model
L. R. Schaeffer, March 1999

Often animals are observed more than once for a particular trait. Examples might be

Besides an animal's additive genetic value for a trait, there is a common permanent environmental (PE) effect which is a non-genetic effect that is common to all observations on the same animal. The model is written as

\begin{displaymath}{\bf y} = {\bf Xb}+\left( \begin{array}{cc} {\bf0} & {\bf Z}
...
...{0} \\
{\bf a}_{r} \end{array} \right) + {\bf Zp} + {\bf e}, \end{displaymath}

where

\begin{eqnarray*}{\bf b} & = & \mbox{ vector of fixed effects, } \\
\left( \beg...
...$ , and } \\
{\bf e} & = & \mbox{ vector of residual effects. }
\end{eqnarray*}


The matrices ${\bf X}$ and ${\bf Z}$ are design matrices that associate observations to particular levels of fixed effects and to additive genetic and PE effects. In a repeated records model, ${\bf Z}$ is not equal to an identity matrix. Also,

\begin{eqnarray*}{\bf a} \mid {\bf A},\sigma^{2}_{a} & \sim & N({\bf0},{\bf A}
...
...
{\bf0} \\ {\bf0} & {\bf I}\sigma^{2}_{p} \end{array} \right).
\end{eqnarray*}


Repeatability is a measure of the likelihood of an animal to make a similar record as a previous record, and is defined as a ratio of variances as

\begin{displaymath}r = \frac{\sigma^{2}_{a} + \sigma^{2}_{p}}
{\sigma^{2}_{a} + \sigma^{2}_{p} + \sigma^{2}_{e}}, \end{displaymath}

which is logically always going to be greater than or equal to heritability, because

\begin{displaymath}h^{2} = \frac{\sigma^{2}_{a}}
{\sigma^{2}_{a} + \sigma^{2}_{p} + \sigma^{2}_{e}}. \end{displaymath}

Simulating some records may help to understand this type of model. Let

\begin{eqnarray*}\sigma^{2}_{a} & = & 36 \\
\sigma^{2}_{p} & = & 16 \ \mbox{and} \\
\sigma^{2}_{e} & = & 48
\end{eqnarray*}


Thus,

\begin{displaymath}h^{2} = \frac{36}{36+16+48} = .36, \end{displaymath}

and

\begin{displaymath}r = \frac{36+16}{36+16+48} = .52. \end{displaymath}

We will generate records according to the following data structure.

Animal Sire Dam Year 1 Year 2 Year 3
7 1 2 $\surd$ $\surd$ $\surd$
8 3 4 $\surd$ $\surd$  
9 5 6 $\surd$   $\surd$
10 1 4   $\surd$ $\surd$
11 3 6     $\surd$
12 1 2   $\surd$  
None of the animals are inbred, so that the inverse of the additive genetic relationship matrix is


\begin{displaymath}{\bf A}^{-1} = \frac{1}{2} \left( \begin{array}{rrrrrrrrrrrr}...
... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 4
\end{array} \right). \end{displaymath}

The first step is to generate the additive genetic values of each animal, proceeding in chronological order.

Animal Parent Ave. RND (36*Bii).5 TBV
1 0.0 -2.5038 6.0 -15.0228
2 0.0 -.3490 6.0 -2.0940
3 0.0 -.2265 6.0 -1.3590
4 0.0 -.3938 6.0 -2.3628
5 0.0 1.4786 6.0 8.8716
6 0.0 2.3750 6.0 14.2500
7 -8.5584 -.8166 4.2426 -12.0229
8 -1.8609 1.0993 4.2426 2.8030
9 11.5608 1.5388 4.2426 18.0893
10 -8.6928 .0936 4.2426 -8.2957
11 6.4455 1.3805 4.2426 12.3024
12 -8.5584 -1.2754 4.2426 -13.9694

A PE effect can be generated for all 12 animals. Because relationships have no bearing on PE effects, then just generate a RND and multiply by $\sigma_{p} = 4$. These are shown in the table below. The records are generated according to the model equation,

\begin{eqnarray*}y_{ijk} & = & t_{i} + a_{j} + p_{j} + e_{ijk}, \\
& = & t_{i} + a_{j} + RND_{j}*\sigma_{p} + RND_{ijk}*\sigma_{e}.
\end{eqnarray*}


where ti is a year effect. Thus, we need three year means. Let t1=53.0, t2=59.0, and t3=65.0. Note that $\sigma_{p} = 4$, and $\sigma_{e}=6.9282$.

      Year 1 Year 2 Year 3
Animal TBV PE e1jk y1jk e2jk y2jk e3jk y3jk
1 -15.02 2.97            
2 -2.09 -9.04            
3 -1.36 4.44            
4 -2.36 -4.16            
5 8.87 -5.68            
6 14.25 6.85            
7 -12.02 1.38 -3.62 38.74 2.90 51.26 7.44 61.80
8 2.80 7.02 -15.03 47.79 2.80 71.62    
9 18.09 5.94 -6.21 70.82     6.98 96.01
10 -8.30 -5.03     10.07 55.74 -4.72 46.95
11 12.30 -1.06         9.63 85.87
12 -13.97 -2.69     3.47 45.81    

An interesting point to observe from the simulation is that the PE effects are present even for animals with only one record. Also, the same PE value is present in all records. Is this what happens in real life? Probably not. The PE effects probably accumulate as the animal ages. Something happens to the animal between the first and second records, for example, and this affects all subsequent records on that animal, but did not affect the first record.

Another assumption is that the records have a genetic correlation of one, which is true in the way that the above records were simulated because the same TBV was used for each record. However, in real life one might anticipate that the genes affecting the trait might change as the animal ages, and therefore, the genetic correlation between records could be less than unity. This is handled later when we look at multiple trait models.

HMME

We will concentrate mainly on Henderson's Mixed Model equations and BLUP for the remainder of the course. Let

\begin{displaymath}{\bf W} = \left[ \begin{array}{ccc}
{\bf X} & \left( \begin{...
...} & {\bf Z} \end{array} \right) & {\bf Z} \end{array} \right], \end{displaymath}

then

\begin{displaymath}{\bf W}'{\bf W} = \left( \begin{array}{cccc}
{\bf X}'{\bf X}...
...0} \\ {\bf Z}'{\bf y} \\ {\bf Z}'{\bf y}
\end{array} \right), \end{displaymath}

and

\begin{displaymath}\Sigma = \left( \begin{array}{cccc}
{\bf0} & {\bf0} & {\bf0}...
...{\bf0} & {\bf0} & {\bf0} & {\bf I}k_{p}
\end{array} \right), \end{displaymath}

where ${\bf A}^{ij}$ are corresponding elements of the inverse of the additive genetic relationship matrix (given earlier) partitioned according to animals without and with records. In this example, each submatrix is of order 6. Also,

\begin{displaymath}k_{a} = \sigma^{2}_{e}/\sigma^{2}_{a} = 1.33333, \ \ {\rm and}
k_{p} = \sigma^{2}_{e}/\sigma^{2}_{p} = 3. \end{displaymath}

HMME are therefore,

\begin{eqnarray*}({\bf W}'{\bf W} + \Sigma)\beta & = & {\bf W}'{\bf y} \\
({\bf...
...a}_{0} \\ \hat{\bf a}_{r} \\ \hat{\bf p}
\end{array} \right).
\end{eqnarray*}


Let a generalized inverse of the coefficient matrix be represented as

\begin{displaymath}({\bf W}'{\bf W}+\Sigma)^{-} = \left( \begin{array}{ccc}
- &...
...{\bf C}_{aa} & - \\ - & - & {\bf C}_{pp}
\end{array} \right), \end{displaymath}

where ${\bf C}_{aa}$ is of order 12 in this case, and ${\bf C}_{pp}$is of order 6.

The full HMME are too large to present here as a whole, so parts of the matrix are given as follows.

\begin{displaymath}{\bf X}'{\bf X} = \left( \begin{array}{rrr}
3 & 0 & 0 \\ 0 &...
...gin{array}{r}
157.35 \\ 224.43 \\ 290.63 \end{array} \right), \end{displaymath}


\begin{displaymath}{\bf X}'{\bf Z} = \left( \begin{array}{rrrrrr}
1 & 1 & 1 & 0...
...& 0 & 1 & 0 & 1 \\
1 & 0 & 1 & 1 & 1 & 0 \end{array} \right),\end{displaymath}


\begin{displaymath}{\bf Z}'{\bf Z} = diag( 3 \ \ 2 \ \ 2 \ \ 2 \ \ 1 \ \ 1 ), \end{displaymath}

and

\begin{displaymath}{\bf Z}'{\bf y} = \left( \begin{array}{r} 151.80 \\ 119.41 \\
166.83 \\ 102.69 \\ 85.87 \\ 45.81 \end{array} \right). \end{displaymath}

The solutions for animals are given in the table below. Solutions for year effects were

\begin{eqnarray*}\hat{t}_{1} & = & 49.85, \\
\hat{t}_{2} & = & 63.83, \\
\hat{t}_{3} & = & 71.94.
\end{eqnarray*}


Animal TBV PE $\hat{\bf a}$ $\hat{\bf p}$
1 -15.02 2.97 -7.9285  
2 -2.09 -9.04 -4.4339  
3 -1.36 4.44 2.8289  
4 -2.36 -4.16 -2.6365  
5 8.87 -5.68 5.0996  
6 14.25 6.85 7.0703  
7 -12.02 1.38 -8.0155 -1.6305
8 2.80 7.02 0.9544 0.7628
9 18.09 5.94 11.1845 4.5329
10 -8.30 -5.03 -8.7771 -3.1063
11 12.30 -1.06 6.9204 1.7518
12 -13.97 -2.69 -8.7807 -2.3107
The correlation between $\hat{\bf a}$ and TBV was .964, and between $\hat{\bf p}$ and true PE was .721.

Estimation of Variances

The necessary items for REML estimation of variances are

\begin{eqnarray*}\hat{\bf a}'{\bf A}^{-1}\hat{\bf a} & = & 279.38988 \\
\hat{\b...
...({\bf y}'{\bf y} - \hat{\beta}'{\bf W}'{\bf y}) & = & 919.31608
\end{eqnarray*}


then the residual variance estimator is

\begin{eqnarray*}\hat{\sigma}^{2}_{e} & = & ({\bf y}'{\bf y} -
\hat{\beta}'{\b...
...({\bf X})) \\
& = & 919.31608 / (11 -3) \\
& = & 114.91451.
\end{eqnarray*}


For the additive genetic component,

\begin{eqnarray*}\hat{\sigma}^{2}_{a} & = & ( \hat{\bf a}'{\bf A}^{-1}\hat{\bf a...
...72.18528 ) / 12 \\
& = & 1,151.57516 / 12 \\
& = & 95.96459.
\end{eqnarray*}


For the PE component,

\begin{eqnarray*}\hat{\sigma}^{2}_{p} & = & ( \hat{\bf p}'\hat{\bf p}
+ \hat{\s...
... \ 191.550778) / 6 \\
& = & 233.396178 / 6 \\
& = & 38.8994.
\end{eqnarray*}


The new variance ratios after this iteration of REML are

\begin{eqnarray*}k_{a} & = & 114.91451/95.96459 = 1.19746784 \\
k_{p} & = & 114.91451/38.899363 = 2.9541.
\end{eqnarray*}


Remember that REML is iterative, and the new variance ratios should be used to set up a new set of HMME, and all calculations repeated until convergence is reached.

Bayesian Estimation

The procedures are similar to that given in Lesson 11. Prior distributions must be assumed for all random variables. In addition to those already stated,

\begin{displaymath}\sigma^{2}_{a} \sim v_{a}S^{2}_{a} / \chi^{2}_{v_{a}}, \end{displaymath}


\begin{displaymath}\sigma^{2}_{p} \sim v_{p}S^{2}_{p} / \chi^{2}_{v_{p}}, \end{displaymath}

and

\begin{displaymath}\sigma^{2}_{e} \sim v_{e}S^{2}_{e} / \chi^{2}_{v_{e}}. \end{displaymath}

The joint posterior distribution is formed and then each of the conditional posterior distributions are derived. As before,

\begin{displaymath}\beta_{i} \mid \beta_{-i}, \sigma^{2}_{a}, \sigma^{2}_{p},
\...
... {\bf y} \sim N(\hat{\beta}_{i}, C^{-1}_{i,i}
\sigma^{2}_{e}) \end{displaymath}

which takes care of all of the levels of factors in the model and in the MME, and the conditional posterior distributions for the variances are

\begin{displaymath}\sigma^{2}_{a} \mid {\bf b},{\bf a}, {\bf p},
\sigma^{2}_{e}...
...im
\tilde{v}_{a} \tilde{S}^{2}_{a} \chi^{-2}_{\tilde{v}_{a}} \end{displaymath}

for $\tilde{v}_{a} = q + v_{a}$, and

\begin{displaymath}\sigma^{2}_{a} = ({\bf a}'{\bf A}^{-1}{\bf a}
+ v_{a}S^{2}_{a} ) / CHI(\tilde{v}_{a}). \end{displaymath}

For the PE variance,

\begin{displaymath}\sigma^{2}_{p} \mid {\bf b},{\bf a}, {\bf p},
\sigma^{2}_{e}...
...im
\tilde{v}_{p} \tilde{S}^{2}_{p} \chi^{-2}_{\tilde{v}_{p}} \end{displaymath}

for $\tilde{v}_{p} = n_{r} + v_{a}$, and

\begin{displaymath}\sigma^{2}_{p} = ({\bf p}'{\bf p}
+ v_{p}S^{2}_{p} ) / CHI(\tilde{v}_{p}). \end{displaymath}

For the residual variance,

\begin{displaymath}\sigma^{2}_{e} \mid {\bf b},{\bf a}, {\bf p},
\sigma^{2}_{a}...
...sim
\tilde{v}_{e} \tilde{S}^{2}_{e} \chi^{-2}_{\tilde{v}_{e}} \end{displaymath}

for $\tilde{v}_{e} = N + v_{e}$, and

\begin{displaymath}\sigma^{2}_{e} = ({\bf e}'{\bf e}
+ v_{e}S^{2}_{e} ) / CHI(\tilde{v}_{e}), \end{displaymath}

and ${\bf e} = {\bf y}-{\bf Xb}-{\bf Za}_{r} -{\bf Zp}$.

Computationally, the scheme is virtually the same as with the simple animal model except for the additional variance component for PE effects. There is no need to illustrate the calculations for this model.

Reliability

In practical animal breeding, industry personnel are very interested in Estimated Breeding Values (EBV), and the main question is about its reliability or accuracy. The variance-covariance matrix of prediction errors is given by ${\bf C}_{aa} \sigma^{2}_{e}$. Reliability, R, of the ith animal is defined as

\begin{displaymath}R = ( a_{ii}\sigma^{2}_{a} - c_{ii} \sigma^{2}_{e} ) / \sigma^{2}_{a},
\end{displaymath}

where aii is the diagonal of ${\bf A}$ for animal i, and cii is the diagonal of ${\bf C}_{aa}$ for animal i. Note that this is equivalent to

R = aii - cii ka.

This procedure does not work when phantom groups are included in the formation of ${\bf A}^{-1}$ because then it is possible that

cii ka > aii

for some situations. Below is a table of the reliabilities for the twelve animals in the example analysis.

Animal $\hat{\bf a}$ cii R
1 -7.9285 .6472 .1371
2 -4.4339 .6524 .1301
3 2.8289 .6476 .1365
4 -2.6365 .6398 .1469
5 5.0996 .6807 .0924
6 7.0703 .6653 .1129
7 -8.0155 .4740 .3680
8 .9544 .5012 .3317
9 11.1845 .4985 .3353
10 -8.7771 .5180 .3093
11 6.9204 .5656 .2459
12 -8.7807 .5592 .2544

Note that animals with records have a higher reliability than animals that have only progeny. Also, animal 7 had a higher reliability because it had three records while animals 11 and 12 had only one record. The Reliability reflects the years in which the records were made and the number of contemporaries within a year, and specifically who the contemporaries actually were. Reliability also includes the fact that animals were related.

In the analysis of very large numbers of animals, the calculation of ${\bf C}_{aa}$ is virtually impossible. Thus, animal breeders have devised many ways of approximating the diagonals of ${\bf C}_{aa}$. The following method is due to Schaeffer and Jansen (1997).

Step 1
Account for contemporary group size and PE effects. Take animal 7 as an example. Animal 7's first record in year 1 was made with two other contemporaries, calculate

\begin{displaymath}d_{7} = 1 - \frac{1*1}{3} = \frac{2}{3}. \end{displaymath}

Animal 7's second and third records were made with three contemporaries each, so accumulate the following:

\begin{displaymath}d_{7} = d_{7} + (1 - \frac{1}{4}) + (1 - \frac{1}{4}) \end{displaymath}

or

\begin{displaymath}d_{7} = \frac{2}{3} + \frac{3}{4} + \frac{3}{4} = 2.1666667. \end{displaymath}

Now adjust for the fact that we must also estimate PE effects for this animal. The adjustment is

\begin{eqnarray*}d_{7} & = & d_{7} - \frac{d_{7}*d_{7}}{d_{7}+k_{p}} \\
& = & 2.16667 - \frac{4.694444}{5.1666667} \\
& = & 1.25806452.
\end{eqnarray*}


Finally, add aiika, diagonal element from ${\bf A}^{-1}k_{a}$, to give

d7 = 1.25806 + 2 (1.33333) = 3.92473.

This is done for all animals with records. Animals without records have di=ka. For animals 1 through 12 the results are

\begin{eqnarray*}d_{1} & = & 1.33333 \\
d_{2} & = & 1.33333 \\
d_{3} & = & 1.3...
... & = & 3.66667 \\
d_{11} & = & 3.26667 \\
d_{12} & = & 3.26667
\end{eqnarray*}


Step 2
Convert the above numbers into a number that would represent an equivalent number of progeny, ni, by

ni = (di - ka)/ .5 ka .

This gives

\begin{eqnarray*}n_{1} & = & 0.00000 \\
n_{2} & = & 0.00000 \\
n_{3} & = & 0.0...
... & = & 3.50000 \\
n_{11} & = & 2.90000 \\
n_{12} & = & 2.90000
\end{eqnarray*}


Step 3
Add contributions to parents. Animals must be processed from youngest to oldest. Let $\alpha = (4-h^{2})/h^{2} = 10.11111.$ The contribution to a parent is

\begin{displaymath}q_{i} = .25 \alpha t_{i} / (1 - .25 t_{i}), \end{displaymath}

where

\begin{displaymath}t_{i} = n_{i}/ (n_{i}+\alpha), \end{displaymath}

or

\begin{displaymath}q_{i} = .25 \alpha * n_{i} / (.75 n_{i} + \alpha ). \end{displaymath}

The value qi is added to the ns of the sire of i and to the nd of the dam of i. The qi values of animals 7 through 12 are given below.

Animal nei ti qi
7 3.88710 .277686 .754293
8 3.44339 .254040 .685706
9 3.44339 .254040 .685706
10 3.50000 .257143 .694657
11 2.90000 .222886 .596653
12 2.90000 .222886 .596653
For animal 7, for example, qi=.754293 is added to n1 and n2 because the parents of 7 are animals 1 and 2. Animal 1 receives contributions from animals 7, 10, and 12, or

ne1 = 0.0 + .754293 + .694657 + .596653 = 2.045603

Similarly for all parents,

\begin{eqnarray*}ne_{2} & = & 0.0 + .754293 + .596653 = 1.350946 \\
ne_{3} & = ...
...6 = .685706 \\
ne_{6} & = & 0.0 + .685706 + .596653 = 1.282359
\end{eqnarray*}


In real datasets, animals could very likely have both records and progeny.
Step 4
Set up selection index equations with progeny on the animal, on its sire, and on its dam. Assume the sire and dam are unrelated. Use ni as the number of progeny for animal i. Take animal 7 as an example with sire equal to animal 1 and dam equal to animal 2, and calculate

\begin{eqnarray*}m_{1} & = & (n_{1}-q_{7})/(n_{1}-q_{7}+\alpha) \\
& = & (2.04...
...q_{1,2} & = & \alpha t_{1,2} / (1 - t_{1,2}) \\
& = & .445958
\end{eqnarray*}


Now q1,2 is the contribution (in progeny equivalents) to the number of effective progeny for animal 7. Thus,

\begin{eqnarray*}R & = & (n_{7}+q_{1,2})/(n_{7}+q_{1,2}+\alpha) \\
& = & (3.88710 + .445958)/(4.333058 + 10.111111) \\
& = & .299987.
\end{eqnarray*}


This approximation is much less than the value of .3680 derived from the diagonal of ${\bf C}_{aa}$. The small number of records in this example could be a cause for the disagreement.

For animal 1, the parents are unknown and so

\begin{eqnarray*}R & = & n_{1}/( n_{1} + \alpha) \\
& = & 2.045603 / ( 2.045603 + 10.111111 ) \\
& = & .168269,
\end{eqnarray*}


which is greater than the value of .1371 from ${\bf C}_{aa}$and shows that approximations do not work in all cases. For animal 12,with sire equal to animal 1 and dam equal to animal 2, and calculate

\begin{eqnarray*}m_{1} & = & (n_{1}-q_{12})/(n_{1}-q_{12}+\alpha) \\
& = & (2....
...q_{1,2} & = & \alpha t_{1,2} / (1 - t_{1,2}) \\
& = & .654562
\end{eqnarray*}


Now q1,2 is the contribution (in progeny equivalents) to the number of effective progeny for animal 12. Thus,

\begin{eqnarray*}R & = & (n_{12}+q_{1,2})/(n_{12}+q_{1,2}+\alpha) \\
& = & (2.9 + .654562)/(3.554562 + 10.111111) \\
& = & .2601,
\end{eqnarray*}


which is only slightly higher than .2544 given by ${\bf C}_{aa}$.

Reliabilities are required by industry people to determine the 'official' status of an animal's EBV. The approximations that are used should be on the conservative side for safety reasons.

There may also be a method of determining approximate reliabilities by using Gibbs sampling, but not allowing the variances to change in each round. The starting values would be the solutions to the MME and the known variances. Then about 200 rounds of sampling should give a good estimate of the prediction error variance of the EBVs for each animal, which can then be used to arrive at reliability.

Selection

Often with a repeated records animal model, the records are taken over time and based on the first or second record an animal may be culled. Thus, selection could affect the mean and variance of later repeated records. This type of selection is taken into account in HMME provided that all animals have a first record before any culling has occurred and that all pedigrees are known for all animals. If these provisions are not met, then EBV and estimates of fixed effects from a repeated records analysis could be biased by culling, with the magnitude determined by the severity of the culling.


next up previous

This LaTeX document is available as postscript or asAdobe PDF.

Larry Schaeffer
1999-04-09