This LaTeX document is available as postscript or asAdobe PDF.
Assumptions of the Model
A simple animal model will be utilized in this lesson.
The equation of the model is
Genotypic Values of Animals
One way to understand a model is to generate some fictitious data that correspond to the model exactly. Below are pedigrees of 16 animals. The first four were randomly chosen from a large population and subsequently mated at random.
Animal | Sire | Dam | Animal | Sire | Dam |
1 | 0 | 0 | 9 | 5 | 6 |
2 | 0 | 0 | 10 | 7 | 8 |
3 | 0 | 0 | 11 | 5 | 8 |
4 | 0 | 0 | 12 | 7 | 6 |
5 | 1 | 2 | 13 | 1 | 6 |
6 | 3 | 4 | 14 | 3 | 8 |
7 | 3 | 2 | 15 | 5 | 4 |
8 | 1 | 4 | 16 | 7 | 8 |
The Relationship Matrix
The matrix of additive genetic relationships among the sixteen
individuals is
(times 16) given below:
One method to generate additive genetic values of the 16 animals would
be to partition
into the product of a lower triangular
matrix times its transpose, obtain a vector of 16 pseudo-random normal
deviates, and pre-multiply this vector by the lower triangular
matrix.
The Cholesky Decomposition is the procedure to compute the lower
triangular matrix.
In SAS IML, write
HALF(). The theory is
that the vector of pseudo-random normal deviates, say ,
has
By Following Pedigrees
A second and slightly more efficient method of generating true breeding values is to following the pedigree file chronologically. Sort the pedigrees so that the oldest animals appear before their progeny (as already given in the above table). Note that there are base population animals with unknown parents and animals with known parents.
Base population animals, by definition,
are unrelated to each other and are non-inbred.
Let
represent the vector of base population animals,
in this case animals 1 to 4.
Assume that
The other animals, 5 to 16, with known parents, are represented
by
.
The additive genetic value of any animal in
this vector can be written as
Example
Assume heritability is 0.36, and that the variance of phenotypic
records is to be 100. Then,
,
and
.
A base population animal's additive genetic value is created
by obtaining a pseudo-random normal deviate, RND, and multiplying
it by
.
For animals 1 to 4,
Residual and Phenotypic Values
To create phenotypic records, the equation of the model must be
specified. Assume that
i | a_{i} | RND | = | e_{i} | =y_{i} | |||
5 | 50 | -5.2 | -0.7905 | *8 | = | -6.324 | = | 38.5 |
6 | 50 | 3.4 | -0.5634 | *8 | = | -4.507 | = | 48.9 |
7 | 50 | 2.9 | 1.4213 | *8 | = | 11.370 | = | 64.3 |
8 | 50 | -10.0 | 1.3096 | *8 | = | 10.476 | = | 50.5 |
9 | 50 | -2.4 | -1.4456 | *8 | = | -11.564 | = | 36.0 |
10 | 50 | 3.3 | 1.6360 | *8 | = | 13.088 | = | 66.4 |
11 | 50 | -11.4 | -1.2174 | *8 | = | -9.739 | = | 28.9 |
12 | 50 | 3.6 | 2.4276 | *8 | = | 19.420 | = | 73.0 |
13 | 50 | -0.9 | -0.6109 | *8 | = | -4.887 | = | 44.2 |
14 | 50 | -6.7 | 1.2639 | *8 | = | 10.111 | = | 53.4 |
15 | 50 | -5.4 | -1.3777 | *8 | = | -11.021 | = | 33.6 |
16 | 50 | 2.6 | -0.3930 | *8 | = | -3.144 | = | 49.5 |
The variance
of this sample of e_{i} values was 109.83, which is
greater than the population value of 64.
However, such differences can be expected in small samples. In fact, the
variance of the estimate of the sample variance is
Selection Index
Assume a single record per animal, then
Animal | TBV | I | I-TBV | ||
5 | -5.2 | -3.756 | 1.444 | -6.67733 | |
6 | 3.4 | -0.012 | -3.412 | -.02133 | |
7 | 2.9 | 5.532 | 2.632 | 9.83467 | |
8 | -10.0 | 0.564 | 10.564 | 1.00267 | |
9 | -2.4 | -4.656 | -2.256 | -8.27733 | |
10 | 3.3 | 6.288 | 2.988 | 11.17867 | |
11 | -11.4 | -7.212 | 4.188 | -12.82133 | |
12 | 3.6 | 8.664 | 5.064 | 15.40267 | |
13 | -0.9 | -1.704 | -0.804 | -3.02933 | |
14 | -6.7 | 1.608 | 8.308 | 2.85867 | |
15 | -5.4 | -5.520 | -0.120 | -9.81333 | |
16 | 2.6 | 0.204 | -2.396 | 0.36267 | |
Var. | 30.0906 | 24.4841 | 70.93335 | ||
Cov. | 17.8476 | ||||
Cor. | 0.6575 | ||||
MSE | 17.3062 |
Note that base animals were not evaluated by this index because they did not have records. Also, relationships among animals were ignored. Animal 7, for example, had a record and 3 progeny, but the progeny information was not included in the evaluation. The MSE, mean squared error, is the average squared difference between the index and TBV. The MSE criterion is often used to compare different types of estimators. Another criterion is the variance of the estimated residuals (last column). The method giving the highest correlation between estimated and true values, or the smallest MSE, or the lowest variance of estimated residuals should be the preferred method.
The selection index method, although not a bad first attempt, comes pretty close to the true breeding values. The following methods are attempts to improve upon the selection index. Generalized Least Squares
Let a general model be written as
For the animal model, then
Thus, GLS in this situation, gives estimated breeding values that are the observations deviated from the overall mean. The correlation of with TBV is 0.6575, and the ranking of animals by are identical to the results from the selection index. Thus, in terms of genetic change, exactly the same animals would be selected for breeding. Note that the MSE for the GLS estimator is much larger than for the selection index estimator. Thus, MSE is not a perfect criterion for distinguishing between two procedures.
Animal | TBV | -TBV | ||
5 | -5.2 | -10.4333 | -5.2333 | |
6 | 3.4 | -0.0333 | -3.4333 | |
7 | 2.9 | 15.3667 | 12.4667 | |
8 | -10.0 | 1.5667 | 11.5667 | |
9 | -2.4 | -12.9333 | -10.5333 | |
10 | 3.3 | 17.4667 | 14.1667 | |
11 | -11.4 | -20.0667 | -8.6333 | |
12 | 3.6 | 24.0667 | 20.4667 | |
13 | -0.9 | -4.7333 | -3.8333 | |
14 | -6.7 | 4.4667 | 11.1667 | |
15 | -5.4 | -15.3333 | -9.9333 | |
16 | 2.6 | 0.5667 | -2.0333 | |
Var. | 30.0906 | 188.9206 | ||
Cov. | 49.5766 | |||
Cor. | 0.6575 | |||
MSE | 109.8697 |
The estimated residuals in this case are all equal to zero, and therefore, the variance of the estimated residuals is also zero. However, this does not imply that GLS is the best method because the error is actually becoming part of the estimated breeding value. Regressed Least Squares
This method consists of 'shrinking' the GLS estimator,
,
towards its expected value. If animals were unrelated, then the
shrunken estimator would be
which is identical to the
selection index estimator. However, because animals are indeed
related, the procedure is formally written as
For the animal model example, then
Animal | TBV | I | RLS | ||
5 | -5.2 | -3.76 | -7.17 | -3.2633 | |
6 | 3.4 | -0.01 | 2.11 | -2.1433 | |
7 | 2.9 | 5.53 | 8.41 | 6.9567 | |
8 | -10.0 | 0.56 | -2.19 | 3.7567 | |
9 | -2.4 | -4.66 | -4.81 | -8.1233 | |
10 | 3.3 | 6.29 | 6.26 | 11.2067 | |
11 | -11.4 | -7.21 | -8.07 | -11.9633 | |
12 | 3.6 | 8.66 | 9.39 | 14.6767 | |
13 | -0.9 | -1.70 | -2.08 | -2.6533 | |
14 | -6.7 | 1.61 | 2.40 | 2.0667 | |
15 | -5.4 | -5.52 | -6.76 | -8.5733 | |
16 | 2.6 | 0.20 | 2.55 | -1.9833 | |
Cor. | 0.66 | 0.75 | 59.7168 | ||
MSE | 17.31 | 20.18 |
The RLS estimator gave a higher correlation than selection index, but the MSE was also greater. The variance of estimated residuals was smaller than for selection index. Best Linear Unbiased Prediction
Prediction refers to the estimation of the realized value of a random
variable (from data) that has been sampled from a population with a
known variance-covariance structure.
The general mixed model is written as
The elements of are considered to be fixed effects while the elements of are random factors from populations of random effects with known variance-covariance structures. Both and may be partitioned into one or more factors depending on the situation.
The expectations of the random variables are
The prediction problem involves both and . A few definitions are needed.
The derivation of BLUP begins by equating the expectations of the
predictor and the predictand to determine what needs to be true
in order for unbiasedness to hold. That is,
Minimization of the diagonals of
is achieved by differentiating
with respect to the unknowns,
and ,
and equating the partial derivatives to null matrices.
Let
Derivation of MME
Take the first and second partial derivatives of , the variance-covariance matrix of prediction errors plus the LaGrange Multiplier to force unbiasedness, and write them in matrix notation as
These final equations are known as Henderson's Mixed Model Equations or HMME. Notice that these equations are of order equal to the number of elements in and , which is usually less than the number of elements in , and therefore, are more practical to obtain than the original BLUP formulation. Also, these equations require the inverse of rather than , both of which are of the same order, but is usually diagonal or has a more simple structure than . Also, the inverse of is needed, which is of order equal to the number of elements in . The ability to compute the inverse of depends on the model and the definition of .
The variances of the predictors and prediction errors can be expressed in
terms of the generalized inverse of the coefficient matrix of
HMME. Recall that
Without loss of generality, assume that the coefficient matrix of
HMME is full rank (to
simplify the presentation of results), then
As the number of observations in the analysis increases, two things can be noted from these results:
Application to Example Data
For the simple animal model, HMME can be formed by noting that
Animal | TBV | -TBV | |||
1 | -10.8 | -4.74 | 6.06 | ||
2 | 3.9 | 0.37 | -3.53 | ||
3 | 4.8 | 5.86 | 1.06 | ||
4 | -2.5 | -1.50 | 1.00 | ||
5 | -5.2 | -7.14 | -1.94 | -3.2486 | |
6 | 3.4 | 2.14 | -1.26 | -2.1286 | |
7 | 2.9 | 8.44 | 5.54 | 6.9714 | |
8 | -10.0 | -2.16 | 7.84 | 3.7714 | |
9 | -2.4 | -4.78 | -2.38 | -8.1086 | |
10 | 3.3 | 6.30 | 3.00 | 11.2114 | |
11 | -11.4 | -8.04 | 3.36 | -11.9486 | |
12 | 3.6 | 9.42 | 5.82 | 14.6914 | |
13 | -0.9 | -2.04 | -1.14 | -2.6486 | |
14 | -6.7 | 2.44 | 9.14 | 2.0714 | |
15 | -5.4 | -6.73 | -1.33 | -8.5586 | |
16 | 2.6 | 2.59 | -0.01 | -1.9786 | |
Var(5-16) | 30.0906 | 37.2250 | 59.7024 | ||
Var(1-16) | 32.6700 | 31.2449 | |||
Cov(5-16) | 25.2577 | ||||
Cov(1-16) | 24.1477 | ||||
Cor(5-16) | 0.7547 | ||||
Cor(1-16) | 0.7558 | ||||
MSE(5-16) | 20.3286 | ||||
MSE(1-16) | 18.4532 |
Thus, BLUP analysis of the animal model gave a higher correlation with TBV than the simple selection index or the GLS estimator, and the result was very similar to the regressed least squares estimator except that the MSE was slightly higher for BLUP, but the variance of the estimated residuals was slightly smaller for BLUP. The BLUP analysis provided estimators for the base generation animals, and utilized the relationships among all animals in the data.
Partitioning BLUP Solutions
Partitioning solutions from HMME became useful in explaining results from BLUP to dairy breeders in terms that they could understand. Partitioning also helps the researcher to understand what is contributing to the solutions.
From HMME, animal solutions have contributions from three basic
sources of information, i.e. their own records, their parent average
breeding value, and the average of their progeny EBV deviated from
one half the mate's EBV. Take, for example, the equation for
animal 7 in the example. The equation is
From this small example, the weight on the parent average, PA, w_{2}, is greater than the weights on either the animal's own record or on the average of 3 progeny, even at a heritability of .36. Dairy producers believe that w_{2} should be smaller than either w_{1} or w_{3}, even though this could lead to a lower correlation between TBV and EBV. Below is a table of the weights for animal 7 if the heritability were different values.
h^{2} | (n+2k+.5pk) | w_{1} | w_{2} | w_{3} |
Data | PA | Progeny | ||
0.10 | 32.5 | .03077 | .55385 | .41538 |
0.20 | 15.0 | .06667 | .53333 | .40000 |
0.36 | 7.2222 | .13846 | .49231 | .36923 |
0.50 | 4.5 | .22222 | .44444 | .33333 |
As the heritability increases, the weight on an animal's own records increases, the weight on the parent average decreases, and the weight on the average of three progeny decreases. However, in all cases the weight on the parent average is still greater than the other two weights. In general, at least 4 progeny are needed to have the same weight on the parent average as on the progeny average. Thus, the parent average is equivalent to the average of four progeny, in terms of importance. At the same time, an animal needs records to have more importance than the parent average. At h^{2}=0.5, this means that n must be greater than 2, but at h^{2}=0.1, n must be greater than 18.
If animal 7 had 100 progeny, n=1, and h^{2}=0.36, then w_{1}=0.01070, w_{2}=0.03805, and w_{3}=0.95125, so that the progeny average becomes the most important piece of information in the long run. This is because the progeny average represents what an animal transmits to its offspring, while an animal's own record can be affected by many factors, and the parent average also does not have much relevance when an animal has many progeny. Robertson and Dempfle
This methodology was first given by Alan Robertson (1955) and was
later revived by Leo Dempfle (1989) in a symposium in honour of
Alan Robertson.
The method is Bayesian in philosophy and consists of combining
prior information about
with information coming
from the data.
Let
Application to Example Data
For the animal model example, the RMME would be
More realistically, the expected values of the base population animals can be assumed to be zero, i.e. , but due to selection or non random mating, the expected value of offspring may not be a null vector. Assume that , which says that the offspring are below the average of the base population animals. Also, assume that nothing is known about , so that s^{-1}=0. The results from RMME under these assumptions and from HMME are given in the following table.
Animal | TBV | HMME | RMME |
5 | -5.2 | -7.14 | -9.14 |
6 | 3.4 | 2.14 | 0.14 |
7 | 2.9 | 8.44 | 6.44 |
8 | -10.0 | -2.16 | -4.16 |
9 | -2.4 | -4.78 | -6.78 |
10 | 3.3 | 6.30 | 4.30 |
11 | -11.4 | -8.04 | -10.04 |
12 | 3.6 | 9.42 | 7.42 |
13 | -0.9 | -2.04 | -4.04 |
14 | -6.7 | 2.44 | 0.44 |
15 | -5.4 | -6.73 | -8.73 |
16 | 2.6 | 2.59 | 0.59 |
Cor. | 0.75 | 0.75 | |
MSE | 20.33 | 15.45 |
The correlation for RMME is the same as for HMME because the solutions for RMME are exactly 2 units lower than for HMME. However, notice that the MSE is smaller for RMME, which says that the solutions for offspring are closer to their true values.
The outcome from the use of RMME is dependent upon the Prior Info that goes into RMME. With good Prior Info the results are better than if the priors are poor. If the Prior Info is in error, then RMME could be worse than HMME or other methods.
RMME have not been used very often in animal breeding research, at least in terms of using Prior Info. With selection and non random mating being more important in animal breeding, use of RMME should increase in the future because it allows a means to account for these biases.
This LaTeX document is available as postscript or asAdobe PDF.
Larry Schaeffer