This LaTeX document is available as postscript or asAdobe PDF.
1. Maximum Likelihood
Hartley and Rao (1967) described the maximum likelihood approach for the estimation of variance components.
1.1 Derivation
Assume that
is normally distributed with mean
and
variance-covariance matrix
and follows the basic linear
mixed model. Except for a constant, the log likelihood function of
is
1.2 The EM Algorithm
EM stands for Expectation Maximization. From
Searle, Casella, and McCulloch (1992) the following explanation
is given.
The procedure alternates between calculating conditional expected values
and maximizing simplified likelihoods. The actual data
are
called the incomplete data in the EM algorithm, and the complete
data are considered to be
and the unobservable random
effects,
.
If we knew the realized values of the
unobservable random effects then their variance would be the average
of their squared values, i.e.,
The steps of the EM algorithm are as follows:
The EM algorithm can be applied to REML estimation as well. Just
replace
with
and replace
with
and note that
2. Restricted Maximum Likelihood
Restricted maximum likelihood (REML), was suggested by
Thompson (1962), and was described formally by Patterson and Thompson
(1971). The term 'restricted' is perhaps not the most correct
to use to describe the method. Residual ML has also been used
because the likelihood of a set of error contrasts is
maximized rather than the likelihood of the observations.
The procedure requires that
have a multivariate normal
distribution. The method is translation invariant. The maximum
likelihood approach automatically keeps the estimator within
the allowable parameter space(i.e. zero to plus infinity),
and therefore, REML is a biased procedure.
REML was proposed because ML did not account for the
degrees of freedom in the estimation of variance components.
2.1 The Likelihood Function
The likelihood function of
is assumed to have a multivariate normal
distribution which is
Various alternative forms of L2 can be derived. Some basic results on determinants are necessary to follow the derivations.
First,
if
and
are square matrices of the same order, then
Secondly,
for a general matrix
below with
and
being square and non-singular, then
Combining the above results gives
2.2 Derivatives of the Log Likelihood
Recall that the variance-covariance matrix of y is
To derive formulas for estimating the variance components take the
derivative of L2with respect to the unknown components.
The other terms,
and
,
were simplified by Henderson (1973) to show
that they could be calculated from solutions to his Mixed Model
Equations.
To simplify these quadratic forms notice that
Now equate the derivatives to zero to obtain
Use of the above formulas is known as the EM algorithm, (Expectation
Maximization). Also note that the EM algorithm is iterative. One
must take the new estimates, reform the matrix
,
recalculate
new solutions to Henderson's MME, and obtain new estimates of the
variances. The process continues until the change in new versus old
estimates is smaller than a pre-specified amount, like .000001%.
The main problems with the EM algorithm are that convergence is very,
very slow, if it occurs and may not converge, but instead diverge.
The more components there are to estimate, the longer will be the
number of iterates to reach convergence, depending upon the starting
values in the iterative process.
Another major problem is the requirement for
which
is a submatrix of the inverse to the MME. In practice, the inverse
of the MME is not explicitly derived and solutions to MME are
obtained by Gauss-Seidel iteration. Thus, there have been many
attempts to approximate the value of
,
and most have
not been suitable.
The above problems have led to attempts to discover other computational algorithms for REML. One of those has been Derivative Free REML which is described in the next section.
2.3 Derivative Free Method
Derivative Free REML was proposed by Smith and Graser(1986) and Graser, Smith and Tier(1987) and has been expanded upon by Meyer (1987,91) who has developed a set of programs for computing estimates of variance components for a whole range of univariate and multivariate models. The description given below is a very simplified version of the method for basic understanding of the technique.
In essence, one can imagine an s dimensional array containing the values of the likelihood function for every possible set of values of the ratios of the components to the residual variance. The technique is to search this array and find the set of ratios for which the likelihood function is maximized. There is more than one way to conduct this search. Care must be taken to find the 'global' maximum rather than one of possibly many 'local' maxima. At the same time the number of likelihood evaluations to be computed must also be minimized.
One begins by looking at alternative forms of the log likelihood
function given by L2, that is,
2.4 Algorithms to Maximize a Nonlinear Function
Hofer (1998, Variance component estimation in animal breeding: a review,
in the Journal of Animal Breeding and Genetics, 115:247-266) gives a
general description of the algorithms used with REML. A gradient
(change in some quantity as a result of a change in the parameters),
called
determines the direction of the search in the next
step. The gradient is modified by a matrix
and by a
scalar, c. The new estimate is given by
For the Newton-Raphson algorithm, evaluated at
,
For the Method of Scoring algorithm, evaluated at
,
The traces in the Newton-Raphson algorithm and in the Method of
Scoring algorithm both involve the inverse elements, in complex
forms, of the mixed model equations, which makes them difficult
to attack computationally. Johnson and Thompson (1995) suggested
averaging the observed and expected information matrices and
coined the term Average Information REML, or AI REML. They defined
as the inverse of the following matrix, which does
not require elements of the inverse of the mixed model equations.
Hofer (1998) compared the various algorithms for various models. The AI REML algorithm seemed to be advantageous over the other methods. DF REML was advantageous over EM REML when the number of parameters to be estimated was small, but EM REML became more efficient than DF REML when the number of parameters was large as in multiple trait models. Animal breeders, in general, need to be able to access and utilize REML programs written by experts in order to save time from writing their own programs, and to avoid the duplication of effort that would be needed to get new programs to work correctly and efficiently. However, for some models there may not be programs available to handle them, and then writing your own programs would be necessary. 3. More Genetic Models
3.1 Sire Model
For many years a simple sire model was used to estimate heritabilities. This is also known as a half-sib model. Sires are assumed to be random from the population and to be randomly mated to dams. Each dam is assumed to have only one progeny, and each progeny has only one record. In the following example, sires are also assumed to be randomly associated with herds. The data are as follows:
| Herd | Sire | Progeny |
| Record | ||
| 1 | 1 | 22 |
| 1 | 2 | 28 |
| 1 | 3 | 19 |
| 2 | 1 | 27 |
| 2 | 2 | 31 |
| 2 | 3 | 25 |
| 2 | 1 | 20 |
| 2 | 2 | 26 |
The total sum of squares,
was 5,020.
The model equation was
In matrix form,
In a sire model, the sire variance is equal to the
half-sib variance, which is equal to one quarter of the
additive genetic variance. Therefore, heritability is
given by
Let
and
.
Then
The residual variance is estimated as
3.2 Full-Sib Analysis
In this situation, both the sire and dam are known, but neither have
any records of their own. Dams are nested
within sires. There can be more than one progeny per dam, but only
one record per offspring. The equation of the model is
The model can be written in matrix notation as
Heritability in the narrow sense is estimated as four times the sire
variance divided by the phenotypic variance. If dominance genetic
effects and other maternal components are not important, then
another possible estimator is
3.3 Sire-Dam-Mendelian Sampling Model
In most animal breeding applications, sires and dams have been
selected to produce progeny. In such a case, sires and dams should
not be assumed to be random effects, but should be treated as fixed.
In the following model, sires and dams are fixed, they may have any
number of non-inbred progeny with one record each, and dams can be
mated to more than one sire.
Progeny may not appear as a sire or dam.
The equation is
3.4 Animal Models
The model commonly applied to estimation of variance components in
livestock genetics since 1989 has been an animal model.
The animal model assumes a large,
random mating population, only additive genetic effects are of major
importance, and all relationships among animals are known and
tracible to an unselected base population (somewhere in the past).
Animals may have more than one record each.
The equation of the model is
The MME for this model are
The REML procedure gives
This LaTeX document is available as postscript or asAdobe PDF.
Larry Schaeffer