next up previous


This LaTeX document is available as postscript or asAdobe PDF.

Estimation of Genetic Variances - Part 2

1. Maximum Likelihood

Hartley and Rao (1967) described the maximum likelihood approach for the estimation of variance components.

1.
ML uses the log likelihood function of ${\bf y}$.
2.
ML uses the exact same quadratic forms as REML and MIVQUE.
3.
ML is translation invariant, but biased.

1.1 Derivation

Assume that ${\bf y}$ is normally distributed with mean ${\bf Xb}$ and variance-covariance matrix ${\bf V}$ and follows the basic linear mixed model. Except for a constant, the log likelihood function of ${\bf y}$ is

\begin{displaymath}L({\bf y}) = -.5 \ \ln \mid {\bf V}\mid - .5 \ ({\bf y} - {\bf Xb})'
{\bf V}^{-1}({\bf y} - {\bf Xb}) \end{displaymath}

The derivatives of $ L({\bf y})$ with respect to ${\bf b}$ and to $\sigma_{i}^{2}$ for $i=0,1, \ \ldots \, s$ are

\begin{eqnarray*}\partial L({\bf y})/ \partial{\bf b} & = &
{\bf X}'{\bf V}^{-1...
...{\bf Xb})'
{\bf V}^{-1}{\bf V}_{i}{\bf V}^{-1}({\bf y}-{\bf Xb})
\end{eqnarray*}


Equating the derivatives to zero gives

\begin{eqnarray*}\hat{\bf b} & = & ({\bf X}'{\bf V}^{-1}{\bf X})^{-}
{\bf X}'{\...
... V}^{-1}{\bf V}_{i}{\bf V}^{-1}
({\bf y} - {\bf X}\hat{\bf b}).
\end{eqnarray*}


Recall that ${\bf Py} = {\bf V}^{-1}({\bf y} - {\bf X}\hat{\bf b})$, where ${\bf P}$ is the projection matrix, and that ${\bf V}_{i} = {\bf Z}_{i}
{\bf Z}'_{i}$, then the latter equation becomes

\begin{eqnarray*}tr[{\bf V}^{-1}{\bf Z}_{i}{\bf Z}'_{i}] & = &
{\bf y}'{\bf PV}...
...f Py} \\
& = & \hat{\bf u}'_{i}\hat{\bf u}_{i}/\sigma_{i}^{4}.
\end{eqnarray*}


The left hand can be manipulated as

\begin{eqnarray*}tr[{\bf V}^{-1}{\bf V}_{i}] & = & tr[({\bf R}^{-1} -
{\bf R}^{-...
...{\bf G}^{-1})^{-1}
{\bf Z}'{\bf R}^{-1}){\bf Z}_{i}{\bf Z}'_{i}]
\end{eqnarray*}


and if

\begin{eqnarray*}{\bf T} & = & ({\bf Z}'{\bf R}^{-1}{\bf Z} + {\bf G}^{-1})^{-1}...
...2}, \\
{\rm and} \ {\bf G} & = & \sum^{+}{\bf I}\sigma_{i}^{2},
\end{eqnarray*}


then

\begin{eqnarray*}tr[{\bf V}^{-1}{\bf V}_{i}] & = & tr({\bf Z}'_{i}
{\bf Z}_{i})...
...2} \ - \
tr({\bf Z}'_{i}{\bf ZTZ}'{\bf Z}_{i}) \sigma_{0}^{-4}.
\end{eqnarray*}


If ${\bf T}$ can be partitioned into submatrices for each random factor, then

\begin{eqnarray*}{\bf T}\sigma_{0}^{-2}({\bf Z}'{\bf Z} + \sum^{+}{\bf I}\alpha_{i})
& = & {\bf I},
\end{eqnarray*}


and

\begin{eqnarray*}{\bf T}{\bf Z}'{\bf Z}\sigma_{0}^{-2} & = & {\bf I} - {\bf T}
(\sum^{+}{\bf I}\sigma_{i}^{-2}).
\end{eqnarray*}


then

\begin{eqnarray*}tr({\bf Z}'_{i}{\bf ZTZ}'{\bf Z}_{i})\sigma_{0}^{-4} & = &
tr(...
...-2} - tr({\bf I}
- {\bf T}_{ii}\sigma_{i}^{-2})\sigma_{i}^{-2}.
\end{eqnarray*}


Finally,

\begin{eqnarray*}tr[{\bf V}^{-1}{\bf V}_{i}] & = & tr({\bf I} - {\bf T}_{ii}
\s...
...\\
& = & q_{i}\sigma_{i}^{-2} - tr{\bf T}_{ii}\sigma_{i}^{-4}.
\end{eqnarray*}


Combining results gives

\begin{displaymath}\hat{\sigma}_{i}^{2} = (\hat{\bf u}'_{i}\hat{\bf u}_{i} +
tr{\bf T}_{ii}\hat{\sigma}^{2}_{0})/q_{i} \end{displaymath}

for $i = 1, 2, \ \ldots \ ,s$, and for i=0 gives

\begin{displaymath}\hat{\sigma}_{0}^{2} = ({\bf y}'{\bf y} - \hat{\bf b}'{\bf X}'{\bf y} -
\hat{\bf u}'{\bf Z}'{\bf y})/ \ N. \end{displaymath}

1.2 The EM Algorithm

EM stands for Expectation Maximization. From Searle, Casella, and McCulloch (1992) the following explanation is given. The procedure alternates between calculating conditional expected values and maximizing simplified likelihoods. The actual data ${\bf y}$ are called the incomplete data in the EM algorithm, and the complete data are considered to be ${\bf y}$ and the unobservable random effects, ${\bf u}_{i}$. If we knew the realized values of the unobservable random effects then their variance would be the average of their squared values, i.e.,

\begin{displaymath}\hat{\sigma}^{2}_{i} = {\bf u}'_{i}{\bf u}_{i} / q_{i}. \end{displaymath}

However, in real life we do not know the realized values of the random effects.

The steps of the EM algorithm are as follows:

Step 0.
Decide on starting values for the variances and set m= 0.
Step 1.(E-step)
Calculate the conditional expectation of the sufficient statistics, conditional on the incomplete data.

\begin{eqnarray*}E({\bf u}'_{i}{\bf u}_{i} \mid {\bf y}) & = & \sigma_{i}^{4(m)}...
...
({\bf V}^{(m)})^{-1}{\bf Z}_{i}) \\
& = & \hat{t}^{(m)}_{i}
\end{eqnarray*}


Step 2.(M-step)
Maximize the likelihood of the complete data,

\begin{displaymath}\sigma^{2(m+1)}_{i} = \hat{t}^{(m)}_{i} / q_{i}, \ \ \
i=0,1,2, \ldots , s. \end{displaymath}

Step 3.
If convergence is reached, set \(
\hat{\mbox{\boldmath$\sigma$ }}
= \mbox{\boldmath$\sigma$ }^{(m+1)} \), otherwise increase m by one and return to Step 1.
This is equivalent to constructing and solving the mixed model equations with a given set of variances, $\mbox{\boldmath$\sigma^{m}$ }$, and then

\begin{eqnarray*}\sigma^{2(m+1)}_{0} & = & ({\bf y}'{\bf y} - \hat{{\bf b}}'{\bf...
...at{{\bf u}}_{i} +
\sigma^{2(m+1)}_{0} tr {\bf T}_{ii}) / q_{i}
\end{eqnarray*}


The EM algorithm can be applied to REML estimation as well. Just replace ${\bf y}$ with ${\bf K}'{\bf y}$ and replace ${\bf V}$with ${\bf K}'{\bf VK}$ and note that

\begin{eqnarray*}{\bf P} & = & {\bf K}({\bf K}'{\bf VK})^{-1}{\bf K}' \\
& = &...
...}{\bf X}({\bf X}'{\bf V}^{-1}
{\bf X})^{-}{\bf X}'{\bf V}^{-1}.
\end{eqnarray*}


Then the conditional expectation in the E-step becomes

\begin{displaymath}\hat{t}^{(m)}_{i} = \sigma^{4(m)}_{i} {\bf y}'{\bf P}^{(m)}
...
...I} - \sigma^{4(m)}_{i}{\bf Z}'_{i}
{\bf P}^{(m)}{\bf Z}_{i}]. \end{displaymath}

2. Restricted Maximum Likelihood

Restricted maximum likelihood (REML), was suggested by Thompson (1962), and was described formally by Patterson and Thompson (1971). The term 'restricted' is perhaps not the most correct to use to describe the method. Residual ML has also been used because the likelihood of a set of error contrasts is maximized rather than the likelihood of the observations. The procedure requires that ${\bf y}$ have a multivariate normal distribution. The method is translation invariant. The maximum likelihood approach automatically keeps the estimator within the allowable parameter space(i.e. zero to plus infinity), and therefore, REML is a biased procedure. REML was proposed because ML did not account for the degrees of freedom in the estimation of variance components.

2.1 The Likelihood Function

The likelihood function of ${\bf y}$ is assumed to have a multivariate normal distribution which is

\begin{displaymath}L({\bf y}) \mbox{ = } (2\pi)^{-.5N} \mid {\bf V} \mid^{-.5}
\exp(-.5({\bf y}-{\bf Xb})'{\bf V}^{-1}({\bf y}-{\bf Xb})). \end{displaymath}

In REML, however, the likelihood function of a set of error contrasts is used, denoted by \( {\bf K}'{\bf y} \), where \( {\bf K}'{\bf X} = 0 \), and ${\bf K}'$ has rank equal to $N-r({\bf X})$. That is,

\begin{displaymath}L({\bf K}'{\bf y}) = (2\pi)^{-.5(N-r({\bf X}))} \mid
{\bf K}'...
...5({\bf K}'{\bf y})'({\bf K}'
{\bf VK})^{-1}({\bf K}'{\bf y})). \end{displaymath}

The next step is to take the natural log of the likelihood function to give

\begin{displaymath}L_{1} = -.5(N-r({\bf X}))\ln(2\pi)-.5\ln \mid {\bf K}'{\bf VK}
\mid -.5{\bf y}'{\bf K}({\bf K}'{\bf VK})^{-1}{\bf K}'{\bf y}. \end{displaymath}

Notice that \(-.5(N-r({\bf X}))\ln(2\pi)\) is a constant that does not depend on the unknown variance components or factors in the model, and therefore, can be ignored. Searle (1979) showed that

\begin{displaymath}\ln \mid {\bf K}'{\bf VK} \mid \mbox{ = } \ln \mid {\bf V} \mid
+ \ln \mid {\bf X}'{\bf V}^{-1}{\bf X}
\mid \end{displaymath}

and that

\begin{displaymath}{\bf y}'{\bf K}({\bf K}'{\bf VK})^{-1}{\bf K}'{\bf y} =
{\bf...
... X}\hat{{\bf b}})'
{\bf V}^{-1}({\bf y}-{\bf X}\hat{{\bf b}}) \end{displaymath}

for any \({\bf K}'\) as long as ${\bf K}'{\bf X}={\bf0}$ and $r({\bf K}')=N-r({\bf X})$. Hence, L1 can be re-written as

\begin{displaymath}L_{2} = -.5\ln \mid {\bf V} \mid -.5\ln \mid {\bf X}'
{\bf V}...
...f X}\hat{{\bf b}})'{\bf V}^{-1}({\bf y}-{\bf X}
\hat{{\bf b}}).\end{displaymath}

Various alternative forms of L2 can be derived. Some basic results on determinants are necessary to follow the derivations.

First, if ${\bf A}$ and ${\bf D}$ are square matrices of the same order, then

\begin{displaymath}\mid {\bf AD} \mid \mbox{ = } \mid {\bf A} \mid \mbox{ } \mid
{\bf D} \mid .\end{displaymath}

This can be used to show that

\begin{eqnarray*}\mid {\bf V} \mid & = & \mid {\bf R}+{\bf ZGZ}' \mid \\
& = &...
...G}^{-1}+{\bf Z}'{\bf R}^{-1}
{\bf Z} \mid \ \mid {\bf G} \mid .
\end{eqnarray*}


Then it follows that

\begin{displaymath}\ln \mid {\bf V} \mid = \ln \mid {\bf R} \mid + \ln \mid
{\bf...
...\mid
+ \ln \mid {\bf G}^{-1}+{\bf Z}'{\bf R}^{-1}{\bf Z} \mid. \end{displaymath}

Secondly, for a general matrix below with \({\bf A}\) and \({\bf D}\) being square and non-singular, then

\begin{displaymath}\left\vert \begin{array}{rr}
{\bf A} & -{\bf B} \\ {\bf Q} & ...
...d {\bf D} \mid \mbox{ } \mid {\bf A}+{\bf BD}^{-1}{\bf Q}
\mid \end{displaymath}

which can be used to show that if \({\bf A}={\bf I}\)and \({\bf D}={\bf I}\), then

\begin{eqnarray*}\mid {\bf I}+{\bf QB} \mid & = & \mid {\bf I}+{\bf BQ} \mid \\ ...
...f B}'{\bf Q}' \mid \\
& = & \mid {\bf I}+{\bf Q}'{\bf B}' \mid
\end{eqnarray*}


Also, if \(\theta\) is a scalar constant, then

\begin{displaymath}\mid \theta {\bf A} \mid \mbox{ = } \theta^{m} \mid {\bf A} \mid \end{displaymath}

where m is the order of A. These relationships are used to show that if

\begin{displaymath}{\bf C} \mbox{ = } \left( \begin{array}{ll}
{\bf X}'{\bf R}^{...
...& {\bf Z}'{\bf R}^{-1}{\bf Z}+{\bf G}^{-1}
\end{array} \right) \end{displaymath}

then

\begin{eqnarray*}\mid {\bf C} \mid & = & \mid {\bf X}'{\bf R}^{-1}{\bf X} \mid \...
...-1}{\bf Z}
+{\bf G}^{-1})^{-1}{\bf Z}'{\bf R}^{-1}){\bf X} \mid
\end{eqnarray*}


and letting

\begin{displaymath}{\bf S} \mbox{ = } {\bf R}^{-1}-{\bf R}^{-1}
{\bf X}({\bf X}'{\bf R}^{-1}{\bf X})^{-}{\bf X}'{\bf R}^{-1}, \end{displaymath}

then

\begin{eqnarray*}\mid {\bf C} \mid & = & \mid {\bf X}'{\bf R}^{-1}{\bf X} \mid \...
...f Z}+{\bf G}^{-1} \mid \
\mid {\bf X}'{\bf V}^{-1}{\bf X} \mid .
\end{eqnarray*}


From the last equality, then

\begin{displaymath}\ln \mid {\bf X}'{\bf V}^{-1}{\bf X} \mid = \ln \mid {\bf C} \mid
- \ln \mid {\bf Z}'{\bf R}^{-1}{\bf Z}+{\bf G}^{-1} \mid. \end{displaymath}

Combining the above results gives

\begin{displaymath}L_{2} \mbox{ = } -.5\ln \mid {\bf R} \mid -.5\ln \mid {\bf G} \mid
-.5\ln \mid {\bf C} \mid -.5{\bf y}'{\bf Py} .\end{displaymath}

Now note that

\begin{eqnarray*}\ln \mid {\bf R} \mid & = & \ln \mid {\bf I} \sigma_{0}^{2} \mi...
...\sigma_{0}^{-2}({\bf Z}'{\bf MZ}+{\bf G}^{-1}
\sigma_{0}^{2}) .
\end{eqnarray*}


Then

\begin{displaymath}\ln \mid {\bf C} \mid = \ln \mid {\bf X}'{\bf X} \mid
-r({\bf...
... + \ln \mid {\bf Z}'{\bf MZ}+{\bf G}^{-1}
\sigma_{0}^{2} \mid ,\end{displaymath}

and finally, the log-likelihood function becomes

\begin{eqnarray*}L_{2} & = & -.5(N-r({\bf X})-q)\ln \sigma_{0}^{2} -.5 \sum_{i=1...
...} \\
& & -.5\ln \mid {\bf C}^{\star} \mid -.5 {\bf y}'{\bf Py},
\end{eqnarray*}


where

\begin{displaymath}{\bf C}^{\star} \mbox{ = } \left( \begin{array}{ll}
{\bf X}'{...
...\bf Z}'{\bf Z}+{\bf G}^{-1}\sigma_{0}^{2}
\end{array} \right) .\end{displaymath}

2.2 Derivatives of the Log Likelihood

Recall that the variance-covariance matrix of y is

\begin{displaymath}{\bf V} \mbox{ = } \sum_{i=1}^{s} {\bf Z}_{i}{\bf Z}'_{i}
\sigma_{i}^{2} + {\bf I}\sigma_{0}^{2}
\end{displaymath}

and a projection matrix is

\begin{displaymath}{\bf P} = {\bf V}^{-1} - {\bf V}^{-1}{\bf X} ({\bf X}'{\bf V}^{-1}
{\bf X})^{-} {\bf X}'{\bf V}^{-1} . \end{displaymath}

From Searle (1979), the following results about derivatives hold:
(1)

\begin{displaymath}\frac{\partial {\bf V}^{-1} }{\partial \sigma_{i}^{2} } \mbox...
...\frac{\partial {\bf V}}{\partial \sigma_{i}^{2} }
{\bf V}^{-1}
\end{displaymath}

(2)

\begin{displaymath}\frac{\partial \ln \mid {\bf V} \mid}{\partial \sigma_{i}^{2}...
...{-1} \frac{\partial {\bf V}}{\partial \sigma_{i}^{2} }
\right)
\end{displaymath}

(3)

\begin{displaymath}\frac{\partial {\bf P}}{\partial \sigma_{i}^{2} } \mbox{ = }
...
...bf P} \frac{\partial {\bf V}}{\partial \sigma_{i}^{2}} {\bf P} \end{displaymath}

(4)

\begin{displaymath}\frac{\partial {\bf V} }{\partial \sigma_{i}^{2} } \mbox{ = }
{\bf Z}_{i}{\bf Z}'_{i} \end{displaymath}

and
(5)

\begin{displaymath}\frac{\partial \ln \mid {\bf X}'{\bf V}^{-1}{\bf X} \mid }
{\...
...tial {\bf V} }{\partial
\sigma_{i}^{2} } {\bf V}^{-1}{\bf X}. \end{displaymath}

To derive formulas for estimating the variance components take the derivative of L2with respect to the unknown components.

\begin{eqnarray*}\frac{\partial L_{2}}{\partial \sigma_{i}^{2}} & = &
-.5 tr{\bf...
...rtial \sigma_{i}^{2}}
{\bf V}^{-1}({\bf y}-{\bf X}\hat{{\bf b}})
\end{eqnarray*}


Combine the two terms involving the traces and note that

\begin{displaymath}{\bf V}^{-1}({\bf y}-{\bf X}\hat{{\bf b}}) \mbox{ = } {\bf Py} ,\end{displaymath}

then

\begin{displaymath}\frac{\partial L_{2}}{\partial \sigma_{i}^{2}} =
-.5 tr({\bf...
...bf P}
\frac{\partial {\bf V}}{\partial \sigma_{i}^{2}}{\bf Py} \end{displaymath}


\begin{displaymath}= -.5 tr{\bf PZ}_{i}{\bf Z}'_{i} +.5{\bf y}'{\bf PZ}_{i}
{\bf Z}'_{i}{\bf Py} \end{displaymath}

for \(i = 1,\dots,s\) or

\begin{displaymath}= -.5 tr{\bf P} + .5{\bf y}'{\bf PPy} \end{displaymath}

for i=0 for the residual component. Using ${\bf P}$ and the fact that

\begin{displaymath}{\bf V}^{-1} = {\bf R}^{-1} - {\bf R}^{-1}{\bf Z}
({\bf Z}'{\bf R}^{-1}{\bf Z}+{\bf G}^{-1})^{-1}{\bf Z}'{\bf R}^{-1} \end{displaymath}

then

\begin{displaymath}tr{\bf PZ}_{i}{\bf Z}'_{i} = q_{i}/\sigma_{i}^{2} -
tr{\bf C}_{ii}\sigma_{0}^{2}/ \sigma_{i}^{4} \end{displaymath}

and

\begin{displaymath}tr{\bf P} = (N-r({\bf X}))-
\sum_{i=1}^{s}\hat{{\bf u}}'_{i}\hat{{\bf u}}_{i}/\sigma^{2}_{i}. \end{displaymath}

The other terms, ${\bf y}'{\bf PZ}_{i}{\bf Z}'_{i}{\bf Py}$ and ${\bf y}'{\bf PPy}$, were simplified by Henderson (1973) to show that they could be calculated from solutions to his Mixed Model Equations. To simplify these quadratic forms notice that

\begin{eqnarray*}{\bf Py} & = & ({\bf V}^{-1} - {\bf V}^{-1}{\bf X}({\bf X}'
{\b...
... ({\bf X}'{\bf V}^{-1}{\bf X})^{-}
{\bf X}'{\bf V}^{-1}{\bf y}.
\end{eqnarray*}


Therefore,

\begin{displaymath}{\bf y}'{\bf PZ}_{i}{\bf Z}'_{i}{\bf Py} = ({\bf y} -
{\bf X}...
...Z}_{i}{\bf Z}'_{i}
{\bf V}^{-1}({\bf y} - {\bf X}\hat{\bf b}).
\end{displaymath}

Henderson noticed that

\begin{displaymath}\hat{\bf u}_{i} = {\bf G}_{i}{\bf Z}'_{i}{\bf Py} \end{displaymath}

where ${\bf G}_{i}$ is the assumed variance covariance matrix of ${\bf u}_{i}$(in this case ${\bf G}_{i} = {\bf I} \ {\theta}_{i})$, and $\hat{\bf u}_{i}$ is BLUP of ${\bf u}_{i}$ and the solution of $\hat{\bf u}_{i}$ from his MME, then

\begin{eqnarray*}{\bf y}'{\bf PZ}_{i}{\bf Z}'_{i}{\bf Py}
& = & {\bf y}'{\bf PZ}...
... \ \
{\rm when} \ {\bf G}_{i} = {\bf I} \ {\sigma}_{i}^{2}. \\
\end{eqnarray*}


Similarly, Henderson showed that

\begin{displaymath}{\bf y}'{\bf PPy} = ({\bf y}'{\bf y} \ - \
\hat{\bf b}'{\bf...
...t{\bf u}_{i}^{'}\hat{\bf u}_{i}{\alpha}_{i})/
{\sigma}^{2}_{0}
\end{displaymath}

where ${\alpha}_{i} = {\sigma}^{2}_{0}/{\sigma}^{2}_{i}.$

Now equate the derivatives to zero to obtain

\begin{displaymath}q_{i}/\sigma_{i}^{2}- tr{\bf C}_{ii}\sigma_{0}^{2}/
\sigma_{i}^{4}
= \hat{{\bf u}}'_{i}\hat{{\bf u}}_{i}/\sigma_{i}^{4} \end{displaymath}

which leads to

\begin{displaymath}\hat{\sigma}_{i}^{2} = (\hat{{\bf u}}'_{i}\hat{{\bf u}}_{i}
+ tr{\bf C}_{ii}\sigma_{0}^{2})/q_{i} \end{displaymath}

and

\begin{displaymath}(N-r({\bf X}))\sigma_{0}^{2}-\sum_{i=1}^{s}\hat{{\bf u}}'_{i}...
...-
\sum_{i=1}^{s}\hat{{\bf u}}'_{i}\hat{{\bf u}}_{i}
\alpha_{i} \end{displaymath}

which leads to

\begin{displaymath}\hat{\sigma}_{0}^{2} = {\bf y}'{\bf Py}/(N-r({\bf X})). \end{displaymath}

Use of the above formulas is known as the EM algorithm, (Expectation Maximization). Also note that the EM algorithm is iterative. One must take the new estimates, reform the matrix ${\bf V}$, recalculate new solutions to Henderson's MME, and obtain new estimates of the variances. The process continues until the change in new versus old estimates is smaller than a pre-specified amount, like .000001%. The main problems with the EM algorithm are that convergence is very, very slow, if it occurs and may not converge, but instead diverge. The more components there are to estimate, the longer will be the number of iterates to reach convergence, depending upon the starting values in the iterative process.

Another major problem is the requirement for $tr{\bf C}_{ii}$ which is a submatrix of the inverse to the MME. In practice, the inverse of the MME is not explicitly derived and solutions to MME are obtained by Gauss-Seidel iteration. Thus, there have been many attempts to approximate the value of $tr{\bf C}_{ii}$, and most have not been suitable.

The above problems have led to attempts to discover other computational algorithms for REML. One of those has been Derivative Free REML which is described in the next section.

2.3 Derivative Free Method

Derivative Free REML was proposed by Smith and Graser(1986) and Graser, Smith and Tier(1987) and has been expanded upon by Meyer (1987,91) who has developed a set of programs for computing estimates of variance components for a whole range of univariate and multivariate models. The description given below is a very simplified version of the method for basic understanding of the technique.

In essence, one can imagine an s dimensional array containing the values of the likelihood function for every possible set of values of the ratios of the components to the residual variance. The technique is to search this array and find the set of ratios for which the likelihood function is maximized. There is more than one way to conduct this search. Care must be taken to find the 'global' maximum rather than one of possibly many 'local' maxima. At the same time the number of likelihood evaluations to be computed must also be minimized.

One begins by looking at alternative forms of the log likelihood function given by L2, that is,

\begin{eqnarray*}L_2 & = & -.5( (N-r({\bf X})-q)\ln \sigma_{0}^{2}
+ \sum_{i=1}^...
...2} \\
& & +\ln \mid {\bf C}^{\star} \mid + {\bf y}'{\bf Py}) .
\end{eqnarray*}


Note that

\begin{eqnarray*}q_{i} \ln \sigma_{i}^{2} & = & q_{i} \ln \sigma_{0}^{2}/
\alpha_{i} \\
& = & q_{i}( \ln \sigma_{0}^{2} -\ln \alpha_{i})
\end{eqnarray*}


so that

\begin{eqnarray*}L_{2} & = & -.5( (N-r({\bf X}))\ln \sigma_{0}^{2}
- \sum_{i=1}...
...
& & +\ln \mid {\bf C}^{\star}
\mid + {\bf y}'{\bf P}{\bf y} .
\end{eqnarray*}


The computations are achieved by constructing the following matrix,

\begin{displaymath}\left( \begin{array}{lll}
{\bf X}'{\bf X} & {\bf X}'{\bf Z} &...
...f y} \\ {\bf y}'{\bf W} & {\bf y}'{\bf y}
\end{array} \right), \end{displaymath}

then by Gaussian elimination of one row at a time, the sum of the log of the non-zero pivots (using the same ordering for each evaluation of the likelihood) gives

\begin{eqnarray*}\ln \mid {\bf C}^{\star} \mid & \ {\rm and} \ &
{\bf y}'{\bf Py}.
\end{eqnarray*}


Gaussian elimination, using sparse matrix techniques, requires less computing time than inverting the coefficient matrix of the mixed model equations. The ordering of factors within the equations could be critical to the computational process and some experimentation may be necessary to determine the best ordering. The likelihood function can be evaluated without the calculation of solutions to the mixed model equations, without inverting the coefficient matrix of the mixed model equations, and without computing any of the \(\sigma_{i}^{2}\). The formulations for more general models and multiple trait models are more complex, but follow the same ideas.

2.4 Algorithms to Maximize a Nonlinear Function

Hofer (1998, Variance component estimation in animal breeding: a review, in the Journal of Animal Breeding and Genetics, 115:247-266) gives a general description of the algorithms used with REML. A gradient (change in some quantity as a result of a change in the parameters), called ${\bf d}$ determines the direction of the search in the next step. The gradient is modified by a matrix ${\bf M}$ and by a scalar, c. The new estimate is given by

\begin{displaymath}\sigma^{(r+1)} = \sigma^{(r)} + c^{(r)}{\bf M}^{(r)}{\bf d}^{(r)}, \end{displaymath}

where

\begin{displaymath}{\bf d}^{(r)} = \partial \ln L_{r}/ \partial \sigma \end{displaymath}

evaluated at $\sigma = \sigma^{(r)}$.

For the Newton-Raphson algorithm, evaluated at $\sigma = \sigma^{(r)}$,

\begin{eqnarray*}c^{(r)} & = & 1, \\
{\bf M}^{(r)} & = & \left( - (\partial^{2}...
...}}
{\bf P}\frac{\partial {\bf V}}{\partial \sigma_{j}}{\bf Py}
\end{eqnarray*}


For the Method of Scoring algorithm, evaluated at $\sigma = \sigma^{(r)}$,

\begin{eqnarray*}c^{(r)} & = & 1, \\
{\bf M}^{(r)} & = & \left( E[ -(\partial^{...
...\sigma_{i}}
{\bf P}\frac{\partial {\bf V}}{\partial \sigma_{j}}
\end{eqnarray*}


The traces in the Newton-Raphson algorithm and in the Method of Scoring algorithm both involve the inverse elements, in complex forms, of the mixed model equations, which makes them difficult to attack computationally. Johnson and Thompson (1995) suggested averaging the observed and expected information matrices and coined the term Average Information REML, or AI REML. They defined ${\bf M}^{(r)}$ as the inverse of the following matrix, which does not require elements of the inverse of the mixed model equations.

\begin{eqnarray*}{ {\rm AI}(\sigma_{i},\sigma_{j}) } & = &
\frac{1}{2} \left( - ...
...i}}
{\bf P}\frac{\partial {\bf V}}{\partial \sigma_{j}}{\bf Py}
\end{eqnarray*}


These last quantities are calculated by augmenting some previous equations as follows:

\begin{displaymath}\left( \begin{array}{lll}
{\bf C}^{\star} & {\bf W}'{\bf y} ...
...f}'_{i}{\bf y} & {\bf f}'_{i}{\bf f}_{i}
\end{array} \right), \end{displaymath}

where

\begin{displaymath}{\bf f}_{i} = \frac{\partial {\bf V}}{\partial \sigma_{i}}
{\bf Py}. \end{displaymath}

There would be one ${\bf f}_{i}$ vector for each variance component to be estimated. All of them may be added to the above matrix at one time, before Gaussian elimination is applied. The calculation of each ${\bf f}_{i}$ involves solutions to the mixed model equations.

Hofer (1998) compared the various algorithms for various models. The AI REML algorithm seemed to be advantageous over the other methods. DF REML was advantageous over EM REML when the number of parameters to be estimated was small, but EM REML became more efficient than DF REML when the number of parameters was large as in multiple trait models. Animal breeders, in general, need to be able to access and utilize REML programs written by experts in order to save time from writing their own programs, and to avoid the duplication of effort that would be needed to get new programs to work correctly and efficiently. However, for some models there may not be programs available to handle them, and then writing your own programs would be necessary. 3. More Genetic Models

3.1 Sire Model

For many years a simple sire model was used to estimate heritabilities. This is also known as a half-sib model. Sires are assumed to be random from the population and to be randomly mated to dams. Each dam is assumed to have only one progeny, and each progeny has only one record. In the following example, sires are also assumed to be randomly associated with herds. The data are as follows:

Herd Sire Progeny
    Record
1 1 22
1 2 28
1 3 19
2 1 27
2 2 31
2 3 25
2 1 20
2 2 26

The total sum of squares, ${\bf y}'{\bf y}$ was 5,020. The model equation was

\begin{displaymath}y_{ijk} = \mu + h_{i} + s_{j} + e_{ijk}, \end{displaymath}

where yijk is a progeny record, $\mu$ is an overall mean, hi is a random herd effect ( also known as a contemporary group ), sj is a random sire effect, and eijk is a random residual effect. Also,

\begin{eqnarray*}E(y_{ijk}) & = & \mu, \\
E(h_{i}) & = & 0, \\
E(s_{j}) & = & ...
...k}) & = & \sigma^{2}_{h} + \sigma^{2}_{s}
+ \sigma^{2}_{e}. \\
\end{eqnarray*}


In matrix form,

\begin{displaymath}{\bf y} = {\bf Xb} + {\bf Z}_{1}{\bf h} + {\bf Z}_{2}
{\bf s} + {\bf e}, \end{displaymath}

and

\begin{eqnarray*}E\left( \begin{array}{c} {\bf y} \\ {\bf h} \\ {\bf s} \\
{\bf...
...\\
{\bf0} & {\bf0} & {\bf I}\sigma^{2}_{e} \end{array} \right).
\end{eqnarray*}


In a sire model, the sire variance is equal to the half-sib variance, which is equal to one quarter of the additive genetic variance. Therefore, heritability is given by

\begin{displaymath}h^{2} = 4 \sigma^{2}_{s} / \sigma^{2}_{y}. \end{displaymath}

Starting values for the variances need to be assumed to get the REML algorithm started. Let

\begin{eqnarray*}\sigma^{2}_{h} & = & \frac{1}{8} \sigma^{2}_{e}, \\
\mbox{and} & & \\
\sigma^{2}_{s} & = & \frac{1}{12} \sigma^{2}_{e}. \\
\end{eqnarray*}


Let \( {\bf W} = ( {\bf X} \ \ {\bf Z}_{1} \ \ {\bf Z}_{2} ) \)and \( {\bf d}' = ( {\bf b}' \ \ {\bf h}' \ \ {\bf s}' ) \). Then

\begin{displaymath}{\bf W}'{\bf W} = \left( \begin{array}{rrrrrr}
8 & 3 & 5 & 3 ...
... & 2 & 0 & 3 & 0 \\ 2 & 1 & 1 & 0 & 0 & 2
\end{array} \right), \end{displaymath}

and the complete MME are

\begin{displaymath}\left( \begin{array}{rrrrrr}
8 & 3 & 5 & 3 & 3 & 2 \\ 3 & 3+8...
...ay}{r}
198 \\ 69 \\ 129 \\ 69 \\ 85 \\ 44 \end{array} \right). \end{displaymath}

Let ${\bf C}$ represent the inverse of the coefficient matrix of the MME, then the solutions are

\begin{eqnarray*}\hat{\bf d} & = & {\bf CW}'{\bf y} \\
& = & \left( \begin{arr...
... \\
.4350 \\ -.3480 \\ .7187 \\ -.3707 \end{array} \right). \\
\end{eqnarray*}


The residual variance is estimated as

\begin{eqnarray*}\hat{\sigma}^{2}_{e} & = & ( {\bf y}'{\bf y}
- \hat{\bf d}'{\...
... - 4916.6589)/(8-1), \\
& = & 103.3411/7, \\
& = & 14.7630.
\end{eqnarray*}


The other variances are given by

\begin{displaymath}\hat{\sigma}^{2}_{i} = ( \hat{\bf u}'_{i}\hat{\bf u}_{i}
+ tr{\bf C}_{ii} \hat{\sigma}^{2}_{e}) / q_{i}. \end{displaymath}

For the herd variance,

\begin{eqnarray*}\hat{\sigma}^{2}_{h} & = & ( \hat{\bf h}'\hat{\bf h}
+ tr{\bf C...
...\ .2102(14.7639)) / 2, \\
& = & 3.4818 / 2, \\
& = & 1.7409.
\end{eqnarray*}


Similarly, for the sire variance,

\begin{eqnarray*}\hat{\sigma}^{2}_{s} & = & ( \hat{\bf s}'\hat{\bf s}
+ tr{\bf C...
...\ .2202(14.7639)) / 3, \\
& = & 4.0261 / 3, \\
& = & 1.3420.
\end{eqnarray*}


Therefore, the new variance ratios should be

\begin{displaymath}\sigma^{2}_{e} = 8.4806 \sigma^{2}_{h}, \end{displaymath}

and

\begin{displaymath}\sigma^{2}_{e} = 11.0014 \sigma^{2}_{s}. \end{displaymath}

REML is re-iterated until it reaches convergence.

3.2 Full-Sib Analysis

In this situation, both the sire and dam are known, but neither have any records of their own. Dams are nested within sires. There can be more than one progeny per dam, but only one record per offspring. The equation of the model is

\begin{displaymath}y_{ijk} = \mu + s_{i} + d_{ij} + e_{ijk}, \end{displaymath}

such that

\begin{displaymath}Var(y_{ijk}) = \sigma^{2}_{s} + \sigma^{2}_{d} + \sigma^{2}_{e} =
\sigma^{2}_{y}. \end{displaymath}

The dam effect, dij, contains one half the additive genetic effect plus a dominance effect and a correlated environmental effect (common to all progeny of that dam). The dam is assumed to provide the uterine environment and an early growth environment for the offspring.

The model can be written in matrix notation as

\begin{displaymath}{\bf y} = {\bf Xb}+{\bf Zs}+{\bf Td} + {\bf e}, \end{displaymath}

such that

\begin{displaymath}Var \left( \begin{array}{c} {\bf s} \\ {\bf d} \\ {\bf e}
\en...
...
{\bf0} & {\bf0} & {\bf I}\sigma^{2}_{e} \end{array} \right). \end{displaymath}

The mixed model equations are

\begin{displaymath}\left( \begin{array}{ccc}
{\bf X}'{\bf X} & {\bf X}'{\bf Z} &...
...y} \\ {\bf Z}'{\bf y} \\
{\bf T}'{\bf y} \end{array} \right), \end{displaymath}

and the solutions can be represented as

\begin{displaymath}\left( \begin{array}{c} \hat{\bf b} \\
\hat{\bf s} \\ \hat{\...
...y} \\ {\bf Z}'{\bf y} \\
{\bf T}'{\bf y} \end{array} \right), \end{displaymath}

The REML procedure would utilize

\begin{eqnarray*}\hat{\sigma}^{2}_{e} & = & ({\bf y}'{\bf y} - \hat{\bf b}'{\bf ...
... d}'\hat{\bf d} + tr({\bf C}_{dd})
\hat{\sigma}^{2}_{e})/n_{d}.
\end{eqnarray*}


Heritability in the narrow sense is estimated as four times the sire variance divided by the phenotypic variance. If dominance genetic effects and other maternal components are not important, then another possible estimator is

\begin{displaymath}\hat{h}^{2} = 2(\hat{\sigma}^{2}_{s} + \hat{\sigma}^{2}_{d})/
\sigma^{2}_{y}. \end{displaymath}

Usually, however, the dam variance can not be used to estimate the heritability.

3.3 Sire-Dam-Mendelian Sampling Model

In most animal breeding applications, sires and dams have been selected to produce progeny. In such a case, sires and dams should not be assumed to be random effects, but should be treated as fixed. In the following model, sires and dams are fixed, they may have any number of non-inbred progeny with one record each, and dams can be mated to more than one sire. Progeny may not appear as a sire or dam. The equation is

\begin{displaymath}{\bf y} = {\bf Xb} + {\bf Zs} + {\bf Td} + {\bf m} + {\bf e}, \end{displaymath}

where

\begin{displaymath}E({\bf y}) = {\bf Xb}+{\bf Zs}+{\bf Td}, \end{displaymath}

and ${\bf m}$ is a vector of random Mendelian sampling effects associated with each progeny. Keep in mind that the variance of Mendelian sampling effects is equal to one half the additive genetic variance, $\sigma^{2}_{m} = .5 \sigma^{2}_{a}$. The MME for this model is

\begin{displaymath}\left( \begin{array}{cccc}
{\bf X}'{\bf X} & {\bf X}'{\bf Z} ...
...Z}'{\bf y} \\
{\bf T}'{\bf y} \\ {\bf y} \end{array} \right), \end{displaymath}

The REML procedure calls for

\begin{eqnarray*}\hat{\sigma}^{2}_{e} & = & ({\bf y}'{\bf y} - \hat{\bf d}'{\bf ...
...at{\bf m}'\hat{\bf m} + tr{\bf C}_{mm}
\hat{\sigma}^{2}_{e})/N.
\end{eqnarray*}


This model has not been applied in practice yet, nor studied in any detail.

3.4 Animal Models

The model commonly applied to estimation of variance components in livestock genetics since 1989 has been an animal model. The animal model assumes a large, random mating population, only additive genetic effects are of major importance, and all relationships among animals are known and tracible to an unselected base population (somewhere in the past). Animals may have more than one record each. The equation of the model is

\begin{displaymath}{\bf y} = {\bf Xb} +{\bf Za} + {\bf Zp} + {\bf e}, \end{displaymath}

where ${\bf a}$ is the vector of animal additive genetic effects (one per animal), and ${\bf p}$ is a vector of permanent environmental (p.e.) effects associated with each animal.

\begin{eqnarray*}E({\bf y}) & = & {\bf Xb}, \\
Var \left( \begin{array}{c} {\bf...
...\
{\bf0} & {\bf0} & {\bf I}\sigma^{2}_{e} \end{array} \right).
\end{eqnarray*}


The matrix ${\bf A}$ is called the numerator relationship matrix. Wright defined relationships among animals as correlations, but ${\bf A}$is essentially relationships defined as covariances (the numerators of the correlation coefficients). Also, these only represent the additive genetic relationships between animals.

The MME for this model are

\begin{displaymath}\left( \begin{array}{ccc} {\bf X}'{\bf X} & {\bf X}'{\bf Z} &...
...y} \\ {\bf Z}'{\bf y} \\
{\bf Z}'{\bf y} \end{array} \right). \end{displaymath}

Note that ka is the ratio of residual to additive genetic variances, and kp is the ratio of residual to permanent environmental variances. Also, in MME the inverse of ${\bf A}$ is required.

The REML procedure gives

\begin{eqnarray*}\hat{\sigma}^{2}_{e} & = & ({\bf y}'{\bf y} - \hat{\bf b}'{\bf ...
...\bf p}'\hat{\bf p} +
tr {\bf C}_{pp} \hat{\sigma}^{2}_{e})/n,
\end{eqnarray*}


where n is the total number of animals, N is the total number of records, and ${\bf C}_{aa}$ are the inverse elements of the MME for the animal additive genetic effects, and ${\bf C}_{pp}$ are the inverse elements of the MME for the animal permanent environmental effects. An example of this model will be given in later notes.


next up previous

This LaTeX document is available as postscript or asAdobe PDF.

Larry Schaeffer
2001-11-20