next up previous


This LaTeX document is available as postscript or asAdobe PDF.

Henderson Method 1
L. R. Schaeffer, April 27, 1999

Method 1 was derived by Dr. C. R. Henderson in 1949 as a result of his Ph.D. thesis work at Iowa State University. The method was not published until 1953 along with Methods 2 and 3 in Biometrics in one of Henderson's landmark papers. Method 1 applies the analysis of variance(AOV) procedure for balanced data to unbalanced data. Method 1 is only applicable to models that are completely random, i.e, ${\bf Xb} = {\bf 1} \mu$. Method 1 is theoretically derived to be unbiased and translation invariant but only under a model of completely random mating and no selection. Method 1 is probably the easiest of all methods to calculate.

Definition of Quadratic Forms

Three types of quadratic forms are defined for Henderson's Method 1. They are

\begin{eqnarray*}{\bf Q}_{1} & = & {\bf I}, \ \ \mbox{which gives} \\
{\bf y}'{...
...} & = & {\bf Z}_{i}({\bf Z}'_{i}{\bf Z}_{i})^{-1}
{\bf Z}'_{i}
\end{eqnarray*}


for i=1 to s is the number of random factors in the model. The quadratic form is

\begin{displaymath}{\bf y}'{\bf Z}_{i}({\bf Z}'_{i}{\bf Z}_{i})^{-1}
{\bf Z}'_{i}{\bf y}. \end{displaymath}

Note that ${\bf Z}'_{i}{\bf Z}_{i}$is a diagonal matrix whose elements are the number of observations in each level of the ith random factor, and that elements of ${\bf Z}'_{i}{\bf y}$ are the sums of observations within each level of the ith random factor. Let nik. represent the kth diagonal element of ${\bf Z}'_{i}{\bf Z}_{i}$, then ${\bf y}'{\bf Q}_{i+2}
{\bf y} = \sum_{k=1}^{{q}_{i}}(y_{ik.}^{2}/{n}_{ik})$where qi is the number of levels of the ith random factor.

Each of the Q-matrices in Method 1 has the following properties;

1.

\begin{displaymath}{\bf Q}_{m}{\bf 1} = {\bf 1}\end{displaymath}

for m = 1 to s+2. That is, each ${\bf Q}$-matrix has elements which add to 1 within each row, and
2.

\begin{displaymath}{\bf Q}_{m}{\bf Q}_{m} = {\bf Q}_{m}\end{displaymath}

for m = 1 to s+2. That is, each ${\bf Q}$-matrix is idempotent.
The first property is necessary for Method 1 to give translation invariant estimates of the variance components, and the second property allows the expectations of the quadratic forms to simplify to easy computing formulas.

Expectations of Quadratic Forms

The expectations of the Method 1 quadratic forms are easy to derive. Recall that

\begin{eqnarray*}Var({\bf y}) & = & {\bf V} = \sum_{i=o}^{s}
{\bf V}_{i}\sigma_...
...\bf y}'{\bf Qy}) & = & tr{\bf QV} + {\bf b}'{\bf X}'
{\bf QXb}.
\end{eqnarray*}


For Method 1 we have ${\bf Xb} = {\bf 1} \mu$so that

\begin{displaymath}{\bf b}'{\bf X}'{\bf QXb} = {\mu}^{2}{\bf 1}'{\bf Q1} \end{displaymath}

and because ${\bf Q1} = {\bf 1}$ for all of the Method 1 quadratic forms, then

\begin{displaymath}\mu^{2}{\bf 1}'{\bf Q1} = \mu^{2}{\bf 1}'{\bf 1} =
N\mu^{2}. \end{displaymath}

Hence, all quadratic forms in Method 1 have $N\mu^{2}$ as part of their expectations. Consequently, when differences are taken among these quadratic forms, then $N\mu^{2}$is eliminated, and the estimators are translation invariant.

Now concentrate on $tr{\bf QV}$ for each of the three types of quadratic forms. For the Total Sum of Squares,

\begin{eqnarray*}E({\bf y}'{\bf y}) & = & tr{\bf V} + N \mu^{2} \\
tr {\bf V} ...
... + \
N \sigma_{0}^{2} \ = \
\sum_{i=0}^{s} \ N \sigma_{i}^{2}.
\end{eqnarray*}


Trace of ${\bf Z}'_{i}{\bf Z}_{i}$ is the sum of the nik for the qi levels of factor iwith $\sum_{k=1}^{{q}_{i}}{n}_{ik} = N.$Hence, the expectation of ${\bf y}'{\bf y}$ in Method 1 is always

\begin{displaymath}E({\bf y}'{\bf y}) = N \ \mu^{2} + \sum_{i=o}^{s} N \ \sigma_{i}^{2}.
\end{displaymath}

For the Sum of Squares due to the Mean,

\begin{eqnarray*}tr {\bf Q}_{2}{\bf V} & = & \sum_{i=1}^{s} \ tr{\bf Q}_{2}{\bf ...
..._{k=1}^{{q}_{i}}n_{ik}^{2})
\sigma_{i}^{2} \ + \ \sigma_{0}^{2}
\end{eqnarray*}


Thus, the expectation of the sum of squares due to the mean in Method 1 is always

\begin{displaymath}E[(1/N){\bf y}'{\bf 11}'{\bf y}] \ = \ N \mu^{2} \ + \
\sum_...
...}^{{q}_{i}}{n}_{ik}^{2})
\sigma_{i}^{2} \ + \ \sigma_{0}^{2}. \end{displaymath}

For the Sum of Squares for each random factor,

\begin{eqnarray*}tr {\bf Q}_{m}{\bf V} & = & \sum_{i=1}^{s} \ tr {\bf Q}_{m}
{\...
...k.}} \right]\sigma_{i}^{2} \ + \ tr{\bf I}_{q_{j}}\sigma_{0}^{2}
\end{eqnarray*}


where nkh is the number of observations within the khth subclass between factors i and j. When i equals jin the summations, then nk1 = nk. and the entire coefficient simplifies to N. The coefficient of $\sigma_{0}^{2}$simplifies to qj.

Obtaining Estimates

Let ${\bf w}$ be the vector of quadratic forms, i.e.

\begin{displaymath}{\bf w}' = [ \ {\bf y}'{\bf y} \ \ (1/N){\bf y}'{\bf 11}'
{\b...
...f Z}_{s}({\bf Z}'_{s}
{\bf Z}_{s})^{-1}{\bf Z}'_{s}{\bf y} \ ] \end{displaymath}

and let ${\mbox{\boldmath$\sigma$ }}'$ = $[ \ \sigma_{0}^{2} \ \
\sigma_{1}^{2} \ \ldots \ \sigma_{s}^{2} \ ]$, then

\begin{displaymath}{\rm E}({\bf w}) \ = \ {\bf F} \ {\mbox{\boldmath$\sigma$ }} \ + \
{\bf 1} \ N\mu^{2} \end{displaymath}

where a typical element of ${\bf F}$ is $tr {\bf Q}_{j}
{\bf V}_{i}$ for the jth quadratic form and the ith component.

Traditionally with balanced AOV tables, the sum of squares due to the mean was usually subtracted from the other sum of squares. Similarly, find any matrix ${\bf K}'$ such that ${\bf K}'{\bf 1} = 0$ and ${\bf K}'$has full row rank, then

\begin{eqnarray*}E({\bf K}'{\bf w})
& = & {\bf K}'{\bf F} \ {\mbox{\boldmath$\si...
...oldmath$\sigma$ }} \\
& & {\rm because} \ {\bf K}'{\bf 1} = 0.
\end{eqnarray*}


To estimate ${\mbox{\boldmath$\sigma$ }}$ then equate ${\bf K}'
{\bf w}$ to its expectation so that

\begin{eqnarray*}{\bf K}'{\bf F} \ \hat{\mbox{\boldmath$\sigma$ }}
& = & {\bf K}...
...boldmath$\sigma$ }}
& = & ({\bf K}'{\bf F})^{-1}{\bf K}'{\bf w}.
\end{eqnarray*}


However, we could have obtained estimates without using ${\bf K}'$ by

\begin{displaymath}\left( \begin{array}{l}
\hat{\mbox{\boldmath$\sigma$ }}\\
\h...
...array}{lr}
{\bf F} & {\bf 1}N
\end{array} \right]^{-1}{\bf w}. \end{displaymath}

An Example

Consider the case of milk yields by daughters of dairy bulls in several different herds. Let s, the number of random factors in the model, be two (herds and sires), and thus, there are three components of variance to estimate. Data for analysis appears in the table below. Milk yields are given in percentage units relative to a breed mean for cows of a certain age and month of calving. The total sum of squares is 390,729.

Example dairy data for Method 1 analysis.
Herd Sire Daughter Milk Yields Subclass totals
             
1 1 157, 160, 138   455 (3)
  2 96, 110, 115, 120 441 (4)
  3 82, 65,     147 (2)
  4 120, 130, 110   360 (3)
             
2 1 140, 142, 145   427 (3)
  2 122, 117, 98   337 (3)
  3 70, 94,     164 (2)
             
3 2 112, 125, 105   342 (3)
  3 100, 92,     192 (2)
  4 116, 129, 133   378 (3)
             
             

Totals Number Sum
       
By Sire 1 6 882
  2 10 1120
  3 6 503
  4 6 738
       
By Herd 1 12 1403
  2 8 928
  3 8 912
       
Overall   28 3243
       

Quadratic Forms


\begin{eqnarray*}{\bf y}'{\bf y} & = & 390,729. \\
(1/N){\bf y}'{\bf 11}'{\bf y...
...0 + (503)^{2}/6 \\
& & +(738)^{2}/6 \\
& = & 388,036.166667
\end{eqnarray*}


Expectations


\begin{eqnarray*}E({\bf y}'{\bf y}) & = &
N \ \mu^{2} + N \ \sigma_{0}^{2} +
N \...
...{2} \ ]/6 \\
& = & 18/6 + 34/10 + 12/6 + 18/6 \\
& = & 11.4.
\end{eqnarray*}


Solving for Estimates


\begin{displaymath}\left( \begin{array}{lrll}
28 & 28 & 28 & 28 \\
28 & 1 & \ 9...
...5,608.8929 \\ 375,650.0834 \\ 388,036.1667
\end{array} \right)
\end{displaymath}

Premultiply both sides of the above by ${\bf K}'$ where

\begin{displaymath}{\bf K}' = \left( \begin{array}{lrrr}
1 & -1 & 0 & 0 \\
1 & 0 & -1 & 0 \\
1 & 0 & 0 & -1
\end{array} \right)
\end{displaymath}

Ignoring the column for $\mu^{2}$ which becomes all zeros, then we have

\begin{displaymath}\left( \begin{array}{lll}
27 & 18.285714 & 20.571429 \\
25 &...
...}
15,120.1071 \\ 15,078.9166 \\ 2,692.8334
\end{array} \right)
\end{displaymath}

The resulting estimates are

\begin{eqnarray*}\hat{\sigma}_{0}^{2} & = & 149.4246, \\
\hat{\sigma}_{1}^{2} & = & -53.8167, \\
\hat{\sigma}_{2}^{2} & = & 586.7225.
\end{eqnarray*}


Sampling Variances of Estimates

Let TSS, MSS, HSS, and SSS represent the quadratic forms for the total sum of squares, mean, herds, and sires, respectively, then the coefficients of the unknown parameters for computing the variances and covariances of the corresponding quadratic forms are in the following table.

  ${\sigma}_{0}^{4}$ ${\sigma}_{0}^{2}{\sigma}_{1}^{2}$ ${\sigma}_{0}^{2}{\sigma}_{2}^{2}$ ${\sigma}_{1}^{4}$ ${\sigma}_{1}^{2}{\sigma}_{2}^{2}$ ${\sigma}_{2}^{4}$
             
TSS 56. 112. 112. 544. 328. 416.
TSS,MSS 2. 38.8571 29.7143 196.5714 288. 117.7143
TSS,HSS 6. 112. 34.6667 544. 328. 132.6667
TSS,SSS 8. 45.6 112. 224.5333 328. 416.
MSS 2. 38.8571 29.7143 188.7347 288.6531 110.3673
MSS,HSS 2. 38.8571 29.7143 196.5714 288. 110.3810
MSS,SSS 2. 38.8571 29.7143 188.8762 288. 117.7143
HSS 6. 112. 34.6667 544. 328. 112.9514
HSS,SSS 2.4111 45.6 34.6667 224.5333 328. 132.6667
SSS 8. 45.6 112. 193.5867 328. 416.
             
Each line above would also have a function, $4 \ {\bf b}'{\bf X}'
{\bf QVQXb}$, associated with it which will be identical in each term (variance or covariance). These terms have been ignored since they will disappear when differences between pairs of quadratic forms are taken.

Now assume the true components of variance have the following values; ${\sigma}_{0}^{2} = 5, \ \ {\sigma}_{1}^{2} = .1,$ and $ \ {\sigma}_{2}^{2} = 1.$ Post-multiply the table of coefficients above by the vector $(25 \ .5 \; 5 \ .01 \ .1 \; 1)$ which correspond to the columns of the table above, and store the results in matrix form as

\begin{displaymath}Var({\bf w}) =
\left( \begin{array}{rllr}
2470.2400 & 366.480...
...233.8453 & 366.4030 & 424.1231 &1233.5359
\end{array} \right).
\end{displaymath}

Using the same ${\bf K}'$ as before for taking differences in quadratic forms, then

\begin{displaymath}Var({\bf K}'{\bf w}) =
\left( \begin{array}{lll}
2096.4000 & ...
...778 \\
1236.3177 & 1110.2778 & 1236.0852
\end{array} \right),
\end{displaymath}

and if we let

\begin{displaymath}{\bf P} = ({\bf K}'{\bf F})^{-1} =
\left( \begin{array}{lll}
...
... \ 0. & 19.333333 \\
24& 16.6 & \ 0.
\end{array} \right)^{-1}
\end{displaymath}

then $\hat{\sigma} = {\bf PK}'{\bf w}$ and

\begin{displaymath}Var({\bf PK}'{\bf w}) =
\left( \begin{array}{rrr}
2.2956 & -....
....5428 & .0157 \\
-.3218 & .0157 & 2.0778
\end{array} \right).
\end{displaymath}

The variance of $\hat{\sigma}_{0}^{2}$ is then 2.2956 provided the true variances are ${\sigma}_{0}^{2} = 5$, ${\sigma}_{1}^{2} = .1$, and ${\sigma}_{2}^{2} = 1$. The covariance between $\hat{\sigma}_{0}^{2}$ and $\hat{\sigma}_{2}^{2}$ is -.3218, etc. For this small example, the variances of the estimates are quite large. Another set of true values could be used to derive $Var(\hat{\sigma})$ if desired. Few researchers ever compute the sampling variances of their estimates due to the complexity and number of the traces that must be computed.

Least Squares Equations

The quadratic forms(except for TSS) and their expectations can also be obtained from Ordinary Least Squares equations. For the example data, the OLS equations are

\begin{displaymath}\left( \begin{array}{rrrrrrrr} 28 & 6 & 10 & 6 & 6 & 12 & 8 &...
...1120 \\ 503 \\ 738 \\ 1403 \\ 928 \\ 912
\end{array} \right). \end{displaymath}

The first step is to calculate the variance-covariance matrix of the right hand sides. Note that, for herds,

\begin{displaymath}{\bf W}'{\bf Z}_{1} = \left( \begin{array}{rrr}
12 & 8 & 8 \...
...\\
12 & 0 & 0 \\ 0 & 8 & 0 \\ 0 & 0 & 8 \end{array} \right), \end{displaymath}

so that

\begin{eqnarray*}{\bf U}_{1} & = & {\bf W}'{\bf Z}_{1}{\bf Z}'_{1}{\bf W} \\
&...
... 0 \\
64 & 0 & 24 & 16 & 24 & 0 & 0 & 64 \end{array} \right),
\end{eqnarray*}


and similarly for sires,

\begin{displaymath}{\bf W}'{\bf Z}_{2} = \left( \begin{array}{rrrr}
6 & 10 & 6 ...
... 2 & 3 \\ 3 & 3 & 2 & 0 \\ 0 & 3 & 2 & 3
\end{array} \right), \end{displaymath}

so that

\begin{eqnarray*}{\bf U}_{2} & = & {\bf W}'{\bf Z}_{2}{\bf Z}'_{2}{\bf W} \\
&...
...3 \\
60 & 0 & 30 & 12 & 18 & 25 & 13 & 22 \end{array} \right).
\end{eqnarray*}


Then

\begin{displaymath}Var({\bf W}'{\bf y}) = {\bf U}_{1} \sigma_{1}^{2} \ + \
{\bf U}_{2} \sigma_{2}^{2} \ + \ {\bf W}'{\bf W} \sigma_{0}^{2}. \end{displaymath}

The ${\bf Q}$-matrices for ${\bf W}'{\bf y}$ are diagonal matrices. For the mean sum of squares, MSS,

\begin{displaymath}{\bf Q}_{2} = {\rm diag} \left( \begin{array}{rrrrrrrr}
\frac{1}{28} & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{array} \right), \end{displaymath}

and for the herd and sire sum of squares,

\begin{displaymath}{\bf Q}_{3} = {\rm diag} \left( \begin{array}{rrrrrrrr}
0 & ...
...frac{1}{12} & \frac{1}{8} & \frac{1}{8}
\end{array} \right), \end{displaymath}

and

\begin{displaymath}{\bf Q}_{4} = {\rm diag} \left( \begin{array}{rrrrrrrr}
0 & ...
...& \frac{1}{6} & \frac{1}{6} &
0 & 0 & 0 \end{array} \right). \end{displaymath}

The quadratic forms are

\begin{eqnarray*}{\bf y}'{\bf WQ}_{2}{\bf W}'{\bf y} & = & 375,608.8929 \\
{\bf...
....0834 \\
{\bf y}'{\bf WQ}_{4}{\bf W}'{\bf y} & = & 388,036.1667
\end{eqnarray*}


The expectations of these quadratic forms are given by

\begin{displaymath}\left( \begin{array}{rrr}
tr{\bf Q}_{2}{\bf U}_{1} & tr{\bf ...
...
\sigma_{0}^{2} \end{array} \right) \ + \ N \ \mu^{2}{\bf 1}. \end{displaymath}

Then you need to add the total sum of squares and its expectation and proceed as before.

Problem

Apply Method 1 to the following data using a model with three random factors. Compute the sampling variances of the estimates.

\begin{displaymath}{\bf y} =
\left( \begin{array}{r} 32 \\ 44 \\ 116 \\ 94 \\ ...
... 0 & 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 & 0
\end{array} \right), \end{displaymath}


\begin{displaymath}{\bf Z}_{2} = \left( \begin{array}{cccc} 1 & 0 & 0 & 0 \\
0...
...\\
1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0
\end{array} \right). \end{displaymath}

Henderson Method 2
L. R. Schaeffer, April 27, 1999

The main drawback of Method 1 was that only completely random models could be handled. Methods 2 and 3 accommodate mixed models. Method 2 is an unbiased, translation invariant procedure, and can be used with mixed models provided that interactions between fixed and random factors, or nesting of random factors within fixed factors, do not exist in the model.

Method 2 requires the least squares (LS) equations for all factors in the model. A solution vector is obtained by taking a generalized inverse of the coefficient matrix of the LS equations. Because there are an infinite number of possible solution vectors to the LS equations, Searle(1968) maintained for many years that Method 2 was not uniquely defined. That is, different solution vectors could lead to different estimates of the variance components. This was later proven to be incorrect, and Method 2 was proven to be unique if Henderson's directions are followed exactly (Henderson, Searle, and Schaeffer 1974).

The Calculations

Henderson's Method 2 involves the following steps:

Step 1.
Construct LS equations for all factors (fixed and random) in the model. Let ${\bf W} = ( \ {\bf X} \ \
{\bf Z} \ )$ and ${\mbox{\boldmath$\beta$ }}' =
( \ {\bf b}' \ \ {\bf u}' \ )$, then solve

\begin{displaymath}{\bf W}'{\bf W} \ \hat{\mbox{\boldmath$\beta$ }} =
{\bf W}'{...
...{\boldmath$\beta$ }} =
({\bf W}'{\bf W})^{-}{\bf W}'{\bf y}.
\end{displaymath}

Step 2.
Adjust ${\bf y}$ for $\hat{\bf b}$ from the OLS equations in step 1. Let

\begin{displaymath}{\bf y}^{0} = {\bf y} - {\bf X}\hat{\bf b}. \end{displaymath}

Step 3.
Apply Method 1 to ${\bf y}^{0}$ and adjust the coefficients of $\hat{\sigma}_{0}^{2}$ in the expectations of all quadratic forms to allow for the change in $V({\bf y}^{0})$ from $V({\bf y})$.

Theoretical Development

Let

\begin{displaymath}{\bf W} = ( \ {\bf X} \ \ {\bf Z} \ ) =
( \ {\bf X}_{b} \ \ {\bf X}_{a} \ \ {\bf Z}_{a} \ \
{\bf Z}_{b} \ ) \end{displaymath}

such that the following rank requirements are met,

\begin{eqnarray*}{\rm r}( \ {\bf X}_{a} \ \ {\bf Z}_{a} \ ) & = &
{\rm r}( \ {\...
...d} \\
{\rm r}( \ {\bf Z}_{a} \ ) & = & {\rm r}( \ {\bf Z} \ ).
\end{eqnarray*}


Thus, \({\rm r}({\bf X} \ \ {\bf Z} \ ) = {\rm r}({\bf X}) +
{\rm r}({\bf Z}) - 1.\)This implies that the fewest possible rows and columns of Z are set to zero in obtaining a generalized inverse of the OLS coefficient matrix. Then

\begin{displaymath}{\bf W}'{\bf W} = \left( \begin{array}{llll}
{\bf X}_{b}^{'}{...
...f Z}_{a} & {\bf Z}_{b}^{'}{\bf Z}_{b} \\
\end{array} \right). \end{displaymath}

A generalized inverse is obtained by setting the rows and columns corresponding to ${\bf X}_{b}$ and ${\bf Z}_{b}$ to zero and inverting the remaining elements. Let

\begin{displaymath}({\bf W}'{\bf W})^{-} =
\left( \begin{array}{llll}
{\bf0} & {...
...f0} \\
{\bf0} & {\bf0} & {\bf0} & {\bf0}
\end{array} \right), \end{displaymath}

then

\begin{displaymath}\left( \begin{array}{lr}
{\bf P} & {\bf S} \\
{\bf S}' & {\b...
...lr}
{\bf I} & {\bf0} \\
{\bf0} & {\bf I}
\end{array} \right). \end{displaymath}

The solutions are

\begin{displaymath}\left( \begin{array}{l}
\hat{\bf b}_{b} \\
\hat{\bf b}_{a} \...
...f y} + {\bf MZ}_{a}^{'}{\bf y} \\
{\bf0}
\end{array} \right). \end{displaymath}

An unbiased estimate of the residual variance must be obtained, such as

\begin{displaymath}\hat{\sigma}_{0}^{2} = ( \ {\bf y}'{\bf y} - \hat{\bf b}_{a}^...
...{\bf Z}_{a}^{'}{\bf y} \ ) \
/ \ ( \ N - {\rm r}({\bf W}) \ ). \end{displaymath}

Each observation must now be adjusted for $\hat{\bf b}$, as

\begin{eqnarray*}{\bf y}^{0} & = & {\bf y} - {\bf X}\hat{\bf b} \\
& = & {\bf ...
... \hat{\bf b}_{a} \\
& = & {\bf y} -{\bf X}_{a}\hat{\bf b}_{a}.
\end{eqnarray*}


Each observation does not need to be adjusted separately. Note that

\begin{displaymath}{\bf Z}_{j}^{'}{\bf y}^{0} = {\bf Z}_{j}^{'}{\bf y} -
{\bf Z}_{j}^{'}{\bf X}_{a}\hat{\bf b}_{a}, \end{displaymath}

for the jth factor in the model, and that ${\bf Z}_{j}^{'}{\bf y}$and ${\bf Z}_{j}^{'}{\bf X}_{a}$ are available from the LS equations. Hence, only the right hand sides of the LS equations need be adjusted.

To apply Method 1 to ${\bf y}^{0}$ requires that the model for ${\bf y}^{0}$ be completely random except for a common mean effect. The proof of this is given by Henderson et al.(1974). The model equation for ${\bf y}^{0}$ is

\begin{displaymath}{\bf y}^{0} = {\bf y} - {\bf X}_{a}\hat{\bf b}_{a} = {\bf Ty} \end{displaymath}

where

\begin{displaymath}{\bf T} = ( \ {\bf I} - {\bf X}_{a}{\bf PX}_{a}^{'} -
{\bf X}_{a}{\bf SZ}_{a}^{'} \ ). \end{displaymath}

Fortunately,

\begin{eqnarray*}{\bf TZ} & = & ( \ {\bf I} - {\bf X}_{a}{\bf PX}_{a}^{'} -
{\b...
...a} & = & {\bf0}, \ \ \ {\rm and} \\
{\bf TX}_{b} & = & c{\bf 1}
\end{eqnarray*}


So that the model for ${\bf y}^{0}$ is

\begin{eqnarray*}{\bf y}^{0} & = & {\bf Ty} = {\bf TXb} + {\bf TZu} + {\bf Te} \\
& = & c{\bf 1} + {\bf Zu} + {\bf Te}.
\end{eqnarray*}


Proof that TXb = $c{\bf 1}$ is also given in Henderson et al.(1974), and this is the key to showing that Method 2 gives unique estimates of variance components regardless of the generalized inverse used to solve the OLS equations.

Consequently,

\begin{eqnarray*}Var({\bf y}^{0}) & = & Var( \ {\bf Zu} + {\bf Te} \ ) \\
& = ...
..._{i}{\bf Z}_{i}^{'} \
\sigma_{i}^{2} + {\bf TT}'\sigma_{0}^{2}.
\end{eqnarray*}


The variance-covariance matrix of ${\bf y}^{0}$ is slightly different from that of the original observation vector ${\bf y}$. This difference causes changes to the expectations of the Method 1 quadratic forms where $\sigma_{0}^{2}$ is involved.

The coefficient of $\sigma_{0}^{2}$ in the expectation of $(1/N){\bf y}^{0'}{\bf 11}'{\bf y}^{0}$ is

\begin{eqnarray*}(1/N) \ tr({\bf 11}'{\bf TT}') & = &
(1/N) \ tr({\bf 1}'{\bf T...
.../N) \ tr[{\bf P}({\bf X}_{a}^{'}{\bf 1})
({\bf 1}'{\bf X}_{a})]
\end{eqnarray*}


Thus, we have the usual coefficient plus an additional term. Similarly, the coefficient of $\sigma_{0}^{2}$ in the expectation of ${\bf y}^{0'}{\bf Z}_{j}({\bf Z}_{j}^{'}{\bf Z}_{j})^{-1}
{\bf Z}_{j}^{'}{\bf y}^{0}$ where ${\bf Z}_{j}$ is the design matrix of the jth random factor of the model, is

\begin{eqnarray*}tr[{\bf Z}_{j}({\bf Z}_{j}^{'}{\bf Z}_{j})^{-1}
{\bf Z}_{j}^{'}...
...\bf Z}_{j}^{'}{\bf Z}_{j})^{-1}
{\bf Z}_{j}^{'}{\bf X}_{a}] \\
\end{eqnarray*}


The coefficients of the other variance components in the expectations are the same as in the usual Method 1.

An Example

Below are the LS equations for a two-way classification model with one fixed factor and one random factor. Let factor A be the fixed factor and let factor B be random. The total sum of squares of the observations was 1,979.

\begin{displaymath}\left( \begin{array}{r\vert rrr\vert rrrr}
16 & 4 & 6 & 6 & 3...
...
49\\ 66\\ 58\\ \hline
32\\ 51\\ 51\\ 39
\end{array} \right)
\end{displaymath}

Using the partitioning given in the previous section, then let

\begin{eqnarray*}\hat{\bf b}_{b}^{'} & = & [ \ \hat{\mu} \ \ \ \hat{A}_{1} \ ], ...
...\hat{B}_{1} \ \ \hat{B}_{2} \ \
\hat{B}_{3} \ \ \hat{B}_{4} \ ]
\end{eqnarray*}


and similarly for the design matrices X and Z. The solutions are

\begin{eqnarray*}\left( \begin{array}{l}
\hat{\mu}\\
\hat{A}_{1} \\
\hat{A}_{2...
...{l}
11.8253 \\ 14.2980 \\ 11.4384 \\ 10.9988
\end{array} \right)
\end{eqnarray*}


The solutions times their corresponding right hand side elements gives

\begin{displaymath}\hat{\mbox{\boldmath$\beta$ }}'{\bf W}'{\bf y} = 1,912.2422, \end{displaymath}

and an estimate of the residual variance component is

\begin{eqnarray*}\hat{\sigma}_{0}^{2}
& = & ({\bf y}'{\bf y} \ - \ \hat{\mbox{\b...
...& (1,979 \ - \ 1,912.2422) \ / \ (16 \ - \ 6)\\
& = & 6.675782.
\end{eqnarray*}


This estimate of $\hat{\sigma}_{0}^{2}$ is the same for any generalized inverse of ${\bf W}'{\bf W}$.

Now adjust ${\bf W}'{\bf y}$ for the estimates of the fixed effects to give

\begin{eqnarray*}{\bf W}'{\bf y}^{0} & = & {\bf W}'{\bf y} \ - \
{\bf W}'{\bf ...
...68\\ 35.4758\\ 57.1919\\
57.1919\\ 43.9951 \end{array} \right)
\end{eqnarray*}


The necessary quadratic forms are

\begin{eqnarray*}\hat{\sigma}_{0}^{2} & = & 6.675782, \\
(1/N){\bf y}^{0'}{\bf ...
...{2}/4 \\
& & + (57.1919)^{2}/5 + (43.9951)^{2}/4 = 2,375.3139,
\end{eqnarray*}


where ${\bf Z}_{B}$ is the design matrix for factor B. The expectations of the next two quadratic forms are

\begin{eqnarray*}E[ (1/N){\bf y}^{0'}{\bf 11}'{\bf y}^{0}] & = &
16 \ c^{2} \ + ...
... + \ 16 \ {\sigma}_{B}^{2} \ + \
(4 + k_{2}) \ {\sigma}_{0}^{2}
\end{eqnarray*}


where

\begin{eqnarray*}{k}_{1} & = & (1/N) \ {\rm tr}({\bf P} \
{\bf X}_{a}^{'}{\bf 11...
...{B}
({\bf Z}_{B}^{'}{\bf Z}_{B})^{-1}{\bf Z}_{B}^{'}{\bf X}_{a})
\end{eqnarray*}


In this example,

\begin{eqnarray*}{\bf P} &=&
\left( \begin{array}{rr}
.532189 & .291845 \\
.291...
...\\ 3&1
\end{array} \right)
\end{array} \right)\\
& = & 3.81239.
\end{eqnarray*}


The resulting equations to solve are

\begin{displaymath}\left( \begin{array}{rrr}
0& 0.0000 &1.0000 \\
16& 4.1250 &...
...rray}{r}
6.6758\\ 2,348.7276\\ 2,375.3139
\end{array} \right) \end{displaymath}

Let

\begin{displaymath}{\bf K}' \ = \ \left( \begin{array}{lrr}
1 & 0 & 0\\ 0 &-1 & 1
\end{array} \right), \end{displaymath}

then ${\bf K}'{\bf F}\hat{\mbox{\boldmath$\sigma$ }} \ = \
{\bf K}'{\bf w},$ which gives

\begin{displaymath}\left( \begin{array}{rr}
0.0000 & 1.0000 \\ 11.8750 & 3.3194...
...\left( \begin{array}{r}
6.6758\\ 26.5863 \end{array} \right), \end{displaymath}

giving estimates of $\hat{{\sigma}}_{0}^{2} = 6.6758$, and $\hat{{\sigma}}_{B}^{2} = .37275$.

Suppose that $\hat{{\mu}}$ and $\hat{A}_{3}$ were set to zero rather than $\hat{{\mu}}$ and $\hat{A}_{1}$ in obtaining solutions to the OLS equations. The reader should verify the following results.

\begin{eqnarray*}{\bf P} & = &
\left( \begin{array}{rr}
.43654 & .14470 \\
.144...
...{0}^{2} & = & 6.6758, \ \ \
\hat{{\sigma}}_{B}^{2} \ = \ .37275.
\end{eqnarray*}


Problems

1.
Below are the OLS equations for a 2-way model with factors A(4 levels) and B(5 levels). Consider factor A as fixed and factor B as random. The total sum of squares of the observations was 43,000. Estimate the variance components by Method 2.

\begin{displaymath}\left( \begin{array}{rrrrrrrrrr}
202 & 46 & 34 & 80 & 42 & 2...
... 510 \\ 398 \\ 334 \\ 553 \\ 739 \\
772 \end{array} \right). \end{displaymath}

2.
Prove that ${\bf TXb}=c{\bf 1}$. (Not Easy)

3.
Method 2 has often been done incorrectly in the literature from 1953 to 1970. The researchers would estimate the fixed effects as

\begin{displaymath}{\bf b}^{o} = ({\bf X}'{\bf X})^{-} {\bf X}'{\bf y}, \end{displaymath}

and used this estimate to adjust the observation vector, ${\bf y}$. Derive the correct expectations of \( (1/N){\bf y}^{a'}
{\bf 11}'{\bf y}^{a} \) and \( {\bf y}^{a'}{\bf Z}_{1}
({\bf Z}'_{1}{\bf Z}_{1})^{-1}{\bf Z}'_{1}{\bf y}^{a} \) where

\begin{displaymath}{\bf y}^{a} = {\bf y} - {\bf Xb}^{o}, \end{displaymath}

and s=1.

Henderson Method 3
L. R. Schaeffer, April 27, 1999

Method 2 does not allow models to contain interactions between fixed and random factors or nesting of random factors within fixed factors. Henderson's Method 3 is capable of handling general mixed models, but computationally the method is more demanding than Methods 1 or 2. Method 3 has been called 'the fitting constants' method because reductions in sums of squares due to fitting submodels of the full model are used. The method is unbiased and translation invariant, but is not uniquely defined in the sense that more reductions can be computed than are necessary to estimate the variance components.

Method 3 has been shown in some cases to provide more accurate estimates of variance components than Methods 1 or 2. In special design and model situations, Method 3 is equivalent to Methods 1 or 2. The reason that Method 3 gives greater accuracy seems to be due to fitting submodels such that two or more random factors are simultaneously considered rather than singly as would be the case in Methods 1 or 2.

The Calculations

For a general model with s random factors there are 2spossible submodels and reductions that could be computed, but only s + 2 reductions are needed to estimate the unknown variances. Specific guidelines for choosing the best (most accurate) set of s+ 2 reductions out of 2s reductions are not available. The following rules are recommended to uniquely define Method 3 in all situations and to hopefully provide the most accurate set of reductions. The necessary quadratic forms are a) ${\bf y}'{\bf y}$, the total sum of squares, b) the reduction due to the full model, where the least squares equations are

\begin{displaymath}{\bf W}'{\bf W} \ \hat{{\mbox{\boldmath$\beta$ }}}'
= {\bf W}'{\bf y}, \end{displaymath}

then

\begin{eqnarray*}{\rm R}({\bf b}, \ {\bf u}_{1}, \ \ldots \ ,{\bf u}_{s})
= {\r...
...y} - {\rm R}
({\mbox{\boldmath$\beta$ }}))/(N - r ({\bf W})),
\end{eqnarray*}


as in Method 2, and c) reductions due to omitting one random factor at a time from the model. Let

\begin{displaymath}{\mbox{\boldmath$\beta$ }}_{j} =
({\bf b}, \ {\bf u}_{1}, \ ...
... \ , {\bf u}_{j-1}, \
{\bf u}_{j+1}, \ \ldots \ , {\bf u}_{s})\end{displaymath}

with ${\bf u}_{j}$ omitted from the model and let ${\bf W}_{j}$ be the corresponding design matrix with ${\bf Z}_{j}$ omitted. Then the reduction due to this submodel is

\begin{displaymath}{\rm R}({\mbox{\boldmath$\beta$ }}_{j}) =
\hat{{\mbox{\boldmath$\beta$ }}}_{j}'{\bf W}_{j}^{'}
{\bf y}.\end{displaymath}

There are s such reductions with one random factor omitted at a time.

To illustrate, assume a model with three random factors A, B, and C, then the following reductions should be used. Notice that each reduction contains all of the fixed factors in the model.

\begin{eqnarray*}{\bf y}'{\bf y}, & & \\
{\rm R}({\bf b}, \ A, \ B, \ C), & & ...
... R}({\bf b}, \ A, \ C), & & \\
{\rm R}({\bf b}, \ B, \ C). & &
\end{eqnarray*}


Now assume a model with two random factors and interaction, then use

\begin{eqnarray*}{\bf y}'{\bf y}, & & \\
{\rm R}({\bf b}, \ A, \ B, \ AB), & &...
...\\
{\rm R}({\bf b}, \ A), & & \\
{\rm R}({\bf b}, \ B). & &
\end{eqnarray*}


Note that when interactions are in the model then

\begin{eqnarray*}{\rm R({\bf b}, \ A, \ AB)} & = & {\rm R({\bf b}, \ B, \ AB)} \...
...m R({\bf b}, \ A, \ B, \ AB)} \\
& = & {\rm R({\bf b}, \ AB)},
\end{eqnarray*}


and therefore some slight differences in choice of reductions is necessary when interactions are present in the model.

Suppose that in a two random factor model, factor D is a fixed factor, but that there is an interaction between D and random factor A, but not with random factor B. Then use

\begin{eqnarray*}{\bf y}'{\bf y}, & & \\
{\rm R}({\bf b}, \ A, \ B, \ AD), & &...
...{\rm R}({\bf b}, \ A, \ AD), & & \\
{\rm R}({\bf b}, \ B). & & \end{eqnarray*}


In this case

\begin{displaymath}{\rm R}({\bf b}, B, AD) = {\rm R}({\bf b}, A, B, AD) \end{displaymath}

and

\begin{displaymath}{\rm R}({\bf b}, A, AD) = {\rm R}({\bf b}, \ AD). \end{displaymath}

Expectations of Reductions

Let

\begin{eqnarray*}{\bf C}_{j} & = & ({\bf W}_{j}^{'}{\bf W}_{j})^{-}, \ {\rm then...
...\\
& = & {\bf y}'{\bf W}_{j}{\bf C}_{j}{\bf W}_{j}^{'}{\bf y}.
\end{eqnarray*}


The expectation is then

\begin{displaymath}E[{\rm R}({\mbox{\boldmath$\beta$ }}_{j})] = {\bf b}'{\bf X}'...
... W}_{j}{\bf C}_{j}{\bf W}_{j}^{'}{\bf V}_{i}){\sigma}_{i}^{2}.
\end{displaymath}

From the theory of generalized inverses,

\begin{displaymath}{\bf W}_{j}{\bf C}_{j}{\bf W}_{j}^{'}{\bf W}_{j} = {\bf W}_{j} \end{displaymath}

and consequently,

\begin{displaymath}{\bf b}'{\bf X}'{\bf W}_{j}{\bf C}_{j}{\bf W}_{j}^{'}{\bf Xb} =
{\bf b}'{\bf X}'{\bf Xb} \end{displaymath}

and

\begin{eqnarray*}tr({\bf W}_{j}{\bf C}_{j}{\bf W}_{j}^{'}{\bf Z}_{i}{\bf Z}_{i}^...
...{\rm if} \ {\bf Z}_{i} \ {\rm was \ not \ in} \
{\bf W}_{j}. \\
\end{eqnarray*}


Also, the coefficient of $\sigma_{0}^{2}$ is

\begin{displaymath}tr({\bf W}_{j}{\bf C}_{j}{\bf W}_{j}^{'}) =
tr({\bf C}_{j}{\bf W}_{j}^{'}{\bf W}_{j}), \end{displaymath}

but we know that ${\bf C}_{j}{\bf W}_{j}^{'}{\bf W}_{j}$ is idempotent and the trace of an idempotent matrix is equal to the rank of the matrix, so that $tr({\bf C}_{j}{\bf W}_{j}^{'}{\bf W}_{j}) =
{\rm r}({\bf W}_{j})$. Therefore,

\begin{displaymath}E[{\rm R}({\mbox{\boldmath$\beta$ }}_{j})] = {\bf b}'{\bf X}'...
...}'_{j}{\bf Z}_{i})({\bf Z}'_{i}{\bf W}_{j})]
{\sigma}_{i}^{2}. \end{displaymath}

An Example

Consider a three factor model with factors A, B, and C where A is a fixed factor and B and C are random factors, and the LS equations are as given below:

\begin{eqnarray*}\left( \begin{array}{r\vert rr\vert rrr\vert rrrrr}
71&40&31&28...
...\\ 115 \\ \hline
67 \\ 36 \\ 62 \\ 73 \\ 56
\end{array} \right)
\end{eqnarray*}


The reductions and their expectations are

\begin{eqnarray*}{\bf y}'{\bf y} & = & 2,834, \\
E({\bf y}'{\bf y}) & = & {\bf...
...^{2} \\
& & + {\rm r}({\bf X} \ {\bf Z}_{C}) \
\sigma_{0}^{2}.
\end{eqnarray*}


Summarizing the above,

\begin{displaymath}E \left( \begin{array}{l}
{\bf y}'{\bf y} \\ {\rm R}({\bf b},...
...}^{2} \\ \sigma_{C}^{2} \\
\sigma_{0}^{2}
\end{array} \right)
\end{displaymath}

Taking differences from ${\bf y}'{\bf y}$ eliminates the ${\bf b}'{\bf X}'{\bf Xb}$ term from all quadratic forms and gives

\begin{displaymath}\left( \begin{array}{llr}
0. & 0. & 63. \\
0. & 48.9905 & ...
...88.880352 \\ 1,506.449070 \\ 1,547.091675
\end{array} \right),
\end{displaymath}

and the solutions are $\hat{\sigma}_{B}^{2} = 0.245302$, $\hat{\sigma}_{C}^{2} = -1.570986$, and $\hat{\sigma}_{0}^{2} = 23.633021$. Notice the zeros in the above expectation matrix as a result of taking differences from ${\bf y}'{\bf y}$. This helps to reduce the sampling variances of the estimated variance components over other possible sets of reductions that could be used in Method 3.

Problems

1.
Below are feed consumption data on progeny of 3 sires and 10 dams over a ten day period after weaning. Age at weaning is known to influence feed consumption during this period.

Table of feed consumption data.
Sire Dam Sex Age at Feed Intake
      Weaning(d) (mouthfuls)
1 1 1 185 775
1 1 1 185 690
1 2 2 190 618
1 2 1 190 631
1 3 2 200 615
1 3 1 200 704
1 4 2 210 573
         
2 5 1 215 662
2 5 1 215 600
2 6 1 170 526
2 6 2 170 485
2 6 2 170 581
2 7 2 175 494
         
3 8 1 206 660
3 8 2 206 511
3 9 2 180 500
3 9 1 180 450
3 10 2 194 470
3 10 1 194 505
3 10 1 194 516
3 10 2 194 448

Assume the model is

\begin{displaymath}y_{ijkl} = \mu \ + \ {\rm b(Age)} \ + \ X_{i} \ + \ S_{j} \ + \
D_{jk} \ + \ e_{ijkl} \end{displaymath}

where yijkl is feed intake, $\mu$ is an overall mean, b(Age) is a covariate of feed intake on age at weaning, Xi is a sex of offspring effect (fixed), Sj is a random sire effect, Djk is a random dam effect, and eijkl is a random residual effect.

Estimate the variance components by Henderson's Method 3. You might want to take deviations of age at weaning from the overall average weaning age of 200-d, to avoid rounding errors.

2.
Suppose we have data to be analyzed by the following model

\begin{displaymath}y_{ijklm} = \mu \ + \ A_{i} \ + \ B_{ij} \ + \ C_{k} \ + \
D_{kl} \ + \ e_{ijklm}, \end{displaymath}

where the fixed factors of the model are $\mu$ and Ai. Write all of the possible reductions for Method 3 that could be used with this model, and mark those that you would use to estimate the variances.

Other Simple Unbiased Methods
L. R. Schaeffer, April 27, 1999

Absorption of Fixed Effects

The absorption of all fixed effects in the LS equations into the equations for the random factors essentially adjusts the right hand sides of the random effects for the fixed effects, and consequently any quadratic forms involving the absorbed right hand sides are translation invariant. Numerous methods may therefore be defined by using quadratic forms of the absorbed right hand sides. The LS equations are

\begin{displaymath}\left( \begin{array}{lr}
{\bf X}'{\bf X} & {\bf X}'{\bf Z} \\...
...ay}{c}
{\bf X}'{\bf y}\\
{\bf Z}'{\bf y}
\end{array} \right). \end{displaymath}

Let

\begin{displaymath}{\bf M} = {\bf I} - {\bf X}({\bf X}'{\bf X})^{-}{\bf X}', \end{displaymath}

and partition Z into $[ \ {\bf Z}_{1} \ \ {\bf Z}_{2} \ \ldots \
{\bf Z}_{s} \ ]$, then the absorbed random effects equations are

\begin{displaymath}\left( \begin{array}{cccc}
{\bf Z}_{1}^{'}{\bf MZ}_{1} & {\bf...
...\\
\vdots \\ {\bf Z}_{s}^{'}{\bf My} \\
\end{array} \right). \end{displaymath}

To illustrate various methods, suppose we have the following LS equations for a model with two random factors (C and D) and two fixed factors (A and B).

\begin{displaymath}\left( \begin{array}{rrrrr\vert rrrrrrrrr}
200 & 150 & 50 & 7...
... \\ 2230 \\ 9040 \\ 8135 \\ 2405 \\ 1545
\end{array} \right). \end{displaymath}

Absorbing the fixed effects $( {\mu} \ {A}_{1} \ {A}_{2} \
{B}_{1} \ {B}_{2} )$ gives (each element is multiplied by 11)

\begin{displaymath}\left( \begin{array}{rrrrrrrrr}
196.5 & -39.5 & -52.0 & -69....
...\ \hat{D}_{2}\\ \hat{D}_{3}\\
\hat{D}_{4}
\end{array} \right) \end{displaymath}


\begin{displaymath}= \left( \begin{array}{r}
4325 \\ 3030 \\ 5485 \\ -3365 \\ -10575 \\ 23255 \\ 7355 \\
-13005 \\ -17605 \end{array} \right). \end{displaymath}

Quadratic Forms Of Absorbed RHS

Let

\begin{eqnarray*}{\bf r}_{j} & = & {\bf Z}'_{j}{\bf My}, \\
\mbox{then} \ \ E({...
...so} E({\bf r}_{j}) & = & 0, \mbox{for} \ j = 1 \ \mbox{to} \ s.
\end{eqnarray*}


The variance-covariance matrix of ${\bf Z}'{\bf My}$ is

\begin{eqnarray*}Var({\bf Z}'{\bf My}) & = &
\sum_{i=0}^{s}{\bf Z}'{\bf MV}_{i}{...
... MZ}) \
\sigma_{i}^{2} \ + \ {\bf Z}'{\bf MZ} \ \sigma_{0}^{2}.
\end{eqnarray*}


Let ${\bf r}' = [ \ {\bf r}_{1}^{'} \ \ {\bf r}_{2}^{'} \ \ldots \
{\bf r}_{s}^{'} \ ]$, then a quadratic form of ${\bf r}$ has expectation

\begin{displaymath}E({\bf r}'{\bf Qr}) =
\sum_{i=1}^{s} tr[{\bf Q}({\bf Z}'{\bf ...
...\
\sigma_{i}^{2} \ + \ tr[{\bf QZ}'{\bf MZ}] \ \sigma_{0}^{2}. \end{displaymath}

These expectations are easy to calculate when Q is a diagonal matrix, in which case only the diagonals of ${\bf Z}'{\bf MZ}$ and $({\bf Z}'{\bf MZ}_{i})({\bf Z}_{i}^{'}{\bf MZ})$are needed, and these are multiplied times the diagonal elements of ${\bf Q}$. Below are the diagonal elements of the matrices needed to compute the expectations of quadratic forms of the absorbed right hand sides for the example data.
Element of Model $({\bf Z}'{\bf MZ}_{C})({\bf Z}'_{C}{\bf MZ})$ $({\bf Z}'{\bf MZ}_{D})({\bf Z}_{D}^{'}{\bf MZ})$ ${\bf Z}'{\bf MZ}$
C1 404.685950 6.640496 17.863636
C2 1,068.809917 12.400826 28.772727
C3 1,636.016529 34.016529 35.454545
C4 2,309.239669 28.739669 42.590909
C5 801.561984 67.243802 24.954545
D1 13.057851 2,573.921488 42.590909
D2 70.198347 2,664.925620 43.454545
D3 28.611570 731.838843 23.318182
D4 37.173554 863.537190 25.272727
       

MIVQUE-0

The most simple set of quadratic forms of r that could be used are ${\bf r}'{\bf Q}_{1}{\bf r}$ and ${\bf r}'{\bf Q}_{2}{\bf r}$

\begin{eqnarray*}{\bf Q}_{1} & = & {\rm {\bf diag}}( \ 1 \ 1 \ 1 \ 1 \ 1 \ 0 \ 0...
...& {\rm {\bf diag}}( \ 0 \ 0 \ 0 \ 0 \ 0 \ 1 \ 1 \ 1 \ 1 \ ), \\
\end{eqnarray*}


where diag indicates a diagonal matrix whose diagonals are the elements within the parentheses.

Assume that ${\bf y}'{\bf My}$ = 668,160.62 for this example with r(M) = 200 - 3 = 197, degress of freedom, then

\begin{eqnarray*}{\bf r}'{\bf Q}_{1}{\bf r} & = &
(1/121) \ [ \ 4325^{2} + 3030^...
..._{C}^{2} + 134.636 \ \sigma_{D}^{2} +
197 \ \sigma_{0}^{2}, \\
\end{eqnarray*}


The resulting equations are

\begin{displaymath}\left( \begin{array}{rrl}
6220.314 & 149.041 & 149.636 \\
1...
...,496,905.8 \\
8,875,678.5 \\
668,160.62 \end{array} \right) \end{displaymath}

which gives $\hat{\sigma}_{C}^{2} = 152.46, \ \
\hat{\sigma}_{D}^{2} = 1247.65$, and $\hat{\sigma}_{0}^{2} = 2423.19.$

The above procedure is known as MIVQUE-0. MIVQUE-0 stands for Minimum Variance Quadratic Unbiased Estimation using zeros for prior values of the variance components except $\sigma_{0}^{2}$. Calling this procedure MIVQUE is very misleading because the estimates from MIVQUE-0 have been shown to have very large sampling variances. However, because the method is very easy to apply, MIVQUE-0 estimates are frequently used as starting values for other methods of estimation. The estimates from MIVQUE-0 are unbiased and translation invariant, but are not minimum variance.

Quadratics Using Diagonals Of LS Equations

Many methods could be described by simply changing the definition of ${\bf Q}$. For example, define ${\bf Q}$ as a diagonal matrix with corresponding elements equal to the diagonals of $({\bf Z}_{j}^{'}{\bf Z}_{j})^{-1}$ and the rest to zero. For the example data, let

\begin{eqnarray*}{\bf Q}_{3} & = & {\rm diag}( \ .05 \ .028571 \ .02 \ .015385 \...
....028571)13005^{2} + (.033333)17605^{2} \ ] \\
& = & 200,464.95.
\end{eqnarray*}


The expectations are

\begin{eqnarray*}E({\bf r}'{\bf Q}_{3}{\bf r}) & = &
[ \ (.05)404.69 + (.028571...
...gma_{C}^{2} + 127.363 \ \sigma_{D}^{2} +
2.785 \ \sigma_{0}^{2}
\end{eqnarray*}


The resulting estimates using the above quadratics with ${\bf y}'{\bf My}$are

\begin{displaymath}\hat{\sigma}_{C}^{2} = 191.38, \ \ \hat{\sigma}_{D}^{2} = 1495.44, \ \ \
{\rm and} \ \ \hat{\sigma}_{0}^{2} = 3366.74. \end{displaymath}

Henderson's Method 4

In 1980 Henderson presented a particular set of ${\bf Q}$ matrices for the preceding methodology, which was formally described in his book (1984). This method has been given various names such as Henderson's New Method, Diagonal MIVQUE, and Henderson's Method 4. Dempfle et al. (1983) showed that the sampling variances of Method 4 estimates were similar to and sometimes better than those of Method 3 estimates.

Method 4 uses a specific set of quadratic forms. The motivation for these particular forms is that they more closely approximate the quadratic forms used with MIVQUE (which will be discussed later). The specific quadratic forms are, for the example data:

${\bf Q}_{5}$
= diagonals of $({\bf Z}_{C}^{'}{\bf MZ}_{C}
+ {\bf I}{k}_{C})^{-2}$, plus zeros for the rows corresponding to other random factors, where kC = a guess of the value of $\sigma_{0}^{2}/\sigma_{C}^{2}$, and
${\bf Q}_{6}$
= diagonals of $({\bf Z}_{D}^{'}{\bf MZ}_{D}
+ {\bf I}{k}_{D})^{-2}$, plus zeros for the rows corresponding to other random factors, where kD = a guess of the value of $\sigma_{0}^{2}/\sigma_{D}^{2}$.
Assume that kC = 16 and that kD = 2, then

\begin{eqnarray*}{\bf Q}_{5} & = & (10^{-4})
[8.720 \ 4.989 \ 3.777 \ 2.913 \ 5...
...)
[0 \ \ 0 \ \ 0 \ \ 0 \ \ 0 \ \ 5.029 \ 4.84 \ 15.6 \ 13.444].
\end{eqnarray*}


Completing the calculations,

\begin{eqnarray*}{\bf r}'{\bf Q}_{5}{\bf r} & = & 844.85 \\
{\rm and} \ \ {\bf ...
...gma_{C}^{2} + 4.8869 \ \sigma_{D}^{2} + .1128 \
\sigma_{0}^{2}.
\end{eqnarray*}


The estimates, using ${\bf y}'{\bf My}$ and the above quadratics, are

\begin{displaymath}\hat{\sigma}_{C}^{2} = 184.68, \ \ \
\hat{\sigma}_{D}^{2} = 1571.71, \ \ {\rm and} \ \
\hat{\sigma}_{0}^{2} = 3390.71. \end{displaymath}

In all of the methods in this section, ${\bf y}'{\bf My}$ can be replaced by any unbiased estimate of ${\sigma}_{0}^{2}.$

Problem

Below are parts of the OLS equations for a model with five factors. The first three factors are fixed and the last two factors are random (each with 6 levels). The total sum of squares was 3,143,224 based on 41 observations.

\begin{displaymath}{\bf X}'{\bf X} = \left( \begin{array}{rrrrrrrrrr}
11 & 0 & ...
...\
2 & 5 & 1 & 4 & 4 & 0 & 0 & 0 & 0 & 8 \end{array} \right), \end{displaymath}


\begin{displaymath}{\bf X}'{\bf Z} = \left( \begin{array}{rrrrrrrrrrrr}
4 & 4 &...
...1 & 3 & 1 & 3 & 0 & 0 & 2 & 1 & 2 & 3 & 0
\end{array} \right), \end{displaymath}


\begin{eqnarray*}{\bf Z}'_{1}{\bf Z}_{1} & = & {\rm diag}(7 \ 7 \ 7 \ 7 \ 7 \ 6 ...
...\bf Z}'_{2}{\bf Z}_{2} & = & {\rm diag}(6 \ 7 \ 6 \ 7 \ 8 \ 7 ),
\end{eqnarray*}



\begin{displaymath}{\bf Z}'_{1}{\bf Z}_{2} = \left( \begin{array}{rrrrrr}
4 & 2...
...& 1 & 1 & 1 & 2 \\ 0 & 0 & 0 & 3 & 1 & 2
\end{array} \right), \end{displaymath}

and the right hand sides are

\begin{displaymath}{\bf X}'{\bf y} = \left( \begin{array}{r}
3338 \\ 3365 \\ 40...
...\\ 2441 \\ 2087 \\ 2295 \\
2195 \\ 1776 \end{array} \right), \end{displaymath}


\begin{displaymath}{\bf Z}'_{1}{\bf y} = \left( \begin{array}{r}
2541 \\ 1954 \...
... \\ 2146 \\ 1674 \\ 1669 \\ 1624 \\ 1480
\end{array} \right). \end{displaymath}

Estimate the variances using MIVQUE-0, Henderson's Method 4, and another set of quadratic forms of your own choice.

MIVQUE
L. R. Schaeffer, April 27, 1999

During the late 1960's, statisticians began to develop methods of variance component estimation that had more desirable properties than simply unbiasedness and translation invariance. Two researchers, working independently, derived minimum variance quadratic unbiased estimators (MIVQUE) in 1970. C.R. Rao published four papers in his newly formed journal on MIVQUE for ${\bf y}$ being normally distributed. Rao also derived a method called MINQUE (minimum norm quadratic unbiased estimation) for cases when ${\bf y}$ does not have a normal distribution, but when ${\bf y}$ is normally distributed then minimizing the Euclidean norm, as Rao does, is equivalent to minimizing the variance of the quadratic form which gives MIVQUE. In order to minimize the variance of quadratic forms, the true unknown variances must be known.

At the same time, L.R. LaMotte, at the University of Kentucky, also derived MIVQUE in bulletins which were not formally published in a statistical journal. Henderson(1973) took the MIVQUE method and derived easier computing formulas based upon solutions to the mixed model equations (MME) and upon the elements of the generalized inverse of the coefficient matrix of the MME.

MIVQUE assumes that ${\bf V}$, the variance-covariance matrix of ${\bf y}$, is known. The variances of the quadratic forms are minimized for this ${\bf V}$. Since ${\bf V}$ is not known in reality, some prior information about ${\bf V}$ is utilized, and consequently the variance of quadratic forms is only minimized for this prior ${\bf V}$. MIVQUE is unbiased and translation invariant and if the prior ${\bf V}$ is the same as the true ${\bf V}$, then it also has minimum variance. Thus, MIVQUE in general, is not minimum variance in practice, but is only as good as the prior values that are used.

Quadratic Forms

Assume that ${\bf y}$ has a multivariate normal distribution with mean vector ${\bf Xb}$, and variance-covariance matrix ${\bf V}$. Let ${\theta}_{i}$ be the prior information about ${\sigma}_{i}^{2}$ for i = 0 to s, then let ${\bf H}$ be our prior ${\bf V}$ which is

\begin{displaymath}{\bf H} = \sum_{j=1}^{s}{\bf Z}_{j}{\bf Z}'_{j}{\theta}_{j}
+ {\bf I}{\theta}_{0}. \end{displaymath}

A projection matrix is constructed as

\begin{displaymath}{\bf P} = {\bf H}^{-1} - {\bf H}^{-1}{\bf X}({\bf X}'{\bf H}^{-1}
{\bf X})^{-}{\bf X}'{\bf H}^{-1}. \end{displaymath}

A projection matrix is simply a way of absorbing the fixed effects and at the same time accounting for information about ${\bf V}$. Note that ${\bf PX} = {\bf0}$ and that ${\bf PH}$ is idempotent. The MIVQUE quadratic forms are then

\begin{displaymath}{\bf y}'{\bf PZ}_{i}{\bf Z}'_{i}{\bf Py} \ {\rm for} \ i = 1 \
{\rm to} \ s \ {\rm and} \ {\bf y}'{\bf PPy}.
\end{displaymath}

Note that these are quadratic forms of ${\bf Py}$ and since $E({\bf Py})$= 0, then the quadratic forms are translation invariant. Also,

\begin{eqnarray*}\sum_{i=1}^{s}{\bf y}'{\bf PZ}_{i}{\bf Z}'_{i}{\bf Py} \
{\thet...
...theta}_{0} & = & {\bf y}'{\bf PHPy} \\
& = & {\bf y}'{\bf Py}.
\end{eqnarray*}


To simplify these quadratic forms notice that

\begin{eqnarray*}{\bf Py} & = & ({\bf H}^{-1} - {\bf H}^{-1}{\bf X}({\bf X}'
{\b...
... ({\bf X}'{\bf H}^{-1}{\bf X})^{-}
{\bf X}'{\bf H}^{-1}{\bf y}.
\end{eqnarray*}


Therefore,

\begin{displaymath}{\bf y}'{\bf PZ}_{i}{\bf Z}'_{i}{\bf Py} = ({\bf y} -
{\bf X}...
...Z}_{i}{\bf Z}'_{i}
{\bf H}^{-1}({\bf y} - {\bf X}\hat{\bf b}).
\end{displaymath}

Henderson (1973) noticed that

\begin{displaymath}\hat{\bf u}_{i} = {\bf G}_{i}{\bf Z}'_{i}{\bf Py} \end{displaymath}

where ${\bf G}_{i}$ is the assumed variance covariance matrix of ${\bf u}_{i}$(in this case ${\bf G}_{i} = {\bf I} \ {\theta}_{i})$, and $\hat{\bf u}_{i}$ is BLUP of ${\bf u}_{i}$ and the solution of $\hat{\bf u}_{i}$ from the MME, then

\begin{eqnarray*}{\bf y}'{\bf PZ}_{i}{\bf Z}'_{i}{\bf Py}
& = & {\bf y}'{\bf PZ}...
...^{2} \ \
{\rm when} \ {\bf G}_{i} = {\bf I} \ {\theta}_{i}. \\
\end{eqnarray*}


Similarly, Henderson showed that

\begin{displaymath}{\bf y}'{\bf PPy} = ({\bf y}'{\bf y} \ - \
\hat{\bf b}'{\bf ...
...}\hat{\bf u}_{i}^{'}\hat{\bf u}_{i}{\alpha}_{i})/
{\theta}_{0}
\end{displaymath}

where ${\alpha}_{i} = {\theta}_{0}/{\theta}_{i}.$

Thus, under the basic linear model for variance component estimation the necessary quadratic forms for MIVQUE are

\begin{displaymath}\hat{\bf u}'_{i}\hat{\bf u}_{i} \ \ {\rm and} \ \ {\bf y}'{\b...
...bf y} \ - \
\sum_{i=1}^{s}\hat{\bf u}'_{i}{\bf Z}'_{i}{\bf y}.
\end{displaymath}

Expectations

The MME are

\begin{displaymath}\left( \begin{array}{lllcl}
{\bf X}'{\bf X} & {\bf X}'{\bf Z}...
...bf Z}'_{s}{\bf Z}_{s}+{\bf I}\alpha_{s}\\
\end{array} \right) \end{displaymath}


\begin{displaymath}\left( \begin{array}{c}
\hat{\bf b} \\ \hat{\bf u}_{1} \\ \ha...
...}{\bf y}\\
\vdots \\ {\bf Z}'_{s}{\bf y}
\end{array} \right), \end{displaymath}

and let a generalized inverse of the coefficient matrix be represented as

\begin{displaymath}{\bf C} =
\left( \begin{array}{ccccc}
{\bf C}_{xx} & {\bf C}_...
... \\ {\bf C}_{2} \\ \vdots \\
{\bf C}_{s}
\end{array} \right). \end{displaymath}

Now let ${\bf r} = {\bf W}'{\bf y}$, the right hand sides of the MME, then

\begin{eqnarray*}Var({\bf r}) & = & {\bf W}'Var({\bf y}){\bf W} \\
& = & \sum_...
...ma_{0}^{2} \\
& = & \sum_{j=0}^{s}{\bf U}_{j} \sigma_{j}^{2}.
\end{eqnarray*}


Also,

\begin{displaymath}\hat{\bf u}_{i} = {\bf C}_{i}{\bf r}, \end{displaymath}

and,

\begin{displaymath}\hat{\bf u}'_{i}\hat{\bf u}_{i} = {\bf r}'{\bf C}'_{i}
{\bf C}_{i}{\bf r}. \end{displaymath}

The expectation of $\hat{\bf u}'_{i}\hat{\bf u}_{i}$ is

\begin{displaymath}E(\hat{\bf u}'_{i}\hat{\bf u}_{i}) = \ \sum_{j=0}^{s}
tr({\bf C}'_{i}{\bf C}_{i}{\bf U}_{j}) \ \sigma_{j}^{2}.
\end{displaymath}

This expectation can be simplified by noting that (for ${\bf X}$having full rank)

\begin{displaymath}{\bf CW}'{\bf W} =
\left( \begin{array}{ccccc}
{\bf I}_{x} & ...
... &
\ldots & {\bf I}-{\bf C}_{ss}\alpha_{s}
\end{array} \right) \end{displaymath}

then, for example, for i not equal to j,

\begin{eqnarray*}tr({\bf C}'_{i}{\bf C}_{i}{\bf U}_{j})
& = & tr({\bf C}'_{i}{\b...
...{j}) \ ] \\
& = & tr({\bf C}_{ji}{\bf C}_{ij}) \alpha_{j}^{2}.
\end{eqnarray*}


The expected value of $\hat{\bf u}'_{i}\hat{\bf u}_{i}$ can be written as

\begin{eqnarray*}E(\hat{\bf u}'_{i}\hat{\bf u}_{i})
& = & (q_{i} - 2 \ tr{\bf C...
...s}
tr{\bf C}'_{ij}{\bf C}_{ij}
\alpha_{j} \ ) \ \sigma_{0}^{2}
\end{eqnarray*}


In the same way,

\begin{displaymath}E({\bf y}'{\bf y}) = {\bf b}'{\bf X}'{\bf Xb} + N \
\sum_{j=0}^{s}\sigma_{j}^{2}, \end{displaymath}

and

\begin{displaymath}\hat{\bf b}'{\bf X}'{\bf y} + \sum_{i=1}^{s}\hat{\bf u}'_{i}
{\bf Z}'_{i}{\bf y} \ = \ {\bf r}'{\bf Cr}. \end{displaymath}

Then

\begin{displaymath}E({\bf r}'{\bf Cr}) \ = \ {\bf b}'{\bf X}'{\bf WCW}'{\bf Xb} +
\sum_{j=0}^{s} tr{\bf CU}_{j}\sigma_{j}^{2}, \end{displaymath}

but

\begin{displaymath}{\bf X}'{\bf WCW}'{\bf X} = {\bf X}'{\bf X} \end{displaymath}

so that

\begin{eqnarray*}E({\bf y}'{\bf y}-{\bf r}'{\bf Cr}) & = & \sum_{j=0}^{s}
(N - ...
...{i=1}^{s}(q_{i} - tr{\bf C}_{ii}
\alpha_{i})) \ \sigma_{0}^{2}.
\end{eqnarray*}


MIVQUE requires the inverse elements of the MME in order to calculate the expectations of the quadratic forms. In most animal breeding applications, obtaining an inverse of the coefficient matrix could be impossible due to the large order of the equations.

Example

Assume a model with one fixed factor and one random factor. The least squares equations are given below, and ${\bf y}'{\bf y}$ = 770.

\begin{displaymath}\left( \begin{array}{rrrrr}
12 & 0 & 4 & 3 & 5 \\ 0 & 8 & 0 &...
...egin{array}{r}
63 \\ 52 \\ 28 \\ 57 \\ 30 \end{array} \right). \end{displaymath}

Let factor D be random and assume that $\alpha_{D} = \theta_{0} / \theta_{D} = 2$, then to form the MME add 2 to the diagonals of the equations for factor D. The variance-covariance matrix of the right hand sides is

\begin{displaymath}V({\bf r}) = {\bf U}_{0}\sigma_{0}^{2} + {\bf U}_{D}\sigma_{D}^{2} \end{displaymath}

where ${\bf U}_{0} = {\bf W}'{\bf W}$, the LS equations, and ${\bf U}_{D} = {\bf W}'{\bf Z}_{D}{\bf Z}'_{D}{\bf W}$, that is

\begin{eqnarray*}{\bf U}_{D} & = &
\left( \begin{array}{rrr}
4 & 3 & 5 \\ 0 & 6 ...
...& 54 & 0 & 81 & 0\\
35 & 14 & 0 & 0 & 49\\ \end{array} \right).
\end{eqnarray*}


With $\alpha_{D} \ = \ 2$ then

\begin{eqnarray*}{\bf C} & = &
\left( \begin{array}{rrrrr}
.25158 & .16139 & -....
...-.17563 & -.16456 & .11709 & .13766 & .24525
\end{array} \right)
\end{eqnarray*}


and the solutions to the mixed model equations are

\begin{displaymath}{\bf Cr} = \left( \begin{array}{c}
\hat{A}_{1} \\ \hat{A}_{2...
... \\ 6.7563 \\ 1.1013 \\ .0380 \\ -1.1392
\end{array} \right). \end{displaymath}

The quadratic forms are

\begin{eqnarray*}\hat{\bf u}'_{D}\hat{\bf u}_{D} & = & \hat{D}_{1}^{2} + \hat{D}...
...
& = & 2.5121 \\
& = & {\bf r}'{\bf C}'_{D}{\bf C}_{D}{\bf r}
\end{eqnarray*}


where

\begin{eqnarray*}{\bf C}_{D} & = &
\left( \begin{array}{rrrrr}
-.16772 & -.10759...
....08428 & -.08432 & .07570 & .08149 & .09281
\end{array} \right).
\end{eqnarray*}


The other quadratic form is

\begin{eqnarray*}\hat{\bf b}'{\bf X}'{\bf y} + \hat{\bf u}'_{D}{\bf Z}'_{D}{\bf ...
...
& & \mbox{ } + .0380(57)-1.1392(30) \\
& = & {\bf r}'{\bf Cr}.
\end{eqnarray*}


Thus,

\begin{displaymath}{\bf y}'{\bf y} - {\bf r}'{\bf Cr} = 770 - 687.0823 = 82.9177. \end{displaymath}

The expectations of these quadratic forms are as follows:

\begin{eqnarray*}E({\bf r}'{\bf C}'_{D}{\bf C}_{D}{\bf r})
& = & E(\hat{\bf u}'_...
...} \\
& = & 1.038816 \ \sigma_{D}^{2} + .198946 \ \sigma_{0}^{2}
\end{eqnarray*}


also note that

\begin{eqnarray*}tr({\bf C}'_{D}{\bf C}_{D}{\bf U}_{D})
& = & (q_{D} - 2 \ {\rm ...
... \\
& = & 20 \ - \ 2 - \ (3 - .78164(2)) \\
& = & 16.563291.
\end{eqnarray*}


Equate the quadratic forms to their expectations and solve for the estimates of variance components.

\begin{eqnarray*}\left( \begin{array}{l}
\hat{\sigma}_{D}^{2} \\
\hat{\sigma}_{...
...\left( \begin{array}{r}
1.509653\\ 4.744218
\end{array} \right).
\end{eqnarray*}


Problem

Below are the OLS equations for some data for a model with two random factors (B and C) and one fixed factor (A). The total number of observations was 200, and the total sum of squares was 1,000,000. Assume that the prior variance ratios were

\begin{displaymath}\alpha_{B} = 9, \ {\rm and} \ \alpha_{C} = 5. \end{displaymath}

Apply MIVQUE and one of the previous methods to this example data.

\begin{displaymath}\left( \begin{array}{rrrrrrrrrrrr}
25 & 0 & 0 & 5 & 10 & 10 ...
...300 \\ 2200 \\
800 \\ 50 \\ 1600 \\ 350 \end{array} \right). \end{displaymath}

REML
L. R. Schaeffer, April 28, 1999

Restricted maximum likelihood (REML), was first suggested by Thompson (1962), but was described formally by Patterson and Thompson (1971).

1.
${\bf y}$ must have a multivariate normal distribution.
2.
The method is translation invariant.
3.
The estimator is restricted to the allowable parameter space.
4.
REML is a biased procedure.
5.
The exact same quadratic forms as MIVQUE are utilized.
There are several methods of obtaining REML estimators. One method is to take the MIVQUE quadratic forms and derive new expectations assuming that the prior values are equal to the true parameter values. REML estimates can also be obtained by iterating on the MIVQUE quadratic forms and their expectations. The MIVQUE estimates are used to form new prior values of the variance ratios and apply MIVQUE over and over until the estimates and prior values are equal. The final estimates are REML if the variance estimates remain positive. If any of the estimates become negative, then the iteration process must be stopped and REML estimates can not be obtained in that manner.

Derivation of REML from MIVQUE

From MIVQUE, the expectation of quadratic forms of solutions to mixed model equations were as follows:

\begin{eqnarray*}E(\hat{\bf u}'_{i}\hat{\bf u}_{i})
& = & (q_{i} - 2 \ tr{\bf C}...
...=1}^{s} tr{\bf C}'_{ij}
{\bf C}_{ij}{\alpha}_{j})\sigma_{o}^{2}
\end{eqnarray*}


where ${\alpha}_{i} = {\theta}_{o}/{\theta}_{i}$ and ${\theta}_{o}$and ${\theta}_{i}$ are assumed prior values of $\sigma_{o}^{2}$ and ${\sigma}_{i}^{2}$. To derive REML assume that ${\alpha}_{i} = {\sigma}_{o}^{2}/{\sigma}_{i}^{2}$ and substitute into the above expectation to obtain

\begin{displaymath}E(\hat{\bf u}'_{i}\hat{\bf u}_{i}) = q_{i}{\sigma}_{i}^{2} \ - \
tr{\bf C}_{ii}{\sigma}_{o}^{2}, \end{displaymath}

which gives

\begin{displaymath}\hat{\sigma}_{i}^{2} \ = \ (\hat{\bf u}'_{i}\hat{\bf u}_{i} \ + \
tr{\bf C}_{ii}\hat{\sigma}_{o}^{2})/q_{i} \end{displaymath}

Notice that $\hat{\bf u}'_{i}\hat{\bf u}_{i}$ is a sum of squares and therefore always positive, and that $tr{\bf C}_{ii}$ is always positive because the diagonals of ${\bf C}_{ii}$ represent the variances of prediction error (i.e. $ Var(\hat{\bf u}_{i} - {\bf u}_{i})).$Hence the estimator, $\hat{\sigma}_{i}^{2}$ is always positive, i.e. within the parameter space.

Likewise,

\begin{displaymath}E({\bf y}'{\bf y} \ - \ \hat{\bf b}'{\bf X}'{\bf y} \ - \
\su...
...'_{i}{\bf Z}'_{i}{\bf y}) \ = \
(N - r({\bf X}))\sigma_{o}^{2} \end{displaymath}

under the assumption that $\alpha_{i} = \sigma_{o}^{2}/\sigma_{i}^{2}.$Then

\begin{displaymath}\hat{\sigma}_{o}^{2} \ = \ ({\bf y}'{\bf y} \ - \
\hat{\bf b...
...=1}^{s}\hat{\bf u}'_{i}{\bf Z}'_{i}{\bf y})/
(N - r({\bf X})). \end{displaymath}

The above pseudo expectations give formulas for $\hat{\sigma}_{i}^{2}$ and $\hat{\sigma}_{o}^{2}$ that are known as the EM algorithm. EM stands for Expectation Maximization, a procedure described by Dempster et al. (1977). REML estimates are obtained by using $\hat{\sigma}_{o}^{2}$ and $\hat{\sigma}_{i}^{2}$ to form new ratios, $\hat{\alpha}_{i}$, then the new ratios are used as priors to calculate a new $\hat{\bf u}_{i}$, until the ratio used as a prior and the estimated ratio are equal. With the REML EM algorithm,
1.
The user must iterate until convergence.
2.
Convergence is not guaranteed.
3.
The final estimate may not maximize the likelihood function over the entire range of parameter values. This would be convergence to a local maximum rather than a global maximum.
4.
Convergence will normally occur if the number of observations is large.
5.
With few observations and several random factors there may be problems in reaching convergence.

An Example

Below are the mixed model equations for an example with three factors, two of which are random.


\begin{displaymath}\left( \begin{array}{rrrrrrrrr}
50 & 0 & 5 & 15 & 30 & 5 & 10...
...860 \\ 3140 \\ 700 \\ 1320 \\ 2400 \\ 1160
\end{array} \right) \end{displaymath}

The total sum of squares is \({\bf y}'{\bf y} = 356,000\).

To demonstrate EM REML, let \(\alpha_{A} = 10\) and \(\alpha_{B} = 5\) be the starting values of the ratios for factors A and B, respectively. The solution vector is

\begin{displaymath}\left( \begin{array}{r}
F_{1} \\ F_{2} \\ A_{1} \\ A_{2} \\ A...
...\ 5.1064 \\
2.6402 \\ -2.6433 \\ -5.1034 \end{array} \right) .\end{displaymath}

Then

\begin{displaymath}{\bf y}'({\bf X}\hat{{\bf b}}+{\bf Z}\hat{{\bf u}})
\mbox{ = } 347,871.2661 \end{displaymath}

and from the inverse of the coefficient matrix,

\begin{displaymath}tr{\bf C}_{AA} = .16493, \mbox{ and }
tr{\bf C}_{BB} = .3309886 \end{displaymath}

which give rise to the following estimates,

\begin{eqnarray*}\hat{\sigma}_{0}^{2} & = & (356,000 - 347,871.2661) / 88 \\
&...
... = & (66.0771576+.3309886(92.371976))/4 \\
& = & 24.16280774.
\end{eqnarray*}


New ratios are formed as

\begin{displaymath}\alpha_{A} = 92.371976/7.575855 = 12.192944, \end{displaymath}

and

\begin{displaymath}\alpha_{B} = 92.371976/24.16280774 = 3.822899 \end{displaymath}

and these are used to form the mixed model equations again, new solutions and traces are calculated, and so on, until the estimated ratios and the prior values of the ratios are equal. The estimates in this example converge to

\begin{displaymath}\hat{\sigma}_{0}^{2} \mbox{ = } 91.8639, \end{displaymath}


\begin{displaymath}\hat{\sigma}_{A}^{2} \mbox{ = } 2.5692, \end{displaymath}

and

\begin{displaymath}\hat{\sigma}_{B}^{2} \mbox{ = } 30.5190. \end{displaymath}

Derivation of REML from the likelihood function and a description of derivative free REML are given in the regular notes for the course 10-638 Animal Models.

Maximum Likelihood
L. R. Schaeffer, April 28, 1999

Hartley and Rao (1967) described the maximum likelihood approach for the estimation of variance components.

1.
ML uses the log likelihood function of ${\bf y}$.
2.
ML uses the exact same quadratic forms as REML and MIVQUE.
3.
ML is translation invariant, but biased.
Henderson(1973) derived ML estimates from mixed model equations, and ML is a more biased method than REML especially the estimator of the residual variance. Estimates of the residual variance are generally smaller with ML than with REML due to the difference in degrees of freedom. REML was proposed in order to account for the loss in degrees of freedom in ML due to estimating the fixed effects. The main disadvantages of ML are the same as for REML in that the procedure is iterative and convergence is not guaranteed. An advantage of ML over REML is that in some models the traces that need to be calculated are easier than those needed for REML (Schaeffer 1976).

Derivation

Assume that ${\bf y}$ is normally distributed with mean ${\bf Xb}$ and variance-covariance matrix ${\bf V}$ and follows the basic linear mixed model. Except for a constant, the log likelihood function of ${\bf y}$ is

\begin{displaymath}L({\bf y}) = -.5 \ \ln \mid {\bf V}\mid - .5 \ ({\bf y} - {\bf Xb})'
{\bf V}^{-1}({\bf y} - {\bf Xb}) \end{displaymath}

The derivatives of $ L({\bf y})$ with respect to ${\bf b}$ and to ${\sigma}_{i}^{2}$ for $i=0,1, \ \ldots \, s$ are

\begin{eqnarray*}\partial L({\bf y})/ \partial{\bf b} & = &
{\bf X}'{\bf V}^{-1...
...{\bf Xb})'
{\bf V}^{-1}{\bf V}_{i}{\bf V}^{-1}({\bf y}-{\bf Xb})
\end{eqnarray*}


Equating the derivatives to zero gives

\begin{eqnarray*}\hat{\bf b} & = & ({\bf X}'{\bf V}^{-1}{\bf X})^{-}
{\bf X}'{\...
... V}^{-1}{\bf V}_{i}{\bf V}^{-1}
({\bf y} - {\bf X}\hat{\bf b}).
\end{eqnarray*}


Recall that ${\bf Py} = {\bf V}^{-1}({\bf y} - {\bf X}\hat{\bf b})$, where ${\bf P}$ is the projection matrix, and that ${\bf V}_{i} = {\bf Z}_{i}
{\bf Z}'_{i}$, then the latter equation becomes

\begin{eqnarray*}tr[{\bf V}^{-1}{\bf Z}_{i}{\bf Z}'_{i}] & = &
{\bf y}'{\bf PV}...
...f Py} \\
& = & \hat{\bf u}'_{i}\hat{\bf u}_{i}/\sigma_{i}^{4}.
\end{eqnarray*}


Notice that the right hand side of the above equation is the same as in the REML and MIVQUE derivations. However, the difference among the methods is in the left hand side of this equation. For REML, we have $ tr[{\bf PV}_{i}]$ and for MIVQUE it is $ tr[{\bf PVPV}_{i}]$, and for ML it is $ tr[{\bf V}^{-1}{\bf V}_{i}]$. For ML, the left hand can be manipulated as

\begin{eqnarray*}tr[{\bf V}^{-1}{\bf V}_{i}] & = & tr[({\bf R}^{-1} -
{\bf R}^{-...
...{\bf G}^{-1})^{-1}
{\bf Z}'{\bf R}^{-1}){\bf Z}_{i}{\bf Z}'_{i}]
\end{eqnarray*}


and if

\begin{eqnarray*}{\bf T} & = & ({\bf Z}'{\bf R}^{-1}{\bf Z} + {\bf G}^{-1})^{-1}...
...2}, \\
{\rm and} \ {\bf G} & = & \sum^{+}{\bf I}\sigma_{i}^{2},
\end{eqnarray*}


then

\begin{eqnarray*}tr[{\bf V}^{-1}{\bf V}_{i}] & = & tr({\bf Z}'_{i}
{\bf Z}_{i})...
...2} \ - \
tr({\bf Z}'_{i}{\bf ZTZ}'{\bf Z}_{i}) \sigma_{0}^{-4}.
\end{eqnarray*}


If ${\bf T}$ can be partitioned into submatrices for each random factor, then

\begin{eqnarray*}{\bf T}\sigma_{0}^{-2}({\bf Z}'{\bf Z} + \sum^{+}{\bf I}\alpha_{i})
& = & {\bf I},
\end{eqnarray*}


and

\begin{eqnarray*}{\bf T}{\bf Z}'{\bf Z}\sigma_{0}^{-2} & = & {\bf I} - {\bf T}
(\sum^{+}{\bf I}\sigma_{i}^{-2}).
\end{eqnarray*}


then

\begin{eqnarray*}tr({\bf Z}'_{i}{\bf ZTZ}'{\bf Z}_{i})\sigma_{0}^{-4} & = &
tr(...
...-2} - tr({\bf I}
- {\bf T}_{ii}\sigma_{i}^{-2})\sigma_{i}^{-2}.
\end{eqnarray*}


Finally,

\begin{eqnarray*}tr[{\bf V}^{-1}{\bf V}_{i}] & = & tr({\bf I} - {\bf T}_{ii}
\s...
...\\
& = & q_{i}\sigma_{i}^{-2} - tr{\bf T}_{ii}\sigma_{i}^{-4}.
\end{eqnarray*}


Combining results gives

\begin{displaymath}\hat{\sigma}_{i}^{2} = (\hat{\bf u}'_{i}\hat{\bf u}_{i} +
tr{\bf T}_{ii}\hat{\sigma}^{2}_{0})/q_{i} \end{displaymath}

for $i = 1, 2, \ \ldots \ ,s$, and for i=0 gives

\begin{displaymath}\hat{\sigma}_{0}^{2} = ({\bf y}'{\bf y} - \hat{\bf b}'{\bf X}'{\bf y} -
\hat{\bf u}'{\bf Z}'{\bf y})/ \ N. \end{displaymath}

There are two major differences in these formulas from those for REML, namely,
1.
$ tr{\bf T}_{ii}\hat{\sigma}^{2}_{0}$ in ML versus $ tr{\bf C}_{ii}
\hat{\sigma}_{0}^{2}$ in REML in $\hat{\sigma}_{i}^{2}$ formula.
2.
N in ML versus $(N-r({\bf X}))$ in REML as denominator of
$({\bf y}'{\bf y} - \hat{\bf b}'{\bf X}'{\bf y} - \hat{\bf u}'
{\bf Z}'{\bf y})$ in $\hat{\sigma}_{0}^{2}$ formula.
Note that

\begin{displaymath}{\bf T} = ({\bf Z}'{\bf Z} + \sum^{+}{\bf I}\alpha_{i})^{-1} \end{displaymath}

and

\begin{displaymath}{\bf C} = ({\bf Z}'{\bf MZ} + \sum^{+}{\bf I}\alpha_{i})^{-1}. \end{displaymath}

ML, therefore, does not account for the degrees of freedom needed to estimate ${\bf b}$, as reflected by ${\bf T}$ and by N in the above formulas.

Example With One Random Factor

The computations for ML become fairly simple with only one random factor in the model. In that case, ${\bf T}$ is a diagonal matrix and its elements can be readily computed. To illustrate, consider the least squares equations below for a model with fixed herd effects, H, and random sire effects, S, in an analysis of milking speed observations.

\begin{displaymath}\left( \begin{array}{rrrrrrr}
8 & 0 & 2 & 3 & 1 & 2 & 0\\ 0 &...
...4\\ 67.2\\ 26.5\\ 30.0\\ 14.1\\ 26.2\\ 6.8
\end{array} \right) \end{displaymath}

Also let ${\bf y}'{\bf y} = 500$ and $\alpha_{s} = 6$, then the solutions are

\begin{eqnarray*}\begin{array}{rrlrlr}
\hat{H}_{1} = & 4.44037, & \hat{H}_{2} = ...
...3, \\
{\rm and} \ \ \hat{S}_{5} = & -.34670, &&&&\\
\end{array}\end{eqnarray*}


The following quantities are easily calculated.

\begin{eqnarray*}{\bf y}'{\bf y}-\hat{\bf b}'{\bf X}'{\bf y}-\hat{\bf u}'{\bf Z}...
...igma}_{s}^{2} & = & (.268595 + .568493 \ (.421104))/5 = .101598.
\end{eqnarray*}


In this example, ${\bf T}_{ss} = ({\bf Z}'_{s}{\bf Z}_{s}
+ {\bf I}\alpha_{s})^{-1}$, and therefore must be multiplied by $\hat{\sigma}_{0}^{2}$ in the computation of $\hat{\sigma}_{s}^{2}$. This gives a new variance ratio of $\hat{\alpha}_{s} = 4.144807$. These steps would be repeated, many times, until the estimates converge to the values that maximize the likelihood function. ML calculations for a model with only one random factor are very simple even when the random factor has many levels.

One Random Factor Nested Within Another

In a two random factor model we have

\begin{displaymath}{\bf T} = \left( \begin{array}{cc}
{\bf Z}'_{1}{\bf Z}_{1} +...
...f T}_{12} \\
{\bf T}_{21} & {\bf T}_{22}
\end{array} \right) \end{displaymath}

and both $tr{\bf T}_{11}$ and $tr{\bf T}_{22}$ are needed by ML. However, if factor 2 is nested within factor 1, then we know that the columns of ${\bf Z}_{1}$ can be formed from adding together appropriate columns of ${\bf Z}_{2}$. Thus, ${\bf T}$ can be written entirely in terms of ${\bf Z}_{2}$, and the matrix that describes which columns of ${\bf Z}_{2}$ to add together to give ${\bf Z}_{1}$. Call this matrix ${\bf F}$ where ${\bf Z}_{1} = {\bf Z}_{2}{\bf F}$. Now ${\bf T}$ can be rewritten as

\begin{displaymath}{\bf T} = \left( \begin{array}{cc}
{\bf F}'{\bf Z}'_{2}{\bf Z...
...2}{\bf Z}_{2} + {\bf I}\alpha_{2} \\
\end{array} \right)^{-1} \end{displaymath}

and by using partitioned matrix techniques, then

\begin{eqnarray*}{\bf T}_{11} & = & \left( \begin{array}{c}
{\bf F}'({\bf Z}'_{2...
... \alpha_{1}
\end{array} \right)^{-1} \\
& = & \sum_{i} \ p_{i},
\end{eqnarray*}


where the i subscript is summing over factor 1 and the j subscript is summing over levels of factor 2, and nij is the number of observations for a particular subclass. Likewise,

\begin{eqnarray*}{\bf T}_{22} & = & {\bf D} + {\bf DZ}'_{2}{\bf Z}_{2}{\bf FT}_{...
...m_{j}(n_{ij}^{2}/(n_{ij} + \alpha_{2})^{2})
\end{array} \right)
\end{eqnarray*}


Both traces can be computed readily by proper sorting of the data files by levels of factor 2 within levels of factor 1. Schaeffer (1976) gives the details for these calculations.

To illustrate ML for this type of model, consider the previous milking speed example expanded to include cow effects (i.e. daughters within sires in the table below). Also, some cows are allowed to have more than one observation.

Table 2.2 Example 2-minute milk yield data on dairy cows
      Number of Total
Cow Sire Herd Observations 2-min yield(kg)
A 1 1 2 9.9
B 1 1 3 20.0
C 2 1 2 9.5
D 2 1 2 8.8
E 2 1 1 5.6
F 3 1 1 4.7
G 4 1 3 12.4
H 4 1 2 8.9
I 1 2 1 5.5
J 1 2 2 10.1
K 1 2 1 2.2
L 2 2 2 10.2
M 2 2 2 10.3
N 2 2 1 5.6
O 3 2 2 8.6
P 3 2 2 10.2
Q 4 2 2 7.8
R 4 2 2 8.1
S 4 2 2 8.2
T 4 2 1 4.3
U 5 2 2 7.0
V 5 2 2 8.6
The least squares equations for this example are of order 29. Assume that ${\bf y}'{\bf y} = 1,000$ and that $\alpha_{s} = 5$ and $\alpha_{c} = 2,$ for sires and cows, respectively. To compute the traces that are needed, for each sire we have the following:

\begin{eqnarray*}p_{i} & = & [ \ \sum_{j}(n_{ij} - n_{ij}^{2}/(n_{ij} + \alpha_{...
...4895 + .096774 + .130435 + .092025 + .142857 \\
& = & .566986.
\end{eqnarray*}


Now for each sire, compute

\begin{eqnarray*}\lambda_{i} & = & \sum_{j}(n_{ij} + \alpha_{c})^{-1} + p_{i} \
...
...{4} & = & 1.668712, \\ {\rm and} \ \
\lambda_{5} & = & .571429.
\end{eqnarray*}


Summing gives

\begin{eqnarray*}tr{\bf T}_{22} & = &
1.480187 + 1.784946 + .913044 + 1.668712 + .571429 \\
& = & 6.418318.
\end{eqnarray*}


The solutions for herds and sires from the mixed model equations were

\begin{eqnarray*}\begin{array}{rrlrlr}
\hat{H}_{1} = & 4.914415, &
\hat{H}_{2} =...
..., \\
{\rm and} \ \ \hat{S}_{5} = & -.160912. &&&&\\
\end{array}\end{eqnarray*}


Solutions for cows are not shown. Other calculations gave

\begin{eqnarray*}{\bf y}'{\bf y} - \hat{\bf b}'{\bf X}'{\bf y} -
\hat{\bf u}'{\b...
... & = & (2.4662926+ 6.418318 \ (2.8255421))/22 \\
& = & .936433.
\end{eqnarray*}


The new variance ratios were then ${\alpha}_{s} = 7.97721$ and ${\alpha}_{c} = 3.017346.$ These new ratios would be used to begin another iteration.

The EM Algorithm

EM stands for Expectation Maximization. From Searle, Casella, and McCulloch (1992) the following explanation is given. The procedure alternates between calculating conditional expected values and maximizing simplified likelihoods. The actual data ${\bf y}$ are called the incomplete data in the EM algorithm, and the complete data are considered to be ${\bf y}$ and the unobservable random effects, ${\bf u}_{i}$. If we knew the realized values of the unobservable random effects then their variance would be the average of their squared values, i.e.,

\begin{displaymath}\hat{\sigma}^{2}_{i} = {\bf u}'_{i}{\bf u}_{i} / q_{i}. \end{displaymath}

However, in real life we do not know the realized values of the random effects.

The steps of the EM algorithm are as follows:

Step 0.
Decide on starting values for the variances and set m= 0.
Step 1.(E-step)
Calculate the conditional expectation of the sufficient statistics, conditional on the incomplete data.

\begin{eqnarray*}E({\bf u}'_{i}{\bf u}_{i} \mid {\bf y}) & = & \sigma_{i}^{4(m)}...
...
({\bf V}^{(m)})^{-1}{\bf Z}_{i}) \\
& = & \hat{t}^{(m)}_{i}
\end{eqnarray*}


Step 2.(M-step)
Maximize the likelihood of the complete data,

\begin{displaymath}\sigma^{2(m+1)}_{i} = \hat{t}^{(m)}_{i} / q_{i}, \ \ \
i=0,1,2, \ldots , s. \end{displaymath}

Step 3.
If convergence is reached, set \(
\hat{\mbox{\boldmath$\sigma$ }}
= \mbox{\boldmath$\sigma$ }^{(m+1)} \), otherwise increase m by one and return to Step 1.
This is equivalent to constructing and solving the mixed model equations with a given set of variances, $\mbox{\boldmath$\sigma^{m}$ }$, and then

\begin{eqnarray*}\sigma^{2(m+1)}_{0} & = & ({\bf y}'{\bf y} - \hat{{\bf b}}'{\bf...
...at{{\bf u}}_{i} +
\sigma^{2(m+1)}_{0} tr {\bf T}_{ii}) / q_{i}
\end{eqnarray*}


The EM algorithm can be applied to REML estimation as well. Just replace ${\bf y}$ with ${\bf K}'{\bf y}$ and replace ${\bf V}$with ${\bf K}'{\bf VK}$ and note that

\begin{eqnarray*}{\bf P} & = & {\bf K}({\bf K}'{\bf VK})^{-1}{\bf K}' \\
& = &...
...}{\bf X}({\bf X}'{\bf V}^{-1}
{\bf X})^{-}{\bf X}'{\bf V}^{-1}.
\end{eqnarray*}


Then the conditional expectation in the E-step becomes

\begin{displaymath}\hat{t}^{(m)}_{i} = \sigma^{4(m)}_{i} {\bf y}'{\bf P}^{(m)}
...
...I} - \sigma^{4(m)}_{i}{\bf Z}'_{i}
{\bf P}^{(m)}{\bf Z}_{i}]. \end{displaymath}

Pseudo Expectation Method
L. R. Schaeffer, April 28, 1999

In the process of making up an exam question for students in my Variance Component Estimation course in 1986 I discovered a quadratic form whose pseudo expectation (following the method of deriving REML from MIVQUE expectations of quadratic forms) did not involve the inverse elements of the mixed model equations. The estimators were also translation invariant because they were based on the absorbed right hand sides of the mixed model equations. The biggest advantage of these estimators was their computational ease. Unfortunately, with more than one random factor in the model, the 'quadratic forms' can be shown to be bilinear forms and hence not always guaranteed to be positive definite. The method, similar to most other methods, was found to give severely biased estimates when data were selected, as in typical animal breeding situations (Ouweltjes et al. 1988).

Van Raden (1986) expanded the Schaeffer (1986) quadratic forms for the case with relationships among animals, which could also be used in the basic linear model for variance components.

Derivation

Absorb the fixed effects into the equations for the random factors in the mixed model equations. The absorbed right hand side for the ith random factor is then

\begin{displaymath}{\bf Z}'_{i}{\bf My} \end{displaymath}

where ${\bf M} = {\bf I} - {\bf X}({\bf X}'{\bf X})^{-}{\bf X}'$. The expected value is ${\bf0}$ and the variance-covariance matrix is

\begin{displaymath}V({\bf Z}'{\bf My}) = \sum_{j=1}^{s}({\bf Z}'{\bf MZ}_{j})
({...
...j}{\bf MZ})\sigma_{j}^{2} + {\bf Z}'{\bf MZ} \
\sigma_{o}^{2}. \end{displaymath}

The solution vector for the random factors are

\begin{eqnarray*}\left( \begin{array}{c}
\hat{\bf u}_{1} \\ \hat{\bf u}_{2} \\ \...
...f My}\\
\vdots \\
{\bf Z}'_{s}{\bf My}\\
\end{array} \right)
\end{eqnarray*}


The proposed quadratic forms are

\begin{eqnarray*}\hat{\bf u}'_{i}{\bf Z}'_{i}{\bf My} & & \\
{\bf y}'{\bf y} \ ...
...bf My} \ - \
\sum_{i=1}^{s}\hat{\bf u}'_{i}{\bf Z}'_{i}{\bf My}.
\end{eqnarray*}


The latter form is the same as in REML and has pseudo expectation equal to

\begin{eqnarray*}E({\bf y}'{\bf y} - \hat{\bf b}'{\bf X}'{\bf y} -
\hat{\bf u}'...
...y}'{\bf My} -
\hat{\bf u}'{\bf Z}'{\bf My}) / (N - r({\bf X})).
\end{eqnarray*}


The usual expectation of $\hat{\bf u}'_{i}{\bf Z}'_{i}{\bf My}$ is

\begin{eqnarray*}E(\hat{\bf u}'_{i}{\bf Z}'_{i}{\bf My}) & = &
( tr{\bf Z}'_{i}...
...box{ } + \ (q_{i} - tr{\bf C}_{ii}
{\alpha}_{i})\sigma_{o}^{2},
\end{eqnarray*}


which when ${\alpha}_{i}$'s are taken as ratios of the true values, reduces to

\begin{eqnarray*}E(\hat{\bf u}'_{i}{\bf Z}'_{i}{\bf My}) & = &
tr{\bf Z}'_{i}{\bf MZ}_{i}\sigma_{i}^{2}
\end{eqnarray*}


which gives

\begin{displaymath}\hat{\sigma}_{i}^{2} = \hat{\bf u}'_{i}{\bf Z}'_{i}{\bf My} \ / \
tr{\bf Z}'_{i}{\bf MZ}_{i}. \end{displaymath}

The above quadratic form is always positive when there is only one random factor in the model, and therefore, the estimator of ${\sigma}_{i}^{2}$ is always positive. However, with more than one random factor in the model, the quadratic forms are really bi-linear forms and as such can take on negative values. Also, $tr({\bf Z}'_{i}{\bf MZ}_{i})$ does not depend on ${\alpha}_{i}$and therefore is a constant that only needs to be computed once. In addition only the diagonals of ${\bf Z}'_{i}{\bf MZ}_{i}$ need to be computed. This can be readily accomplished for most models.

Van Raden's Method

Van Raden (1986) proposed the following quadratic form, ${\bf w}'_{i}\hat{\bf u}_{i}$, where ${\bf w}_{i} = {\bf D}_{i}{\bf Z}'_{i}{\bf My}$ and ${\bf D}_{i} = diag\{ \ ({\bf Z}'_{i}{\bf MZ}_{i} + {\bf I}
\alpha_{i}) \ \}^{-1}$, then the pseudo expectation is

\begin{eqnarray*}E({\bf w}'_{i}\hat{\bf u}_{i}) & = &
tr({\bf D}_{i}{\bf Z}'_{i}{\bf MZ}_{i})\sigma_{i}^{2}
\end{eqnarray*}


which gives

\begin{displaymath}\hat{\sigma}_{i}^{2} = {\bf w}'_{i}\hat{\bf u}_{i}/
tr({\bf D}_{i}{\bf Z}'_{i}{\bf MZ}_{i}). \end{displaymath}

Because ${\bf D}_{i}$ is diagonal, then $tr({\bf D}_{i}{\bf Z}'_{i}{\bf MZ}_{i})$ is easily obtained, but now depends on ${\alpha}_{i}$ and must be re-computed each iteration. Let $\lambda_{ik}$ be the kth diagonal of ${\bf Z}'_{i}{\bf MZ}_{i}$ then

\begin{displaymath}tr({\bf D}_{i}{\bf Z}'_{i}{\bf MZ}_{i}) =
\sum_{k=1}^{q_{i}}[ \ \lambda_{ik}/(\lambda_{ik} + \alpha_{i})] \end{displaymath}

which is the sum of values that are sometimes used as approximate accuracies of elements of $\hat{\bf u}_{i}$. Hence $tr({\bf D}_{i}{\bf Z}'_{i}{\bf MZ}_{i})$ will always be less than or equal to qi. Recall that Henderson's Method 4 uses ${\bf w}'{i}{\bf w}_{i}$ as the quadratic form, but ${\bf w}'_{i}\hat{\bf u}_{i}$ is likely closer to $\hat{\bf u}'_{i}\hat{\bf u}_{i}$ than is $\hat{\bf u}'_{i}{\bf Z}'_{i}{\bf My}$ or ${\bf w}'_{i}{\bf w}_{i}$.

An Example

Assume a model with two random factors and let ${\bf y}'{\bf y} = 401,000$. As a priori values let $\alpha_{1} = 10$ and $\alpha_{2} = 8$. The rank of the fixed effects matrix, ${\bf X}$, is 4 and the total number of observations is 18. The first random factor has 3 levels and the second random factor has 6 levels. The absorbed coefficient matrix is


\begin{eqnarray*}\left( \begin{array}{lrrrrrrrr}
2.6071&-1.2143&-1.3929&-.2857& ...
...10&-.4286 \\
& & & & & & & & 1.6071 \\
\end{array} \right)\\
\end{eqnarray*}


The absorbed right hand sides, solutions, and ${\bf w}_{i}$ are

Factor ${\bf Z}'_{i}{\bf My}$ $\hat{\bf u}_{i}$ ${\bf w}_{i}$
1 128.8214 9.6695 10.2182
  155.3571 9.0836 12.5000
  -284.1786 -18.7531 -22.5412
       
2 6.8905 .0750 .6745
  .5000 1.5094 .0526
  174.2500 14.8447 17.8718
  -157.6786 -12.1682 -16.4127
  49.7976 5.1919 4.4738
  -73.6786 -9.4528 -7.6692
and ${\bf y}'{\bf My} = 62,796.881.$ The following can be calculated from the above;

\begin{eqnarray*}\hat{\bf u}'_{1}{\bf Z}'_{1}{\bf My} & = & 7,986.0696, \\
\hat...
...{1} & = & 7.6429, \\
tr{\bf Z}'_{2}{\bf MZ}_{2} & = & 11.6905,
\end{eqnarray*}


The estimates are

\begin{eqnarray*}\hat{\sigma}_{o}^{2} & = & ({\bf y}'{\bf My} -
\hat{\bf u}'{\bf...
...{2}{\bf Z}'_{2}{\bf My}/
tr{\bf Z}'_{2}{\bf MZ}_{2} = 467.1870.
\end{eqnarray*}


Using Van Raden's proposed quadratic forms, the results are

\begin{eqnarray*}{\bf w}'_{1}\hat{\bf u}_{1} & = & 635.0673, \\
{\bf w}'_{2}\ha...
...\\
\hat{\sigma}_{2}^{2} & = & 560.8674 \ / \ 1.1608 = 483.1732.
\end{eqnarray*}


Problem

Below are data following a model with one fixed factor(A) and two random factors(T and R). Apply EM REML, DFREML, ML, I-MIVQUE (if they stay positive), Pseudo-Expectation method, and Van Raden's method to these data and compare results.

Ai Tj Rk yijkl Ai Tj Rk yijkl Ai Tj Rk yijkl
1 1 1 50 2 1 1 65 3 1 1 80
1 1 1 54 2 1 2 76 3 1 1 83
1 1 2 59 2 1 2 73 3 1 1 78
1 1 2 61 2 1 2 75 3 1 2 95
1 1 3 68 2 1 3 84 3 1 3 102
1 1 4 79 2 1 3 87 3 1 3 97
1 1 4 82 2 1 4 98 3 1 4 109
1 2 1 41 2 1 4 93 3 1 4 112
1 2 2 52 2 2 1 53 3 2 2 81
1 2 3 64 2 2 3 77 3 2 2 79
1 2 4 69 2 2 3 74 3 2 4 101
1 2 4 73 2 2 4 86 3 3 1 63
1 2 4 70 2 3 2 57 3 3 1 57
1 3 3 47 2 3 2 54 3 3 3 75
1 3 3 46 2 3 2 51 3 3 3 84
1 3 4 62 2 3 3 66 3 3 4 93
1 3 4 57         3 3 4 86
1 3 4 66                

Method R
L. R. Schaeffer, April 28, 1999

This method was first reported through the Animal Geneticist's Discussion Group in 1993. A paper was subsequently published in the Journal of Animal Science, 1994, 72:2247-2253, by A. Reverter, B. L. Golden, R. M. Bourdon, and J. S. Brinks. The method came about as a result of a study on the change in genetic evaluations as animals acquire more progeny as in routine beef cattle evaluations. The method is based on the linear regression of recent solutions (based on all the data) on previous solutions (based on a subset of all data). The expected value of the regression equals 1 regardless of the distribution of observations or predictions. If the wrong ratio of variance components is used in the mixed model equations, then the computed value of the regression deviates from the expected value of one. If the computed value is greater than 1, then the h2 was too low. The method is iterative, in that the variance ratio is changed until the regression equals one.

As with the pseudo expectation method, Method R is ad hoc in nature and the properties of the method can only be established through simulation or empirical evidence. Schenkel (1998) has shown that Method R also suffers from bias due to non-random mating and when pedigree files are not complete, as does nearly every method.

Development

To simplify presentation assume only one random factor in the model. Denote the data and model for subset j by

\begin{displaymath}{\bf y}_{(j)} = {\bf X}_{(j)}{\bf b}_{(j)} + {\bf Z}_{(j)}
{\bf u}_{(j)} + e_{(j)} , \end{displaymath}

and denote the data and model for all data by the subscript (t). Construct and solve MME for ${\bf y}_{(j)}$ and ${\bf y}_{(t)}$.

\begin{displaymath}\left( \begin{array}{cc} {\bf X}'_{(k)}{\bf X}_{(k)} &
{\bf ...
... y}_{(k)} \\
{\bf Z}'_{(k)}{\bf y}_{(k)} \end{array} \right) \end{displaymath}

for k=j, and for k=t. Note that the same variance ratio, $\alpha$, is used for both data sets. If an animal model is assumed where ${\bf G}={\bf A}$, then the dimension of ${\bf G}$ must be the same for both the subset and the total data sets. That is, all animals must be evaluated from each subset and the complete data. Note that ${\bf y}_{(t)}$ contains all of the data in ${\bf y}_{(j)}$ and more.

The jth subset is constructed by randomly choosing a sample one half of the data. This should probably be a random sample half of the contemporary groups, so that animals in a contemporary group stay intact. The idea of Method R is to obtain a number of subsets from the same complete data set, say 10 or more, and to estimate the variances from each subset. Then the end result would be to average the 10 estimates together and calculate their SD in order to get an idea of the variability of the estimates due to sample size.

Now consider the ith element of $\hat{{\bf u}}$ from the two analyses above.

\begin{eqnarray*}Cov( \hat{u}_{i(j)}, \hat{u}_{i(t)} ) & = &
Cov( \hat{u}_{i(j)...
...)}) + Cov( \hat{u}_{i(j)}, d_{i} ) \\
& = & V( \hat{u}_{i(j)})
\end{eqnarray*}


because the change, $d_{i} = \hat{u}_{i(j)} - \hat{u}_{i(t)}$, from subset j to complete data t should be uncorrelated with the solution in subset j. The regression of $\hat{u}_{i(t)}$ on $\hat{u}_{i(j)}$ is then

\begin{eqnarray*}\Re_{c} & = & \frac{Cov (\hat{u}_{i(j)}, \hat{u}_{i(t)})}{V( \h...
... \hat{{\bf u}}_{(t)} /
\hat{{\bf u}}'_{(j)} \hat{{\bf u}}_{(j)}
\end{eqnarray*}


which should equal one.

An Example

Illustrating Method R really requires a fairly large data set, in order to achieve convergence. This example merely shows the calculations that should be followed, but the example does not converge to a good final estimate. Below are data on progeny of ten sires, for a model with just a mean and random (unrelated) sire and residual effects. The left half of the table represents the data from one subset and the right half represents the complete data.

Sire Subset 1 Complete Data
Number ni yi. ni yi.
1 85 11,462 213 29,218
2 54 6,819 182 23,745
3 112 15,153 121 16,456
4 90 11,115 191 23,637
5 144 16,625 307 35,666
6 192 26,203 298 40,720
7 136 17,957 237 30,944
8 83 10,151 226 27,899
9 174 20,700 338 39,865
10 161 21,346 274 36,483
The total sum of squares of the observations in Subset 1 was 20,715,797., and for the complete data was 39,977,646.

The solutions for sires within each data set are given below, for a variance ratio of 10.

Sire    
Number Subset 1 Complete
1 6.3132 8.2960
2 -1.2768 1.8758
3 6.8886 6.9384
4 -3.8619 -4.4988
5 -11.5384 -11.9239
6 8.2531 7.8913
7 3.9549 1.9931
8 -4.8995 -4.8277
9 -8.3459 -10.2414
10 4.5125 4.4973

From these solutions, then

\begin{eqnarray*}\hat{{\bf u}}'_{(1)}\hat{{\bf u}}_{(1)} & = & 434.7651 \\
\ha...
...'_{(1)}\hat{{\bf u}}_{(t)} & = & 455.1629 \\
\Re & = & 1.04692
\end{eqnarray*}


which suggests that the ratio for this sample of data should be lower than 10. To get the new ratio, divide 10 by $\Re$ to give 9.55183. Use the new ratio to construct the MME again for the subset and the complete data sets. Continue until $\Re=1$. The authors suggest that a binary search algorithm could be faster. Start with heritability equal to .5, and depending on the value of $\Re$ go to heritability of .75 or .25. Continue reducing the interval by one half until $\Re=1$.

The calculations need to be repeated for other subsets of the same data set. The Method R estimate is then the average of the estimates from each subset. Hence there is lots of computing that must be done. However, the only requirement is for a program that computes genetic evaluations for a particular model, and these are readily available. Computationally, Method R is very attractive, and for randomly mating populations Method R will give estimates close to the true values. One question is the number of subsets that should be chosen. This depends on the size of the complete data set and the amount of time to solve equations. The more subsets one has, the better should be the average of the estimates. The authors claim that the estimates, from their experience, are similar to REML estimates, but the proof is rather loose in a technical sense. The covariances and variances in the regression utilize the same quadratic forms and involve the same inverse elements of the mixed model equations as in EM REML. Method R could be a method for obtaining an initial prior value for use in Bayesian or REML methodology, rather than as a stand-alone method. This is due to its problems with incomplete pedigrees and non random matings, and because the properties can not be proven except empirically.

Convergence of Iterative Methods
L. R. Schaeffer, April 28, 1999

These notes are terribly outdated now, with the use of DFREML and Bayesian methods, but they may be of interest to those that work with problems of convergence of estimation systems.

Iterative methods of variance component estimation such as REML and ML are known to converge at very slow rates towards a final solution via the EM algorithm. For large datasets, the rate of convergence could affect computing costs substantially. Rates of convergence seem to be very data dependent and even specific to particular traits within a dataset. In REML, iterative MIVQUE is known to converge faster than the REML EM algorithm. Convergence is also adversely affected by the number of random factors in the model. Then there is always the question whether the system will converge or not. Hence the researcher needs to know a) will the system converge, and b) can the rate of convergence be improved, if the system does converge.

Below are least squares equations for a model with two fixed factors and one random factor

\begin{displaymath}\left( \begin{array}{rrrrrrrrrrrr}
226 & 60 & 72 & 53 & 41 & ...
...600 \\ 800
\\ 2720 \\ 1450 \\ 630 \\ 1000 \end{array} \right). \end{displaymath}

Let factor C be the random factor and assume that ${\bf y}'{\bf y} =
2,250,000.$ Let the best guess of the ratio of residual to factor C variances be 15. Both iterative MIVQUE and the REML EM algorithm were applied to these data and the results by iteration are in Tables 1 and 2. Both computing algorithms were slow to converge, but iterative MIVQUE was faster than REML EM.

Table 1. Results of iterative MIVQUE.
Iteration $\hat{\sigma}_{o}^{2}$ $\hat{\sigma}_{C}^{2}$ ${\alpha}_{C}$ ${\Delta}_{i-1}$ ${\phi}_{i-1}$
Number(i)          
1 9138.59 659.45 13.85793 -1.14207 ----
2 9131.43 709.03 12.87875 -0.97918 1.16635
3 9124.82 757.03 12.05338 -0.82537 1.18635
4 9118.87 802.14 11.36816 -0.68522 1.20453
5 9113.66 843.32 10.80684 -0.56132 1.22073
6 9109.19 879.92 10.35226 -0.45458 1.23481
7 9105.44 911.67 9.98767 -0.36459 1.24683
8 9102.33 938.62 9.69760 -0.29007 1.25690
9 9099.80 961.08 9.46833 -0.22927 1.26519
10 9097.76 979.51 9.28809 -0.18024 1.27203
20 9090.85 1044.20 8.70603 -0.01422 1.29466
30 9090.30 1049.49 8.66166 -0.00106 1.29245
40 9090.26 1049.88 8.65838 -0.00008 1.29750
50 9090.26 1049.91 8.65813    

Table 2. Results for REML.
Iteration $\hat{\sigma}_{o}^{2}$ $\hat{\sigma}_{C}^{2}$ $\hat{\alpha}_{C}$ ${\Delta}_{i-1}$ ${\phi}_{i-1}$
Number(i)          
1 9145.94 622.81 14.68508 -0.31492 ----
2 9143.95 635.96 14.37814 -0.30694 1.02600
3 9141.94 649.31 14.07958 -0.29856 1.02807
4 9139.94 662.81 13.78974 -0.28984 1.03009
5 9137.94 676.44 13.50875 -0.28079 1.03223
6 9135.95 690.16 13.23748 -0.27147 1.03433
7 9133.97 703.94 12.97557 -0.26191 1.03650
8 9132.01 717.73 12.72341 -0.25216 1.03867
9 9130.08 731.51 12.48112 -0.24229 1.04074
10 9128.17 745.23 12.24881 -0.23231 1.04296
20 9111.59 870.89 10.46233    
30 9100.81 958.84 9.49152    
40 9095.07 1007.74 9.02521    
50 9092.60 1027.79 8.8468 $ \: $    

The final estimates were \( \hat{\sigma}^{2}_{0} = 9090.260284 \), \( \hat{\sigma}^{2}_{C} = 1049.912327 \), and \( \hat{\alpha}_{C} = 8.65811 \).

Convergence Curves

The following results are due mainly to Dr. Ignacy Misztal. A plot of the estimated variance ratio against iteration number would show a nonlinear relationship. The curve would appear to be asymptoting to a final value if the system was converging, as in the example. This curve may be characterized by a model, and therefore, the asymptotic value could be predicted if the parameters of that model were known. Let

$\alpha_{o}$
= initial starting value of variance ratio,
${\alpha}_{i}$
= value of variance ratio after the ith iteration,
$\Delta_{i}$
= $\alpha_{i+1} - \alpha_{i}$, the difference in values of the variance ratio after one iteration, and
$\phi_{i}$
= $\Delta_{i-1}/\Delta_{i}$, ratio of differences in consecutive iterations.
Assume ${\phi}_{i} = {\phi}$, a constant for all iterations. For REML this assumption is reasonable since $\phi_{i}$ varied from 1.026 to 1.043 over the first ten iterations (Table 2), but for iterative MIVQUE the values ranged from 1.17 to 1.272 in the first ten iterations (Table 1). If a constant value can be assumed, then

\begin{eqnarray*}{\Delta}_{1} & = & {\Delta}_{o}/{\phi}, \ \ {\rm or \ more \ ge...
...^{2} \\ {\rm or} \ \
{\Delta}_{i} & = & {\Delta}_{o}/{\phi}^{i}.
\end{eqnarray*}


Then

\begin{eqnarray*}{\alpha}_{1} & = & {\alpha}_{o} + {\Delta}_{o}, \\
{\alpha}_{2...
...& {\alpha}_{o} + {\Delta}_{o}
(\sum_{j=o}^{i-1} \ {\phi}^{-j}).
\end{eqnarray*}


Suppose $i = \infty$, then

\begin{eqnarray*}{\alpha}_{\infty} & = & {\alpha}_{o} + {\Delta}_{o}(\sum_{j=o}^...
...nfty} & = & {\alpha}_{o} + {\Delta}_{o}
(1 - {\phi}^{-1})^{-1}.
\end{eqnarray*}


For REML in Table 2, if ${\phi} = 1.035,$ ${\Delta}_{o} = -.31492$and ${\alpha}_{o} = 15$, then

\begin{eqnarray*}{\alpha}_{\infty} & = & 15 + (-.31492) \ [1 - (1.035)^{-1}]^{-1...
...8] \\
& = & 15 + (-.31492) / .03382 = 15 \ -9.31263 = 5.68737.
\end{eqnarray*}


Obviously, 5.68 is greatly different from the converged value of 8.65, and this is due to the fact that $\phi_{i}$ is not a constant over iterations. However, if ${\phi}$ had been 1.05218, then ${\alpha}_{\infty}$ would equal 8.65. Hence a small difference in ${\phi}$ can make a big difference in the predicted ratio.

For MIVQUE let

\begin{eqnarray*}{\phi} & = & 1.2975, \ {\alpha}_{o} = 15, \ {\rm and} \ {\Delta...
...}] \\
& = & 15 - 1.14207 / .22929 = 15 \ - 4.98096 = 10.01904.
\end{eqnarray*}


Common Intercept Approach

The Common Intercept Approach (CIA) was suggested in 1979 by Dr. Lynn Johnson (then a postdoctoral research associate with Dr. Burnside) and has been derived from the nonlinear model of the previous section. The purpose of CIA was to reduce the number of iterations needed to attain convergence in any iterative procedure. In addition, CIA can be used to determine if an iterative procedure will converge for a given data set.

Let ${\alpha}_{xo}$ represent one starting value and let ${\alpha}_{zo}$ represent a different starting value. The corresponding changes in value after one iteration are ${\Delta}_{xo}$ and ${\Delta}_{zo}$. Then

\begin{eqnarray*}{\alpha}_{x\infty} & = & {\alpha}_{xo} + {\Delta}_{xo}/(1 - {\p...
...-1} & = & ({\alpha}_{x\infty} -
{\alpha}_{xo}){\Delta}_{xo}^{-1}
\end{eqnarray*}


which can be substituted into

\begin{eqnarray*}{\alpha}_{z\infty} & = & {\alpha}_{zo} + {\Delta}_{zo}/(1 - {\p...
...elta}_{zo}({\alpha}_{x\infty} -
{\alpha}_{xo}){\Delta}_{xo}^{-1}
\end{eqnarray*}


Assuming that

\begin{eqnarray*}{\alpha}_{z\infty} & = & {\alpha}_{x\infty} = {\alpha}_{\infty}...
...elta}_{zo}
({\alpha}_{\infty} - {\alpha}_{xo}){\Delta}_{xo}^{-1}
\end{eqnarray*}


which can be re-arranged to give

\begin{eqnarray*}{\alpha}_{\infty} & = & ({\Delta}_{xo}{\alpha}_{zo} -
{\Delta}_{zo}{\alpha}_{xo}) \ / \ ({\Delta}_{xo} - {\Delta}_{zo}).
\end{eqnarray*}


Using the example data and REML, from Table 2, let

\begin{eqnarray*}{\alpha}_{xo} & = & 15, \\
{\Delta}_{xo} & = & -.31492, \\
{\...
...a}_{zo} & = & 5, \\
{\rm and} \ \ {\Delta}_{zo} & = & +.40208,
\end{eqnarray*}


then CIA gives

\begin{eqnarray*}{\alpha}_{\infty} & = & (-.31492(5) \ - \ .40208(15)) / (-.3149...
...- \ .40208) \\
& = & -7.60580 \ / \ -.717 \\
& = & 10.60781.
\end{eqnarray*}


With MIVQUE, let

\begin{eqnarray*}{\alpha}_{xo} & = & 15, \\
{\Delta}_{xo} & = & -1.14207, \\
{\alpha}_{zo} & = & 5, \\
{\rm and} \ {\Delta}_{zo} & = & .96246,
\end{eqnarray*}


then

\begin{eqnarray*}{\alpha}_{\infty} & = & (-1.14207(5) \ - \ .96246(15)) \ / \ (-...
... .96246) \\
& = & -20.14725 \ / \ -2.10453 \\
& = & 9.57328.
\end{eqnarray*}


Thus, two iterations with different starting values are needed to obtain a prediction of the converged value. The closer the two starting values are to the actual converged estimate, then CIA will work better. For example, using REML, with ${\alpha}_{xo} = 9$ and ${\alpha}_{zo} = 8.5$then

\begin{eqnarray*}{\Delta}_{xo} & = & .02746 \ \ {\rm and} \ {\Delta}_{zo} = .013...
...02746 \ - \ .013217) \\
& = & -.35236 \ / \ -.04068 = 8.66233.
\end{eqnarray*}


With MIVQUE, the result was 8.66014.

If the estimate of ${\alpha}_{\infty}$ from CIA is negative, then this indicates that convergence may not be possible. Select new starting ratios that are closer to each other and retry the CIA. If the estimate of the converged value remains negative, then the system will likely not converge. Use of an unbiased, non-iterative procedure on the same data will likely produce negative estimates of one or more variance components. In any event, the negative value of ${\alpha}_{\infty}$ should not be used as the next prior value in the iterative procedure.

Aitken's Approximation

Another method of determining the converged value is Aitken's approximation given by the following formula:

\begin{displaymath}{\alpha}_{\infty} = \frac{\alpha_{i}\alpha_{i+2} - \alpha_{i+1}^{2}}
{\alpha_{i} - 2\alpha_{i+1} + \alpha_{i+2}}. \end{displaymath}

To illustrate, take the values from Table 1 for iterations 6, 7, and 8, then

\begin{eqnarray*}{\alpha}_{\infty} & = & \frac{\alpha_{6}\alpha_{8} - \alpha_{7}...
...98767) + 9.69760] \\
& = & [0.6385]/[0.07452] \\
& = & 8.56817
\end{eqnarray*}


for MIVQUE, and

\begin{eqnarray*}{\alpha}_{\infty} & = & [(13.23748)(12.72341) - (12.97557)^{2}]...
...557) + 12.72341] \\
& = & (.06047)/(.00975) \\
& = & 6.20191
\end{eqnarray*}


for REML.

Relaxation Factors

Relaxation factors attempt to speed convergence. The success of relaxation factors depends upon the magnitude of the factor used. If the factor is too large, then changes between iterations may be too disruptive for the system and convergence may not be achieved any faster, if at all. If the factor is too small, then the increased rate of convergence may not be noticeable. If $\rho$ is the relaxation factor, then the appropriate starting ratio for the (i+1) iterate is not ${\alpha}_{i}$, but is

\begin{displaymath}{\alpha}_{(i)} = {\alpha}_{i} + \rho \ [{\alpha}_{i} - {\alpha}_{i-1}].
\end{displaymath}

Usually a factor between 0 and 1 is used. Let $\rho = 1$ in the example, then in REML ${\alpha}_{1}$ = 14.68508 and $\alpha_{o}$ = 15, and therefore ${\alpha}_{(1)}$ = 14.37016. Thus, only one iteration was skipped, and overall probably half the iterations would be needed. Care needs to be used in choosing a value of $\rho$ because it could be dependent on the data structure and the observation values themselves. Thus, using $\rho = .6$ for all situations may not give the best results. Some test runs with different values may be needed.

Covariances and General Models
L. R. Schaeffer, April 28, 1999

The linear model that has been assumed for variance component estimation to this point has variance-covariance matrices of all random factors of the form ${\bf I}c$, where c is the variance to be estimated, and covariances between any two random factors were always null. With animal models and maternal genetic effects, these assumptions are not met, and some of the previous methods cannot be used. Methods applicable to more general models are MIVQUE, REML, ML, Method R and Bayesian methods.

Covariances Between Two Random Factors

Assume the following model and example problem where

\begin{displaymath}{\bf y} = {\bf Xb} + {\bf Z}_{1}{\bf u}_{1} + {\bf Z}_{2}{\bf u}_{2} +
{\bf e}, \end{displaymath}

and the data are as in the table below. The total sum of squares was ${\bf y}'{\bf y} = 325,520$. Also, $V({\bf u}_{1}) = {\bf I}
{\sigma}_{1}^{2}$, $V({\bf u}_{2}) = {\bf I}{\sigma}_{2}^{2},$and letting ${\bf u}_{1}$ and ${\bf u}_{2}$ have the same number of elements in the same sequence, then

\begin{displaymath}Cov({\bf u}_{1}, \ {\bf u}_{2}) = {\bf I}{\sigma}_{12}. \end{displaymath}

This covariance changes the structure of the variance-covariance matrix of ${\bf y}$ to be

\begin{displaymath}Var({\bf y}) = {\bf Z}_{1}{\bf Z}'_{1}\sigma_{1}^{2} +
{\bf Z...
...f Z}_{2}{\bf Z}'_{1})\sigma_{12} +
{\bf I} \ {\sigma}_{0}^{2}. \end{displaymath}

Depending on the method of estimation applied to ${\bf y}$, the quadratic forms will likely contain $\sigma_{12}$ in their expectations. If ${\bf W} = [{\bf X} \ \ {\bf Z}_{1} \ \ {\bf Z}_{2}],$ then

\begin{displaymath}Var({\bf W}'{\bf y}) = {\bf W}'{\bf Z}_{1}{\bf Z}'_{1}
{\bf ...
... Z}'_{1}){\bf W}
\sigma_{12} + {\bf W}'{\bf W}\sigma_{0}^{2}. \end{displaymath}

Table: Example data for two random factors with covariance.
y Level of b Level of ${\bf u}_{1}$ Level of ${\bf u}_{2}$
145 1 1 2
84 1 2 4
83 1 3 4
75 1 4 5
122 1 1 5
100 2 2 3
172 2 3 2
102 2 4 2
158 2 1 4
156 2 2 5
84 2 3 5
133 3 4 3
139 3 1 3
173 3 2 3
117 3 3 4
187 3 4 2
150 3 1 2
160 3 2 5
The least squares equations, ${\bf W}'{\bf W}$ and ${\bf W}'{\bf y}$, are

\begin{displaymath}\left( \begin{array}{ccccccccccccc}
5 & 0 & 0 & 2 & 1 & 1 & 1...
...97 \\ 0 \\ 0 \\
756 \\ 545 \\ 442 \\ 597
\end{array} \right) \end{displaymath}

MIVQUE

Let

\begin{displaymath}{\bf G}^{-1}\sigma_{0}^{2} =
\left( \begin{array}{rr}
11 \ {\...
...bf I} \\
-8 \ {\bf I} & 11 \ {\bf I} \\
\end{array} \right), \end{displaymath}

then the solutions to the mixed model equations become

\begin{displaymath}\left( \begin{array}{l}
\hat{b}_{1} \\ \hat{b}_{2}\\ \hat{b}_...
...ray}{l}
101.2441 \\ 127.9646 \\ 151.2713 \end{array} \right) , \end{displaymath}


\begin{displaymath}\left( \begin{array}{c}
\hat{u}_{11} \\ \hat{u}_{12} \\ \hat{...
... 9.9943 \\ -7.3217 \\ -6.0118 \\ -2.1231
\end{array} \right) . \end{displaymath}

The quadratic forms and expectations were

\begin{eqnarray*}\begin{array}{llrl}
&1.& \hat{\bf u}'_{1}\hat{\bf u}_{1} & = \ ...
...y} -
\hat{\bf u}'{\bf Z}'{\bf y}& = \: 11,372.25 \\
\end{array}\end{eqnarray*}


and

\begin{displaymath}E \left( \begin{array}{l}
1.\\ 2.\\ 3.\\ 4. \end{array} \righ...
...\
{\sigma}_{12} \\
{\sigma}_{0}^{2} \\
\end{array} \right). \end{displaymath}

The estimates were

\begin{displaymath}\begin{array}{rlrr}
\hat{\sigma}_{1}^{2} = & \ \: 23.8460, &
...
...{\rm and} &
\hat{\sigma}_{0}^{2} = & 914.2274. \\
\end{array} \end{displaymath}

The correlation between the two random vectors was greater than one indicating that unbiased procedures, in general, may not perform well in estimating the covariance between factors.

REML

REML is a biased procedure because the estimates are kept within the allowable parameter space. Hence, the estimated correlation between the two random vectors should be between -1 to +1. REML uses the same quadratic forms as MIVQUE, but also

\begin{displaymath}tr{\bf C}_{11} = .66328, \ \ \
tr{\bf C}_{22} = .66863, \ \ \ {\rm and} \ \ \
tr{\bf C}_{12} = .43290.\\
\end{displaymath}

The estimates were

\begin{eqnarray*}\hat{\sigma}_{0}^{2} & = & ({\bf y}'{\bf y} - \hat{\bf b}'{\bf ...
...})/5 \\
& = & (207.6922 + .43290 \ (758.15))/5 \ = \ 107.1791.
\end{eqnarray*}


The estimated correlation between the two random vectors with REML was .7462. Of course, the procedure must be iterated many more times before a final solution is obtained, and this could lead to a correlation that converges towards unity.

The course notes have descriptions of methods of variance component estimation for maternal effects and multiple trait models, and will not be reproduced here.


next up previous

This LaTeX document is available as postscript or asAdobe PDF.

Larry Schaeffer
1999-04-28