next up previous


This LaTeX document is available as postscript or asAdobe PDF.

Estimation of Genetic Variances

These notes are based on material prepared by Brian Kennedy (1993), but modified slightly. The intent of the notes were to present methods that were commonly used in the past (1950's to 1970's), and lead up to current models and methods. These notes will also be used to introduce basic matrix algebra concepts.

1. Offspring on Parent Regression

An infinitesimal model is assumed throughout these notes. The methodology is based on linear regression methods.

The analysis begins with a model. The model begins with an equation,

\begin{displaymath}y_{i} = \mu + \beta x_{i} + e_{i}, \end{displaymath}

where yi is a progeny record, $\mu$ is an overall mean of progeny records, xi is a parent record, and ei is a random residual effect associated with each progeny record. Implied by this model are that Variances are defined as

\begin{eqnarray*}Var(y_{i}) & = & \sigma^{2}_{y} \\
& = & Var(\mu + \beta x_{i...
...\beta^{2} Var(x_{i}) + Var(e_{i}) + 2\beta Cov(x_{i},e_{i}). \\
\end{eqnarray*}


Now, xi is a parent record, so that,

xi = yparent,

and

\begin{eqnarray*}Var(x_{i}) & = & Var(y_{i}) \\
& = & \sigma^{2}_{y}. \\
\end{eqnarray*}


Combining the results, then

\begin{eqnarray*}Var(y_{i}) & = & \sigma^{2}_{y} \\
& = & \beta^{2} \sigma^{2}_{y} + \sigma^{2}_{e}, \\
\end{eqnarray*}


which gives

\begin{displaymath}\sigma^{2}_{e} = (1 - \beta^{2}) \sigma^{2}_{y}. \end{displaymath}

Ordinary least squares (LS) equations are constructed to estimate $\mu$ and $\beta$, i.e.,

\begin{displaymath}\left( \begin{array}{ll}
N & \sum x_{i} \\ \sum x_{i} & \sum ...
...in{array}{c} \sum y_{i} \\ \sum x_{i}y_{i} \end{array}\right). \end{displaymath}

Take the first equation and solve for $\hat{\mu}$, which gives

\begin{displaymath}\hat{\mu} = \frac{1}{N} \sum (y_{i} - \sum x_{i} \hat{\beta}), \end{displaymath}

then substitute this expression into the next equation giving

\begin{displaymath}\sum x_{i}(\frac{1}{N} \sum (y_{i} - \sum x_{i} \hat{\beta}))
+ \sum x^{2}_{i} \hat{\beta} = \sum x_{i}y_{i}, \end{displaymath}

and combining terms in yi and $\hat{\beta}$ gives

\begin{displaymath}(\sum x^{2}_{i} - \frac{(\sum x_{i})^{2}}{N})\hat{\beta} =
(\sum x_{i}y_{i} - \frac{(\sum x_{i})(\sum y_{i})}{N}). \end{displaymath}

If both sides of the above equality are divided by (N-1), then

\begin{displaymath}\sigma^{2}_{x} \hat{\beta} = \sigma_{xy}, \end{displaymath}

or

\begin{displaymath}\hat{\beta} = \frac{\sigma_{xy}}{\sigma^{2}_{x}}. \end{displaymath}

Now $\sigma_{xy}$ is the covariance between parent and offspring which is one-half the additive genetic variance, and $\sigma^{2}_{x}$has already been shown to be equal to the variance of offspring records. Thus, the expected value of $\hat{\beta}$ is

\begin{eqnarray*}E(\hat{\beta}) & = & \frac{0.5 \sigma^{2}_{a}}{ \sigma^{2}_{y}} \\
& = & 0.5 h^{2}
\end{eqnarray*}


1.1 Small Numerical Example

Below are records on daughters and their dams.

Record Daughter Dam
(i) yi xi
1 51 60
2 57 58
3 45 48
4 60 37
5 40 25

In matrix format, the model looks as follows:

\begin{displaymath}{\bf y} = \left( \begin{array}{r}
51 \\ 57 \\ 45 \\ 60 \\ 40 ...
...e_{1} \\ e_{2} \\ e_{3} \\ e_{4} \\ e_{5}
\end{array} \right). \end{displaymath}

Appropriate SAS IML statements would be

    y = { 51, 57, 45, 60, 40 };
    x = {1 60, 1 58, 1 48, 1 37, 1 25};
    print x y;

The ordinary least squares equations would be

\begin{eqnarray*}{\bf X}'{\bf Xb} & = & {\bf X}'{\bf y} \\
\left( \begin{array}...
...) & = &
\left( \begin{array}{r} 261 \\ 11946 \end{array}\right).
\end{eqnarray*}


The total sum of squares is given by ${\bf y}'{\bf y}$, and the solutions are computed as

\begin{displaymath}\hat{\bf b} = ( {\bf X}'{\bf X} )^{-1}{\bf X}'{\bf y}. \end{displaymath}

In SAS,

    xx = x`*x;
    xy = x`*y;
    yy = y`*y;
    c = inv(xx);
    bhat = c*xy; 
    print xx, xy, yy, c, bhat;

The numerical results are

\begin{eqnarray*}{\bf C} & = & \frac{1}{4326} \left( \begin{array}{rr}
11262 & -...
...eta}
\end{array} \right), \\
{\bf y}'{\bf y} & = & 13,799. \\
\end{eqnarray*}


Thus, the estimate of heritability for these data would be

\begin{eqnarray*}\hat{h}^{2} & = & 2 \hat{\beta} \\
& = & 2 ( 0.051318 ) \\
& = & 0.1026. \\
\end{eqnarray*}


Because this estimate was obtained from LS regression methods, the standard error of the estimate can be calculated very readily. An estimate of the residual variance is needed, from the formula,

\begin{displaymath}\hat{\sigma}^{2}_{e} = ({\bf y}'{\bf y} - \hat{\bf b}'
{\bf X}'{\bf y})/ (N - r({\bf X})), \end{displaymath}

where N is the total number of observations (i.e. 5) and $r({\bf X})$ is the rank of ${\bf X}$. Rank is a scalar number that indicates the number of linearly independent rows and columns in a matrix. Rank is either the number of rows or the number of columns, whichever number is smaller (i.e. 2 in this case). Rank can be smaller than this number, but never larger. For example, if ${\bf X}$ had two columns and 100 rows, then the rank would be 2 or less. Rank would be 1 if both columns were identical, and would be 0 if all the elements in the matrix were 0.

    rx = 2;
    N = 5;
    red = bhat`*xy;
    ev = (yy - red)/(N-rx);
    print ev;

Numerically,

\begin{displaymath}\hat{\sigma}^{2}_{e} = (13799 - 13626.479)/3 \ = \ 50.8405. \end{displaymath}

Now

\begin{eqnarray*}Var(\hat{\bf b}) & = & Var({\bf CX}'{\bf y}) \\
& = & {\bf CX...
...527 \\
-0.0527 & 0.0012 \end{array} \right) \times 50.8405. \\
\end{eqnarray*}


So that,

\begin{displaymath}Var(\hat{h}^{2}) = Var(2\hat{\beta}) = 4(0.0012)50.8405 \ = \
0.2350. \end{displaymath}

Thus, the standard error of the heritability estimate is 0.4848.

2. Offspring on Mid-Parent Average

In this case, there can be several offspring from one set of parents (as in swine or rabbits), and each offspring would have just one record. However, a sire (dam) is mated to just one dam (sire). A model that has been employed is

\begin{displaymath}y_{ij} = \mu + f_{i} + \beta x_{i} + e_{ij}, \end{displaymath}

where yij is an offspring record, $\mu$ is an overall mean, fi is a family or full-sib effect, xi is the average of the records on the two parents, $\beta$ is the regression coefficient which has expectation equal to heritability, and eij is a residual error. Families are assumed to be randomly drawn from the large random mating population, thus,

\begin{eqnarray*}E(y_{ij}) & = & \mu, \\
E(f_{i}) & = & 0, \\
Var(y_{ij}) & = ...
...\
Cov(y_{ij},y_{ij'}) & = & \sigma^{2}_{f} + \beta^{2}_{x}. \\
\end{eqnarray*}


Note that $\sigma^{2}_{x} = \sigma^{2}_{y}/2$ because xi is the mean of two records, and the variance of a mean is equal to the variance of individual records divided by the number of records in the mean, in this case two. Then

\begin{eqnarray*}Cov(y_{ij},y_{ij'}) & = & \sigma^{2}_{f} + .5 \beta^{2} \sigma^{2}_{y}, \\
& = & \rho \sigma^{2}_{y}. \\
\end{eqnarray*}


From this we can determine that

\begin{displaymath}\sigma^{2}_{f} = ( \rho - .5 \beta^{2})\sigma^{2}_{y}, \end{displaymath}

and that

\begin{displaymath}\sigma^{2}_{e} = ( 1 - \rho ) \sigma^{2}_{y}. \end{displaymath}

The ratio of these two variances is then

\begin{displaymath}\frac{\sigma^{2}_{e}}{\sigma^{2}_{f}} =
\frac{(1-\rho)}{(\rho - .5 \beta^{2})} \ = \ k. \end{displaymath}

To illustrate the method, suppose we have the following data on 8 progeny from 4 sire-dam pairs.

i j yij Sire Dam Average
      Record Record xi
1 1 14 21 25 23
1 2 30 21 25 23
1 3 20 21 25 23
           
2 1 10 20 12 16
           
3 1 25 4 20 12
3 2 7 4 20 12
           
4 1 20 8 4 6
4 2 13 8 4 6

Let $\rho = 0.5$ and assume apriori that $\beta = 0.25$, then

\begin{displaymath}k = \frac{(1-0.5)}{(.5 - .5(.25)^{2})} \ = \ 1.0666667. \end{displaymath}

The SAS statements to analyze the data would be

    x = {1  1 0 0 0  23,
         1  1 0 0 0  23,
         1  1 0 0 0  23,
         1  0 1 0 0  16,
         1  0 0 1 0  12,
         1  0 0 1 0  12,
         1  0 0 0 1   6,
         1  0 0 0 1   6};
    y = {14, 30, 20, 10, 25, 7, 20, 13};
    g = { 0 1 1 1 1 0 };
    gi = diag(g);
    xx = x`*x;
    xy = x`*y;
    yy = y`*y;
    mme = xx + gi*1.0666667;
    c = inv(mme);
    bhat = c*xy;
    red = bhat`*xy;
    ev = (yy - red)/6;
    print xx, xy, yy, c, bhat, red, ev;
    end;

The denominator in the equation for ev is 6 because N=8 and the rank of the fixed effects columns (i.e. $\mu$and xi) is 2 giving a difference of 6. The mixed model equations, mme, are

\begin{displaymath}\left( \begin{array}{rrrrrr} 8 & 3 & 1 & 2 & 2 & 121 \\
3 & ...
...y}{r} 139 \\ 64 \\ 10 \\ 32 \\ 33 \\ 2214
\end{array} \right). \end{displaymath}

The inverse of the coefficient matrix, mme, is

\begin{displaymath}\left( \begin{array}{rrrrrr}
.2202 & .5363 & -.0822 & -.4451...
...-.0561 & -.0067 & -.0141 & .0488 & .0089
\end{array} \right), \end{displaymath}

and the solution vector is

\begin{displaymath}\left( \begin{array}{c} \hat{\mu} \\ \hat{f}_{1} \\
\hat{f}_...
... \\ -3.3533 \\
.04598 \\ 1.3510 \\ .2502 \end{array} \right). \end{displaymath}

Thus, the estimate of heritability from this data was $\hat{\beta} = \hat{h}^{2} = .2502$. The variance of this estimate equals the variance of $\hat{\beta}$which is $.0089 \hat{\sigma}^{2}_{e}$, where

\begin{displaymath}\hat{\sigma}^{2}_{e} = (2839 - 2488.5223)/6 \ = \ 58.4130, \end{displaymath}

so that

\begin{displaymath}Var(\hat{h}^{2}) = .0089(58.4130) \ = \ 0.51872, \end{displaymath}

or the standard error of the estimate was 0.72, which is rather large.

One must also estimate the family variance, $\sigma^{2}_{f}$ from this model. A procedure that will be described later, called restricted (or residual) maximum likelihood (REML), can be used for this purpose. Let the inverse of mme be represented as

\begin{displaymath}\left( \begin{array}{lll} c_{\mu \mu} & {\bf c}'_{\mu f} &
c...
...ta} & {\bf c}'_{f \beta} & c_{\beta \beta} \end{array}\right). \end{displaymath}

The estimates of the variances by REML are given by

\begin{eqnarray*}\hat{\sigma}^{2}_{e} & = & ({\bf y}'{\bf y} - \hat{\bf b}'{\bf ...
...
& = & ( 16.8990 \ + \ 2.6373(58.4130))/4, \\
& = & 42.7379.
\end{eqnarray*}


The new estimates give a new value of the variance ratio, i.e.

\begin{displaymath}\frac{\hat{\sigma}^{2}_{e}}{\hat{\sigma}^{2}_{f}} = 1.3668, \end{displaymath}

and this new ratio is used to construct new MME, and the whole process is repeated until the old and new ratios are no longer different. Unfortunately for this small example the process does not converge to a useful result.

Families would be treated as fixed effects if they were selected as the better families out of the population. Then simple least squares equations are used. That is,

\begin{displaymath}\hat{\bf b} = ({\bf W}'{\bf W})^{-}{\bf W}'{\bf y}. \end{displaymath}

Then $\hat{\beta}$ is an estimate of heritability. In this situation there is no need to estimate a family variance because it does not reflect the true underlying family variance for a large population of randomly mating parents due to the selection. The estimate of heritability is still unbiased because the selection of families, that caused them to be treated as fixed, affects both the covariance between offspring and mid-parent average as well as the variance of mid-parent averages, and affects them to the same degree so that they cancel each other out in the regression coefficient.

3. Family Means Analysis

Only the averages of the offspring and the number of offspring in each family are known. Individual records of offspring are not known. The model for individual records is the same as in the last section, but now we only have family means. The data are as follows:

Family Means Data.
Family Mid-parent Number in Family
Number Average Family Mean
1 23 3 $20\frac{1}{3}$
2 16 1 13
3 15 2 $19\frac{1}{2}$
4 12 2 $13\frac{1}{2}$

The equation of the model, in this case, is

\begin{displaymath}\bar{y}_{i} = \mu + \beta x_{i} + \bar{e}_{i}, \end{displaymath}

where

\begin{displaymath}\bar{e}_{i} = f_{i} + \frac{1}{n_{i}} \sum e_{ij}, \end{displaymath}

then

\begin{eqnarray*}Var(\bar{e}_{i}) & = & \sigma^{2}_{f} + \frac{\sigma^{2}_{e}}{n_{i}}, \\
& = & ( 1 \ + \ k_{f}/n_{i})\sigma^{2}_{f},
\end{eqnarray*}


where

\begin{displaymath}k_{f} = \frac{\sigma^{2}_{e}}{\sigma^{2}_{f}}. \end{displaymath}

Let

\begin{displaymath}Var({\bf e}) = {\bf R} = Var \left( \begin{array}{c}
\bar{e}...
...bar{e}_{2} \\ \bar{e}_{3} \\ \bar{e}_{4}
\end{array} \right), \end{displaymath}

and if kf=2, then

\begin{displaymath}{\bf R} = \left( \begin{array}{rrrr}
\frac{5}{3} & 0 & 0 & 0...
... & 2 & 0 \\
0 & 0 & 0 & 2 \end{array} \right) \sigma^{2}_{f}. \end{displaymath}

Weighted least squares are used in this case (but note that kfis assumed to be known) with ${\bf R}^{-1}$ used as the weights.

\begin{eqnarray*}{\bf X} & = & \left( \begin{array}{rr}
1 & 23 \\ 1 & 16 \\ 1 &...
...left( \begin{array}{r} 7.99704 \\ 0.5384798
\end{array} \right).
\end{eqnarray*}


The key to this analysis is to account for different number of offspring per family mean by using a weighted LS analysis.

4. Nested Design

In this situation, sires are mated to several dams each, but there is still only one offspring per dam and one record per offspring. Sires are assumed to be unrelated and not selected, and sires and dams are mated at random. The equation of the model is

\begin{displaymath}y_{ij} = \mu + s_{i} + \beta x_{ij} + e_{ij}, \end{displaymath}

where xij in this model represents a record on the dam and not the mid-parent value. Sires are a random effect in this model, and therefore, a sire variance must be estimated using a procedure like REML. In matrix form, the model is

\begin{displaymath}{\bf y} = \mu {\bf 1} + {\bf Zs} + {\bf x}\beta + {\bf e}, \end{displaymath}

and the appropriate MME are

\begin{displaymath}\left( \begin{array}{ccc} {\bf 1}'{\bf 1} & {\bf 1}'{\bf Z} &...
...y} \\
{\bf Z}'{\bf y} \\ {\bf x}'{\bf y} \end{array} \right), \end{displaymath}

where

\begin{displaymath}k_{s} = \frac{\sigma^{2}_{e}}{\sigma^{2}_{s}}, \end{displaymath}

the ratio of residual to sire variances. The sire variance in this model is equivalent to the covariance among half-sibs, and therefore has expectation equal to one quarter the additive genetic variance. Also, $\hat{\beta}$ in this case has expectation equal to one half heritability, because this is a regression on only one parent and not the mid-parent average. Some example data are given in the next table.

Dams nested within Sire Data.
Sire Dam Offspring
Number Record Record
1 50 41
1 42 39
1 45 52
2 38 44
2 48 60
3 35 43
3 27 37
3 36 30
3 43 47

Assuming that ks=12, then the MME and subsequent results are as follows:

\begin{eqnarray*}\left( \begin{array}{c} \hat{\mu} \\ \hat{\bf s} \\ \hat{\beta}...
...4735 \\
0.915129 \\ -0.220394 \\ 0.7048412 \end{array} \right).
\end{eqnarray*}


The residual variance is

\begin{eqnarray*}\hat{\sigma}^{2}_{e} & = & ({\bf y}'{\bf y} - \hat{b}'{\bf W}'{...
..., \\
& = & (17,769 \ - \ 17,384.09) / 7, \\
& = & 54.987152.
\end{eqnarray*}


and the sire variance is estimated by

\begin{displaymath}\hat{\sigma}^{2}_{s} = ({\bf s}'{\bf s} + tr{\bf C}_{ss}
\hat{\sigma}^{2}_{e})/ n_{s}, \end{displaymath}

where ns is the number of sires, and ${\bf C}_{ss}$ is the matrix of inverse elements from the MME corresponding to the sire equations.

If sires have been selected in some way, then treating them as fixed effects is necessary, and the sire variance is not estimated. Then $\hat{\beta}$ is still an unbiased estimate of one half heritability provided that the other assumptions of the model are still true.


next up previous

This LaTeX document is available as postscript or asAdobe PDF.

Larry Schaeffer
2001-11-20