next up previous


This LaTeX document is available as postscript or asAdobe PDF.

Inverse of Genomic Relationship Matrix and Application to Non-Additive Genetic Models
L. R. Schaeffer
Centre for Genetic Improvement of Livestock
Department of Animal & Poultry Science
University of Guelph, Guelph, ON, Canada N1G 2W1
December 28, 2002

Introduction

The infinitesimal animal model can be expanded by partitioning each animal's breeding value into genomic contributions from the sire and from the dam. The term genomic is used because reference is to the average relationship over the entire genome and not to one particular locus. The genomic relationship matrix, ${\bf G}$, was described by SMITH and ALLAIRE (1992) and the inverse of this matrix was derived by SCHAEFFER et al. (1993) although computations were too prohibitive for practical applications. Another attempt at a simple inverse was presented by ALI and FREEMAN (1994). However, using the algorithm of MEUWISSEN and LUO (1992), a very rapid method for determining the elements needed for the inverse of the genomic relationship matrix can be derived. Besides the inverse of the genomic relationship matrix, the same routines can provide the genomic relationships between any pair of individuals. The dominance genetic relationship between two individuals can be computed from the genomic relationships, even for inbred individuals.

With the ability to compute dominance relationships, the estimation of any number of non-additive genetic effects and variances, assuming a randomly mating population, can be accomplished using a simple animal model, and does not require the inverses of any of the non-additive genetic variance-covariance matrices. A description of this process is provided in this paper.

A Genomic Model

A genomic model separates the additive genetic merit of an individual into contributions from the male parent and from the female parent. In matrix form, a genomic model is

\begin{displaymath}{\bf y} = {\bf Xb} + {\bf Zg}_{m} + {\bf Zg}_{f}
+ {\bf e}, \end{displaymath}

where ${\bf g}_{m}$ is the male parent contribution and ${\bf g}_{f}$ is the female parent contribution. Summed together they give the total additive genetic merit of an individual. Now let ${\bf g}$ represent all of the genomic effects ordered in pairs by animal, so that

\begin{displaymath}\left( \begin{array}{c} g_{1m} \\ g_{1f} \\
g_{2m} \\ g_{2f}...
...enome} \\
\mbox{animal n, female genome}
\end{array} \right). \end{displaymath}

Then

\begin{displaymath}Var( {\bf g}) = {\bf G}\sigma^{2}_{g}. \end{displaymath}

The infinitesimal model assumes that there are an infinite number of loci affecting the trait, each with a small and nearly equal variance. The genomic relationship matrix can be constructed using fairly simple rules. Consider the following pedigrees.

Animal Sire Dam
A - -
B - -
C A B
D A C
E D B
Expand this table to identify the genomic pedigree structure, where X1 denotes the male genome contribution to animal X, and X2 denotes the female genome contribution to animal X.

Animal Genome Parent1 Parent2
A A1 - -
A A2 - -
B B1 - -
B B2 - -
C C1 A1 A2
C C2 B1 B2
D D1 A1 A2
D D2 C1 C2
E E1 D1 D2
E E2 B1 B2

The genomic relationship matrix will have order equal to twice the number of animals. The diagonals of a genomic relationship matrix are all equal to 1. The completed matrix is shown below.

            A B A C D B
    A B C D E
    A1 A2 B1 B2 C1 C2 D1 D2 E1 E2
  A1 1 0 0 0 .5 0 .5 .25 .375 0
A                      
  A2 0 1 0 0 .5 0 .5 .25 .375 0
  B1 0 0 1 0 0 .5 0 .25 .125 .5
B                      
  B2 0 0 0 1 0 .5 0 .25 .125 .5
  C1 .5 .5 0 0 1 0 .5 .5 .5 0
C                      
  C2 0 0 .5 .5 0 1 0 .5 .25 .5
  D1 .5 .5 0 0 .5 0 1 .25 .625 0
D                      
  D2 .25 .25 .25 .25 .5 .5 .25 1 .625 .25
  E1 .375 .375 .125 .125 .5 .25 .625 .625 1 .125
E                      
  E2 0 0 .5 .5 0 .5 0 .25 .125 1

Because the parents of A and B are unknown, then they are assumed to be randomly drawn from the large random mating population and assumed to have no genes identical by descent between them.

Let (A1,C1) indicate an element in the above table between the A1 male parent contribution of animal A and the C1 male parent contribution of animal C, then the value that goes into that location is


\begin{displaymath}(\mbox{A1,C1}) = 0.5 * [ (\mbox{A1,A1})
+ (\mbox{A1,A2}) ] = 0.5. \end{displaymath}

Similarly, for the rest of the A1 row,


\begin{eqnarray*}(\mbox{A1,C2}) & = & 0.5 * [ (\mbox{A1,B1}) +
(\mbox{A1,B2}) ...
...,E2}) & = & 0.5 * [ (\mbox{A1,B1}) +
(\mbox{A1,B2}) ] = 0. \\
\end{eqnarray*}


This recursive pattern follows through the entire table. Animals D and E are inbred and the offdiagonals between D1 and D2 and between E1 and E2 show the inbreeding coefficient. Both the additive and dominance relationship matrices may be obtained from the genomic relationship table. The additive relationship between animals A and C, for example, is given by

\begin{displaymath}0.5 * [ (\mbox{A1,C1}) + (\mbox{A1,C2}) +
(\mbox{A2,C1}) + (\mbox{A2,C2}) ] = 0.5. \end{displaymath}

Add the four numbers in each square of the table and divide by 2. The usual additive numerator relationship matrix, ${\bf A}$, is then

\begin{displaymath}{\bf A} = \left( \begin{array}{lllll}
1 & 0 & .5 & .75 & .375...
...& .75 \\
.375 & .625 & .625 & .75 & 1.125 \end{array}\right). \end{displaymath}

The dominance genetic relationship between animals X and Y, in general, is given by

\begin{displaymath}(\mbox{X1,Y1})*(\mbox{X2,Y2}) +
(\mbox{X1,Y2})*(\mbox{X2,Y1}). \end{displaymath}

The complete dominance relationship matrix, ${\bf D}$ is

\begin{displaymath}{\bf D} = \left( \begin{array}{lllll}
1 & 0 & 0 & .25 & 0 \\ ...
...5 \\
0 & .125 & .25 & .15625 & 1.015625
\end{array} \right). \end{displaymath}

Inverse of The Genomic Matrix

HENDERSON (1975) presented a fast method to invert the additive genetic relationship matrix for the case when animals were not inbred. QUAAS (1976) showed how to compute the inverse when animals were inbred. The fastest method of calculating the inbreeding coefficient was presented by MEUWISSEN and LUO (1992). A combination of the results of these papers leads to a rapid algorithm for inverting the genomic relationship matrix.

Any positive definite matrix may be partitioned as

\begin{displaymath}{\bf G} = {\bf TBT}' \end{displaymath}

where ${\bf T}$ is a lower triangular matrix and ${\bf B}$ is a diagonal matrix. The MEUWISSEN and LUO (1992) algorithm has been modified to provide the diagonals of ${\bf B}$ while forming a row of ${\bf T}$. A modification is necessary because all of the diagonals of ${\bf G}$ are equal to one rather than to one plus the inbreeding coefficient. Animal genomes must be processed in sequence from oldest to youngest. For animal genomes with unknown parent genomes, the diagonals of ${\bf B}$are equal to 1. Therefore, the diagonals of ${\bf B}$for A1, A2, B1, and B2 are equal to 1.

Begin with C1, the parent genomes are A1 and A2. Form a table as follows:

Genome t B
C1 1.0 x
A1 0.5 1
A2 0.5 1

The diagonal element for (C1,C1) in ${\bf G}$ is equal to 1, which is equal to ${\bf t}'{\bf Bt}$, which is

\begin{displaymath}(1)^{2}x \ + \ (0.5)^{2}(1) \ + \ (0.5)^{2}(1) \ = \ 1, \end{displaymath}

which can be re-arranged and solved for x, giving

\begin{displaymath}x = 1 \ - \ 0.25 \ - \ 0.25 \ = \ 0.5. \end{displaymath}

Similar tables and calculations can be made for C2, D1, and E2. Thus, their diagonal elements of ${\bf B}$ are also equal to 0.5.

For D2, the parent genomes are C1 and C2.

Genome t B
D2 1.0 x
C1 0.5 0.5
C2 0.5 0.5
Now add the parent genomes of C1 and C2, as follows:

Genome t B
D2 1.0 x
C1 0.5 0.5
C2 0.5 0.5
A1 0.25 1
A2 0.25 1
B1 0.25 1
B2 0.25 1

The next step would be to add the 'parents' of A1 and A2, then B1 and B2, but these 'parents' are unknown, and so no further additions to the table are made. Now ${\bf t}'{\bf Bt}$ as

x + (0.5)2(0.5) + (0.5)2(0.5) + 4(0.25)2(1) = 1,

or

\begin{displaymath}x = 1 - 0.125 - 0.125 - 4(0.0625) \ = \ 0.5. \end{displaymath}

So far, the diagonals of ${\bf B}$ have been either 1 or .5. For E1, the parent genomes are D1 and D2. As the animals become younger, the length of these tables can become greater, and with n generations there can be up to 2n+1 elements in the table.

Genome t B
E1 1.0 x
D1 0.5 0.5
D2 0.5 0.5
A1 0.25 1
A2 0.25 1
C1 0.25 0.5
C2 0.25 0.5
A1 0.125 1
A2 0.125 1
B1 0.125 1
B2 0.125 1

Notice that A1 and A2 appear twice in the table, and their coefficients in ${\bf t}$ must be added together before computing ${\bf t}'{\bf Bt}$. The new table, after adding coefficents is

Genome t B
E1 1.0 x
D1 0.5 0.5
D2 0.5 0.5
A1 0.375 1
A2 0.375 1
C1 0.25 0.5
C2 0.25 0.5
B1 0.125 1
B2 0.125 1

Then

x = 1 - 2(0.5)2(0.5) - 2(0.375)2(1) - 2(0.25)2(0.5) - 2(0.125)2(1) = 0.375.

The complete results for the diagonals of ${\bf B}$ are

Animal Genome Parent1 Parent2 ${\bf B}$
A A1 - - 1
A A2 - - 1
B B1 - - 1
B B2 - - 1
C C1 A1 A2 0.5
C C2 B1 B2 0.5
D D1 A1 A2 0.5
D D2 C1 C2 0.5
E E1 D1 D2 0.375
E E2 B1 B2 0.5

The inverse of ${\bf G}$ is

\begin{displaymath}{\bf G}^{-1} = {\bf T}^{-T}{\bf B}^{-1}{\bf T}^{-1}, \end{displaymath}

and as HENDERSON (1975) discovered, the elements in ${\bf T}^{-1}$ are all 1's on the diagonals, and each row has a -0.5 in the columns corresponding to the two parent genomes. All other elements are equal to 0. This structure leads to a simple set of rules for creating the inverse of ${\bf G}$, which can be accomplished by going through the pedigrees in the above table, one genome at a time. Let bi be equal to one over the diagonal of ${\bf B}$for the ith genome, and let p1 and p2 be the parent genomes, then the contributions to the inverse of ${\bf G}$ from this genome would be to add the following values:

  i p1 p2
i bi 0.5bi 0.5bi
p1 0.5bi 0.25bi 0.25bi
p2 0.5bi 0.25bi 0.25bi

Applying these rules, then the complete inverse of ${\bf G}$for the example animals is


\begin{displaymath}{\bf G}^{-1} = \frac{1}{3} \left( \begin{array}{rrrrrrrrrr}
...
...& 0 & -3 & -3 & 0 & 0 & 0 & 0 & 0 & 6 \\
\end{array} \right). \end{displaymath}

Dominance Relationships

Given animals D and E from the previous pedigree and assuming that the diagonals of ${\bf B}$ are computed for all animals, then dominance relationships can be easily computed using the same algorithm as for computing the diagonals of ${\bf B}$. Construct a table of the rows of ${\bf T}$ for D1, D2, E1, and E2, for example, as follows:

  D1 D2 E1 E2 B
A1 .5 .25 .375 0 1
A2 .5 .25 .375 0 1
B1 0 .25 .125 .5 1
B2 0 .25 .125 .5 1
C1 0 .5 .25 0 .5
C2 0 .5 .25 0 .5
D1 1 0 .5 0 .5
D2 0 1 .5 0 .5
E1 0 0 1 0 .375
E2 0 0 0 1 .5

The 2 by 2 block between animals D and E can be calculated by multiplying columns together times the diagonal of ${\bf B}$and summing down the table. For example,


\begin{displaymath}(\mbox{D1,E1}) = (.5)(.375)(1) + (.5)(.375)(1) + 1(.5)(.5) = .625, \end{displaymath}

and likewise,


\begin{eqnarray*}(\mbox{D1,E2}) & = & 0, \\
(\mbox{D2,E1}) & = & 2(.25)(.375)(1...
...= & .625, \\
(\mbox{D2,E2}) & = & 2(.25)(.5)(1) \\
& = & .25.
\end{eqnarray*}


Naturally, one does not need to explicitly make up the above table, and only ancestor parent genomes that are in common need to be involved. Computationally, this algorithm is very fast for one pair of animals, but with 10,000 animals there are over 50 million pairs of animals for which to compute these 2 by 2 blocks, and this could take a lot of time. A short cut would be to group animals by sire and dam pairs so that the above table would be the same for all of their progeny. There could be other strategies for reducing the number of calculations as well.

Once a 2 by 2 block is calculated then the additive and dominance relationships are easily computed. In the above example, aDE = .75, and dDE= .15625. Then the element for the additive by additive genetic effects would be aaDE = (.75)2, and the element for the additive by dominance genetic effects would be adDE = .1171875, and so on. Any number of locus interactions could be considered. All non-zero elements of these matrices need to be stored to a file in full-stored mode and sorted by columns within rows, to be used in the estimation of non-additive genetic effects as described in the next section.

Non-Additive Genetic Models

HENDERSON (1984) described a general non-additive genetic model as

\begin{displaymath}{\bf y} = {\bf Xb} + {\bf Z}({\bf a} + {\bf d} +
{\bf aa} + {\bf ad} + {\bf dd}+ \ldots) + {\bf e}, \end{displaymath}

where ${\bf a}$ refers to additive genetic effects, ${\bf d}$ refers to dominance genetic effects, ${\bf aa}$ refers to additive by additive genetic interactions, ${\bf ad}$ refers to additive by dominance genetic interactions, ${\bf dd}$ refers to dominance by dominance genetic interactions, and there could be higher interactions, if desired. Animals are assumed to be mating randomly. The assumed covariance structure is

\begin{displaymath}Var \left( \begin{array}{c} {\bf a} \\
{\bf d} \\ {\bf aa} \...
...vdots & \vdots & \vdots & \vdots & \ddots
\end{array} \right), \end{displaymath}

where $\otimes$ is the Hadamard product of two matrices, (i.e., element by element products). To simplify the description of the proposed algorithm, consider only ${\bf a}$ and ${\bf d}$ effects in the model. Extension to further non-additive genetic effects should become obvious.

In a non-inbred, random mating population in joint equilibrium, then where ${\bf A}$ is the additive relationship matrix, and ${\bf D}$ is the dominance relationship matrix. Much is known about ${\bf A}$ and its inverse, but construction of ${\bf D}$ and its inverse is not known in terms of easy methods to compute them. The same is true for all higher order epistatic genetic effects and the inverses of their variance-covariance matrices. The purpose of this paper is to show that the inverses of these matrices are not needed explicitly.

The mixed model equations, MME, are

\begin{displaymath}\left( \begin{array}{ccc} {\bf X}'{\bf X} & {\bf X}'
{\bf Z} ...
...y} \\ {\bf Z}'{\bf y} \\
{\bf Z}'{\bf y} \end{array} \right), \end{displaymath}

where $\alpha_{10}$ is the ratio of residual to additive genetic variances, $\alpha_{01}$ is the ratio of residual to dominance genetic variances, and so forth. HENDERSON (1984) showed that subtraction of the second equation from the third gives

\begin{displaymath}{\bf0}\hat{\bf b} \ - \ {\bf A}^{-1}\alpha_{10}
\hat{\bf a} \ + \ {\bf D}^{-1}\alpha_{01} \hat{\bf d}
\ = \ {\bf0}, \end{displaymath}

and that rearrangement of this equation gives

\begin{eqnarray*}({\bf D}^{-1}\hat{\bf d}) & = & {\bf A}^{-1}\hat{\bf a}
\frac{\...
...{\bf A}^{-1}\hat{\bf a}
( \frac{\alpha_{10}}{\alpha_{01}})^{2}.
\end{eqnarray*}


Hence, the inverse of ${\bf D}$ is not needed to solve for $\hat{\bf d}$ or to compute the quadratic form of dominance genetic effects that is necessary for estimation of variances. This same manipulation of equations applies to all of the other epistatic genetic effects as well. The inverses of their variance-covariance matrices are not needed.

Thus, the elements of ${\bf A}$ and ${\bf D}$ can be derived from the routines for the genomic relationship matrix, and elements of ${\bf A}^{-1}$ can be derived as usual. All epistatic effects can be written as a function of the additive genetic effects and the prior variance ratios.

General Computing Strategy

The epistatic genetic covariance matrices should be stored in files (non-zero elements only) in full stored mode and sorted by columns within rows. The number of these files depends on the degree of non-additivity being considered in the model. Set the starting values for $\hat{\bf a}$, $\hat{\bf d}$, $\hat{\bf aa}$, etc., to null vectors to begin the iteration process. The iteration process involves a repetition of the following steps.

1.
Correct the observations for all non-additive genetic effects solutions,

\begin{displaymath}{\bf w} = ({\bf y} - {\bf Z}(\hat{\bf d} + \hat{\bf aa}
+ \ldots )). \end{displaymath}

2.
Solve the following equations,

\begin{displaymath}\left( \begin{array}{cc} {\bf X}'{\bf X} & {\bf X}'{\bf Z} \\...
...ay}{l}
{\bf X}'{\bf w} \\ {\bf Z}'{\bf w} \end{array} \right). \end{displaymath}

3.
Calculate

\begin{eqnarray*}{\bf m} & = & {\bf A}^{-1}\hat{\bf a}
\end{eqnarray*}


then

\begin{eqnarray*}\hat{\bf d} & = & {\bf D}{\bf m} \frac{\alpha_{10}}{\alpha_{01}...
...mes {\bf D}){\bf m} \frac{\alpha_{10}}{\alpha_{02}}, \ldots \\
\end{eqnarray*}


and so on for the epistatic effects being included in the model.

4.
If the quadratic forms are needed, then compute

\begin{displaymath}\begin{array}{c} \hat{\bf d}'{\bf m} \frac{\alpha_{10}}{\alph...
...\bf m} \frac{\alpha_{10}}{\alpha_{02}}, \ldots \\
\end{array} \end{displaymath}

Numerical Example


next up previous

This LaTeX document is available as postscript or asAdobe PDF.

Larry Schaeffer
2002-12-28