This LaTeX document is available as postscript or asAdobe PDF.
Increasing use is being made of animal models in animal breeding. For example, animal models are being applied routinely for genetic evaluation of pigs in Canada and beef cattle in the United States. Animal models for joint bull and cow evaluation of dairy cattle in both the U.S. and Canada have been developed and implemented. Also, animal models are now being used for estimation of genetic parameters.
One of the impediments to application of animal models has been their computational difficulty. They can be very demanding. However, development of efficient computing strategies and increased computing power at reasonable costs are making the application of animal models to livestock improvement increasingly more practical. In this section we examine briefly some computing strategies for applications of animal models and examine some actual applications to genetic improvement in field populations.
Prediction of Breeding Values
In prediction of breeding values from data from field populations, the number of equations is usually too large to permit an explicit solution and solutions are obtained by iterative procedures. Gauss-Seidel or a modification to it is frequently used. The usual method is to form the equations for the coefficient matrix on an external storage device, such as disk or tape, and read the coefficient matrix for each cycle of iteration. Usually only non-zero elements of the coefficient matrix are stored. Frequently the order of the equations has been reduced through absorption of some effects or through use of a reduced animal model. These procedures, although efficient in terms of use of computer memory, are time consuming because they must access external storage to read the coefficient matrix with each cycle of iteration.
An alternative computing strategy was described by Schaeffer and Kennedy (1986, 3rd World Cong. Genet. Appl. Livest. Prod. XII:382) which requires reading the data and a pedigree file with each cycle of iteration. In most applications, this requires fewer read operations per cycle of iteration and as a result less computer time for solving the equations. The animal model coefficient matrix equations are not set up explicitly, but the structure of the equations is capitalized upon to adjust observations and breeding value estimates as the data and pedigree files are read. Rather than describe the method in general terms, the method is illustrated here for a specific model and application to pig breeding.
Consider the model
We have the following data
| Herd-Year-Season | Litter | Animal | Sire | Dam |
| 1 | 1 | 6 | 1 | 2 |
| 1 | 1 | 7 | 1 | 2 |
| 1 | 1 | 8 | 1 | 2 |
| 2 | 2 | 9 | 1 | 3 |
| 2 | 2 | 10 | 1 | 3 |
| 2 | 3 | 11 | 4 | 5 |
| 2 | 3 | 12 | 4 | 5 |
| Sire or | Dam or | |||||
| Type | H-Y-S | Litter | Animal | Progeny | Mate | Record |
| 2 | 0 | 0 | 1 | 0 | 0 | 0 |
| 3 | 0 | 0 | 1 | 6 | 2 | 0 |
| 3 | 0 | 0 | 1 | 7 | 2 | 0 |
| 3 | 0 | 0 | 1 | 8 | 2 | 0 |
| 3 | 0 | 0 | 1 | 9 | 3 | 0 |
| 3 | 0 | 0 | 1 | 10 | 3 | 0 |
| 2 | 0 | 0 | 2 | 0 | 0 | 0 |
| 3 | 0 | 0 | 2 | 6 | 1 | 0 |
| 3 | 0 | 0 | 2 | 7 | 1 | 0 |
| 3 | 0 | 0 | 2 | 8 | 1 | 0 |
| 2 | 0 | 0 | 3 | 0 | 0 | 0 |
| 3 | 0 | 0 | 3 | 9 | 1 | 0 |
| 3 | 0 | 0 | 3 | 10 | 1 | 0 |
| 2 | 0 | 0 | 4 | 0 | 0 | 0 |
| 3 | 0 | 0 | 4 | 11 | 5 | 0 |
| 3 | 0 | 0 | 4 | 12 | 5 | 0 |
| 2 | 0 | 0 | 5 | 0 | 0 | 0 |
| 3 | 0 | 0 | 5 | 11 | 4 | 0 |
| 3 | 0 | 0 | 5 | 12 | 4 | 0 |
| 1 | 1 | 1 | 6 | 1 | 2 | 150 |
| 1 | 1 | 1 | 7 | 1 | 2 | 144 |
| 1 | 1 | 1 | 8 | 1 | 2 | 156 |
continued ...
| Sire or | Dam or | |||||
| Type | H-Y-S | Litter | Animal | Progeny | Mate | Record |
| 1 | 2 | 2 | 9 | 1 | 3 | 148 |
| 1 | 2 | 2 | 10 | 1 | 3 | 145 |
| 1 | 2 | 3 | 11 | 4 | 5 | 140 |
| 1 | 2 | 3 | 12 | 4 | 5 | 151 |
Let
,
and
be vectors of solutions
for herd-year-seasons, litters and animals, respectively. Initially,
| Cycle 1 | Convergence | |
|
|
150 | 149.96 |
|
|
146 | 146.00 |
|
|
0 | 0 |
|
|
.2 | .154 |
|
|
-.2 | -.154 |
|
|
0 | .077 |
|
|
0 | 0 |
|
|
0 | .077 |
|
|
0 | -.077 |
|
|
0 | -.077 |
|
|
0 | .039 |
|
|
-.857 | -.819 |
|
|
.857 | .896 |
|
|
.257 | .330 |
|
|
-.171 | -.099 |
|
|
-.829 | -.901 |
|
|
.743 | .670 |
Schaeffer and Kennedy (1986) compared this procedure of iterating on the data with genetic evaluation through the usual process of storing the equations for the coefficient matrix on disk. A reduced animal model was used for the latter. The data were 86,385 growth and backfat records on pigs. Iterating on the data took only 43% as much computing time as the usual method for 120 cycles of iteration. Further iterating on the data took fewer cycles of iteration to reach given convergence criteria. The procedure can also be applied to reduced animal models, multiple trait evaluation and variance component estimation. Additional details are in Schaeffer and Kennedy (1986) and Schaeffer and Wilton (1987, Mimeo, Univ. Guelph).
Estimation of Variance Components
Variance component estimation under an animal model can be very
demanding computationally. For example, obtaining REML estimates of
and
requires iterating on
Use of An Equivalent Model
Most variance component estimation programs assume that the random
effects are independent of each other, eg.
.
Of course with the animal model,
.
Meyer (1987, J. Anim. Breed. Genet. 104:163)
suggested use of a transformation of
,
,
such that
.
Recall that
where
is a diagonal matrix.
This can also be expressed as
where if qi is
the ith diagonal element of
and wi is the ith diagonal
element of
,
for all i. Let
Reduced Animal Model
Just as use of a gametic or reduced animal model can reduce the order of the coefficient matrix for prediction of breeding values, it can also do so for estimation of variance components. With a reduced animal model, if fixed effects are absorbed, the order of equations to be inverted is reduced to the number of parents.
Illustrations of variance component estimation under reduced animal models by MIVQUE and REML are in Sorensen and Kennedy (1986, J. Anim. Sci. 68:245) and Henderson (1986, J. Dairy Sci. 69:1394).
Avoiding the Inverse of the Coefficient Matrix
The limiting factor to applications of animal models for variance component estimation has been obtaining the inverse of the coefficient matrix of the order of the number of animals or parents. However, recently Grasser et al. (1987, J. Anim. Sci. 64:1362) have presented a derivative-free REML algorithm for animal models that does not require direct inversion. Their procedure is applicable to relatively large data sets.
The model is
Solution of
is
The procedures can be extended to models with more than one random
effect, and a general purpose program DFREML for a variety of animal models
has been written by K. Meyer. Southwood et al. (1989, J. Dairy Sci. 72:3006) have applied
this procedure to simulated data under cytoplasmic (
)
and
additive (
)
inheritance for records on 5000 cows. Results
were
| True | Estimated | |
| Parameter | Parameter | |
|
|
.30 | .37 |
|
|
.31 | .34 |
|
|
.71 | .69 |
| h2 | .23 | .26 |
| c2 | .23 | .24 |
Application to Joint Bull and Cow Evaluation
Animal models lend themselves to joint bull and cow evaluation, but this can
be difficult computationally. If we assume that repeated records on a cow
are the same trait genetically, the model is
![]() |
(1) |
Cow Evaluation by Intra-Herd Animal Model
Henderson (1975, J. Dairy Sci. 58:1910) proposed an intra-herd animal
model for cow evaluation. Essentially the model assumes that
The animal model equations are set up for a single herd as in (6.1), but
additions are made to diagonal elements of the sire equations and the
corresponding right hand sides. For each sire, add
n(1-r)/(4-h2)to the diagonal of
and
to the right hand side,
where
is the estimated breeding value of the sire from the
across herd genetic evaluation system. The value n is the ``effective
number of daughters'' for the sire from the across herd system and is
computed from the accuracy of sire evaluation as
One cautionary note, the intra-herd system assumes that base
population sires and cows were sampled from a common population and that
is expressed relative to the mean of that population. This
might not be the case. An expedient solution would be to adjust the
estimated breeding value of the sire to the genetic base of the herd,
if possible. If one then wanted to choose among cows in different
herds, the intra-herd evaluations would have to be adjusted again to a
common base. This might be difficult operationally.
The intra-herd model also assumes that unknown ancestors of new cows entering the herd are also from the same base population as the herd. If this is not the case, there is need for some form of grouping to accommodate this. Grouping strategies are considered in the next section. This is difficult for an intra-herd animal model because group size is small.
Inter-Herd Animal Model
Inter-herd animal models for joint bull and cow evaluation nationally have been developed and implemented (Wiggans et al., 1988, J. Dairy Sci. 71, Suppl. 2:54; Robinson and Chesnais, 1988, J. Dairy Sci. 71 Suppl. 2:70). The systems make use of all known relationships among animals and use the computing strategy of Schaeffer and Kennedy (1986).
Previously, joint bull and cow evaluation using an animal model has
been done by Westell and Van Vleck (1987, J. Dairy Sci. 70:1006) on data
from New York State. They pointed out that proper grouping to account for
the fact that unidentified ancestors were not sampled from the same
population was important. They used Thompson's (1979, Biometrics 35:339)
model as a grouping strategy. Let
Use of phantom groups is illustrated for a simpler situation. Consider the following pedigree
| 2 | 1 | 3 | ||||||
| 4 | 5 | |||||||
| 6 |
| Animal | Sire | Dam |
| 1 | P1 | P2 |
| 2 | P3 | P4 |
| 3 | P5 | P6 |
| 4 | 1 | 2 |
| 5 | 1 | 3 |
| 6 | 5 | 4 |
| Animal | Sire | Dam |
| 1 | G1 | G2 |
| 2 | G1 | G2 |
| 3 | G1 | G2 |
| 4 | 1 | 2 |
| 5 | 1 | 3 |
| 6 | 5 | 4 |
The usual animal model equations are
For the example pedigree
Considerable attention has been paid to methods for detecting genes with major effect, but little work has been done on genetic evaluation for traits influenced by major genes. A number of single genes, inherited in a simple Mendelian manner, that can be screened for and identified in the animal have been shown to be associated with quantitative traits of economic importance. However, most of these quantitative traits are likely influenced also by many other genes that have smaller individual effects but large aggregate effects on the trait. Under certain population and breeding structures, these polygenic effects can be confounded with effects of the single gene under investigation, even in the absence of linkage. In the analysis of data to examine effects of single genes on quantitative traits, it is important to disentangle this confounding to obtain unbiased estimates of single gene effects and valid significant levels of tests of hypotheses about these single gene effects. Most studies to date have used least squares analyses to measure single gene effects and have ignored some or all of the effects of the background polygenes.
In this Chapter we consider two situations, where observations on animals can be classified as to genotype without error and where there is error or incertainty in the classification.
Observations Classified as to Genotype Without Error
If animals upon which observations are available can be readily
classified as to whether or not they possess the major gene, BLUE of the
effect of genotype at the major gene locus and BLUP of additive genetic
merit
of animals for the polygenes
influencing the trait can be obtained by treating the major gene or
transgene as a fixed effect assuming the gene is expressed equally in all
animals. If the effect of the major gene on the trait of
concern is additive or incompletely dominant, then heterozygotes
need to be distinguished from homozygotes.
In this section we consider both randomly mated and selected populations and show where use of an individual animal model provides unbiased estimates and exact tests of hypotheses of single gene effects and where use of ordinary least squares (OLS) analysis, the usual method of analysis, does not.
Methodology
Consider the simple genetic model
| yij = gi + aij + eij | (2) |
In matrix notation, the model is
| (3) |
| (4) |
Mixed Model Method
Estimation of effects of single locus genotype under (7.2) and (7.3) are
obtained as follows:
![]() |
= | ![]() |
|
| = | ![]() |
(5) |
| (6) |
The null hypothesis
can be tested from
where
and
Ordinary Least Squares
Frequently, in analysis of data for single gene effects an
operational model of
However, with ordinary least squares, Var
is (incorrectly)
assumed for computation purposes to be
where
and
the correlated error structure from
is ignored.
Under ordinary least squares, estimation of genotype effects is from
| (7) |
Random Mating
We now consider properties of the mixed model and ordinary least squares estimates under random mating.
Mixed Model
Under random mating,
and it is well known
that (given
),
Also, for
Ordinary Least Squares
With analysis by OLS,
This also has ramifications for hypothesis tests involving
.
For
Therefore, the expectation of
,
again assuming for
simplicity that each individual has a single record, is
Now we examine Q, the numerator of the F-test. Under
,
Similarly, the test that the heterozygous genotype is equal to the average
of the homozygous genotypes (i.e. dominance = 0) can be constructed through
.
With this hypothesis, it can be shown
under the null hypothesis that
Selection
Again for simplicity assume that the single locus has two allelic
forms and three possible genotypes. Assume also that at generation 0
the population is in Hardy-Weinberg and linkage equilibrium and let genotypes
g1, g2 and g3 have values of
,
and
and frequencies of p02,
2p0q0 and q02,
respectively. If
represents phenotypic observations then,
before selection,
For the jth selected individual of genotype i
| (8) |
| (9) |
| (10) |
We now consider the effects of this selection on estimates of
from
mixed model methods and OLS.
Mixed Model Methods
Properties of mixed model methods when selection has occurred are not fully
understood. Use of
in (7.4) accommodates the change in variance
due to gametic disequilibrium. Henderson
(1975) has shown that use of the usual mixed model equations (7.4), ignoring
selection, yield BLUE of fixed effects and BLUP of random effects if
selection is on a linear function of the data, which he designated as
,
and
are multivariate normal, which
implies an infinitesimal model for the polygenes,
and
are known and
.
It is this last
restriction that is most troublesome. Henderson has interpreted
this to imply that selection must be within levels of fixed effects.
If we assume the effect of the single locus genotype to be fixed,
the selection process envisioned here clearly is not within genotype at
the single genetic locus and seemingly does not fit Henderson's requirement
that
.
With repeated sampling, however,
is not fixed and will vary from sample to sample. In our sampling
scheme, so will
.
It seems intuitively reasonable that, if the data used for analyses
contain all the observations used in the selection decision process, use
of (7.4) will provide unbiased estimates of
,
over repeated
sampling, even though selection is directly on the observations
which include the contributions of the effects of genotype at the single
genetic locus. However, if data upon which selection was practiced are not
included, some bias will result.
Ordinary Least Squares
We now consider properties of OLS estimates of
under selection.
First consider the situation where only data from the progeny of selected
parents are used (generation 1). From (7.7) and (7.9) it is easy to show
that the estimate of
as obtained from
The bias is directly proportional to heritability of the polygenes. Under an additive model the bias depends mostly upon selection intensity, and increases with increasing intensity. With complete dominance, the bias is greatest for small initial frequencies of the favorable gene and tends to zero at high gene frequencies for all intensities of selection. With a recessive model the opposite tends to occur.
If
is estimated as
If data are pooled over generations 0 and 1, and all data of generation 0
are included, the situation is more complicated. The OLS estimate of g1is
Similarly, the expected value of the OLS estimate of g3 is
| (11) |
The estimate of
requires an estimate of g2. The OLS estimate
of g2 is
| = | |||
| - p0p1(p0rv1 + q0sv2)/2(p02 + p12) | |||
| - q0q1(p0sv2 + q0tv3)/2(q02 + q12) | |||
| = | ![]() |
||
![]() |
|||
![]() |
(12) |
Bias in
is influenced greatly by selection and is generally
positive with intense selection. At low selection intensities
bias in
is small. Under an additive model at very high
selection intensity, bias can be greater than the size of the true effect
but decreases proportionally as the magnitude of the true effect increases.
Bias is slightly higher at high gene frequency than at low. With dominance
and high intensities of selection, bias is more dependent on gene frequency
and peaks at frequencies of the desirable gene of between .1 and .2, and
then drops off quickly as gene frequency increases. With recessive gene
action, bias tends to increase with gene frequency.
As with
,
bias in
is small with weak
selection. With intense selection, direction of the bias depends very much
on mode of gene action and gene frequency. Under an additive model,
large positive biases in
occur at low gene freqency and large
negative biases occur at high gene frequency if selection is very intense.
Otherwise, bias in
is relatively small.
With complete dominance, bias in
is largely positive, can
be high at low gene frequency if selection intensity is high, but tends to
zero at intermediate and high gene frequencies. However, with a recessive
model, large negative biases in
can occur at high gene
frequency and intense selection, with small positive biases at low gene
frequencies. In contrast to the situation in estimation of
,
bias in
tends to be proportionally less for genes of moderate
effect than genes of large effect.
The preceding is based on the assumption that there is an effect of the
single gene (i.e.
). If
,
then
v1
= v2 = v3, r = s = t,
p1 = p0 and
q1 = q0. If we
let
v* = rv1 = sv2 = tv3 then it is simple to show that
(7.7), (7.8) and (7.9), respectively, reduce to
E(y11) = E(y21) =
E(y31) = v*, that is the expected value of an observation on
each genotype at generation 1 is simply the response to selection for
the polygenes. As a result,
of (7.10) reduces to
and
of (7.11) reduces to
,
that is there is
no bias in the least squares estimates of
and
whether
data from both generations are used or data only from generation 1 are
used. Although
and
are unbiased when
true
,
the likelihood of finding spurious ``significant''
effects of
and
increases with selection.
The following simulation results illustrate this.
Records of animals were simulated over five generations for a population
size of 40 (20 males and 20 females). The trait was controlled by a
single locus with two alleles (p=q=0.5 in the base population) as well
as by polygenes. For the polygenes,
was 1 as was the
environmental variance (
)
(i.e. h2=05). Data were
analyzed by an animal model according to (7.2) and by least squares (7.6)
ignoring
in (7.2).
In the first simulation, five of 20 males were randomly selected for breeding each
generation and there was no effect of the single gene (i.e. a=0 and d=0).
The results are summarized in the following table.
Estimates of the additive effect of a single gene (a), average F-ratios to test the null hypothesis Ho:a=0 and frequency of rejection of the null hypothesis using an animal model (AM) and ordinary least squares (OLS) when selection is at random and there is no effect of the single gene (a=0).
| F-Ratio | Freq. Rejection | |||||
| Gen. | AM | OLS | AM | OLS | AM | OLS |
| 0 | -.01 | -.01 | 1.19 | 1.19 | .08 | .08 |
| 3 | -.01 | -.01 | 1.08 | .06 | ||
| 5 | -.00 | -.01 | 1.09 | .05 | ||
When selection is at random, both the animal model and least squares gave unbiased estimates of a at all generations. However, as time progressed, least squares produced inflated F-ratios and the null hypothesis Ho:a=0(which was true) was rejected more frequently than the stated type I error rate (P=.05). The animal model rejected the null hypothesis at the specified error rate. In other words, use of least squares which ignores the background polygenes will likely find ``significant'' single gene effects which are not real. Use of an animal model will not.
With selection of the best five of 20 males on phenotype the situation
is much worse as illustrated in the following table.
Estimates of the additive effect of a single gene (a), average F-ratios to test the null hypothesis Ho:a=0 and frequency of rejection of the null hypothesis using an animal model (AM) and ordinary least squares (OLS) under selection when there is no effect of the single gene (a=0).
| F-Ratio | Freq. Rejection | |||||
| Gen. | AM | OLS | AM | OLS | AM | OLS |
| 0 | -.01 | -.01 | 1.19 | 1.19 | .08 | .08 |
| 3 | -.01 | -.04 | .05 | |||
| 5 | -.00 | -.01 | .05 | |||
Although least squares is still unbiased, the probability of finding significant results when there was no effect of the single gene was very high and was 0.31 at generation 5. Again the animal model worked well.
Also with selection, if there is a real effect of the single gene its
effect will be overestimated by least squares but an animal model will
estimate it unbiasedly. In the following simulation, there was an effect
of the single gene
and selection was the best five of 20
males on phenotype.
Estimates of the additive (a) and dominance (d) effects of a single locus using an animal model (AM) and least squares (LS) under selection with complete dominance and a true effect of the single gene of a=1 and d=1.
| Gen. | AM | OLS | AM | OLS |
| 0 | .99 | 1.03 | 1.03 | |
| 3 | 1.00 | 1.09* | 1.02 | 1.04 |
| 5 | 1.00 | 1.20* | 1.04 | 1.04 |
The animal model gave unbiased estimates of both a and d. Although least squares gave unbiased estimates of d, by generation 5 estimates of a were overestimated by 20%.
Observations Classified as to Genotype With Error
If it is not practical or feasible to classify the individual as to major
genotype then one could resort to the usual genetic
evaluation system whereby the major gene is simply considered
as one of the many genes influencing the trait, and its effect is included
as part of
.
This, however, is not too satisfactory if the gene has
a large effect. Multivariate normality could no longer be
assumed legitimately, even prior to selection. The greater the effect of
the major gene, the greater will be the departures from
normality.
An alternative worth investigating is to consider the observations as arising from a mixture of two or more normally distributed populations. Mixture models, which are distinct from mixed models, were first considered by Pearson (1894) who used the method of moments to estimate parameters of a mixture of two normal densities; the means and variances of each population and the proportion of each population in the mixture. Maximum likelihood procedures have been developed for mixture models. Their application is demanding computationally, perhaps too demanding for application to large data sets but computational requirements are less if a common variance can be assumed for the populations. Applications have been made in the field of human genetics to distinguish between the effects of major genes and polygenes and these have been extended to large animal populations.
In the case of a major gene with two alleles, a
mixture of two (complete dominance) or three (incomplete dominance)
populations, each normally distributed, could be hypothesized. Evaluation
could be according to the model
It is possible to estimate the effects of genotype through an appropriate regression of phenotype on genotype probabilities. This regression is complicated by the fact that genotype status is calculated as a set of probabilities, and is therefore measured with error. In the absence of error, each probability is a certainty with a value of either 0 or 1 and we have the usual fixed incidence matrix. With probabilities of less than one we must account for the error.
The problem is most simply illustrated for just two genotypic classes as in a haploid organism. Simple linear regression of phenotype on probability of belonging to better genotype class is overestimated. This section from Kinghorn et al. (1993) derives an appropriate correction, developed from the ideas of Cochran (1968).
Consider the regression of phenotype (y) on probability of belonging to
the better of the two genotype classes (X). Then
The objective is to estimate
given y and X. This can be done
by first evaluating
For diploid organisms with two sites per locus and two alleles segregating
(one of these being a putative ``major gene'') there are two degrees of
freedom for estimation of genotypic effects. These are taken as the effect
of carrying one copy
and the effect of carrying two copies
of the major gene. The effect of carrying no copies is thus
part of the regression intercept.
The ``errored'' model is now
![]() |
(13) |
| (14) |
| = | |||
| = | |||
| = | |||
| = | (15) |
Thus
| = | |||
| = | (16) |
Fitting Polygenic Effects
Consider the model
| (17) |
Substituting
leads to the mixed model equation
![]() |
(18) |
The procedure requires data on the trait of interest plus identity of sire
and dam. Data involving missing pedigree information and missing
recordings are permitted for genotype probability estimation, and can be
used if a sufficiently flexible mixed model method is adopted for fitting
polygenic and fixed effects. A prior value for the frequency of the
putative major gene is required and starting values for the effects of one
and two copies of the major gene (
and
)
are required. These are changed by the analysis and converge to the true
values under ideal conditions.
A prior estimate of heritability is required for the calculation of
.
The first action is to calculate genotype probabilities given starting
values of
and
.
The heights of the
normal distributions at the ith individual's phenotype corrected for
estimated breeding value are, in arbitrary units,
Without using information from relatives, the probabilities for individual i of carrying j copies of the major gene are thus
For j = 2: Prob 2 = p2hi2/(p2hi2 + 2pqhi1 + q2hi0).
For j = 1: Prob 1 = 2pqhi1/(p2hi2 + 2pqhi1 + q2hi0).
For j = 0: Prob 0 = q2hi0/(p2hi2 + 2pqhi1 + q2hi0).
The next step is to fit (7.16) and (7.17) with
.
The
residual variance is
The elements of matrix
are calculated from genotype probabilities
as follows
Hoeschele (1988) has given ideas
similar to those presented to provide joint estimates of major gene effects,
gene frequencies and predictors of additive effects of polygenes. As
before the model is
The elements of
are the posterior probabilities conditional on
the data that the ith individual has major genotype j assuming that
,
and
are known. Approximations are presented
to estimate
.
The procedure is iterative starting with initial values of
,
,
and
and terminating when
,
,
and
equal
,
,
and
within some specified level of convergence. If
is not known, Hoeschele (1988) presented computing algorithms for
estimating
and
.
The methods of Hoeschele (1988) and Kinghorn et al. (1993) use a single estimate of polygenic breeding value for each animal irrespective of its genotype, and Hofer and Kennedy (1993) extended this to use three values for each animal depending on its genotype but independent of the genotypes of all other animals. Hofer and Kennedy (1993) compared their method to those of Hoeschele (1988) and Kinghorn et al. (1993) by simulation.
Phenotypic observations were generated by using the following mixed model
Three different sets of parameters were used. Only additive effects
of the major locus were considered although all of the methods compared allow
for dominance. In the first set of parameters 50% of the phenotypic variance
(variance due to major locus + polygenic variance + residual variance) was
due to genetic effects, 75% of the genetic variance was due to the major locus
and 25% was due to the polygenes. The frequency of allele A with major
effect was 25% in the base population, which resulted in an allele substitution
effect
of 1.0, i.e. genotype effects of 2.0 (AA), 1.0 (Aa) and 0
(aa). In parameter set 2 the allele frequency p was .5, but the genotype
effects as well as all other parameters were the same as in set 1. Thus
the variance due to the major locus was increased from .375 to .5, and the
phenotypic variance changed from 1.0 to 1.125. In parameter set 3 the
allele frequency p was .25 and 50% of the phenotypic variance was due to
genetic effects, as in parameter set 1, but the proportion of genetic variance
due to the polygenes was increased from 25% to 40% which resulted in
an allele substitution effect
of
.
In each of 10 herds, 20 base dams each had a record in year 1. A group of 20 base sires each with their own record in a common herd-year (e.g. test station) was mated to these base dams. Each sire was randomly mated to one dam in each herd. Each mating produced 5 progeny in year 2.
Table 7.1 shows the simulation results for the three parameter sets using all
three procedures when major locus genotypes were unknown. For parameter
sets 1 and 2 estimates of major locus effects
were close to the
true values or slightly underestimated with approximated maximum likelihood
(AML) of Hofer and Kennedy (1993), underestimated by about
20% with the method of Hoeschele (1988)
and overestimated by 25 to 30% with the method of Kinghorn et al. (1993).
For parameter set 3 estimates of major locus effects
were zero for
2 replicates using AML and for 21 replicates using the method of Hoeschele
(1988). Non-zero estimates of
were biased upwards with AML by 14%
and with the method of Kinghorn et al. (1993) by 47%. Both AML and the
method of Hoeschele (1988) showed a large variability of the non-zero
estimates of major locus effects for parameter set 3. When the true allele
frequency was 0.25 the allele frequency p was substantially underestimated
with AML, but estimated quite well with the two other methods. Correlations
between true and predicted breeding values were similar for AML and the
method of Hoeschele (1988), but zero for the method of Kinghorn et al.
(1993). For parameter sets 1 and 2 the correlations between true (
)
and estimated (
)
major locus effects were
similar for all three methods. When major locus effects were smaller
(parameter set 3) these correlations were largest with the method of
Kinghorn et al. (1993). Predicted breeding values were positively
correlated to estimated major locus effects
with AML and to a larger extent with the method of Hoeschele (1988). Using
the method of Kinghorn et al. (1993) these correlations were strongly
negative.
| AML | Hoeschele | Kinghorn | ||||
| Mean | SD | Mean | SD | Mean | SD | |
| Parameter set 1 | ||||||
|
|
2.058 | .237 | 1.659 | .189 | 2.607 | .132 |
|
|
1.067 | .095 | .744 | .172 | 1.302 | .072 |
| .115 | .033 | .235 | .044 | .246 | .033 | |
|
|
.394 | .062 | .405 | .078 | .023 | .125 |
|
|
.684 | .061 | .720 | .058 | .701 | .059 |
|
|
.385 | .128 | .632 | .101 | -.627 | .028 |
| Parameter set 2 | ||||||
|
|
1.836 | .085 | 1.628 | .089 | 2.495 | .074 |
|
|
.894 | .086 | .779 | .101 | 1.227 | .068 |
| .497 | .103 | .498 | .048 | .496 | .043 | |
|
|
.377 | .076 | .375 | .068 | -.060 | .144 |
|
|
.752 | .035 | .752 | .035 | .729 | .040 |
|
|
.614 | .085 | .711 | .066 | -.647 | .031 |
| Parameter set 31 | ||||||
|
|
2.042 | .686 | 1.779 | .914 | 2.664 | .114 |
|
|
1.019 | .264 | .272 | .257 | 1.300 | .074 |
| .041 | .024 | .181 | .101 | .253 | .027 | |
|
|
.468 | .060 | .457 | .084 | .001 | .126 |
|
|
.455 | .147 | .486 | .134 | .609 | .078 |
|
|
.236 | .130 | .420 | .178 | -.649 | .029 |
In summary, AML generally slightly understimates major locus
effects
and
seriously underestimates allele frequency p when the true frequency
is 0.25. The underestimation of p leads to increased estimates of
.
The method of Hoeschele (1988) consistently underestimates major
locus effects
which is in agreement with her simulation results.
For smaller allele effects (parameter set 3), although still quite large,
most of the estimates of
were zero, indicating that the genotype
effects have to be large in order to be recognized.
With the method of Kinghorn et al. (1993) estimates of the allele frequency p were generally closer to the true values than with the two other procedures. However, major locus effects were overestimated and the correlations between true and predicted breeding values were close to zero.
Clearly, none of the methods is very satisfactory for a separate genetic evaluation for the major locus and the polygenes. In this study only large effects were considered. AML and especially the method of Hoeschele (1988) were unable to detect smaller effects than used with parameter set 3.
More work is required in this area.
This LaTeX document is available as postscript or asAdobe PDF.
Larry Schaeffer