This LaTeX document is available as postscript or asAdobe PDF.
Animals are commonly observed for more than one trait because many traits affect overall profitability. Dairy cattle, for example, are observed for production traits (milk, fat, and protein yields), conformation traits (too many to list), calving ease, milking speed, temperament, survival, and disease susceptibility. Beef, swine, and sheep are observed for a number of weight traits, reproductive performance, litter size, carcass traits, and others. A multiple trait (MT) model is one in which two or more traits are analyzed simultaneously in order to take advantage of genetic and environmental correlations between traits.
Multiple trait models are useful for traits where the difference between genetic and residual correlations are large ( e.g. greater than .5 difference ) or where one trait has a much higher heritability than the other trait. In the latter case, traits with low heritability tend to gain more in accuracy than high heritability traits, although all traits benefit to some degree from the simultaneous analysis. Another use of MT models is for traits that occur at different times in the life of the animal, such that culling of animals results in fewer observations on animals for traits that occur later in life compared to those at the start. Consequently, animals which have observations later in life tend to have been selected based on their performance for earlier traits. Thus, analysis of later life traits by themselves could suffer from the effects of culling bias, and the resulting EBV could lead to errors in selecting future parents. An MT analysis that includes all observations on an animal upon which culling decisions have been based, has been shown to account for the selection that has taken place, and therefore gives unbiased estimates of breeding values for all traits.
MT models do not offer great increases in accuracy for cases where heritabilities of traits are similar in magnitude, and where both genetic and residual correlations are relatively the same. However, if culling bias exists, then an MT analysis should be performed even if the parameters are similar. If all animals are observed for all traits, then there would be no need to worry about culling bias because all traits would be equally affected.
An MT analysis relies on the accuracy of the genetic and residual correlations that are assumed. If the parameter estimates are greatly different from the underlying, unknown true values, then an MT analysis could do as much harm as it might do good.
Lastly, the researcher needs to consider the increased costs of computing MT analyses. Programs are more complicated, more memory and disk storage are usually needed, and verification of results might be more complicated. These have to be balanced against the benefits of an MT analysis. If culling bias is the main concern, then an MT model must be used regardless of the costs or no analysis should be done at all, except for the traits not affected by culling bias.
Consider two traits with a single observation per trait on animals. A model should be specified separately for each trait. Usually, the same model is assumed for each trait, and this can greatly simplify the computational aspects, but such an assumption may be unrealistic in many situations.
Let the model equation for trait 1 be
For example, y1ij could be a trait like birthweight, so that B1i could identify animals born in the same season. Trait 2 could be yearling weights and C2i could identify contemporary groups of animals of the same sex, same herd, and same rearing unit within herd.
Because the two traits will be analyzed simultaneously, the variances
and covariances need to be specified for the traits together.
For example, the additive genetic variance-covariance (VCV) matrix
could be written as
When simulating data for a multiple trait problem it is best to
generate observations for all animals for all traits. Then
one can go through the simulated data and randomly delete observations
to simulate a missing data situation, or selectively delete
observations to imitate culling decisions. Another simplification
is to assume that the model for each trait is the same, and then
for a factor that does not belong with a given trait just make the
true values of levels for that factor and trait equal to zero.
In matrix form, the model equation for one animal would be
B11 = 6.7 and
B12 = 6.3 for trait 1, and because factor
B is not in the model for trait 2, then B21=0 and B22=0.
Similarly, C21=25, C22=40, and C23=55 for trait 2, and
because factor C is not in the model for trait 1, then
C11=0, C12=0, and C13=0. Suppose the animal is a
base animal, then bii=1 and the parent averages for traits 1 and
2 are assumed to be zero, then the observations would be
The following data (rounded off) were simulated according to the preceeding scheme and parameters.
|Animal||Sire||Dam||B-level||C-level||Trait 1||Trait 2|
To simulate selection, assume that all animals had trait 1 observed, but for any animal with a trait 1 value below 3.0, then their trait 2 observation was removed. Four trait 2 observations were deleted, giving the results in the table below.
|Animal||Sire||Dam||B-level||C-level||Trait 1||Trait 2|
Organize the data by traits within animals. With two traits there
are three possible residual matrices per animal, i.e.,
Again, to simplify construction of the MME, pretend that both traits
have the same model equation, so that
Similarly for animal 2,
For animal 3,
The remaining animals are processed in the same manner. The resulting
equations are of order 34 by 34. To these
must be added to the animal by animal submatrix
in order to form the full HMME. However, solutions for the B-factor
for trait 2 are not needed because the B-factor does not affect
trait 2, and solutions for the C-factor for trait 1 are not needed
because the C-factor does not affect trait 1. Therefore, remove
rows (and columns) 2, 4, 5, 7, and 9, or if an iterative solution
is being computed, then require that the solutions for B21,
B22, C11, C12, and C13 are always equal to zero.
The solutions to the HMME, for this example, were
|Animal||Sire||Dam||Trait 1||Trait 2|
The correlation between the animal additive genetic solutions for traits 1 and 2 was .74 which is greater than the .52 assumed in the original .
Partitioning the MT solution
Some insight into the workings of multiple trait HMME can be gained by partitioning an animal's additive genetic solution for any one trait. Take animal 10 from the previous example, because this animal has a record on both traits and progeny, plus parents are known. The partitioning results in contributions attributable to each trait weighted by the genetic correlations between traits. In addition to the Data, Parent Average, and Progeny contributions, there is also a contribution from the direct genetic solutions for the other traits. The partitions will be presented in tabular form.
|Contribution||Trait 1||Trait 2|
Summing all of the above pieces gives . There would be additional columns if more traits were included in the analysis, i.e. one for each additional trait. One could interpret the results in the above table as follows. The Data contribution from trait 1 to the trait 1 solution was .1273, but the trait 2 correlated information said that it should be .0611 lower. The parent average for trait 1 contributed -.1607, but the correlated parent average from trait 2 indicated it should be .0819 higher. The Progeny contribution from trait 1 was nearly equal and opposite to the contribution from trait 2. The contribution from the direct genetic solution for trait 2 was seemingly large.
Another way to look at the table is to combine all of the partitions
of the trait 2 column into one figure, because the Data, Parent
Average, and Progeny contributions from trait 2 are components of
the additive genetic solution for trait 2, but they are weighted
slightly differently, that is,
There is another way to construct the MME without the need of
forming different inverses of
for missing traits.
If a trait is missing, then that
observation is assigned to its own contemporary group in the
model for that trait. In the example data there were four missing
observations. Animal 1 would be assigned to C24,
animal 2 to C25, animal 6 to C26 and animal 11 to C27,
respectively. In this case only trait 2 observations were missing.
If trait 1 observations were also missing, then animals
would be assigned to separate levels of factor B. In this way,
only one residual VCV matrix is needed, i.e.
represent the design matrix for
fixed effects (factors B and C) for either trait. Note the
four extra columns for factor C for the animals with missing
trait 2 observations.
To prove that this trick will work, take
a Gaussian elimination (i.e. absorption) of the row and column
corresponding to the missing trait, say trait 2,
This trick is not very practical, for example, when one trait has 1 million observations and trait 2 has only 100,000 observations, then there would be 900,000 extra single observation subclasses created for trait 2. However, if the percentages of missing observations are relatively small, or if many traits are being considered, then pretending all observations are present may make programming easier.
Estimation of Covariances
Derivative free REML is one option for estimating variances and covariances in a multi-trait situation. The EM algorithm is not suitable due to the requirement for the traces of inverse elements that are needed. Even DF REML takes considerably more time as the number of parameters to be estimated increases.
Another option is the Bayesian approach, where operations are performed
in t dimensions, for t being the number of traits. Thus, for a
solution to the MME, the
vector for any one fixed effect,
for example, would be
vector of animal solutions for
trait i, then form
A difficult part of a multiple trait analysis, when missing traits are
possible, is the calculation of the appropriate residual matrix of
sums of squares and cross products. The residual effect for any
one trait is
If you use Bruce Tier's MME with the single observation contemporary groups for missing trait observations, then the residuals can be calculated directly by using zero as the observation for the missing traits and using the solutions for the single observation contemporary groups. This gives the exact same residual estimates as the above methodology. Therefore, Tier's approach is handy for the Gibb's sampling algorithm.
Once the residuals are calculated for all animals with records, then
The previous description applies to general multiple trait models where the model for each trait can be different and where animals need not be observed for each trait. This is frequently the true situation when a multiple trait analysis is utilized in order to account for selection bias. However, there are special situations where multiple trait analyses can be greatly simplified in terms of the necessary calculations. One special case is when all traits have the exact same model and when all animals are observed for all traits, and only additive genetic and residual effects are random in the model. In this case a canonical transformation can be applied to the traits. That is, a matrix can be found such that and that is a diagonal matrix. Thus, the transformed traits are considered to be uncorrelated, and consequently each transformed trait can be analyzed independently of the others. At the end, the results can be back-transformed to the original scale using . The steps to find are as follows.
The canonical transformation cannot be applied when the model contains more than additive genetic and residual variance-covariance matrices because it is impossible to diagonalize more than two matrices. However, there are approximation methods for diagonalizing more than two matrices at a time which have been applied to multiple trait situations. These methods are not covered in these notes.
This LaTeX document is available as postscript or asAdobe PDF.Larry Schaeffer