next up previous


This LaTeX document is available as postscript or asAdobe PDF.

Random Regression Models
For Survival Analyses

L. R. Schaeffer
January 24, 2002

INTRODUCTION

Random regression models (RRM) have been applied successfully to the analysis of test day production records of dairy cattle in Canada. RRM can be applied to any trait that is observed on an animal over time. Survival is a trait that is implicitly observed many times during the life of any animal. Veerkamp, Brotherstone, and Meuwissen (1999) suggested the use of RRM for survival analyses. This proposal is based on their suggestion, but is not exactly the same model as theirs.

At any given point in time an animal is either alive or dead (1 or 0). The survival 'curve' of an individual is a straight line with the value of 1 from birth or first calving up until it is culled whereupon the value of 0 is assigned from that point until the maximum age limit is reached. The survival 'curve' of a population of cows represents the average survival rate at various points in time, but it starts at 1 and decreases to zero (an age when all animals would be dead).

Survival 'curves' are expected to differ depending on the production level of the cow and depending on the type classification of the cow. The year and season of birth could also have an influence on the shape of the survival curve, and lastly, herd environment and management could have an effect on survival.

The objective of this report is to propose a RR model for the analysis of survival data. A small, simplified example is given.

DATA

The necessary items needed for each cow are Cow ID, Herd ID, Year-Season of Birth, Birthdate, Date Culled, Age When Culled, Latest Production EBVs, and Latest Conformation Scores. Reasons for culling may also be necessary at some point.

If Date Culled is blank, then the cow is still active. From this data an observation vector, ${\bf y}$, can be constructed for the cow. If ages 28 to 100 are considered in the analysis, for example, then ${\bf y}$ is a vector of length 73. If the cow was culled at age 40, then the first 12 elements of ${\bf y}$ would be 1 and the remaining 61 elements would be zero. If a cow was not yet culled and was 40 months of age, then ${\bf y}$ would be of length 13 only and all elements equal to 1. Every animal would have a large number of 'observations'.

Production EBVs would be assigned to one of five levels depending upon the year of birth of the cow, so that roughly equal numbers of cows are in each level. The assumption is that survival depends on the genetic level of production. Similarly, conformation traits would be assigned to one of five levels by year of birth. The 1 to 9 categories could be condensed to five. Survival depends on the classifications of the cow. Particular traits may have more influence than others.

MODEL

An observation, 0 or 1, is observed on an animal at age iborn in year-season j, and herd k within year-season j.


\begin{eqnarray*}y_{ijklmn} & = & (YS:H)_{jk} + (YSPL:A)_{lji} + (YSCL:A)_{mji} ...
...}^{c} a_{t}x^{t}_{i} + \sum_{t}^{c} p_{t}x^{t}_{i}
+ e_{ijklmn}
\end{eqnarray*}


where (YS:H)jk is a random herd within year-season of birth effect; (YSPL:A)lji is a fixed effect of the age at observation within year- season by production level subclasses; (YSCL:A)mji is a fixed effect of the age at observation within year-season by classification level subclasses; at are the animal additive genetic random regression coefficients; pt are the animal permanent environmental random regression coefficients; xti are standardized ages at observation, (-1 to 1), to the power t; and eijklmn are random residual effects. Thus, survival curves are allowed to change or differ from one year-season to the next as well as according to production levels and conformation scores. The appropriate order of the random regressions for survival needs to be determined. The same order is assumed for both genetic and permanent environmental effects, but this is not absolutely necessary.

Let ${\bf a}$ represent the vector of at values for all animals, ${\bf p}$ represent the vector of pt values for all animals that were observed, and ${\bf e}$ represent the vector of residual effects, then

\begin{displaymath}Var \left( \begin{array}{c} {\bf a} \\ {\bf p} \\ {\bf e}
\en...
...P} & {\bf0} \\
{\bf0} & {\bf0} & {\bf R} \end{array} \right), \end{displaymath}

where ${\bf A}$ is the additive genetic relationship matrix among animals, ${\bf G}$ is a matrix of order c, the covariance matrix of the additive genetic random regression coefficients, ${\bf P}$ is a matrix of order c, the covariance matrix of the permanent environmental random regression coefficients, and ${\bf R}$ is a diagonal matrix with elements ri which is the residual variance for the ith age at observation.

Note that at age 28 the majority of animals (if not all) will have an observation of 1, which means that the phenotypic variance would be zero or very close to it. As animals age, more observations become 0, and the phenotypic variance increases until there are an equal number of 0's and 1's. After this point, there are more 0's and fewer 1's so that the phenotypic variance decreases until it reaches 0, at which point all animals have been culled. If first lactations begin mostly at 18 to 24 months of age, then 28 months would be near the end of the first lactation and the phenotypic variance should not be 0.

SMALL EXAMPLE

In order to simplify the illustration of the methods, let ages be exchanged with lactation numbers (1 to 5 only). Below are the data vectors for 16 animals. In total there are 80(=5$\times$16) observations. Assumptions are that these 16 animals were all born in the same YS in one herd and belong to the same levels of production and conformation.

Cow Sire Dam Lactation
      1 2 3 4 5
1 C X 1 1 0 0 0
2 C Y 1 1 1 0 0
3 C Z 1 0 0 0 0
4 A W 1 1 1 1 1
5 D X 1 1 0 0 0
6 D Z 1 1 0 0 0
7 D W 0 0 0 0 0
8 A T 1 1 1 1 1
9 A Y 1 1 1 0 0
10 B W 1 1 1 1 1
11 D Y 0 0 0 0 0
12 B X 1 1 0 0 0
13 A Z 1 1 1 1 0
14 B T 1 0 0 0 0
15 B U 1 1 1 1 1
16 B Y 1 1 1 1 0

The simplified model is

yijk = Li + (a0j+a1jxi + a2jxi2) + (p0+p1jxi+p2jxi2) + eijk,

where Li is a lactation number effect (the fixed survival curve), and the other elements are as described earlier. The standardized values in this example are x1=-1, x2=-.5, x3=0, x4=.5, and x5=1. The parameters of the random regression coefficients and residual variances are

\begin{eqnarray*}{\bf G} & = & \left( \begin{array}{lll} 0.4 & 0.07 &
-0.018 \\ ...
...bf R} & = & diag( 0.0039 \ 0.0661 \ 0.0784 \ 0.0423 \
0.0289 ).
\end{eqnarray*}


These values were estimated from the above data using Gibbs sampling. The heritability for survival to lactation 3 would be

\begin{displaymath}h^{2} = (0.4)/(0.4 \ + \ 0.2779 \ + \ 0.0784) \ = \ 0.53. \end{displaymath}

The solutions from the mixed model equations were

\begin{displaymath}\left( \begin{array}{c} \hat{L}_{1} \\ \hat{L}_{2} \\
\hat{L...
...0.70537 \\
0.44319 \\ 0.30729 \\ 0.17269 \end{array} \right), \end{displaymath}

and the solutions for the sires and other animals are shown in the next two tables for additive genetic and permanent environmental RR coefficients, respectively.

Additive Genetic Estimates
Animal ID Sire Dam a0 a1 a2 EBV1 EBV2 EBV3
A     .360 .114 -.085 .562 .803 .235
B     .138 .036 .021 .368 .581 .038
C     -.060 -.135 .039 .017 .383 -.232
D     -.230 .053 .009 .005 .213 .073
T     .151 .077 -.015 .386 .594 .127
U     .245 .125 -.027 .516 .688 .208
W     .117 .059 -.011 .338 .560 .097
X     .017 -.018 .002 .174 .460 -.029
Y     .040 -.006 -.012 .195 .483 .000
Z     -.010 -.023 .018 .158 .433 -.048
1 C X -.075 -.132 .039 .005 .368 -.227
2 C Y .003 -.104 -.003 .069 .446 -.154
3 C Z -.143 -.144 .079 -.035 .300 -.275
4 A W .385 .186 -.057 .687 .828 .322
5 D X -.134 -.073 .029 -.005 .309 -.131
6 D Z -.143 -.074 .036 -.008 .300 -.138
7 D W -.201 .157 -.011 .118 .242 .244
8 A T .394 .191 -.058 .700 .837 .330
9 A Y .127 -.035 -.054 .211 .570 -.012
10 B W .312 .168 -.011 .642 .755 .260
11 D Y -.220 .137 -.013 .077 .223 .215
12 B X -.024 -.082 .035 .102 .419 -.149
13 A Z .237 .033 -.082 .361 .680 .111
14 B T -.045 -.065 .061 .124 .398 -.143
15 B U .348 .186 -.016 .691 .791 .291
16 B Y .088 .019 -.014 .266 .531 .039

Permanent Environmental Estimates
Animal ID Sire Dam p0 p1 p2
1 C X -.031 -.069 .021
2 C Y .027 -.063 -.025
3 C Z -.080 -.058 .064
4 A W .151 .132 -.003
5 D X .035 -.124 .025
6 D Z .040 -.122 .024
7 D W -.290 .167 -.009
8 A T .143 .127 -.003
9 A Y -.060 -.132 -.015
10 B W .192 .155 -.010
11 D Y -.270 .184 -.010
12 B X -.078 -.115 .024
13 A Z .061 -.056 -.067
14 B T -.163 -.131 .070
15 B U .161 .137 -.009
16 B Y .163 -.034 -.078

There are several ways to convert these numbers into useful EBVs. One way is to determine differences among animals at a fixed age, such as lactation number five. Then xi=1, and

\begin{eqnarray*}EBV1 & = & a_{0} \ + \ a_{1}(1) \ + \ a_{2}(1)^{2} \ + \ 0.173 \\
& = & a_{0} \ + \ a_{1} \ + \ a_{2} \ + \ 0.173
\end{eqnarray*}


which would be the survival percentages at lactation 5. Alternatively, an EBV could be computed for lactation 3 where xi=0, then

\begin{eqnarray*}EBV2 & = & a_{0} \ + \ a_{1}(0) \ + \ a_{2}(0)^{2} \ + \ 0.443 \\
& = & a_{0} \ + \ 0.443.
\end{eqnarray*}


Lastly, the difference between two ages (lactations) could indicate the percentage change in survival between two ages. An example would be the difference between lactation 4 minus lactation 1,

\begin{eqnarray*}EBV3 & = & a_{1}(-0.5) \ + \ a_{2}*(-0.75).
\end{eqnarray*}


EBV3 may be a good measure for sires, but a cow that was culled in lactation 1 does not have any change in survival between lactation 1 and 4. Therefore, EBV3 is not a good measure for cows.


next up previous

This LaTeX document is available as postscript or asAdobe PDF.

Larry Schaeffer
2002-01-24