This LaTeX document is available as postscript or asAdobe PDF.
Matrix Algebra
L. R. Schaeffer, March 1999
Definitions
- 1.
- Scalars.
A scalar is a single quantity usually designated by an italicized lower case
letter, for example a, r, and k.
- 2.
- Matrix.
A matrix is a group of elements arranged in rows and columns. Elements can be
scalars, mathematical expressions, or other matrices. Upper case letters in bold face type denote matrices. Examples of matrices are
and
The order or dimension of a matrix is the number of rows and
columns.
- 3.
- Vectors.
A column vector is a matrix with one column, and a row vector is a
matrix with one row. Usually
if the word vector is used, it is implied that the vector is a
column vector. Vectors are commonly denoted by lower case letters
in bold face type, but this is not a universal practice.
- 4.
- Transposition.
Transposition is the formation of a new matrix by making rows of the
original matrix into columns of the new matrix in corresponding order.
The transpose of a column vector is a row vector, and the transpose of
a matrix of order
results in a new matrix of order
.
If
is the original matrix, then the
transpose is denoted as
in these notes, but in some
places the transpose is denoted as
.
- 5.
- Diagonals.
The diagonal elements of a matrix are those elements whose row and column
numbers are the same. For example, if
then the diagonal elements are 8 and 7.
Off-diagonal elements of a matrix are all of the other elements
excluding the diagonals.
Types of Matrices
- 1.
- Rectangular.
A matrix is rectangular if the number of rows does not equal the number
of columns. All vectors are rectangular matrices.
- 2.
- Square.
Square matrices have the same number of rows as columns.
- 3.
- Symmetric.
Symmetric matrices are also square matrices. The element in the
ith row and jth column is equal to the element in the
jth row and ith column, for all values of i and
j.
The transpose of a symmetric matrix is that same matrix, i.e.
.
Take any matrix, say
and form
the product,
,
then that product is symmetric and
square.
- 4.
- Diagonal.
A diagonal matrix is a square, symmetric matrix in which all of the
elements are zero except the diagonal elements. A diagonal matrix
can be written as
where di are the diagonal elements.
- 5.
- Identity.
An identity matrix is a special diagonal matrix in which all of the
diagonal elements are equal to one. An identity matrix is usually
represented by
.
Because is it a diagonal matrix, an
identity matrix is also square and symmetric.
- 6.
- Null.
A null matrix is a matrix of any dimension in which all elements are
equal to zero, and is denoted by
.
- 7.
- Matrices With Only 1's.
A column vector of length n with every element equal to 1,
is usually denoted as
.
A matrix with r rows and c columns and
all elements equal to 1 is denoted as
.
- 8.
- Triangular.
Triangular matrices are square matrices with either all off-diagonal
elements above or below the diagonals equal to zero. An upper
triangular matrix has all elements below the diagonal equal to zero,
and a lower triangular matrix has all elements above the diagonals
equal to zero.
- 9.
- Tridiagonal.
A tridiagonal matrix is a square matrix with all elements equal to
zero except the diagonals and the elements immediately to the left
and right of the diagonal. An example is shown below.
Addition of Matrices
Matrices are conformable for addition if they have the same order.
The resulting sum is a matrix having the same number of rows and columns
as the two matrices to be added. Matrices that are not of the same order
cannot be added together. If
and
,
then
For example, let
then
Subtraction is the addition of two matrices, one of which has all elements
multiplied by a minus one (-1). That is,
Multiplication of Matrices
Two matrices are conformable for multiplication if the number of
columns in the first matrix equals the number of rows in the second matrix.
If
has order
and
has order
,
then the product
exists only if q=m. The
product matrix has order
.
In general,
does
not equal
,
and most often the product
may not even exist because
may not be conformable for
multiplication with
.
Thus, the ordering of matrices in a
product must be carefully and precisely written.
The computation of a product is defined as follows: let
and
and q=m, then
As an example, let
then
Transpose of a Product
The transpose of the product of two or more matrices is the product of
the transposes of each matrix in reverse order. That is, the transpose
of
,
for example, is
.
The reader should
verify that the result is conformable for multiplication, and
that the result has the correct dimensions.
Special Products
A matrix, say
,
is idempotent if the product of the matrix
with itself equals itself, i.e.
.
This implies that the matrix must be
square, but not necessarily symmetric.
A matrix is nilpotent if the product of the matrix with itself
equals a null matrix, i.e.
,
then
is
nilpotent.
A matrix is orthogonal if the product of the matrix with its
transpose equals an identity matrix, i.e.
,
which
also implies that
.
Some Rules on Multiplication
- 1.
- Multiplication of a matrix by a scalar results in multiplying
every element of the matrix by that scalar.
- 2.
-
.
- 3.
-
.
- 4.
-
.
Traces of Square Matrices
The trace is an operation that is applied to a square matrix and is
denoted by the letters tr. The trace operation is the sum of
the diagonal elements of a matrix. The sum is a scalar quantity. Let
then the trace is
The trace of the product of conformable matrices has a special property
known as the rotation rule of traces. That is,
The traces are equal, because they are scalars, even though the
dimensions of the three products might be greatly different.
Euclidean Norm
The Euclidean Norm is a matrix operation usually used to determine the
degree of difference between two matrices of the same order. The norm
is a scalar and is denoted as
.
For example, let
then
Other types of norms also exist.
Other Useful Matrix Operations
- 1.
- Direct Sum.
For matrices of any dimension, say
,
,
...
,
the direct sum is
- 2.
- Kronecker Product.
The Kronecker product, also known as the direct product, is where
every element of the first matrix is multiplied, as a scalar,
times the second matrix. Suppose that
is a matrix of order
and that
is of order
,
then the
direct product of
times
is
Notice that the dimension of the
example product is
.
- 3.
- Hadamard Product.
The Hadamard product exists for two matrices that are conformable for
addition. The corresponding elements of the two matrices are
multiplied together. The order of the resulting product is the same
as the matrices in the product. For two matrices,
and
of order
,
then the Hadamard product is
Elementary Operators
Elementary operator matrices are identity matrices that have been modified
by one of three methods. The product of an elementary operator matrix with
any other matrix does not alter the rank of that matrix.
Elementary operator matrices
reduce a matrix to either a triangular or a diagonal form.
The reduction process may involve many elementary
operator matrices multiplied together. To illustrate the three types of
elementary operator matrices and the process of reducing a matrix, begin
with an example matrix,
,
where
The first type of elementary operator matrix has one of the
diagonal elements of an identity matrix replaced with a constant other
than 1. Let
be an elementary operator where the
first diagonal element has been changed to .25. Pre-multiplication of
by this matrix gives
,
where
The second type of elementary operator matrix interchanges
rows or columns of a matrix. Let
be an elementary
operator where rows i and j are interchanged. Pre-multiplication of
by this matrix gives
where
The third type of elementary operator matrix combines
the elements of two rows or columns of
a matrix by addition or subtraction.
Let
be an elementary operator where
c times row i is added to row j. Pre-multiplication of
by this matrix gives
where
By continued pre-multiplication of
by other
elementary operators,
may be reduced to an upper
triangular matrix. Further post-multiplication by another set
of elementary operators can reduce
to a diagonal
matrix.
Matrix Inversion
By definition, an inverse of a square matrix
,
denoted by
,
is a matrix which when pre- or post- multiplied
times the original matrix yields an identity matrix. That is,
Any square matrix with a nonzero determinant can be inverted. Thus,
the first step to matrix inversion is the calculation of the
determinant of the matrix. The determinant is a scalar quantity.
For a
matrix, say
then the determinant is
For a
matrix, the determinant can be reduced to a
series of determinants of
matrices. For example, let
then
The general expression for the determinant of a matrix is
where
is of order n, and
is a
minor submatrix of
resulting from the deletion
of the ith row and jth column of
.
Any row
of
may be used to compute the determinant because the
result should be the same for each row. Columns may also be used
instead of rows. The reader may verify this assertion.
If the determinant is non-zero, then proceed to the next step which
is to find the matrix of signed minors, known as the adjoint
matrix. Using the same matrix,
above, the minors and
their determinants are as follows:
The adjoint matrix of signed minors,
is then
The inverse of
is then
If the determinant is zero, then the inverse is not defined or does
not exist. A square matrix with a non-zero determinant is said to
be nonsingular. Only nonsingular matrices have inverses.
Matrices with zero determinants are called singular and do
not have an inverse.
An important result is that the inverse of an inverse matrix is
equal to the original matrix. That is,
The inverse of the product of two or more nonsingular matrices follows
a rule similar to that for the transpose of a product of matrices.
Let
be three nonsingular matrices,
then
and
Generalized Inverses
Suppose we have a set of equations,
,
then a solution
for
is obtained by pre-multiplying both sides of the equation
by
provided that
is a nonsingular matrix.
Then
.
Often, however,
is a
singular matrix, i.e. the determinant is zero and the inverse does not
exist. Solutions to the set of equations can still be calculated using
a generalized inverse of
.
A generalized inverse,
denoted as
,
is a matrix that satisfies the following
expression
In general, neither
are
equal to
except in the case when
is nonsingular.
There are an infinite number of generalized inverses for any one
singular matrix. A unique generalized inverse is the Moore-Penrose
inverse which satisfies the following conditions:
- 1.
-

- 2.
-

- 3.
-
and
- 4.
-

Usually, a generalized inverse that satisfies only the first condition
is sufficient for practical purposes.
- Linear Independence.
A square matrix with a nonzero determinant is a matrix in which the rows
and columns of the matrix are linearly independent. If
is
the matrix, then linear independence means that no vector, say
,
exists such that
except for
.
A square matrix with a zero determinant implies that at least one non-null
vector,
,
exists such that
To compute a generalized inverse of a matrix that has a zero determinant
requires that we know the number of linearly independent rows and columns
in that matrix. The number of linearly independent rows or columns is
called the rank of a matrix and is a term that applies to
rectangular as well as square matrices.
- Rank.
Elementary operators are used to determine the rank of a matrix. Reduction
of the matrix to a diagonal matrix is called reduction to canonical
form, and the reduced matrix is called the canonical form under
equivalence. If the matrix is square and symmetric, then the reduction
of the matrix to a diagonal form with either ones or minus ones on the
diagonals is called congruent reduction and the resulting matrix
is the canonical form under congruence. Usually, only a reduction
to an upper triangular form is necessary to determine the rank of the
matrix.
Let
The rank of a matrix can not be greater than the minimum of
the number of rows or columns, whichever is smaller. In the example
above,
has 3 rows and 5 columns, and therefore the rank
of
can not be greater than 3. To find the exact rank,
elementary operator matrices are used. Let
be the
first elementary operator matrix that will zero out the first elements
in the second and third rows. Premultiplication of any matrix by
elementary operator matrices does not change the rank of that matrix.
Let
then
Now use
to subtract the third row from the second
row, so that
and
The rank is the number of non-zero diagonal elements of the reduced
matrix, i.e.
.
Elementary operators of order
would be postmultiplied
times
in order to reduce the matrix to a diagonal form,
and the rank would still be the number of non-zero diagonal elements.
- Definitions.
- Full-row rank
- If
has order
with rank
equal to m, then
has full row rank.
- Full-column rank
- A matrix with rank equal to the number of
columns has full-column rank.
- Full rank
- A square matrix with rank equal to the number of
rows or columns has full rank. A full rank matrix is
nonsingular, has a non-zero determinant, and has an inverse.
- 1.
- A null matrix has a rank of zero.
- 2.
- A J-matrix has a rank of one.
- 3.
- The rank of an idempotent matrix is equal to its trace.
- 4.
- If
has rank r and
has rank q, and
the two matrices are conformable for multiplication, then the
product,
,
has a maximum possible rank equal to
the lesser of r or q.
- Consequences.
The fact that a matrix has rank that is less than the number of rows
or columns in the matrix means that we can partition the matrix into
a square matrix of order equal to the rank and that has rank equal to
that matrix, and into other matrices of the remaining rows and columns.
The other rows and columns are linearly dependent upon the rows and
columns in that square matrix. That means that they can be formed from
the rows and columns that are linearly independent.
Let
be a matrix of order
with rank r, and
r is less than either p or q, then there are p-r rows of
and q-r columns which are not linearly independent.
Partition
as follows:
such that
has order
and rank of r.
Re-arrangement of rows and columns of
may be needed to
find an appropriate
.
has order
,
has order
,
and
has order
.
There exist matrices,
and
such that
and
and
To illustrate, let
and the rank of this matrix is 2. A
full rank submatrix
within
is the upper left
matrix. Let
then
where
where
and
This kind of partitioning of matrices with less than full rank is
always possible. In practice, we need only know that we can
perform this kind of partitioning, but it is usually not necessary
to derive
and
explicitly.
- Calculations.
For a matrix,
,
of less than full rank, there is an infinite
number of possible generalized inverses, all of which would
satisfy
.
However, only one
generalized inverse needs to be computed in practice. A method to
derive one particular type of generalized inverse has the following
steps:
- 1.
- Determine the rank of
.
- 2.
- Obtain
,
a square, full rank submatrix of
with rank equal to the rank of
.
- 3.
- Partition
as
- 4.
- Compute the generalized inverse as
If
has order
,
then
must have order
.
To prove that
is a generalized inverse
of
,
then multiply out the expression
From the previous section,
so that
- Solutions to Equations.
Because there are an infinite number of generalized inverses to a
matrix that has less than full rank, then it logically follows that
for a system of consistent equations,
,
where
the solutions are computed as
,
then there would also be an infinite number of solution vectors for
.
Having computed only one generalized inverse, however, it
is possible to compute many different solution vectors.
From Searle(1971 page 9, Theorem 2), if
has q columns and if
is one generalized
inverse of
,
then the consistent equations
have solution
where
is any arbitrary vector of length q.
The number of linearly independent solution vectors, however, is
(q-r+1).
It is also possible to generate other generalized inverses of the same
matrix. If
is a generalized inverse of
then so
is
for any
and
.
Pre- and post- multiplication of
by
shows that this is so.
- Special Generalized Inverses.
In linear models analyses, the product
occurs
frequently where
is a matrix that usually has more
rows than it has columns and has rank less than the number of
columns, and
is a square matrix that is usually
diagonal. Generalized inverses of this product matrix have special
features that are used in the derivation and proof of various
statistical procedures. Let
be a matrix of order
with rank r. The product,
has order
and is symmetric with rank r. Let
represent any
generalized inverse of the product matrix, then the following results
are true.
- 1.
is not necessarily a symmetric matrix, in which case
is also a generalized inverse of the product matrix.
- 2.
-
or

- 3.
-
is a generalized inverse of
.
- 4.
-
is always symmetric and unique for all
generalized inverses of the product matrix,
.
- 5.
- If
=
for some
,
then
.
The proofs of the above results can be found in Searle (1971). These
results are very important.
Eigenvalues and Eigenvectors
There are a number of square, and sometimes symmetric, matrices involved in
statistical procedures that must be positive definite. Suppose that
is any square matrix then
is positive definite if
is always greater than
zero for all vectors,
.
is positive semi-definite if
is greater
than or equal to zero for all vectors
,
and for at least
one vector
,
then
.
is non-negative definite if
is either
positive definite or positive semi-definite.
To determine the appropriate condition of the matrix
,
then one
must calculate the eigenvalues (or latent roots or characteristic roots)
of the matrix. The eigenvalues are useful in that
- The product of the eigenvalues equals the determinant of the matrix.
- The number of non-zero eigenvalues equals the rank of the matrix.
- If all the eigenvalues are greater than zero, then the matrix is
positive definite.
- If all the eigenvalues are greater than or equal to zero and one or
more are equal to zero, then the matrix is positive semi-definite.
If
is a square, symmetric matrix, then it can be represented as
where
is a diagonal matrix, the canonical form of
,
containing the eigenvalues of
,
and
is an orthogonal
matrix of the corresponding eigenvectors. Recall that for a matrix to
be orthogonal then
and
.
The eigenvalues and eigenvectors are found by solving
and
where d is one of the eigenvalues of
and
is the
corresponding eigenvector. There are numerous computer routines for
calculating
and
.
Differentiation
The differentiation of mathematical expressions involving matrices
follows similar rules as for those involving scalars. Some of the
basic results are shown below.
Let
With scalars we have
but with vectors we have
The general rule is
Another function might be
c = 9x12 + 6x1x2 + 4x22
or
With scalars we would have
then in matrix form,
If
was not a symmetric matrix, then
Cholesky Decomposition
In simulation studies or applications of Gibb's sampling there is
frequently a need to factor a symmetric positive definite matrix into
the product of a matrix times its transpose. The Cholesky decomposition
of a matrix, say
,
is a lower triangular matrix such that
and
is lower triangular. Suppose that
The problem is to derive
such that
These equations give
t11 = 3, then
t21 = 3/t11 = 1, and
t31 = -6/t11 = -2. From the fourth equation, t22 is the
square root of
(5 - t212) or
t22 = 2. The fifth equation
says that
(1)(-2) + (2)t32 = 0 or
t32 = 1. The last equation
says that
t332 is
21 - (-2)2 - (1)2 = 16. The
end result is
The derivation of the Cholesky decomposition is easily programmed for
a computer. Note that in calculating the diagonals of
the
square root of a number is needed, and consequently this number must
always be positive. Hence, if the matrix is positive definite, then
all necessary square roots will be of positive numbers. However, the
opposite is not true. That is, if all of the square roots are of
positive numbers, the matrix is not necessarily guaranteed to be
positive definite. The only way to guarantee positive definiteness is
to calculate the eigenvalues of a matrix and to see that they are all
positive.
Inverse of a Lower Triangular Matrix
The inverse of a lower triangular matrix is also a lower triangular
matrix, and can be easily derived. The diagonals of the inverse
are the inverse of the diagonals of the original matrix. Using the
matrix
from the previous section, then
where
These equations give
Likewise the determinant of a triangular matrix is the product of all
of the diagonal elements. Hence all diagonal elements need to be
non-zero for the inverse to exist.
The natural logarithm of the determinant of a triangular matrix is
the summation of the natural logarithms of the individual diagonal
elements. This property is useful in derivative free restricted
maximum likelihood.
This LaTeX document is available as postscript or asAdobe PDF.
Larry Schaeffer
1999-02-26