Title: A Little Necessary Matrix Algebra for Doctoral Studies in Business Administration
1A Little Necessary Matrix Algebra for Doctoral
Studies in Business Administration
- James J. Cochran
- Department of Computer Information Systems
Analysis - Louisiana Tech University
- Jcochran_at_cab.latech.edu
2Matrix Algebra
- Matrix algebra is a means of efficiently
expressing large numbers of calculations to be
made upon ordered sets of numbers - Often referred to as Linear Algebra
3Why use it?
- Matrix algebra is used primarily to facilitate
mathematical expression. - Many equations would be completely intractable if
scalar mathematics had to be used. It is also
important to note that the scalar algebra is
under there somewhere.
4Definitions - Scalars
- scalar - a single value (i.e., a number)
5Definitions - Vectors
- Vector - a single row or column of numbers
- Each individual entry is called an element
- denoted with bold small letters
- row vector
6Definitions - Matrices
- A matrix is a rectangular array of numbers
(called elements) arranged in orderly rows and
columns
- Subscripts denote row (i1,,n) and column
(j1,,m) location of an element
7Definitions - Matrices
- Matrices are denoted with bold Capital letters
- All matrices (and vectors) have an order or
dimensions - that is the number of rows x the
number of columns. Thus A is referred to as a two
by three matrix. - Often a matrix A of dimension n x m is denoted
Anxm - Often a vector a of dimension n (or m) is denoted
An (or Am)
8Definitions - Matrices
- Null matrix a matrix for which all elements are
zero, i.e., aij 0 ? i,j - Square Matrix a matrix for which the number of
rows equals the number of columns (n m) - Symmetric Matrix a matrix for which aij aji ?
i,j
9Definitions - Matrices
- Diagonal Elements Elements of a Square Matrix
for which the row and column locations are equal,
i.e., aij ? i j - Upper Triangular Matrix a matrix for which all
elements below the diagonal are zero, i.e., aij
0 ? i,j ? i lt j - Lower Triangular Matrix a matrix for which all
elements above the diagonal are zero, i.e., aij
0 ? i,j ? i gt j
10Matrix Equality
- Thus two matrices are equal iff (if and only if)
all of their elements are identical - Note that statistical data sets are matrices
(usually with observations in the rows and
variables in the columns)
11Basic Matrix Operations
- Transpositions
- Sums and Differences
- Products
- Inversions
12The Transpose of a Matrix
- The transpose A of a matrix A is the matrix
such that the ith row of A is the jth column of
A, i.e., B is the transpose of A iff bij aji ?
i,j - This is equivalent to fixing the upper left and
lower right corners then rotating the matrix 180
degrees
13Transpose of a MatrixAn Example
then
i.e.,
14More on theTranspose of a Matrix
- (A) A (think about it!)
- If A A', then A is symmetric
15Sums and Differencesof Matrices
- Two matrices may be added (subtracted) iff they
are the same order - Simply add (subtract) elements from corresponding
locations
where
16Sums and DifferencesAn Example
then we can calculate C A B by
17Sums and DifferencesAn Example
then we can calculate C A - B by
18Some Properties ofMatrix Addition/Subtraction
- Note that
- The transpose of a sum sum of transposes ?
(ABC) ABC - AB BA (i.e., matrix addition is commutative)
- Matrix addition can be extended beyond two
matrices - matrix addition is associative, i.e., A(BC)
(AB)C
19Products of Scalarsand Matrices
- To multiply a scalar times a matrix, simply
multiply each element of the matrix by the scalar
quantity
20Products of Scalars Matrices An Example
then we can calculate bA by
- Note that bA Ab if b is a scalar
21Some Properties ofScalar x Matrix Multiplication
- Note that
- if b is a scalar then bA Ab (i.e., scalar x
matrix multiplication is commutative) - Scalar x Matrix multiplication can be extended
beyond two scalars - Scalar x Matrix multiplication is associative,
i.e., ab(C) a(bC) - Scalar x Matrix multiplication leads to removal
of a common factor, i.e., if
22Products of Matrices
- We write the multiplication of two matrices A and
B as AB - This is referred to either as
- pre-multiplying B by A
- or
- post-multiplying A by B
- So for matrix multiplication AB, A is referred to
as the premultiplier and B is referred to as the
postmultiplier
23Products of Matrices
- In order to multiply matrices, they must be
conformable (the number of columns in the
premultiplier must equal the number of rows in
postmultiplier) - Note that
- an (m x n) x (n x p) (m x p)
- an (m x n) x (p x n) cannot be done
- a (1 x n) x (n x 1) a scalar (1 x 1)
24Products of Matrices
- If we have A(3x2) and B(2x3) then
where
25Products of Matrices
- If we have A(3x2) and B(2x3) then
i.e., matrix multiplication is not commutative
(why?)
26Matrix MultiplicationAn Example
then
where
27Some Properties ofMatrix Multiplication
- Note that
- Even if conformable, AB does not necessarily
equal BA (i.e., matrix multiplication is not
commutative) - Matrix multiplication can be extended beyond two
matrices - matrix multiplication is associative, i.e., A(BC)
(AB)C
28Some Properties ofMatrix Multiplication
- Also note that
- The transpose of a product is equal to the
product of the transposes in reverse order ?
(ABC) CBA - If AA A then A' is idempotent (and A' A)
29Special Uses forMatrix Multiplication
- Sum Row Elements of a Matrix
- Premultiply a matrix A by a conformable row
vector of 1s If
then premultiplication by
will yield the column totals for A, i.e.
30Special Uses forMatrix Multiplication
- Sum Column Elements of a Matrix
- Postmultiply a matrix A by a conformable column
vector of 1s If
then postmultiplication by
will yield the column totals for A, i.e.
31Special Uses forMatrix Multiplication
- The Dot (or Inner) Product of two Vectors
- Premultiplication of a column vector a by
conformable row vector b yields a single value
called the dot product or inner product - If
then ab gives us
which is the sum of products of elements in
similar positions for the two vectors
32Special Uses forMatrix Multiplication
- The Outer Product of two Vectors
- Postmultiplication of a column vector a by
conformable row vector b yields a matrix
containing the products of each pair of elements
from the two matrices (called the outer product)
- If
then ba gives us
33Special Uses forMatrix Multiplication
- Sum the Squared Elements of a Vector
- Premultiply a column vector a by its transpose
If - then premultiplication by a row vector a
- will yield the sum of the squared values of
elements for a, i.e.
34Special Uses forMatrix Multiplication
- Postmultiply a row vector a by its transpose If
- then postmultiplication by a column vector a
- will yield the sum of the squared values of
elements for a, i.e.
35Special Uses forMatrix Multiplication
- Determining if two vectors are Orthogonal Two
conformable vectors a and b are orthogonal iff - ab 0
- Example Suppose we have
- then
36Special Uses forMatrix Multiplication
- Representing Systems of Simultaneous Equations
Suppose we have the following system of
simultaneous equations - px1 qx2 rx3 M
- dx1 ex2 fx3 N
- If we let
then we can represent the system (in matrix
notation) as Ax b (why?)
37Special Uses forMatrix Multiplication
- Linear Independence any subset of columns (or
rows) of a matrix A are said to be linearly
independent if no column (row) in the subset can
be expressed as a linear combination of other
columns (rows) in the subset. - If such a combination exists, then the columns
(rows) are said to be linearly dependent.
38Special Uses forMatrix Multiplication
- The Rank of a matrix is defined to be the number
of linearly independent columns (or rows) of the
matrix. - Nonsingular (Full Rank) Matrix Any matrix that
has no linear dependencies among its columns
(rows). For a square matrix A this implies that
Ax 0 iff x 0. - Singular (Not of Full Rank) Matrix Any matrix
that has at least one linear dependency among its
columns (rows).
39Special Uses forMatrix Multiplication
- Example - The following matrix A
is singular (not of full rank) because the third
column is equal to three times the first
column. This result implies there is either i)
no unique solution or ii) no existing solution to
the system of equations Ax 0 (why?).
40Special Uses forMatrix Multiplication
- Example - The following matrix A
is singular (not of full rank) because the third
column is equal to the first column plus two
times the second column. Note that the number
of linearly independent rows in a matrix will
always equal the number of linearly independent
columns in the matrix.
41Geometryof Vectors
- Vectors have geometric properties of length and
direction for a vector - we have
2
length
x2
Why?
1
x1
42Geometryof Vectors
- Recall the Pythagorean Theorem in any right
triangle, the lengths of the hypotenuse and the
other two sides are related by the simple formula.
2
hypotenuse
side a
1
side b
43Geometryof Vectors
- Vector addition for the vectors
- we have
Lx
2
Ly
q
1
44Geometryof Vectors
- Scalar multiplication changes only the vector
length for the vector - we have
2
Lx
1
45Geometryof Vectors
- Vector multiplication have angles between them
for the vectors - we have
2
q
1
46A Little Trigonometry Review
2
Cos q 0
0 ? Cos q ? 1
-1 ? Cos q ? 0
Cos q 1
Cos q -1
q
1
-1 ? Cos q ? 0
0 ? Cos q ? 1
Cos q 0
47A Little Trigonometry Review
Suppose we rotate x any y so x lies on axis 1
2
qxy
1
48A Little Trigonometry Review
What does this imply about rxy?
2
Cos q 0
0 ? Cos q ? 1
-1 ? Cos q ? 0
qxy
Cos q 1
Cos q -1
1
-1 ? Cos q ? 0
0 ? Cos q ? 1
Cos q 0
49Geometryof Vectors
What is the correlation between the vectors x
and y?
2
x
Plotting in the column space gives us
1
y
50Geometryof Vectors
Rotating so x lies on axis 1 makes it easier to
see
2
Qxy1800
1
51Geometryof Vectors
What is the correlation between the vectors x
and y?
52Geometryof Vectors
Of course, we can see this by plotting the these
values in the x,y (row) space
Y
X
0.6, -0.3
1.0, -0.5
53Geometryof Vectors
What is the correlation between the vectors x
and y?
54Geometryof Vectors
Plotting in the column space gives us
2
Qxy900
1
55Geometryof Vectors
Rotating so x lies on axis 1 makes it easier to
see
2
Qxy900
1
56Geometryof Vectors
- The space of all real m-tuples, with scalar
multiplication and vector addition as we have
defined, is called vector space. - The vector
- is a linear combination of the vectors x1, x2,,
xk. - The set of all linear combinations of the vectors
x1, x2,, xk is called their linear span.
57Geometryof Vectors
Here is the column space plot for some vectors
x1 and x2
3
1
2
58Geometryof Vectors
Here is the linear span for some vectors x1 and
x2
3
1
2
59Geometryof Vectors
- A set of vectors x1, x2,, xk is said to be
linearly dependent if there exist k numbers a1,
a2,, ak, at least one of which is nonzero, such
that - Otherwise the set of vectors x1, x2,, xk is
said to be linearly independent
60Geometryof Vectors
- Are the vectors
- linearly independent?
- Take a1 0.5 and a2 1.0. Then we have
- The vectors x and y are dependent.
61Geometryof Vectors
Geometrically x and y look like this
2
1
62Geometryof Vectors
Rotating so x lies on axis 1 makes it easier to
see
2
Qxy1800
1
63Geometryof Vectors
- Are the vectors
- linearly independent?
- There are no real values a1, a2 such that
- so the vectors x and y are independent.
64Geometryof Vectors
Geometrically x and y look like this
2
Qxy900
1
65Geometryof Vectors
Rotating so x lies on axis 1 makes it easier to
see
2
Qxy900
1
66Geometryof Vectors
- Here x and y are called perpendicular (or
orthogonal) this is written x ? y. - Some properties of orthogonal vectors
- xy 0 ? x ? y
- z is perpendicular to every vector iff z 0
- If z is perpendicular to each vector x1, x2, ,
xk, then z is perpendicular their linear span.
67Geometryof Vectors
Here vectors x1 and x2 (plotted in the column
space) are orthogonal
3
1
2
68Geometryof Vectors
Recall that the linear span for vectors x1 and
x2 is
3
1
2
69Geometryof Vectors
Vector z looks like this
3
1
2
70Geometryof Vectors
The vector z is perpendicular to the linear span
for vectors x1 and x2
3
Check each of the dot products!
1
2
71Geometryof Vectors
Here vectors x1, x2, and z from our previous
problem are orthogonal
3
x1 and z are perpendicular
1
2
72Geometryof Vectors
Here vectors x1, x2, and z from our previous
problem are orthogonal
3
x2 and z are perpendicular
1
2
73Geometryof Vectors
Here vectors x1, x2, and z from our previous
problem are orthogonal
3
x1 and x2 are perpendicular
1
Note we could rotate x1, x2, and z until they
lied on our three axes!
2
74Geometryof Vectors
The projection (or shadow) of a vector x on a
vector y is given by
If y has unit length (i.e., Ly 1), the
projection (or shadow) of a vector x on a vector
y simplifies to (xy)y
75Geometryof Vectors
For the vectors the projection (or shadow) of
x on y is
76Geometryof Vectors
Geometrically the projection of x on y looks
like this
2
Perpendicular wrt y
projection of x on y
1
77Geometryof Vectors
Rotating so y lies on axis 1 makes it easier to
see
2
1
78Geometryof Vectors
Note that we write the length of the projection
of x on y like this
For our previous example, the length of the
length of the projection of x on y is
79The Gram-Schmidt (Orthogonalization) Process
For linearly independent vectors x1, x2,, xk,
there exist mutually perpendicular vectors u1,
u2,, uk with the same linear span. These may be
constructed by setting
80The Gram-Schmidt (Orthogonalization) Process
We can normalize (convert to vectors z of unit
length) the vectors u by setting
Finally, note that we can project a vector xk
onto the linear span of vectors x1, x2,, xk-1
81The Gram-Schmidt (Orthogonalization) Process
Here are vectors x1, x2, and z from our previous
problem
3
1
2
82The Gram-Schmidt (Orthogonalization) Process
Lets construct mutually perpendicular vectors
u1, u2, u3 with the same linear span well
arbitrarily select the first axis as u1
83The Gram-Schmidt (Orthogonalization) Process
Now we construct a vector u2 perpendicular with
vector u1 (and in the linear span of x1, x2, z)
84The Gram-Schmidt (Orthogonalization) Process
Finally, we construct a vector u3 perpendicular
with vectors u1 and u2 (and in the linear span of
x1, x2, z)
85The Gram-Schmidt (Orthogonalization) Process
86The Gram-Schmidt (Orthogonalization) Process
Here are our orthogonal vectors u1, u2, and u3
3
1
2
87The Gram-Schmidt (Orthogonalization) Process
If we normalized our vectors u1, u2, and u3, we
get
88The Gram-Schmidt (Orthogonalization) Process
and
89The Gram-Schmidt (Orthogonalization) Process
and
90The Gram-Schmidt (Orthogonalization) Process
The normalized vectors z1, z2, and z3 look like
this
3
1
2
91Special Matrices
- There are a number of special matrices. These
include - Diagonal Matrices
- Identity Matrices
- Null Matrices
- Commutative Matrices
- Anti-Commutative Matrices
- Periodic Matrices
- Idempotent Matices
- Nilpodent Matrices
- Orthogonal Matrices
92Diagonal Matrices
- A diagonal matrix is a square matrix that has
values on the diagonal with all off-diagonal
entities being zero.
93Identity Matrices
- An identity matrix is a diagonal matrix where
the diagonal elements all equal 1 - When used as a premultiplier or postmultiplier
of any conformable matrix A, the Identity Matrix
will return the original matrix A, i.e., - IA AI A
- Why?
94Null Matrices
- A square matrix whose elements all equal 0
- Usually arises as the difference between two
equal square matrices, i.e., - a b 0 ? a b
95Commutative Matrices
- Any two square matrices A and B such that AB BA
are said to commute. - Note that it is easy to show that any square
matrix A commutes with both itself and with a
conformable identity matrix I.
96Anti-Commutative Matrices
- Any two square matrices A and B such that AB
-BA are said to anti-commute.
97Periodic Matrices
- Any matrix A such that Ak1 A is said to be of
period k. - Of course any matrix that commutes with itself
of period k for any integer value of k (why?).
98Idempotent Matrices
- Any matrix A such that A2 A is said to be of
idempotent. - Thus an idempotent matrix commutes with itself
if of period k for any integer value of k.
99Nilpotent Matrices
- Any matrix A such that Ap 0 where p is a
positive integer is said to be of nilpotent. - Note that if p is the least positive integer
such that Ap 0, then A is said to be nilpotent
of index p.
100Orthogonal Matrices
- Any square matrix A with rows (considered as
vectors) are mutually perpendicular and have unit
lengths, i.e., - AA I
- Note that A is orthogonal iff A-1 A.
101The Determinant of a Matrix
- The determinant of a matrix A is commonly denoted
by A or det A. - Determinants exist only for square matrices.
- They are a matrix characteristic (that can be
somewhat tedious to compute).
102The Determinantfor a 2x2 Matrix
- If we have a matrix A such that
- then
- For example, the determinant of
- is
- Determinants for 2x2 matrices are easy!
103The Determinantfor a 3x3 Matrix
- If we have a matrix A such that
- Then the determinant is
- which can be expanded and rewritten as
(Why?)
104The Determinantfor a 3x3 Matrix
- If we rewrite the determinants for each of the
2x2 submatrices in
as
by substitution we have
105The Determinantfor a 3x3 Matrix
- Note that if we have a matrix A such that
- Then A can also be written as
- or
- or
106The Determinantfor a 3x3 Matrix
- To do so first create a matrix of the same
dimensions as A consisting only of alternating
signs (,-,,)
107The Determinantfor a 3x3 Matrix
- Then expand on any row or column (i.e., multiply
each element in the selected row/column by the
corresponding sign, then multiply each of these
results by the determinant of the submatrix that
results from elimination of the row and column to
which the element belongs - For example, lets expand on the second column
108The Determinantfor a 3x3 Matrix
- The three elements on which our expansion is
based will be a12, a22, and a32. The
corresponding signs are -, , -.
109The Determinantfor a 3x3 Matrix
- So for the first term of our expansion we will
multiply -a12 by the determinant of the matrix
formed when row 1 and column 2 are eliminated
from A (called the minor and often denoted Arc
where r and c are the deleted rows and columns)
which gives us
This product is called a cofactor.
110The Determinantfor a 3x3 Matrix
- For the second term of our expansion we will
multiply a22 by the determinant of the matrix
formed when row 2 and column 2 are eliminated
from A
which gives us
111The Determinantfor a 3x3 Matrix
- Finally, for the third term of our expansion we
will multiply -a32 by the determinant of the
matrix formed when row 3 and column 2 are
eliminated from A
which gives us
112The Determinantfor a 3x3 Matrix
- Putting this all together yields
So there are nine distinct ways to calculate the
determinant of a 3x3 matrix! These can be
expressed as
Note that this is referred to as the method of
cofactors and can be used to find the determinant
of any square matrix.
113The Determinant for a 3x3 Matrix An Example
- Suppose we have the following matrix A
Using row 1 (i.e., i1), the determinant is
Note that this is the same result we would
achieve using any other row or column!
114Some Propertiesof Determinants
- Determinants have several mathematical properties
useful in matrix manipulations - AA'
- If each element of a row (or column) of A is 0,
then A 0 - If every value in a row is multiplied by k, then
A kA - If two rows (or columns) are interchanged the
sign, but not value, of A changes - If two rows (or columns) of A are identical, A
0
115Some Propertiesof Determinants
- A remains unchanged if each element of a row is
multiplied by a constant and added to any other
row - If A is nonsingular, then A1/A-1, i.e.,
AA-11 - AB AB (i.e., the determinant of a product
product of the determinants) - For any scalar c, cA ckA where k is the
order of A - Determinant of a diagonal matrix is simply the
product of the diagonal elements
116Why areDeterminants Important?
- Consider the small system of equations
- a11x1 a12x2 b1
- a21x1 a22x2 b2
- Which can be represented by
- Ax b
- where
117Why areDeterminants Important?
- If we were to solve this system of equations
simultaneously for x2 we would have - a21(a11x1 a12x2 b1)
- -a11(a21x1 a22x2 b2)
- Which yields (through cancellation
rearranging) - a21a11x1 a21a12x2 - a11a21x1 - a11a22x2
- a21b1 - a11b2
118Why areDeterminants Important?
- or (a11a2 - a21a12)x2 a11b2 - a21b1
- which implies
Notice that the denominator is
Thus iff A 0 there is either i) no unique
solution or ii) no existing solution to the
system of equations Ax b!
119Why areDeterminants Important?
- This result holds true
- if we solve the system for x1 as well or
- for a square matrix A of any order.
- Thus we can use determinants in conjunction with
the A matrix (coefficient matrix in a system of
simultaneous equations) to see if the system has
a unique solution.
120Traces of Matrices
- The trace of a square matrix A is the sum of the
diagonal elements - Denoted tr(A)
- We have
For example, the trace of
is
121Some Propertiesof Traces
- Traces have several mathematical properties
useful in matrix manipulations - For any scalar c, tr(cA) ctr(A)
- tr(A ? B) tr(A) ? tr(B)
- tr(AB) tr(BA)
- tr(B-1AB) tr(A)
-
122The Inverse of a Matrix
- The inverse of a matrix A is commonly denoted by
A-1 or inv A. - The inverse of an n x n matrix A is the matrix
A-1 such that AA-1 I A-1A - The matrix inverse is analogous to a scalar
reciprocal - A matrix which has an inverse is called
nonsingular
123The Inverse of a Matrix
- For some n x n matrix A an inverse matrix A-1
may not exist. - A matrix which does not have an inverse is
singular. - An inverse of n x n matrix A exists iff A?0
124Inverse bySimultaneous Equations
- Pre or postmultiply your square matrix A by a
dummy matrix of the same dimensions, i.e.,
- Set the result equal to an identity matrix of the
same dimensions as your square matrix A, i.e.,
125Inverse bySimultaneous Equations
- Recognize that the resulting expression implies a
set of n2 simultaneous equations that must be
satisfied if A-1 exists - a11(a) a12(d) a13(g) 1, a11(b)
a12(e) a13(h) 0, - a11(c) a12(f) a13(i) 0
- or
- a21(a) a22(d) a23(g) 0, a21(b)
a22(e) a23(h) 1, - a21(c) a22(f) a23(i) 0
- or
- a31(a) a32(d) a33(g) 0, a31(b)
a32(e) a33(h) 0, - a31(c) a32(f) a33(i) 1.
Solving this set of n2 equations simultaneously
yields A-1.
126Inverse by Simultaneous Equations An Example
Then the postmultiplied matrix would be
We now set this equal to a 3x3 identity matrix
127Inverse by Simultaneous Equations An Example
- Recognize that the resulting expression implies
the following n2 simultaneous equations - 1a 2d 3g 1, 1b 2e 3h 0, 1c 2f 3i
0 - or
- 2a 5d 4g 0, 2b 5e 4h 1, 2c 5f 4i
0 - or
- 1a - 3d - 2g 0, 1b - 3e - 2h 0, 1c - 3f - 2i
1.
This system can be satisfied iff A-1 exists.
128Inverse by Simultaneous Equations An Example
Solving the set of n2 equations simultaneously
yields a -2/15, b 1/3, c 7/3, d
-8/15, e 1/3, f -2/3 g 11/15, h
-1/3, i -1/15 so we have that A-1
129Inverse by Simultaneous Equations An Example
ALWAYS check your answer. How? Use the fact
that AA-1 A-1A I and do a little matrix
multiplication!
So we have found A-1!
130Inverse by theGauss-Jordan Algorithm
- Augment your matrix A with an identity matrix of
the same dimensions, i.e.,
Now we use valid Row Operations necessary to
convert A to I (and so AI to IA-1)
131Inverse by theGauss-Jordan Algorithm
- Valid Row Operations on AI
- You may interchange rows
- You may multiply a row by a scalar
- You may replace a row with the sum of that row
and another row multiplied by a scalar (which is
often negative) - Every operation performed on A must be performed
on I - Use valid Row Operations on AI to convert A to I
(and so AI to IA-1)
132Inverse by the Gauss-Jordan Algorithm An Example
Then the augmented matrix AI is
We now wish to use valid row operations to
convert the A side of this augmented matrix to I
133Inverse by the Gauss-Jordan Algorithm An Example
- Step 1 Subtract 2Row 1 from Row 2
And substitute the result for Row 2 in AI
134Inverse by the Gauss-Jordan Algorithm An Example
- Step 2 Subtract Row 3 from Row 1
Divide the result by 5 and substitute for Row 3
in the matrix derived in the previous step
135Inverse by the Gauss-Jordan Algorithm An Example
- Step 3 Subtract Row 2 from Row 3
Divide the result by 3 and substitute for Row 3
in the matrix derived in the previous step
136Inverse by the Gauss-Jordan Algorithm An Example
- Step 4 Subtract 2Row 2 from Row 1
Substitute the result for Row 1 in the matrix
derived in the previous step
137Inverse by the Gauss-Jordan Algorithm An Example
- Step 5 Subtract 7Row 3 from Row 1
Substitute the result for Row 1 in the matrix
derived in the previous step
138Inverse by the Gauss-Jordan Algorithm An Example
- Step 6 Add 2Row 3 to Row 2
Substitute the result for Row 2 in the matrix
derived in the previous step
139Inverse by the Gauss-Jordan Algorithm An Example
- Now that the left side of the augmented matrix
is an identity matrix I, the right side of the
augmented matrix is the inverse of the matrix A
(A-1), i.e.,
140Inverse by the Gauss-Jordan Algorithm An Example
- To check our work, lets see if our result
yields AA-1 I
So our work checks out!
141Inverse by Determinants
- Replace each element aij in a matrix A with an
element calculated as follows - Find the determinant of the submatrix that
results when the ith row and jth column are
eliminated from A (i.e., Aij) - Attach the sign that you identified in the Method
of Cofactors - Divide by the determinant of A
- After all elements have been replaced, transpose
the resulting matrix
142Inverse by Determinants An Example
- Again suppose we have some matrix A
We have calculated the determinant of A to be
15, so we replace element 1,1 with
Similarly, we replace element 1,2 with
143Inverse by Determinants An Example
- After using this approach to replace each of the
nine elements of A, The eventual result will be
which is A-1!
144Eigenvalues and Eigenvectors
- For a square matrix A, let I be a conformable
identity matrix. Then the scalars satisfying the
polynomial equation A - lI 0 are called the
eigenvalues (or characteristic roots) of A. - The equation A - lI 0 is called the
characteristic equation or the determinantal
equation.
145Eigenvalues and Eigenvectors
- For example, if we have a matrix A
then
which implies there are two roots or eigenvalues
-- ?-6 and ?4.
146Eigenvalues and Eigenvectors
- For a matrix A with eigenvectors ?, a nonzero
vector x such that Ax ?x is called an
eigenvector (or characteristic vector) of A
associated with ?.
147Eigenvalues and Eigenvectors
- For example, if we have a matrix A
with eigenvalues ? -6 and ? 4, the
eigenvector of A associated with ? -6 is
Fixing x11 yields a solution for x2 of 2.
148Eigenvalues and Eigenvectors
- Note that eigenvectors are usually normalized so
they have unit length, i.e.,
For our previous example we have
Thus our arbitrary choice to fix x11 has no
impact on the eigenvector associated with ? -6.
149Eigenvalues and Eigenvectors
- For matrix A and eigenvalue ? 4, we have
We again arbitrarily fix x11, which now yields
a solution for x2 of 1/2.
150Eigenvalues and Eigenvectors
Normalization to unit length yields
Again our arbitrary choice to fix x11 has no
impact on the eigenvector associated with ? 4.
151Quadratic Forms
A Quadratic From is a function Q(x) xAx in
k variables x1,,xk where
and A is a k x k symmetric matrix.
152Quadratic Forms
Note that a quadratic form has only squared
terms and crossproducts, and so can be written
Suppose we have
then
153Spectral Decomposition and Quadratic Forms
Any k x k symmetric matrix can be expressed in
terms of its k eigenvalue-eigenvector pairs (li,
ei) as
This is referred to as the spectral
decomposition of A.
154Spectral Decomposition and Quadratic Forms
For our previous example on eigenvalues and
eigenvectors we showed that
has eigenvalues ?1 -6 and ?2 -4, with
corresponding (normalized) eigenvectors
155Spectral Decomposition and Quadratic Forms
Can we reconstruct A?
156Spectral Decomposition and Quadratic Forms
Spectral decomposition can be used to
develop/illustrate many statistical results/
concepts. We start with a few basic concepts
- Nonnegative Definite Matrix when any k x k
matrix A such that 0 ? xAx ? x x1, x2, ,
xk the matrix A and the quadratic form are
said to be nonnegative definite.
157Spectral Decomposition and Quadratic Forms
- Positive Definite Matrix when any k x k
matrix A such that 0 lt xAx ? x x1, x2, ,
xk?? 0, 0, , 0 the matrix A and the
quadratic form are said to be positive definite.
158Spectral Decomposition and Quadratic Forms
Example - Show that the following quadratic form
is positive definite
We first rewrite the quadratic form in matrix
notation
159Spectral Decomposition and Quadratic Forms
Now identify the eigenvalues of the resulting
matrix A (they are l1 2 and l2 8).
160Spectral Decomposition and Quadratic Forms
Next, using spectral decomposition we can write
where again, the vectors ei are the normalized
and orthogonal eigenvectors associated with the
eigenvalues l1 2 and l2 8.
161Spectral Decomposition and Quadratic Forms
Sidebar - Note again that we can recreate the
original matrix A from the spectral decomposition
162Spectral Decomposition and Quadratic Forms
Because l1 and l2 are scalars, premultiplication
and postmultiplication by x and x, respectively,
yield
where
At this point it is obvious that xAx is at
least nonnegative definite!
163Spectral Decomposition and Quadratic Forms
We now show that xAx is positive definite, i.e.
From our definitions of y1 and y2 we have
164Spectral Decomposition and Quadratic Forms
Since E is an orthogonal matrix, E
exists. Thus,
But 0 ? x Ey implies y ? 0 ?.
At this point it is obvious that xAx is
positive definite!
165Spectral Decomposition and Quadratic Forms
This suggests rules for determining if a k x k
symmetric matrix A (or equivalently, its
quadratic form xAx) is nonegative definite or
positive definite - A is a nonegative definite
matrix iff li ? 0, i 1,,rank(A) - A is a
positive definite matrix iff li gt 0, i
1,,rank(A)
166Measuring Distance
Euclidean (straight line) distance The
Euclidean distance between two points x and y
(whose coordinates are represented by the
elements of the corresponding vectors) in p-space
is given by
167Measuring Distance
For a previous example
3
1
2
the Euclidean (straight line) distances are
168Measuring Distance
3
1.430
1.414
1.414
1
2
169Measuring Distance
Notice that the lengths of the vectors are their
distances from the origin
This is yet another place where the Pythagorean
Theorem rears its head!
170Measuring Distance
Notice also that if we connect all points
equidistant from some given point z, the result
is a hypersphere with its center at z and area of
pr2
2
In p2 dimensions this yields a circle
z
r
1
171Measuring Distance
In p 2 dimensions, we actually talk about
area. In p ? 3 dimensions, we talk about volume -
which is 4/3pr3 for this problem or, more
generally
3
In p3 dimensions we have a sphere
z
r
1
2
172Measuring Distance
Problem What if the coordinates of a point x
(i.e., the elements of vector x) are random
variables with differing variances? Suppose
- we have n pairs of measurements on two
variables X1 and X2, each having a mean of zero
- X1 is more variable than X2 - X1 and X2 vary
independently
173Measuring Distance
A scatter diagram of these data might look like
this
2
Which point really lies further from the origin
in statistical terms (i.e., which point is less
likely to have occurred randomly)?
1
Euclidean distance does not account for
differences in variation of X1 and X2!
174Measuring Distance
Notice that a circle does not efficiently
inscribe the data
2
r2
1
The area of the ellipse is pr1r2.
r1
An ellipse does so much more efficiently!
175Measuring Distance
How do we take the relative dispersions on the
two axes into consideration?
2
1
We standardize each value of Xi by dividing by
its standard deviation.
176Measuring Distance
Note that the problem can extend beyond two
dimensions.
3
The area of the ellipsoid is (4/3)pr1r2r3 or more
generally
1
2
177Measuring Distance
If we are looking at distances from the origin
D(0,P), we could divide coordinate i by its
sample standard deviation sii
178Measuring Distance
The resulting measure is called Statistical
Distance or Mahalanobis Distance
179Measuring Distance
Note that if we plot all points a constant
squared distance c2 from the origin
2
1
The area of this ellipse is
180Measuring Distance
What if the scatter diagram of these data looked
like this
2
1
X1 and X2 now have an obvious positive
correlation!
181Measuring Distance
We can plot a rotated coordinate system on axes
x1 and x2
2
x2
x1
Q
1
This suggests that we calculate distance based
on the rotated axes x1 and x2.
182Measuring Distance
The relation between the original coordinates
(x1, x2) and the rotated coordinates (x1, x2) is
provided by
183Measuring Distance
Now we can write the distance from P (x1, x2)
to the origin in terms of the original
coordinates x1 and x2 of P as
where
184Measuring Distance
and
185Measuring Distance
Note that the distance from P (x1, x2) to the
origin for uncorrelated coordinates x1 and x2 is
for weights
186Measuring Distance
What if we wish to measure distance from some
fixed point Q (y1, y2)?
2
x2
x1
Q(y1, y2)
1
_
_
In this diagram, Q (y1, y2) (x1, x2) is
called the centroid of the data.
187Measuring Distance
The distance from any point p to some fixed
point Q (y1, y2) is
2
x2
P(x1, x2)
x1
Q(y1, y2)
Q
1
188Measuring Distance
Suppose we have the following ten bivariate
observations (coordinate sets of (x1, x2))
189Measuring Distance
The plot of these points would look like this
2
Centroid (-2, 5)
1
The data suggest a positive correlation between
x1, and x2.
190Measuring Distance
The inscribing ellipse (and major and minor
axes) look like this
2
x1
x2
Q450
1
191Measuring Distance
The rotational weights are
192Measuring Distance
and
193Measuring Distance
and
194Measuring Distance
So the distances of the observed points from
their centroid Q (-2.0, 5.0) are
195Measuring Distance
Mahalonobis distance can easily be generalized
to p dimensions
and all points satisfying
form a hyperellipsoid with centroid Q.
196Measuring Distance
Now lets backtrack the Mahalonobis distance
of a random p dimensional point P from the origin
is given by
so we can say
provided that d2 gt 0 ? x ? 0.
197Measuring Distance
Recognizing that aij aji, i ? j, i 1,,p, j
1,,p, we have
for x ? 0.
198Measuring Distance
Thus, p x p symmetric matrix A is positive
definite, i.e., distance is determined from a
positive definite quadratic form xAx! We can
also conclude from this result that a positive
definite quadratic form can be interpreted as a
squared distance! Finally, if the square of the
distance from point x to the origin is given by
xAx, then the square of the distance from point
x to some arbitrary fixed point m is given by
(x-m)A (x-m).
199Measuring Distance
Expressing distance as the square root of a
positive definite quadratic form yields an
interesting geometric interpretation based on the
eigenvalues and eigenvectors of A. For example,
in p 2 two dimensions all points
that are constant distance c from the origin
must satisfy
200Measuring Distance
By the spectral decomposition we have
so by substitution we now have
and A is positive definite, so l1 gt 0 and l2 gt0,
which means
is an ellipse.
201Measuring Distance
Finally, a little algebra can be used to show
that
satisfies
202Measuring Distance
Similarly, a little algebra can be used to show
that
satisfies
203Measuring Distance
So the points at a distance c lie on an ellipse
whose axes are given by the eigenvectors of A
with lengths proportional to the reciprocals of
the square roots of the corresponding eigenvalues
(with constant of proportionality c)
x2
e1
e2
This generalizes to p dimensions
x1
204Square Root Matrices
Because spectral decomposition allows us to
express the inverse of a square matrix in terms
of its eigenvalues and eigenvectors, it enables
us to conveniently create a square root
matrix. Let A be a p x p positive definite
matrix with the spectral decomposition
205Square Root Matrices
Also let P be a matrix whose columns are the
normalized eigenvectors e1, e2, , ep of A, i.e.,
Then
where PP PP I and
206Square Root Matrices
Now since (PL-1P)PLPPLP(PL-1P)PPI we
have
Next let
207Square Root Matrices
The matrix
is called the square root of A.
208Square Root Matrices
- The square root of A has the following
properties -
-
-
-
209Square Root Matrices
Next let L-1 denote the matrix matrix whose
columns are the normalized eigenvectors e1, e2,
, ep of A, i.e.,
Then
where PP PP I and
210Singular Value Decomposition
We can extend the operations of spectral
decomposition for a rectangular matrix by using
the eigenvalues and eigenvectors from the square
matrix AA or (AA). Suppose A is an m x k real
matrix. There exists an m x m orthogonal matrix U
and a k x k orthogonal matrix V such that A
ULV where L has ith diagonal element li ? 0 for
i 1, 2,, min (m,k) and 0 for all other
elements.
singular values of A
211Singular Value Decomposition
Singular Value decomposition can also be
expressed as a matrix expansion that depends on
the rank r of A. There exist r - positive
constants l1, l2,, lr, - orthogonal m x 1 unit
vectors u1, u2, , ur, - orthogonal k x 1 unit
vectors v1, v2, , vr, such that
212Singular Value Decomposition
where - Ur u1, u2, , ur - Vr v1, v2,
, vr - L is an r x r with diagonal entries l1,
l2,, lr and off diagonal entries 0
213Singular Value Decomposition
We can show that AA has eigenvalue-eigenvector
pairs (li, ui), so
with
then
214Singular Value Decomposition
Alternatively, we can show that AA has
eigenvalue-eigenvector pairs (li, vi), so
with
then
215Singular Value Decomposition
Suppose we have a rectangular matrix
then
216Singular Value Decomposition
and AA has eigenvalues of g15 and g210 with
corresponding normalized eigenvectors
217Singular Value Decomposition
Similarly for our rectangular matrix
then
218Singular Value Decomposition
and AA also has eigenvalues of g10, g25 and
g310 with corresponding normalized eigenvectors
219Singular Value Decomposition
Now taking
we find that the singular value decomposition of
A is
220Singular Value Decomposition
The singular value decomposition is closely
connected to the approximation of a rectangular
matrix by a lower-dimension matrix Eckart and
Young, 1936 First note that if a m x k matrix
A is approximated by B of same dimension but
lower rank,
221Singular Value Decomposition
Eckart and Young used this fact to show that,
for a m x k real matrix A with m ? k and singular
value decomposition ULV,
is the rank-s least squares approximation to A
(where s lt k rank (A)).
222Singular Value Decomposition
This matrix B minimizes
over all m x k matrices of rank no greater than
s! It can also be shown that the error of this
approximation is
223Random Vectors and Matrices
- Random Vector vector whose individual elements
are random variables - Random Matrix matrix whose individual elements
are random variables
224Random Vectors and Matrices
The expected value of a random vector or matrix
is the matrix containing the expected values of
the individual elements, i.e.,
225Random Vectors and Matrices
where
226Random Vectors and Matrices
- Note that for random matrices X and Y of the
same dimension, conformable matrices of constants
A and B, and scalar c - E(cX) cE(X)
- E(XY) E(X) E(Y)
- E(AXB) AE(X)B
227Random Vectors and Matrices
Mean Vector random vector whose elements are
the means of the corresponding random variables,
i.e.,
228Random Vectors and Matrices
In matrix notation we can write the mean vector
as
229Random Vectors and Matrices
For the bivariate probability distribution
the mean vector is
230Random Vectors and Matrices
Covariance Matrix random symmetric vector
whose diagonal elements are variances of the
corresponding random variables, i.e.,
231Random Vectors and Matrices
and whose off-diagonal elements are covariances
of the corresponding random variable pairs, i.e.,
notice that if we this expression, when i k,
returns the variance, i.e.,
232Random Vectors and Matrices
In matrix notation we can write the covariance
matrix as
233Random Vectors and Matrices
For the bivariate probability distribution we
used earlier
the covariance matrix is
234Random Vectors and Matrices