Section 5.5 Matrix Multiplication
Addition and scalar multiplication of matrices is defined in the intuitively natural way, entry-wise. However, there is another operation which is not defined in this natural way. That is, multiplying two matrices, or matrix multiplication, is not computed entry-wise. This will be a generalization of the matrix-vector product.
Subsection 5.5.1 Matrix Multiplication
Consider a matrix \(B\text{.}\) If \(B\) is multipled to a vector \(\vec{x}\text{,}\) this transforms \(\vec{x}\) into the vector \(B\vec{x}\text{.}\) Then, if this vector is then multiplied by another matrix \(A\text{,}\) the resulting vector is \(A(B\vec{x})\text{.}\)
The vector \(A(B\vec{x})\) is produced from \(\vec{x}\) by a composition of transformations. Then, we define \(AB\) to be a matrix which represents this single composite transformation produced by applying \(B\) and then \(A\text{.}\)
Let \(A\) be an \(m \times n\) matrix, and \(B\) be an \(n \times p\) matrix. Let the columns of \(B\) be \(\vec{b}_1, \dots, \vec{b}_p\text{,}\) and let \(\vec{x} = (x_1, \dots, x_p)\) be a vector in \(\mathbb{R}^p\text{.}\) Then, first consider the matrix-vector product \(B\vec{x}\text{,}\)
\begin{equation*}
B\vec{x} = x_1 \vec{b}_1 + \dots + x_p \vec{b}_p
\end{equation*}
\(A(B\vec{x})\text{,}\)
\begin{equation*}
A(B\vec{x}) = A\brac{x_1 \vec{b}_1 + \dots + x_p \vec{b}_p}
\end{equation*}
\(A\)\(\vec{b}_1, \dots, \vec{b}_p\text{,}\)\(x_1, \dots, x_p\text{.}\)
\begin{equation*}
A(B\vec{x}) = x_1 A\vec{b}_1 + \dots + x_p A \vec{b}_p
\end{equation*}
In other words, the vector \(A(B\vec{x})\) is the linear combination of \(A\vec{b}_1, \dots, A\vec{b}_p\text{,}\) with the entries of \(\vec{x}\) as weights. In matrix notation,
\begin{equation*}
A(B\vec{x}) = \begin{bmatrix} A \vec{b}_1 \amp \cdots \amp A \vec{b}_p \end{bmatrix} \vec{x}
\end{equation*}
Thus, the matrix \(\begin{bmatrix} A \vec{b}_1 \amp \cdots \amp A \vec{b}_p \end{bmatrix}\) is precisely the matrix which transforms \(\vec{x}\) into \(A(B\vec{x})\text{,}\) which we want to denote by \(AB\text{.}\)
Definition 5.5.1.
Let \(A\) be an \(m \times n\) matrix, \(B\) be an \(n \times p\) matrix with columns \(\vec{b}_1, \dots, \vec{b}_p\text{.}\) Then, the product \(AB\) is the \(m \times p\) matrix with columns \(A\vec{b}_1, \dots, A\vec{b}_p\text{,}\) or,
\begin{equation*}
\boxed{AB = A \begin{bmatrix} \vec{b}_1 \amp \cdots \amp \vec{b}_p \end{bmatrix} = \begin{bmatrix} A \vec{b}_1 \amp \cdots \amp A \vec{b}_p \end{bmatrix}}
\end{equation*}
This definition for matrix multiplication makes the equation \(A(B\vec{x}) = (AB)\vec{x}\) true. That is, \(AB\) is the single transformation that is equvialent to applying \(B\) first, and then \(A\text{.}\) In this way, multiplication of matrices corresponds to composition of linear transformations.
This definition also emhasizes that the column of \(AB\) are given by the matrix-vector products \(A\vec{b}_1, \dots, A\vec{b}_p\text{.}\) In some sense, matrix multiplication is a generalization of the matrix-vector product.
Subsection 5.5.2 Matrix Multiplication as a Dot Product (Entry-by-Entry)
Another perspective on matrix multiplication views it entry-by-entry, and involves a dot product of a column and a row.
Example 5.5.2.
Let \(A = \begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix}, B = \begin{bmatrix} e \amp f \\ g \amp h \end{bmatrix}\text{.}\) Then,
\begin{align*}
\begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix} \begin{bmatrix} e \amp f \\ g \amp h \end{bmatrix} \amp = \begin{bmatrix} \begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix} \begin{bmatrix} e \\ g \end{bmatrix} \amp \begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix} \begin{bmatrix} f \\ h \end{bmatrix} \end{bmatrix}\\
\amp = \begin{bmatrix} e \begin{bmatrix} a \\ c \end{bmatrix} + g \begin{bmatrix} b \\ d \end{bmatrix} \amp f \begin{bmatrix} a \\ c \end{bmatrix} + h \begin{bmatrix} b \\ d \end{bmatrix} \end{bmatrix}\\
\amp = \begin{bmatrix} ae + bg \amp af + bh \\ ce + dg \amp cf + dh \end{bmatrix}
\end{align*}
Notice that the entries of this product of matrices has the form of a dot product of vectors within the original two matrices. For example, the \((1,1)\)-entry \(ae + bg\) is the dot product of \((a,b)\) with \((e,g)\text{,}\) the 1st row of \(A\) and the 1st column of \(B\text{,}\) respectively.
More generally, the \((i,j)\)-entry of \(AB\) is the dot product of the \(i\)th row of \(A\) and the \(j\)the column of \(B\text{.}\) In other words,
\begin{equation*}
\boxed{(AB)_{ij} = a_{i1} b_{1j} + a_{i2} b_{2j} + \dots + a_{in} b_{nj} = \sum_{k=1}^n a_{ik} b_{kj}}
\end{equation*}
Subsection 5.5.3 Row Perspective of the Matrix-Vector Product and Matrix Multiplication
There is another characterization of the matrix-vector product, and matrix multiplication, from the perspective of the rows of the matrix, and in particular a linear combination of the rows. Let \(A\) be an \(m \times n\) matrix. Recall that \(A\vec{x}\) (where \(\vec{x} \in \mathbb{R}^n\)) can be thought of as a linear combination of the columns of \(A\) with weights given by the entries of \(\vec{x}\text{.}\) In a similar way, the product \(\vec{x}A\) (where \(\vec{x} \in \mathbb{R}^m\)) can be thought of as a linear combination of the rows of \(A\text{,}\) with weights given by the entries of \(\vec{x}\text{.}\)
More precisely, let \(\vec{x} = (x_1, \dots, x_m)\text{,}\) and let \(\vec{a}^i\) denote the \(i\)th row of \(A\text{,}\) so
\begin{equation*}
A = \begin{bmatrix} \vec{a}^1 \\ \vdots \\ \vec{a}^m \end{bmatrix}
\end{equation*}
Note that we haven't defined vector powers, so this is not ambiguous. Then,
\begin{equation*}
\vec{x} A = x_1 \vec{a}^1 + \dots + x_m \vec{a}^m
\end{equation*}
for which the result is again a row vector.
Example 5.5.3.
For example,
\begin{align*}
\begin{bmatrix} 2 \amp -1 \end{bmatrix} \begin{bmatrix} 2 \amp 3 \\ -1 \amp 4 \end{bmatrix} \amp = 2 \begin{bmatrix} 2 \amp 3 \end{bmatrix} - 1 \begin{bmatrix} -1 \amp 4 \end{bmatrix}\\
\amp = \begin{bmatrix} 4 \amp 6 \end{bmatrix} + \begin{bmatrix} 1 \amp -4 \end{bmatrix}\\
\amp = \begin{bmatrix} 5 \amp 2 \end{bmatrix}
\end{align*}
Then, this leads to a characterization of matrix multiplication. Let \(B\) be an \(n \times p\) matrix. Then,
\begin{equation*}
\boxed{AB = \begin{bmatrix} \vec{a}^1 B \\ \vdots \\ \vec{a}^m B \end{bmatrix}}
\end{equation*}
That is, each row of \(AB\) is a linear combination of the rows of \(B\text{,}\) with weights given by the correspondsing row of \(A\text{.}\) In particular, the \(i\)th row of \(AB\) is a linear combination of the rows of \(B\) with weights given by the entries of the \(i\)th row of \(A\text{.}\)
Example 5.5.4.
For example,
\begin{align*}
\begin{bmatrix} 1 \amp 0 \\ 0 \amp 1 \end{bmatrix} \begin{bmatrix} 5 \amp 2 \\ -2 \amp 4 \end{bmatrix} \amp = \begin{bmatrix} 1 \cdot \begin{bmatrix} 5 \amp 2 \end{bmatrix} + 0 \cdot \begin{bmatrix} -2 \amp 4 \end{bmatrix} \\ 0 \cdot \begin{bmatrix} 5 \amp 2 \end{bmatrix} + 1 \cdot \begin{bmatrix} -2 \amp 4 \end{bmatrix} \end{bmatrix}\\
\amp = \begin{bmatrix} 5 \amp 2 \\ -2 \amp 4 \end{bmatrix}
\end{align*}
Subsection 5.5.4 Properties of Matrix Multiplication
Matrix multiplication shares many properties of real number multiplication, assuming the matrices have the appropriate dimensions.
Theorem 5.5.5.
Let \(A\) be an \(m \times n\) matrix, \(B\) and \(C\) be \(n \times p\) matrices, \(D\) and \(E\) be \(q \times m\) matrices, \(a \in \mathbb{R}\) be a scalar. Then,
Multiplication distributes over addition.
\begin{equation*}
A(B + C) = AB + AC \qquad \text{and} \qquad (D + E)A = DA + EA
\end{equation*}
Associative with scalar multiplication.
\begin{equation*}
a(AB) = (aA)B = A(aB)
\end{equation*}
Associative with matrix multiplication.
\begin{equation*}
A(BC) = (AB)C
\end{equation*}
Intuitively, these properties mean that parentheses surrounding matrix expressions generally work the same as for real numbers.
The associative property \(A(BC) = (AB)C\) is very intuitive through the language of transformations. It says that applying the composition \(BC\) and then \(A\text{,}\) is the same as applying \(C\text{,}\) then the composition \(AB\text{.}\) Both of these are by definition equiavlent to applying \(C\text{,}\) then \(B\text{,}\) then \(A\text{.}\)
The numerical proofs for these properties for an arbitrary matrix can be tedious and symbol-heavy.
Proof.
Proof of 1. entry-wise.
\begin{align*}
(A(B + C))_{ij} \amp = \sum_{k=1}^n A_{ik}(B + C)_{kj} \amp\amp \text{definition of matrix multiplication}\\
\amp = \sum_{k=1}^n A_{ik}(B_{kj} + C_{kj}) \amp\amp \text{definition of matrix addition}\\
\amp = \sum_{k=1}^n (A_{ik} B_{kj} + A_{ik} C_{kj})\\
\amp = \sum_{k=1}^n A_{ik}B_{kj} + \sum_{k=1}^n A_{ik}C_{kj}\\
\amp = (AB)_{ij} + (AC)_{ij} \amp\amp \text{definition of matrix multiplication}\\
\amp = (AB + AC)_{ij} \amp\amp \text{definition of matrix addition}
\end{align*}
Proof of 3. for \(2 \times 2\) matrices. Let
\(A = \begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix}, B = \begin{bmatrix} e \amp f \\ g \amp h \end{bmatrix}, C = \begin{bmatrix} i \amp j \\ k \amp l \end{bmatrix}\text{.}\) Then,
\begin{align*}
A(BC) = \begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix} \brac{\begin{bmatrix} e \amp f \\ g \amp h \end{bmatrix} \begin{bmatrix} i \amp j \\ k \amp l \end{bmatrix}} \amp = \begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix} \begin{bmatrix} ei + fk \amp ej + fl \\ gi + hk \amp gj + hl \end{bmatrix}\\
\amp = \begin{bmatrix} a(ei + fk) + b(gi + hk) \amp a(ej + fl) + b(gj + hl) \\ c(ei + fk) + d(gi + hk) \amp c(ej + fl) + d(gj + hl) \end{bmatrix}
\end{align*}
Also,
\begin{align*}
(AB)C = \brac{\begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix} \begin{bmatrix} e \amp f \\ g \amp h \end{bmatrix}} \begin{bmatrix} i \amp j \\ k \amp l \end{bmatrix} \amp = \begin{bmatrix} ae + bg \amp af + bh \\ ce + dg \amp cf + dh \end{bmatrix} \begin{bmatrix} i \amp j \\ k \amp l \end{bmatrix}\\
\amp = \begin{bmatrix} (ae + bg)i + (af + bh)k \amp (ae + bg)j + (af + bh)l \\ (ce + dg)i + (cf + dh)k \amp (ce + dg)j + (cf + dh)l
\end{align*}
Comparing these expressions entry-by-entry shows that they are equal.
Subsection 5.5.5 Matrix Multiplication is Not Commutative
While matrix multiplication shares many properties of real number multiplication, it differs by an important property. That is, matrix multiplication is not commutative, in that in general,
\begin{equation*}
AB \neq BA
\end{equation*}
Viewing \(A\) and \(B\) as transformations, it is clear that the composition applying \(B\) first, and then \(A\text{,}\) will not be the same as applying \(A\) first, and then \(B\text{.}\) You may recall the more specific property holds for functions, that is, for real-valued functions \(f, g\text{,}\) \(f(g(x)) \neq g(f(x))\text{.}\) That is, the non-commutativity of matrix multiplication is a consequence of the more general fact that transformation composition is not commutative.
In fact, viewing \(A\) and \(B\) as matrices, suppose say \(A\) is an \(m \times n\) matrix, and \(B\) is an \(n \times p\) matrix. Then, the product \(AB\) is defined. However, the product \(BA\) is not even defined in general, unless \(m = p\text{.}\) Further, even if \(m = p\text{,}\) the product \(AB\) will have dimensions \(m \times m\text{,}\) while the product \(BA\) will have dimensions \(n \times n\text{,}\) and so \(AB\) and \(BA\) cannot even possibly be the same matrix, unless \(m = n\text{.}\) In this special case, where \(m = n\) and \(m = p\text{,}\) \(A\) and \(B\) are both square matrices, say \(n \times n\text{.}\) Even in this particular case, the entries of \(A\) and \(B\) are not all the same in general, that is, \(AB \neq BA\text{.}\)
More concretely, \(AB\) is a linear combination of the columns of \(B\text{,}\) whereas \(BA\) is a linear combination of the columns of \(A\text{,}\) and there is no reason to expect these to be equal.
Then, in the product \(AB\text{,}\) we sometimes say \(A\) is right-multiplied by \(B\text{,}\) or \(B\) is left-multiplied by \(A\text{.}\)
Subsection 5.5.6 The Identity Matrix
The identity matrix for the matrix-vector product plays an identical role in the multiplciation of matrices. Recall that the \(n \times n\) identity matrix is denoted by \(I_n\) and has 1's along its main diagonal (from top left to bottom right) and 0's everywhere else,
\begin{equation*}
I_n = \begin{bmatrix} 1 \amp 0 \amp \dots \amp 0 \\ 0 \amp 1 \amp \dots \amp 0 \\ \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp 0 \amp \dots \amp 1 \end{bmatrix}
\end{equation*}
This matrix has the property that for any \(n \times n\) matrix \(A\text{,}\)
\begin{equation*}
\boxed{AI_n = I_n A = A}
\end{equation*}
Example 5.5.6.
For example, you can verify that,
\begin{equation*}
\begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix} \begin{bmatrix} 1 \amp 0 \\ 0 \amp 1 \end{bmatrix} = \begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix} = \begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix} \begin{bmatrix} 1 \amp 0 \\ 0 \amp 1 \end{bmatrix}
\end{equation*}
Subsection 5.5.7 Matrix Operations with \(1 \times 1\) Matrices
The arithmetic of matrices can be more deeply understood by considering the simplest possible case, of arithemtic with \(1 \times 1\) matrices. A general \(1 \times 1\) matrix is of the form \(A = \begin{bmatrix} a \end{bmatrix}\text{.}\) Then, addition and scalar multiplication are defined as,
\begin{align*}
\begin{bmatrix} a \end{bmatrix} + \begin{bmatrix} b \end{bmatrix} \amp = \begin{bmatrix} a + b \end{bmatrix}\\
k \begin{bmatrix} a \end{bmatrix} \amp = \begin{bmatrix} ka \end{bmatrix}\\
\begin{bmatrix} a \end{bmatrix} \begin{bmatrix} b \end{bmatrix} \amp = \begin{bmatrix} ab \end{bmatrix}
\end{align*}
All of these computations follows from the more general definitions presented previously. Notice that the operations of \(1 \times 1\) matrices are essentially analogously to the operations of real numbers. That is, \(1 \times 1\) can be identified with real numbers.
Then, \(2 \times 2\) and larger matrices can be thought of as “generalized numbers”.
Subsection 5.5.8 Matrix Multiplication by Blocks (Partitioned Matrices)