Skip to main content

Section 5.7 Inverse of a Matrix

Recall that for real numbers, the inverse (or multiplicative inverse, or reciprocal of \(a \in \mathbb{R}\) is the number \(b\) such that ab = 1. We denote \(b\) by \(\frac{1}{a}\) or \(a^{-1}\text{.}\) For example, the multiplicative inverse of \(3\) is \(3^{-1} = \frac{1}{3}\text{,}\) because \(3 \cdot \frac{1}{3} = \frac{1}{3} \cdot 3 = 1\text{.}\) Also, inverses only exist for non-zero real numbers.
For matrices, we can define an analogous notion of inverse. However, unlike real number multiplication, matrix multiplication is not commutative. Further, for both sides of the product to be defined, the matrices involved must be square. So, most often, we focus on inverses of square matrices. Also, similarly, the inverse of a matrix will only exist if the matrix is “non-zero” in some sense.

Subsection 5.7.1 Inverse of a Square Matrix

Definition 5.7.1.

An \(n \times n\) square matrix \(A\) is invertible if there exists an \(n \times n\) matrix \(B\) such that \(AB = BA = I_n\text{.}\) Then, \(B\) is called the inverse of \(A\text{,}\) and is denoted by \(B = A^{-1}\text{.}\)
If \(B\) and \(C\) are both inverses of \(A\text{,}\) then \(AB = BA = I_n\) and \(AC = CA = I_n\text{.}\) Then,
\begin{equation*} B = BI_n = B(AC) = (BA)C = I_n C = C \end{equation*}
Thus, \(B = C\text{.}\)
Thus, the inverse of a matrix \(A\text{,}\) if it exists, is well-defined, and so is denoted by \(A^{-1}\text{.}\) It has the property that,
\begin{equation*} AA^{-1} = A^{-1} A = I_n \end{equation*}
A matrix which is not invertible is sometimes called singular, and a matrix which is invertible is called non-singular.
To verify that matrix \(B\) is the inverse of \(A\text{,}\) we can multiply \(B\) by \(A\) and verify that the result is the identity matrix. Further, for a square matrix \(A\text{,}\) \(AB = I_n\) if and only if \(BA = I_n\text{,}\) so it is sufficient to only compute one of the two products \(AB\) or \(BA\text{.}\)
EXERCISE.

Subsection 5.7.2 Matrix Inverses and Linear Transformations

Considering a matrix \(A\) as a linear transformation, its inverse \(A^{-1}\) is the inverse linear transformation in the sense of the inverse of a function. That is, its composition with \(A\) results in the identity transformation. This is quite literally encoded in the definition \(AA^{-1} = I_n\text{.}\) In the language of linear transformations (or functions), the inverse \(A^{-1}\) is the linear transformation such that for every vector \(\vec{x} \in \mathbb{R}^n\text{,}\)
\begin{equation*} A^{-1}(A\vec{x}) = A^{-1} A \vec{x} = I_n\vec{x} = \vec{x} \end{equation*}
The parentheses are to emphasize the function composition.

Subsection 5.7.3 Computing the Inverse of a Matrix (\(1 \times 1\) and \(2 \times 2\) Cases)

The next natural question is: how do you compute the inverse of a matrix? It turns out that it is non-trivial, especially for large matrices. First, start with the simplest case, of a \(1 \times 1\) matrix \(A = \begin{bmatrix} a \end{bmatrix}\text{.}\) We want to find the matrix \(B = \begin{bmatrix} b \end{bmatrix}\) such that,
\begin{equation*} AB = \begin{bmatrix} a \end{bmatrix} \begin{bmatrix} b \end{bmatrix} = \begin{bmatrix} ab \end{bmatrix} = \begin{bmatrix} 1 \end{bmatrix} \end{equation*}
In other words, \(ab = 1\text{.}\) Clearly, the entry \(b\) should be \(b = \frac{1}{a}\text{,}\) provided that \(a \neq 0\text{.}\) Thus,
\begin{equation*} \begin{bmatrix} a \end{bmatrix}^{-1} = \begin{bmatrix} \frac{1}{a} \end{bmatrix} \end{equation*}
and \(A\) is invertible if and only if \(a \neq 0\text{.}\)
Next, consider a \(2 \times 2\) matrix, of the form \(A = \begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix}\text{.}\) We want to find a matrix, say of the form \(B = \begin{bmatrix} x_{11} \amp x_{12} \\ x_{21} \amp x_{22} \end{bmatrix}\text{,}\) such that,
\begin{equation*} \begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix} \begin{bmatrix} x_{11} \amp x_{12} \\ x_{21} \amp x_{22} \end{bmatrix} = \begin{bmatrix} 1 \amp 0 \\ 0 \amp 1 \end{bmatrix} \end{equation*}
By the definition of matrix multiplication, this is equivalent to,
\begin{equation*} \begin{bmatrix} ax_{11} + bx_{21} \amp ax_{12} + bx_{22} \\ cx_{11} + dx_{21} \amp cx_{12} + dx_{22} \end{bmatrix} = \begin{bmatrix} 1 \amp 0 \\ 0 \amp 1 \end{bmatrix} \end{equation*}
Equating entries, this is equivalent to the system of 4 equations,
\begin{align*} ax_{11} \amp + bx_{21} \amp \amp \amp = 1\\ \amp \amp a_{12} \amp + bx_{22} \amp = 0\\ cx_{11} \amp + dx_{21} \amp \amp \amp = 0\\ \amp \amp cx_{12} \amp + dx_{22} \amp = 1 \end{align*}
That is, finding the inverse of a \(2 \times 2\) matrix is equivalent to solving this system of 4 equations in 4 unknowns. The augmented matrix is,
\begin{equation*} \begin{bmatrix} a \amp b \amp 0 \amp 0 \amp 1 \\ 0 \amp 0 \amp a \amp b \amp 0 \\ c \amp d \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp c \amp d \amp 1 \end{bmatrix} \end{equation*}
Converting this matrix to RREF,
\begin{equation*} \begin{bmatrix} 1 \amp 0 \amp 0 \amp 0 \amp \frac{d}{ad - bc} \\ 0 \amp 1 \amp 0 \amp 0 \amp -\frac{c}{ad - bc} \\ 0 \amp 0 \amp 1 \amp 0 \amp -\frac{b}{ad - bc} \\ 0 \amp 0 \amp 0 \amp 1 \amp \frac{a}{ad - bc} \end{bmatrix} \end{equation*}
provided that \(ad - bc \neq 0\text{.}\) Thus, the inverse matrix is,
\begin{equation*} \begin{bmatrix} x_{11} \amp x_{21} \\ x_{12} \amp x_{22} \end{bmatrix} = \begin{bmatrix} \frac{d}{ad - bc} \amp -\frac{b}{ad - bc} \\ -\frac{c}{ad - bc} \amp \frac{a}{ad - bc} \end{bmatrix} = \frac{1}{ad - bc} \begin{bmatrix} d \amp -c \\ -b \amp a \end{bmatrix} \end{equation*}
In summary,
Intuitively, \(A^{-1}\) is obtained from \(A\) by exchanging the diagonal entries, negating the “off-diagonal” entries, and dividing by \(ad - bc\text{.}\)
The previously explained steps could be considered a “derivation”, but a proof of this formula only requires that we verify that \(A^{-1} A = I_2\) using matrix multiplication. Indeed,
\begin{align*} A^{-1} A \amp = \frac{1}{ad - bc} \begin{bmatrix} d \amp -b \\ -c \amp a \end{bmatrix} \begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix}\\ \amp = \frac{1}{ad - bc} \begin{bmatrix} ad - bc \amp 0 \\ 0 \amp ad - bc \end{bmatrix}\\ \amp = \begin{bmatrix} 1 \amp 0 \\ 0 \amp 1 \end{bmatrix} = I_2 \end{align*}
If \(ad - bc = 0\text{,}\) then the matrix \(A\) does not have an inverse.

Subsection 5.7.4 Solving Systems with Inverse Matrices

Inverse matrices also provide a conceptually intuitive method of solving a linear system \(A\vec{x} = \vec{b}\text{.}\) If the coefficient matrix \(A\) is invertible, then multiplying both sides of the equation on the left by \(A^{-1}\text{,}\) we can solve for \(\vec{x}\text{,}\)
\begin{align*} A^{-1} (A \vec{x}) \amp = A^{-1} \vec{b}\\ I \vec{x} \amp = A^{-1} \vec{b}\\ \vec{x} \amp = A^{-1} \vec{b} \end{align*}
Thus, \(\vec{x} = A^{-1} \vec{b}\) is the unique solution to the equation. This leads to the following theorem.
Intuitively, the uniqueness follows from the fact that matrix inverses are unique.
First, indeed \(\vec{x} = A^{-1} \vec{b}\) is a solution, as
\begin{equation*} A(A^{-1} \vec{b}) = (AA^{-1}) \vec{b} = I_n \vec{b} = \vec{b} \end{equation*}
so it indeed satisfies the equation. For uniqueness, let \(\vec{u}\) be any solution, so that \(A\vec{u} = \vec{b}\text{.}\) Then, multiplying on the left by \(A^{-1}\text{,}\) \(A^{-1} A \vec{u} = A^{-1} \vec{b}\text{,}\) or \(I_n \vec{u} = A^{-1} \vec{b}\text{,}\) and so \(\vec{u} = A^{-1} \vec{b}\text{.}\)
However, the formula \(\vec{x} = A^{-1} \vec{b}\) is is rarely used in practice to solve a system of equations. This is because, especially for large systems, computing inverses is very time-consuming. The alternative method of row reduction is much faster. A possible exception is for systems of two equations in two unknowns.
Instead, the importance of this theorem is conceptual, because it provides the condition of invertibility that guarentees a unique solution.
Recall that previously, we solved the \(2 \times 2\) system,
\begin{align*} a x_1 + b x_2 \amp = y_1 \\ c x_1 + d x_2 \amp = y_2 \end{align*}
In matrix form,
\begin{equation*} \begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \end{equation*}
which has a unique solution,
\begin{equation*} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} \frac{1}{\Delta} \brac{d y_1 - b y_2} \\ \frac{1}{\Delta} \brac{a y_2 - c y_1} \end{bmatrix} \end{equation*}
provided that \(\Delta = ad - bc \neq 0\text{.}\) Notice that this unique solution is the product of the inverse of \(A\) and the constant vector,
\begin{equation*} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \frac{1}{\Delta} \begin{bmatrix} d \amp -b \\ -c \amp a \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \end{equation*}

Subsection 5.7.5 Inverses of Larger Matrices

The reasoning used to determine the inverse of a \(2 \times 2\) matrix involved solving a linear system with 4 equations and 4 variables. Using a slightly different perspective, it can be discovered how to generalize this reasoning to higher dimensions.
Again, consider the matrix equation,
\begin{equation*} \begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix} \begin{bmatrix} x_{11} \amp x_{12} \\ x_{21} \amp x_{22} \end{bmatrix} = \begin{bmatrix} 1 \amp 0 \\ 0 \amp 1 \end{bmatrix} \end{equation*}
where the variables are \(x_{11}, x_{12}, x_{21}, x_{22}\text{.}\) If we consider the perspective of multiplication as linear combinations of the columns of \(A\text{,}\) the product on the left-hand side is,
\begin{equation*} \begin{bmatrix} \begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix} \begin{bmatrix} x_{11} \\ x_{21} \end{bmatrix} \amp \begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix} \begin{bmatrix} x_{12} \\ x_{22} \end{bmatrix}\end{bmatrix} \end{equation*}
So, the matrix equation requries that,
\begin{equation*} \begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix} \begin{bmatrix} x_{11} \\ x_{21} \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \qquad \text{and} \qquad \begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix} \begin{bmatrix} x_{21} \\ x_{22} \end{bmatrix} = \begin{bmatrix} 0 \\ 1 \end{bmatrix} \end{equation*}
These are two linear systems, each with two equations and two unknowns each, and corresponding augmented matrices,
\begin{equation*} \left[\begin{array}{cc|c} a \amp b \amp 1 \\ c \amp d \amp 0 \end{array}\right] \qquad \text{and} \qquad \left[\begin{array}{cc|c} a \amp b \amp 0 \\ c \amp d \amp 1 \end{array}\right] \end{equation*}
At first, this seems just as complicated as a single system with 4 equations and 4 unknowns. However, the insight is that the coefficient matrix for the systems are the same. Thus, the elementary row operations used to solve one system will be the same as that used to solve the other. In this way, we can augment the systems together, and work with the augmented matrix,
\begin{equation*} \left[\begin{array}{cc|cc} a \amp b \amp 1 \amp 0 \\ c \amp d \amp 0 \amp 1 \end{array}\right] \end{equation*}
That is, \(\begin{bmatrix} A \amp I_2 \end{bmatrix}\text{.}\) Then, after performing row reduction, the resulting matrix should be of the form,
\begin{equation*} \left[\begin{array}{cc|cc} 1 \amp 0 \amp \ast \amp \ast \\ 0 \amp 1 \amp \ast \amp \ast \end{array}\right] \end{equation*}
Then, the entries of \(A^{-1}\) will be precisely the entires on the right-hand side of the augmented matrix. If \(A\) turns out to be not invertible (in the \(2 \times 2\) case, if \(ad - bc = 0\)), then the left-hand side of the augmented matrix will not reduce to \(I_2\text{,}\) but instead have some zero rows, indicating that the system of equations used to find the inverse doesn't have a solution.
All of the previous reasoning applies to general \(n \times n\) matrix \(A\text{.}\) If the inverse is say \(B = \begin{bmatrix} \vec{b}_1 \amp \dots \amp \vec{b}_n \end{bmatrix}\text{,}\) then in this case, the equation \(AB = I_n\) is equivalent to the \(n\) linear systems,
\begin{equation*} A\vec{b}_1 = \vec{e}_1 \qquad A\vec{b}_2 = \vec{e}_2 \quad \dots \quad A\vec{b}_n = \vec{e}_n \end{equation*}
each with \(n\) equations and \(n\) unknowns. These systems can be combined and solved together, by row reducing the augmented matrix formed by augmented the identity matrix onto the right-hand side,
\begin{equation*} \begin{bmatrix} A \mid I_n \end{bmatrix} \end{equation*}
Then, if \(A\) reduces to \(I_n\text{,}\) then \(A^{-1}\) is the resulting matrix on the right-hand side.
In summary, we get an algorithm for computing matrix inverses, using row reduction. To find the inverse of \(A\text{,}\) convert it to RREF using row reduction, and then perform the same row reduction steps (in the same order) on \(I_n\text{,}\) and the result is \(A^{-1}\text{.}\) These two steps are combined by performing row reduction on the augmented matrix \(\begin{bmatrix} A \mid I_n \end{bmatrix}\text{.}\)
In short,
\begin{equation*} \begin{bmatrix} A \mid I_n \end{bmatrix} \longrightarrow \begin{bmatrix} I_n \amp A^{-1} \end{bmatrix} \end{equation*}
After using this procedure, the result can be verified by checking that \(AA^{-1} = I\text{.}\)
This procedure is a fast method for determining the inverse of a matrix, especially for large matrices. This is because it essentially just involves row reduction.

Subsection 5.7.6 Matrix Inverses and Elementary Matrices

The previous reasoning can be generalized and made more precise using elementary matrices.
Suppose that \(A \sim I_n\text{,}\) that is, there is a sequence of elementary row operations that converts \(A\) to \(I_n\text{.}\) In other words, there are a sequence of elementary matrices \(E_1, \dots, E_p\) (say there are \(p\) steps) such that,
\begin{equation*} A \sim E_1 A \sim E_2 (E_1 A) \sim \cdots \sim E_p(E_{p-1} \cdots E_1 A) = I_n \end{equation*}
\begin{equation*} E_p E_{p-1} \cdots E_2 E_1 A = I_n \end{equation*}
\(E_p \cdots E_1\)\(A\)\(A^{-1} = E_p \cdots E_1\text{.}\)
In other words, \(A^{-1}\) is formed by multiplying the sequence of elementary matrices used to convert \(A\) to \(I_n\text{.}\) This is equivalently \(A^{-1} = E_p \cdots E_1 I_n\text{,}\) i.e. \(A^{-1}\) is obtained by applying the row operations \(E_1, \dots, E_p\) successively to \(I_n\text{,}\) in the same sequence requried to transform \(A\) to \(I_n\text{.}\)
In fact, the converse is also true. If \(A\) is invertible, then \(A\) is row equivalent to \(I_n\text{.}\) In summary,
If \(A\) is invertible, then the equation \(A\vec{x} = \vec{b}\) has a solution for every \(\vec{b}\text{,}\) and \(A\) has a pivot position in every row (\(n\) rows). Since \(A\) is square (in particular has \(n\) columns), this implies that the pivot positions are on the main diagonal. Thus, the RREF of \(A\) is \(I_n\text{,}\) or \(A \sim I_n\text{.}\)

Subsection 5.7.7 Inverse of a \(3 \times 3\) Matrix

The explicit formula for a \(3 \times 3\) matrix is more difficult to determine. Consider a \(3 \times 3\) matrix \(A\text{,}\)
\begin{equation*} A = \begin{bmatrix} a_{11} \amp a_{12} \amp a_{13} \\ a_{21} \amp a_{22} \amp a_{23} \\ a_{31} \amp a_{32} \amp a_{33} \end{bmatrix} \end{equation*}
Using the process of row reduction on a \(3 \times 3\) matrix, the formula turns out to be,
Derivation (FINISH).
\begin{equation*} A^{-1} = \frac{1}{\Delta} \begin{bmatrix} a_{22} a_{33} - a_{23} a_{32} \amp -(a_{12} a_{33} - a_{32} a_{13}) \amp a_{12} a_{23} - a_{22} a_{13} \\ -(a_{21} a_{33} - a_{23} a_{31}) \amp a_{11} a_{33} - a_{13} a_{21} \amp -(a_{11} a_{23} - a_{13} a_{21}) \\ a_{21} a_{32} - a_{22} a_{31} \amp -(a_{11} a_{32} - a_{12} a_{31}) \amp a_{11} a_{22} - a_{12} a_{21} \end{bmatrix} \end{equation*}
where,
\begin{equation*} \Delta = a_{11} a_{22} a_{33} - a_{11} a_{32} a_{23} + a_{12} a_{23} a_{32} - a_{12} a_{21} a_{33} + a_{13} a_{21} a_{32} - a_{13} a_{22} a_{31} \end{equation*}
Then, \(A\) is invertible if and only if \(\Delta \neq 0\text{.}\) This explicit formula is not to be memorized. There are many patterns in the entries of this matrix, but the full insight can only be understood after considering the later topic of determinants.

Subsection 5.7.8 Properties of Inverse Matrices

By definition, the inverse of \(A^{-1}\) is the matrix \(B\) such that \(A^{-1} B = BA^{-1} = I_n\text{.}\) Clearly, \(A\) satisfies this, as \(A^{-1} A = A A^{-1} = I_n\text{.}\)
\begin{equation*} (AB)(B^{-1} A^{-1}) = A(BB^{-1})A^{-1} = A I_n A^{-1} = AA^{-1} = I_n \end{equation*}
Thus, \((AB)^{-1} = B^{-1}A^{-1}\text{.}\)
Intuitively, considering matrices \(A, B\) as linear transformations, this statement follow from the fact that the inverse of a composition of functions is the composition of the inverse functions in reverse order, or \((f \circ g)^{-1} = g^{-1} \circ f^{-1}\text{.}\)
This property can be generalized,
Intuitively, the inverses will “cancel out” from inside out,
\begin{align*} \amp (A_1 A_2 \cdots A_{n-1} A_n)(A_n^{-1} A_{n-1}^{-1} \cdots A_2^{-1} A_1^{-1})\\ \amp = A_1 A_2 \cdots A_{n-1} (A_n A_n^{-1}) A_{n-1}^{-1} \cdots A_2^{-1} A_1^{-1}\\ \amp = A_1 A_2 \cdots A_{n-2} (A_{n-1} A_{n-1}^{-1}) A_{n-2}^{-1} \cdots A_2^{-1} A_1^{-1}\\ \amp = \dots\\ \amp = A_1 (A_2 A_2^{-1}) A_1^{-1}\\ \amp = A_1 A_1^{-1}\\ \amp = I_n \end{align*}
A more precise argument uses mathematical induction.

Exercises 5.7.9 Exercises

1.

In Python, implement a program to compute the inverse of an arbitrary square matrix, using row reduction.