Gaussian Elimination and Matrices

Section 2.4 Gaussian Elimination and Matrices

Then, Gaussian elimination can be reframed in terms of modifying rows of the augmented matrix of a system.

Subsection 2.4.1 Elementary Row Operations

Definition 2.4.1. Elementary Row Operation.

Interchange. Interchange two rows in the matrix.
Scaling. Multiply (or divide) an equation by a non-zero real number.
Replacement. Add to one equation a multiple of another equation.

These row operations are denoted symbolically by,

\(R_i \leftrightarrow R_j\text{,}\) interchanging row \(i\) and row \(j\text{.}\)
\(\displaystyle kR_i \rightarrow R_i\)
\(\displaystyle R_i + k R_j \rightarrow R_i\)

Performing Gaussian elimination with this matrix form removes the need of repeatedly writing the variables. This process on a matrix is also called row reduction.

Two matrices are row equivalent if one can be obtained from the other by a (finite) sequence of row oeprations.
A zero row has only zeros. A non-zero row is a row with at least one non-zero entry.
The leading entry of a non-zero row is the left-most non-zero entry. It is also called the leading coefficient, or pivot.

Subsection 2.4.2 Row Echelon Form

After Gaussian elimination is performed on a matrix, the staircase or triangular form of the matrix is called row echelon form.

Definition 2.4.2.

A matrix is in row echelon form (or simply echelon form, or REF) if,

Any zero rows are the bottom of the matrix.
the leading entry of every non-zero row is always strictly to the right of any leading entry above it.

Matrices in row echelon form have the broad form,

\begin{equation*} \begin{bmatrix} a \amp \ast \amp \ast \amp \ast \amp \ast \amp \ast \\ 0 \amp b \amp \ast \amp \ast \amp \ast \amp \ast \\ 0 \amp 0 \amp 0 \amp c \amp \ast \amp \ast \\ 0 \amp 0 \amp 0 \amp 0 \amp 0 \amp d \\ 0 \amp 0 \amp 0 \amp 0 \amp 0 \amp 0 \end{bmatrix} \end{equation*}

\(a, b, c, d\)

Every entry below a pivot is 0.
The leading entries are “staggered to the right” as we go down.
All zero rows are at the bottom.

The terminology echelon is due to the step-like pattern of the non-zero entries of the matrix. The process of transforming a matrix into row echelon form is called row reduction. In summary,

\begin{equation*} \begin{pmatrix} \text{doing operations on a} \\ \text{system of equations} \\ \text{to convert it to triangular form} \end{pmatrix} \quad \iff \quad \begin{pmatrix} \text{doing row operations on a} \\ \text{augmented matrix} \\ \text{to convert it to row echelon form} \end{pmatrix} \end{equation*}

Definition 2.4.3.

A pivot position in a matrix \(A\) is a location which corresponds to a leading entry in the row echelon form of \(A\text{.}\) A pivot column is a column of \(A\) which contains a pivot position.

Subsection 2.4.3 Summary of Row Reduction

Consider a matrix \(A\text{.}\) Broadly, row reduction involves going column by column, reducing the matrix to a simpler matrix at each step.

First, determine the left-most non-zero column, which will be the first pivot column. In this column, choose a non-zero entry to be the pivot entry. If necessary, use a row exchange to positive the pivot at the top of the column.
\begin{equation*} \begin{bmatrix} \mid \amp \mid \amp a \amp \ast \amp \ast \amp \ast \\ 0 \amp 0 \amp \ast \amp \ast \amp \ast \amp \ast \\ \mid \amp \mid \amp \ast \amp \ast \amp \ast \amp \ast \end{bmatrix} \end{equation*}
In terms of completing the algorithm, it doesn't matter which non-zero entry we select to be the pivot. For hand-computation, it is typically easier to select a 1 if that is available, as that will make the computations a bit more simple.
Use row replacement operations to create zeros in all positions below the pivot. That is, add multiples of the top row to rows below it as to remove all entries below the pivot entry.
\begin{equation*} \begin{bmatrix} \mid \amp \mid \amp a \amp \ast \amp \ast \amp \ast \\ 0 \amp 0 \amp 0 \amp \ast \amp \ast \amp \ast \\ \mid \amp \mid \amp 0 \amp \ast \amp \ast \amp \ast \end{bmatrix} \end{equation*}
Repeat the previous steps with the remaining submatrix below and to the right of the pivot,
\begin{equation*} \begin{bmatrix} \mid \amp \mid \amp a \amp \ast \amp \ast \amp \ast \\ 0 \amp 0 \amp 0 \amp b \amp \ast \amp \ast \\ \mid \amp \mid \amp 0 \amp 0 \amp \ast \amp \ast \end{bmatrix} \qquad \longrightarrow \qquad \underbrace{\begin{bmatrix} b \amp \ast \amp \ast \\ 0 \amp \ast \amp \ast \end{bmatrix}}_{\text{submatrix}} \end{equation*}
(In this way, row reduction is a recursive algorithm, as row reduction of a matrix depends on performing row reduction on some smaller matrix). Repeat this step until there are no more non-zero rows to modify.

Finally, to convert the matrix to RREF,

Starting with the right-most pivot, use replacement row operations to create zeros above each pivot. If a pivot is not 1, make it 1 using a scaling operation.

Steps 1-3 to convert a matrix to REF is called the forward phase of Gaussian elimination. Step 4 to convert it to the unique RREF is called the backwards phase.

Subsection 2.4.4 Remark on Gaussian Elimination

Gaussian elimination, despite its simplicity, is one of the most important algorithms in linear algebra. Notice that Gaussian elimination is not really much more than high school algebra, but done in a systematic way.

Subsection 2.4.5 Systems with Infinitely Many Solutions

If the REF of the augmented matrix of a system has a zero row, then the system has infinitely many solutions.

Example 2.4.4.

Consider the matrix in REF,

\begin{equation*} \begin{bmatrix} 1 \amp 0 \amp -5 \amp 1 \\ 0 \amp 1 \amp 1 \amp 4 \\ 0 \amp 0 \amp 0 \amp 0 \end{bmatrix} \end{equation*}

which is the augmented matrix of the linear system,

\begin{equation*} \begin{array}{rrrrr} x \amp \amp -5z \amp = \amp 1 \\ \amp y \amp + z \amp = \amp 4 \\ \amp \amp 0 \amp = \amp 0 \end{array} \end{equation*}

At this point, notice that the third equation is the true statement (or tautology) \(0 = 0\text{.}\) Intuitively, this means that while there were originally 3 equations that constrained the possible solutions, one of those equations were superfluous. Then, any solution \((x,y,z)\) only has to satisfy the remaining two equations. In short, there are 3 variables, but only 2 constraints. In this case, for example, if we choose an arbitrary value of \(z\text{,}\) say \(z = t\text{,}\) then we can solve for \(x\) and \(y\) in equation 1 and 2, respectively, and get a solution of the system. In particular,

\begin{equation*} \begin{cases} x = 1 + 5t \\ y = 4 - t \\ z = t \end{cases} \end{equation*}

Then, any tuple of the form \((1+5t,4-t,t)\) (where \(t \in \mathbb{R}\) is any number) will be a solution to the system. In particular, there are infinitely many solutions of the system. Here, \(z\) is called a free variable, because it can take on any value to generate a solution of the system.

Definition 2.4.5.

Variables which correspond to pivot columns are called basic variables. The other variables are called free variables.

In general, for the augmented matrix of a system in REF, any non-pivot column corresponds to a free variable (besides the right-most constant column). Free variables can take on any value, and then we can solve for the basic variables in terms of the free variables.

Subsection 2.4.6 Systems with No Solution

If the REF of the augmented matrix of a system has a row which corresponds to a false statement, then the system has no solution.

Example 2.4.6.

Consider the matrix in REF,

\begin{equation*} \begin{bmatrix} 1 \amp -3 \amp 1 \amp 4 \\ 0 \amp -1 \amp -4 \amp 7 \\ 0 \amp 0 \amp 0 \amp 2 \end{bmatrix} \end{equation*}

which corresponds to the linear system,

\begin{equation*} \begin{array}{rrrrr} x \amp -3y \amp + z \amp = \amp 1 \\ \amp -y \amp -4 z \amp = \amp 4 \\ \amp \amp 0 \amp = \amp 2 \end{array} \end{equation*}

The 3rd equation is the false statement (or, contradiction) \(0 = 2\text{,}\) so the system has no solution. This is because, no matter what the values of \(x, y, z\text{,}\) it is not possible for the third equation to be satisfied. A solution of the system must make every equation true, so no such solution exists. Thus, the system is inconsistent.

Subsection 2.4.7 Reduced Row Echelon Form

The row ecehlon form of a matrix is not unique, in that, by doing different sequences of row operations, the final row echelon forms can be different. However, we can continue to apply row operations in order to simplify the matrix further into reduced row ecehlon form.

Definition 2.4.7.

A matrix is in reduced row echelon form (RREF) if,

The matrix is in REF.
Every pivot is 1.
Every pivot is the only non-zero entry in its column.

In short, RREF goes further than REF by requiring that every pivot is 1 (rather than just any number), and that there are all 0's below and above each pivot.

Example 2.4.8.

A classic example of a matrix in RREF is a matrix of the form,

\begin{equation*} \begin{bmatrix} 1 \amp 0 \amp 0 \amp a \\ 0 \amp 1 \amp 0 \amp b \\ 0 \amp 0 \amp 1 \amp c \end{bmatrix} \end{equation*}

This represents a system of 3 linear equations in 3 unknowns. In particular the system,

\begin{equation*} \begin{array}{rrrrr} 1x \amp + 0y \amp + 0z \amp = \amp a \\ 0x \amp + 1y \amp + 0z \amp = \amp b \\ 0x \amp + 0y \amp + 1z \amp = \amp c \end{array} \end{equation*}

In other words,

\begin{equation*} \begin{array}{rrr} x \amp = \amp a \\ y \amp = \amp b \\ z \amp = \amp c \end{array} \end{equation*}

from which the unique solution can be read right off as \((a,b,c)\text{.}\)

Basically, RREF “builds in” the back substitution into the row reduction procedure.

Example 2.4.9.

Consider the system,

\begin{equation*} \begin{array}{rrrrr} 2x \amp + y \amp + z \amp = \amp 1 \\ 4x \amp + y \amp \amp = \amp -2 \\ -2x \amp + 2y \amp + z \amp = \amp 7 \end{array} \end{equation*}

which has augmented matrix,

\begin{equation*} \left[\begin{array}{ccc|c} 2 \amp 1 \amp 1 \amp 1 \\ 4 \amp 1 \amp 0 \amp -2 \\ -2 \amp 2 \amp 1 \amp 7 \end{array}\right] \end{equation*}

After reducing to REF, one possible form is,

\begin{equation*} \left[\begin{array}{ccc|c} 2 \amp 1 \amp 1 \amp 1 \\ 0 \amp -1 \amp -2 \amp -4 \\ 0 \amp 0 \amp -4 \amp -4 \end{array}\right] \end{equation*}

We could perform back substitution as before, but we can also instead convert to RREF. Doing so gives,

\begin{equation*} \left[\begin{array}{ccc|c} 1 \amp 0 \amp 0 \amp -1 \\ 0 \amp 1 \amp 0 \amp 2 \\ 0 \amp 0 \amp 1 \amp 1 \end{array}\right] \end{equation*}

from which the unique solution \((-1,2,1)\) can be read right off.

Theorem 2.4.10. Uniqueness of RREF.

Every matix is row equivalent to a unique matrix in RREF.

Proof.

EXERCISE.

The process of converting a matrix into RREF is sometimes called Gauss-Jordan elimination, due to Gauss and also Wilhelm Jordan.

Subsection 2.4.8 Existence and Uniqueness, Summary

Theorem 2.4.11.

A linear system is consistent if and only if the right-most column of its augmented matrix is not a pivot column. In other words, if and only if the REF of the augmented matrix has no row of the form,

\begin{equation*} \begin{bmatrix} 0 \amp \dots \amp 0 \amp b \end{bmatrix} \qquad \text{where } \end{equation*}

If a linear system in consistent, then the solution set is either:

A unique solution, if there are no free variables.
Infinitely many solutions, if there are at least one free variable.

Subsection 2.4.9 Summary of Solving Systems with Gaussian Elimination

Determine the augmented matrix of the system \(\begin{bmatrix} A \mid b \end{bmatrix}\text{.}\)
Convert the matrix into an equivalent matrix in REF, using row reduction. If the system is inconsistent, then we are done.
If the system is consistent, then convert the matrix into RREF.
Write the system of equations corresponding to the matrix in RREF.
Solve for each basic variable in terms of any free variables.

Subsection 2.4.10 Historical Note

Gaussian elimination is named after German mathematician Carl Friedrich Gauss (1777-1855), who developed the method in his work and helped develop the formal theory of matrices. German engineer Wilhelm Jordan (1842-1899) helped popularize the method.

However, similar methods of elimination were developed as early as around 250-100 BC by the ancient Chinese, during the Han dynasty. The method is included in the book “The Nine Chapters on the Mathematical Art” (Jiuzhang suanshu), one of the earliest surviving mathematical texts from China. The ancient chinese also had a device called a counting board, which contained the array of coefficients, and allowed for calculations similar to that with matrices.

Subsection 2.4.11 Number of Operations for Row Reduction (Flops)

Example 2.4.12.

Consider the number of operations required to perform row reduction. Let \(A\) be an \(n \times n\) matrix. To consider the worst case, we will assume that \(A\) has all non-zero entries (if the matrix has some zero entries, this reduces the computations required).

The first step of elimination requires making zeros in the first column, through row replacement operations. Each row replacement operation (\(R_i + k R_j \rightarrow R_i\)) basically involves a single multiplication (of \(k\) times an entry in \(R_j\)) and a single addition (\(R_i + k R_j\)). For simplicity, we will assume that each operation (addition, subtraction, multiplication, division) all take roughly the same amount of time.

We call a single one of these operations a flop (short for floating point operation). Traditionally, flop referred only to a multiplication or division, because these operations took somewhat more time and the more simple operations of addition and subtraction, and so the latter could be ignored, relatively speaking. With modern technology, all operations take about the same time.

Then, modifying a single entry with a replacement row operation requires 2 flops. Then, the first step in row reduction requires modifying \(n-1\) rows of the matrix (every row except the pivot row), and each of these rows has \(n\) entries. So, the first step requires \(2n(n-1)\) flops.

The second step involves a submatrix of size \((n-1) \times (n-1)\text{.}\) Here, row reduction involves \(n-2\) rows, each with \(n-1\) entries, for a total for \(2(n-1)(n-2)\) flops.

Continuing in this way, the subsequent steps require \(2(n-2)(n-3), 2(n-3)(n-4), \dots\) steps. The final step, acting on a \(2 \times 2\) submatrix, requries \(2 \cdot 2 \cdot 1\) flops. Thus, the total number of flops is,

\begin{equation*} = 2n(n-1) + 2(n-1)(n-2) + \dots + 2 \cdot 3 \cdot 2 + 2 \cdot 2 \cdot 1 \end{equation*}

In summation notation, this can be written as (in increasing order),

\begin{equation*} = \sum_{k=1}^{n-1} 2 k(k + 1) \end{equation*}

This sum can be evaluated to find an explicit formula,

\begin{align*} \amp = 2 \sum_{k=1}^{n-1} (k^2 + k)\\ \amp = 2 \brac{\sum_{k=1}^{n-1} k^2 + \sum_{k=1}^{n-1} k} \end{align*}

Then, we can apply the summation formulas,

\begin{equation*} \sum_{k=1}^n k^2 = \frac{n(n+1)(2n+1)}{6} \qquad \sum_{k=1}^n k = \frac{n(n+1)}{2} \end{equation*}

Here, the upper limit of the summation is \(n-1\) rather than \(n\text{,}\) so we substitute \(n-1\) for \(n\) when applying the formulas,

\begin{align*} \amp = 2 \brac{\frac{(n-1) \cdot n \cdot (2(n-1)+1)}{6} + \frac{(n-1) \cdot n}{2}} \end{align*}

Then, simplifying,

\begin{align*} \amp = 2 \brac{\frac{n(n-1)(2n-1) + 3n(n-1)}{6}}\\ \amp = 2 \brac{\frac{n(n-1)((2n-1)+3)}{6}}\\ \amp = 2 \brac{\frac{n(n-1)(2n+2)}{6}}\\ \amp = \frac{2n(n-1)(n+1)}{3} \end{align*}

Then, consider the asymptotic behavior. When \(n\) is large,

\begin{equation*} \frac{2n(n-1)(n+1)}{3} \approx \frac{2n^3}{3} \end{equation*}

In particular, about \(n^3\) operations.

Next, consider back substitution. Here, the first step requires making a 0 everywhere above the bottom-right entry in the \(n\)th column. There are \(n-1\) entries, and again each uses a single row replacement, which is made up of 2 flops. So, \(2(n-1)\) flops. The next step involves \(n-2\) entries, and so has \(2(n-2)\) flops. Continuing in this way, the last step involving changing the single entry about the pivot in the \(2nd\) column, for \(2 \cdot 1\) flops. Then, in total, the number of flops is,

\begin{align*} \amp = 2(n-1) + 2(n-2) + \dots + 2 \cdot 2 + 2 \cdot 1\\ \amp = \sum_{k=1}^{n-1} 2k\\ \amp = 2 \sum_{k=1}^{n-1} k\\ \amp = 2 \cdot \frac{n(n-1)}{2}\\ \amp = n(n-1) \end{align*}

As \(n\) becomes large, this is approximately \(n^2\) flops.

Subsection 2.4.12 Computing Note

A computer program performing row reduction with floating point numbers has to minimize rounding error when performing numerical operations. For this reason, they will typically choose pivot rows using not only by the necessary condition that the pivot entry is non-zero, but additionally it will choose the largest entry in the column, in absolute value. In other words, they will avoid using pivots which are close to 0.

Exercises 2.4.13 Exercises

1.

In Python, implement a Gaussian elimination algorithm, to be used on a matrix formed by a 2D array.

2.

In Python, extend the Gaussian elimination algorithm to perform Gauss-Jordan elimination.

Linear Algebra Notes

Search Results: