SCLA QR (Gram-Schmidt) Decomposition

Section 2.2 QR (Gram-Schmidt) Decomposition

The QR decomposition begins with any matrix and describes it as a product of a unitary matrix (\(Q\)) and an upper triangular matrix (\(R\)). Hence the customary shorthand name, “QR”. If the LU decomposition is reminiscent of reduced row-echelon form, then the QR decomposition is reminiscent of the Gram-Schmidt process (see Subsection Subsection 0.GSP).

Subsection 2.2.1 QR Decomposition via Gram-Schmidt

The Gram-Schmidt procedure is based on Theorem GSP. We begin with a set of linearly independent vectors and progressively convert them into a new set of nonzero vectors that form an orthogonal set, which can be easily converted to an orthonormal set. An orthonormal set is a wondrous thing, so the other portion of the conclusion is often overlooked. The new set of vectors spans the same subspace as the space spanned by the original set. This is half the reason that the Gram-Schmidt procedure is so useful.

As a preview of our main theorem, let's convert this idea into the language of matrices for a special case. Let \(A\) be a square matrix of size \(n\text{.}\) Take the columns of \(A\) as a set of vectors, which will form a basis of \(\complex{n}\text{.}\) Apply the Gram-Schmidt procedure to this set of \(n\) vectors. The manufacture of the \(i\)-th vector of this new set is the \(i\)-th vector of the original set, added to a linear combination of vectors \(1\) through \(i-1\) of the new set. If we recursively unpack these linear combinations, we can express each new vector as a linear combination of vectors \(1\) through \(i\) of the original set, where the \(i\)-th vector has a coefficient of \(1\text{.}\) Record the scalars of this linear combination in a column vector, whose last nonzero entry is \(1\text{.}\) Make these column vectors the columns of a square matrix, \(R^\prime\text{,}\) of size \(n\text{.}\) Define \(Q=AR^\prime\text{.}\)

By the Gram-Schmidt procedure, the columns of \(Q\) are an orthogonal set of nonzero vetors, and so \(\adjoint{Q}Q\) will be a diagonal matrix with nonzero entries. The matrix \(R^\prime\) is square, upper-triangular, and each diagonal entry is \(1\text{.}\) Hence \(R^\prime\) is invertible, so let \(R\) denote the inverse, which is again upper-triangular with diagonal entries equal to \(1\text{.}\) We then obtain \(A=QR\text{.}\) It is a simple matter to scale the columns of \(Q\) to form an orthonormal set, and the requisite scaling of the columns of \(R^\prime\) will not impede the existence of \(R\text{,}\) though we can only claim diagonal entries are nonzero. In this way, we can claim that \(Q\) is a unitary matrix.

A QR decomposition can be created for any matrix — it need not be square and it need not have full rank. The matrix \(Q\) is unitary, and \(R\) is upper triangular. Thus, each column of \(A\) can be expressed as a linear combination of the columns of \(Q\text{,}\) which form an orthonormal basis. So the column space of \(A\) is spanned by an orthonormal subset of the columns of \(Q\text{,}\) giving us the essence of the Gram-Schmidt procedure without the hypothesis that our original set is linearly independent. For the statement of Theorem GSP, it was a convenience to hypothesize that \(S\) is linearly independent. Can you examine the proof and see what changes are required if we lift this hypothesis?

We now state, and prove, a sequence of theorems which solidify the discussion above and generalize to rectangular matrices that may not have full rank.

Theorem 2.2.1.

Suppose that \(A\) is an \(m\times n\) matrix of rank \(n\text{.}\) Then there exists an \(m\times n\) matrix \(Q\) whose columns form an orthonormal set, and an upper-triangular matrix \(R\) of size \(n\) with positive diagonal entries, such that \(A = QR\text{.}\)

Proof.

Outline: Use Gram-Schmidt to successively build the columns. Scale each column by positive/negative norm to get positive diagonal entries in \(R\text{.}\)

Notice that the rank \(n\) condition of Theorem Theorem 2.2.1 necessarily implies \(m\geq n\text{.}\) We can expand \(Q\) and \(R\) simultaneously to get a decomposition where \(Q\) is a unitary matrix.

The column space of a matrix is an important property of a matrix. For example, the column space of the coefficient matrix of a system of equations is the set of vectors that can be used to form a consistent system when paired with the coefficient matrix (Theorem CSCS). Not only does \(Q\) have a column space equal to that of \(A\text{,}\) the first \(i\) columns of \(Q\) are a basis for the space spanned by the first \(i\) columns of \(A\text{,}\) for \(1\leq i\leq n\text{.}\)

Theorem 2.2.2.

Suppose that \(A\) is an \(m\times n\) matrix of rank \(n\text{.}\) Then there exists a unitary matrix \(Q\) of size \(m\) and an upper-triangular \(m\times n\) matrix \(R\) with positive diagonal entries such that \(A = QR\text{.}\)

Proof.

Begin with a decomposition \(A=Q^\prime R^\prime\) as given by Theorem Theorem 2.2.1. Create the matrix \(Q\) by adding \(m-n\) columns to \(Q^\prime\) by the following process. Find a vector outside the span of the current set of columns (Theorem ELIS). Apply the Gram-Schmidt procedure (Theorem GSP) to the set of columns plus this one new vector. The current columns will be unaffected by an additional application of the Gram-Schmidt procedure (so in a practical application it is unneccessary). The one additional vector will be orthogonal to the others, and can be scaled to have norm \(1\text{.}\) add this vector as a new column of \(Q\text{,}\) preserving the property that the columns are an orthonormal set.

For each of the \(m-n\) new columns, add a new zero row to \(R^\prime\) creating the \(m\times n\) matrix \(R\text{,}\) with \(QR = Q^\prime R^\prime = A\text{.}\)

The decomposition of Theorem Theorem 2.2.1 is referred to as a thin decomposition, while the decomposition of Theorem Theorem 2.2.2 is referred to as a full decomposition.

What happens if \(A\) does not have full rank? Then there will be some relation of linear dependence on the columns of the matrix. In the course of working the Gram-Schmidt procedure, as in the construction given for the proof of Theorem Theorem 2.2.1, this will be discovered when the newly created vector is the zero vector. No matter, we simply add any vector as the next column of \(Q\) which will preserve the columns as an orthonormal set. This vector can be determined in a fashion entirely similar to the device used in the proof of Theorem Theorem 2.2.2. However, this nearly arbitrary choice outside the span of the current set of columns, requires that we add a row of zeros in \(R\) and lose our positive diagonal entry. The ultimate price for this is that certain uniqueness results [provisional cross-reference: uniqueness of QR] no longer hold.

For the most general case of a QR decomposition for a rectangular matrix of arbitrary rank, we could fashion a proof based on the discussion of the previous paragraph. However, while Gram-Schmidt provides a good theoretical grounding for the QR decomposition, in numerical contexts its performance is weak ([provisional cross-reference: TB, Lecture 7 to 10]). Instead we give a constructive proof based on Householder reflections.

Subsection 2.2.2 QR Decomposition via Householder Reflections

Householder reflections provide a usefull tool for creating the zero entries required in each column of \(R\text{,}\) and will provide an algorithm with better performance (speed and accuracy), compared to the more intuitive approach of the Gram-Schmidt procedure.

Theorem 2.2.3.

Suppose that \(A\) is an \(m\times n\) matrix. Then there exists a unitary matrix \(Q\) of size \(m\) and an upper-triangular \(m\times n\) matrix \(R\) such that \(A = QR\text{.}\)

Proof.

We proceed by induction on \(n\text{.}\) When \(n=1\) form the Householder matrix, \(P\text{,}\) which will convert all of the entries of the lone column of \(A\) into zeros, except the first one (Theorem Theorem 1.5.4). Denote the resulting column vector as the matrix \(R\text{,}\) which is upper triangular. Define \(Q=\adjoint{P}\text{,}\) which is unitary. Then \(A = \adjoint{P}PA=QR\text{.}\)

Now consider general \(n\text{.}\) Let \(\hat{A}\) be the first \(n-1\) columns of \(A\text{,}\) so by induction there is a unitary matrix \(\hat{Q}\) of size \(m\) and an upper-triangular matrix \(\hat{R}\) providing a QR decomposition of \(\hat{A}\text{.}\) Partition \(\hat{R}\) into a square upper triangular matrix \(R_1\) comprised of the first \(n-1\) rows of \(\hat{R}\text{,}\) leaving a second matrix with \(m-n+1\) rows and zero in every entry. Let \(\vect{v}\) denote the final column of \(A\) and compute \(\vect{w}=\adjoint{\hat{Q}}\vect{v}\text{.}\) Partition \(\vect{w}\) into two pieces by denoting the first \(n-1\) entries as \(\vect{w}_1\text{,}\) and entries \(n\) through \(m\) as \(\vect{w}_2\text{.}\) Compute the Householder matrix, \(P\text{,}\) of size \(m-n+1\) that takes \(\vect{w}_2\text{,}\) to a multiple of \(\vect{e}_1\) (Theorem Theorem 1.5.4). We have all the pieces in place, so now observe:

\begin{align*} \left[\begin{array}{c|c}I_{n-1}&0\\\hline0&P\end{array}\right] \adjoint{\hat{Q}}A &=\left[\begin{array}{c|c}I_{n-1}&0\\\hline0&P\end{array}\right] \adjoint{\hat{Q}} \left[\begin{array}{c|c}\hat{A}&\vect{v}\end{array}\right]\\ &= \left[\begin{array}{c|c}I_{n-1}&0\\\hline0&P\end{array}\right] \left[\begin{array}{c|c}\adjoint{\hat{Q}}\hat{A}&\adjoint{\hat{Q}}\vect{v}\end{array}\right]\\ &= \left[\begin{array}{c|c}I_{n-1}&0\\\hline0&P\end{array}\right] \left[\begin{array}{c|c}\hat{R}&\vect{w}\end{array}\right]\\ &= \left[\begin{array}{c|c}I_{n-1}&0\\\hline0&P\end{array}\right] \left[\begin{array}{c|c}R_1&\vect{w}_1\\\hline0&\vect{w}_2\end{array}\right]\\ &= \left[\begin{array}{c|c}R_1&\vect{w}_1\\\hline0&P\vect{w}_2\end{array}\right] \end{align*}

Notice that the first two matrices in these equations are unitary, and hence so is their product. Because of the action of the Householder matrix, the final matrix is upper triangular. If we move the unitary matrix to the other side of the equation, with an inverse (adjoint), we arrive at a QR decomposition of \(A\) and complete the induction step.

The inductive proof of Theorem Theorem 2.2.3 will automatically provide a recipe for a recursive procedure to compute a QR decomposition. But recursion is rarely, if ever, a good idea for efficiency. Fortunately, the proof suggests a better procedure. Work on \(A\) column-by-column, progressively using Householder matrices to “zero out” each column below the diagonal entry. The construction of such a Householder matrix will require a nonzero entry in the diagonal entry. A row swap, accomplished by a (unitary) permutation matrix, can move a nonzero entry from elsewhere in the column. Of course, if the whole remainder of the column is all zeros, then we can just move on to the next column.

Notice how the unitary matrices change over the course of these iterations. In later steps each unitary matrix has a larger identity matrix as the block in the upper-left corner of the matrix. So if a product of these matrices is required, it can be more efficient to begin building up the product starting with the last unitary matrix. Of course, order matters in matrix multiplication, and maybe you need the product in the reverse order. No matter, compute the transpose of the desired matrix, which necessarily will reverse the order of the product. Once the product is concluded, then transpose the result.

Example 2.2.4.

We illustrate the algorithm suggested by the proof of Theorem Theorem 2.2.3 on a \(4\times 4\) nonsingular matrix \(A\text{.}\) All of our computations were performed in Sage using algebraic numbers, so there were no approximations of irrational square roots as floating point numbers. However, we do give each matrix here with the final exact entries displayed as approximations with a limited number of places. At step \(i\text{,}\) \(1\leq i\leq 3\text{,}\) we display the unitary Householder matrix, \(Q_i\text{,}\) and the partially upper triangular matrix \(R_i=Q_i\dots Q_1A\text{.}\)

\begin{equation*} A= \begin{bmatrix} 4 & -5 & -7 & -4 \\ 1 & -1 & -1 & -2 \\ 0 & 0 & 1 & -1 \\ -1 & 5 & 8 & -8 \end{bmatrix} \end{equation*}

\begin{align*} i&=1 & Q_1&= \begin{bmatrix} 0.94 & 0.24 & 0.0 & -0.24 \\ 0.24 & 0.03 & 0.0 & 0.97 \\ 0.0 & 0.0 & 1.0 & 0.0 \\ -0.24 & 0.97 & 0.0 & 0.03 \end{bmatrix} & R_1&= \begin{bmatrix} 4.24 & -6.13 & -8.72 & -2.36 \\ 0.0 & 3.65 & 6.09 & -8.77 \\ 0.0 & 0.0 & 1.0 & -1.0 \\ 0.0 & 0.35 & 0.91 & -1.23 \end{bmatrix}\\ i&=2 & Q_2&= \begin{bmatrix} 1.0 & 0.0 & 0.0 & 0.0 \\ 0.0 & 0.99 & 0.0 & 0.01 \\ 0.0 & 0.0 & 1.0 & 0.0 \\ 0.0 & 0.01 & 0.0 & -0.99 \end{bmatrix} & R_2&= \begin{bmatrix} 4.24 & -6.13 & -8.72 & -2.36 \\ 0.0 & 3.67 & 6.15 & -8.85 \\ 0.0 & 0.0 & 1.0 & -1.0 \\ 0.0 & 0.0 & -0.32 & 0.39 \end{bmatrix}\\ i&=3 & Q_3&= \begin{bmatrix} 1.0 & 0.0 & 0.0 & 0.0 \\ 0.0 & 1.0 & 0.0 & 0.0 \\ 0.0 & 0.0 & 0.95 & -0.31 \\ 0.0 & 0.0 & -0.31 & -0.95 \end{bmatrix} & R_3&= \begin{bmatrix} 4.24 & -6.13 & -8.72 & -2.36 \\ 0.0 & 3.67 & 6.15 & -8.85 \\ 0.0 & 0.0 & 1.05 & -1.07 \\ 0.0 & 0.0 & 0.0 & -0.06 \end{bmatrix} \end{align*}

So,

\begin{equation*} R=R_3= \begin{bmatrix} 4.2426 & -6.1283 & -8.721 & -2.357 \\ 0.0 & 3.6667 & 6.1515 & -8.8485 \\ 0.0 & 0.0 & 1.0504 & -1.0701 \\ 0.0 & 0.0 & 0.0 & -0.0612 \end{bmatrix} \end{equation*}

and then \(R=Q_3Q_2Q_1A\text{,}\) so

\begin{equation*} Q=\adjoint{Q_1}\adjoint{Q_2}\adjoint{Q_3}= \begin{bmatrix} 0.9428 & 0.2121 & -0.0787 & -0.2448 \\ 0.2357 & 0.1212 & 0.2951 & 0.918 \\ 0.0 & 0.0 & 0.952 & -0.306 \\ -0.2357 & 0.9697 & -0.0197 & -0.0612 \end{bmatrix} \end{equation*}

Notice how the product for \(Q\) involves progressively simpler matrices (bigger identity matrix blocks) moving from right to left.

Subsection 2.2.3 Solving Systems with a QR Decomposition

Consider the problem of solving a linear system \(A\vect{x}=\vect{b}\text{.}\) Replace \(A\) by a QR decomposition, to obtain \(QR\vect{x}=\vect{b}\text{.}\) Now, the inverse of the unitary matrix \(Q\) is its adjoint, so we have the new system \(R\vect{x}=\adjoint{Q}\vect{b}\text{.}\) Since \(R\) is upper-triangular, the new system can be solved via back-solving, much as we did with an LU decomposition (Subsection Subsection 2.1.2).

A primary advantage of this approach is that a unitary matrix is generally well-behaved in numerical computations. Since the columns have unit norm, the entries can never be too large. Many of the problems of floating-point arithmetic come from combining numbers of grossly different magnitudes, and this is avoided in computations with a unitary matrix, such as the matrix-vector product in the rearrangement of a linear system in the previous paragraph.

What about the number of operations required? For a QR decomposition of an \(m\times n\) matrix via the Gram-Schmidt procedure the operation count is \(\orderof{2mn^2}\text{.}\) When \(m\geq n\) the determination of \(R\) can be accomplished in \(\orderof{2mn^2-\frac{2}{3}n^3}\) operations when using a sequence of Householder reflections ([provisional cross-reference: TB Lecture 10]). This includes storing the Householder vector for each iteration, but not the computation of the Householder matrix itself, or the accumulated product of all the Householder matrices into one grand unitary matrix \(Q\text{.}\) Notice that for a square matrix, when \(m=n\text{,}\) the count is \(\orderof{\frac{4}{3}n^3}\text{,}\) which is twice the cost of an LU factorization

If we use Householder reflections, then we must decide what we want to do with Householder matrices. If we are solving a linear system, we can successively multiply each new Householder matrix times the column vector on the right hand side of the equation. Thus, we modify both sides of the equation in the same way, producing equivalent systems as we go. In practice, the product of a column vector by a Householder matrix can be accomplished very efficiently, and not by explicitly forming the matrix. (See Exercise Checkpoint 1.5.8.) It should never be necessary to explicitly form a Householder matrix, but instead the Householder vector should be enough information to perform whatever computation is at hand.

Subsection 2.2.4 Uniqueness of QR Decompositions

Generally, when \(A\) has full rank and we require that diagonal entries of \(R\) be positive, we get a unique QR decomposition. Proving this will be easier once we learn about Cholesky decompositions (Section [provisional cross-reference: a section about Cholesky decompositions]), so we will defer a proof until then.

Subsection 2.2.5 Final Thoughts

Checkpoint 2.2.5.

Suppose that \(A\) is a nonsingular matrix with real entries. Then \(A\) has a QR decomposition where (1) \(A=QR\text{,}\) (2) \(Q\) is unitary, and (3) \(R\) is upper triangular with positive entries, (4) \(Q\) and \(R\) are also matrices with real entries. Prove that this decomposition is unique. (So you may assume such a decomposition exists, you are just being asked to establish uniqueness.)

Solution

Assume there are two such decompositions, so \(R_1Q_1=A=R_2Q_2\) and rearrange to obtain \(\inverse{R_2}R_1=Q_2\adjoint{Q_1}\text{.}\) The left-hand side will be an upper triangular matrix with positive diagonal entries. The right-hand side is a product of two unitary matrices, hence is unitary, and its columns form an orthonormal set. Now, viewing the columns of the left-hand matrix as an orthonormal set will allow you to progressively conclude that the columns (from left to right) are the columns of an identity matrix. Thus, \(\inverse{R_2}R_1=I\) and \(R_1=R_2\text{,}\) and similarly, \(Q_1=Q_2\text{.}\)