Why does row rank equal column rank?

1 Introduction

Everyone who studies elementary linear algebra encounters the concepts of row rank and column rank of a matrix at some point. Interestingly, it turns out that they are always equal. Although this is not difficult to prove, it often remains somewhat of a mystery: Why is this true? The aim of this course is to present a surprisingly ostensive answer to this question for matrices over the real numbers.

Everyone can follow the explanations, including those who have never heard of the rank of a matrix, since we introduce all needed concepts on an intuitive level. For all who are already familiar with the terminology, we also include formal proofs based on the intuitive arguments.

2 From linear systems of equations to matrices

Suppose we have a linear system of equations:

\begin{aligned} a_{11} x_{1} + \dots + a_{1 n} x_{n} & = y_{1} \\ a_{21} x_{2} + \dots + a_{2 n} x_{n} & = y_{2} \\ ⋮ \\ a_{m 1} x_{1} + \dots + a_{m n} x_{n} & = y_{m} . \end{aligned}

It consists of $m$ equations in the variables $x_{1}, \dots, x_{n}$ . The $a_{i j}$ are coefficients in $ℝ$ .

For which values $y_{1}, \dots, y_{m} \in ℝ$ do these equations have a solution $(x_{1}, \dots, x_{n})$ ? This question certainly depends on the number of equations and their relationship to each other.

Example: The system

\begin{aligned} x_{1} + x_{2} & = y_{1} \\ 2 x_{1} + 2 x_{2} & = y_{2} \end{aligned}

does not always have a solution. For instance, it has no solution for $y_{1} = 0$ and $y_{2} = 1$ , since the second equation is two times the first equation. More generally the same reasoning gives that there does not exist a solution whenever $2 y_{1} \neq y_{2}$ . On the other hand it has a solution for $y_{1} = 1$ and $y_{2} = 2$ or more generally for $2 y_{1} = y_{2}$ , for example $x_{1} = y_{1}$ and $x_{2} = 0$ .

Writing a linear system of equations the way we wrote it above carries a lot of redundancy. For example the variables $x_{1}, \dots, x_{n}$ appear in every row. To ease notation, we can use the matrix vector multiplication: We assemble the coefficients $a_{i j}$ into an $(m \times n)$ -matrix and write the $x_{i}$ ’s and $y_{j}$ ’s as vectors.

(\begin{array}{ccc} a_{11} & \dots & a_{1 n} \\ a_{21} & \dots & a_{2 n} \\ ⋮ & ⋮ \\ a_{m 1} & \dots & a_{m n} \end{array}) (\begin{array}{c} x_{1} \\ ⋮ \\ x_{n} \end{array}) = (\begin{array}{c} y_{1} \\ y_{2} \\ ⋮ \\ y_{m} \end{array})

By definition of matrix vector multiplication, we have $y_{i} = a_{i 1} x_{1} + \dots a_{i n} x_{n}$ for each $i \in {1, \dots, m}$ .

Our example turns into the following matrix vector multiplication

(\begin{array}{cc} 1 & 1 \\ 2 & 2 \end{array}) (\begin{array}{c} x_{1} \\ x_{2} \end{array}) = (\begin{array}{c} y_{1} \\ y_{2} \end{array}) .

3 Column rank

In the previous example we saw that the linear equation

(\begin{array}{cc} 1 & 1 \\ 2 & 2 \end{array}) (\begin{array}{c} x_{1} \\ x_{2} \end{array}) = (\begin{array}{c} y_{1} \\ y_{2} \end{array})

has a solution $y = (y_{1}, y_{2})^{T} \in ℝ^{2}$ if and only if $2 y_{1} = y_{2}$ .

We can plot this to see that the $y$ ’s which have a solution form a line in $ℝ^{2}$ :

We give these $y$ ’s, for which the linear system has a solution, a name: the image of the matrix. That is, for an $(m \times n)$ -matrix $A$ we define

im (A) = {y \in ℝ^{m} | We find x \in ℝ^{n} with A \cdot x = y} .

If $y = (y_{1}, y_{2})^{T}$ and $y ’ = (y_{1} ’, y_{2} ’)^{T}$ are in the image of the coefficient matrix of the above linear system, then their sum $y + y ’$ is again in the image, as well as any multiple $α \cdot y = (α y_{1}, α y_{2})^{T}$ with $α \in ℝ$ .

This holds for the image of any matrix. We say that the image of a matrix is a subspace of $ℝ^{m}$ . It contains the columns of the matrix, since the $i$ -th column is $A e_{i}$ where $e_{i} = (0, \dots, 1, \dots, 0)^{T}$ with the $1$ in the $i$ -th position. Moreover, one can show that it is the smallest subspace containing the columns. We say that $im (A)$ is spanned by the columns of A.

In particular, we can speak of the dimension of $im (A)$ . Intuitively this is the number of different “directions” in the space. For example a single point is zero-dimensional, a line is one-dimensional, a plane is two-dimensional, and a space looking like reality is three-dimensional. In our example, $im (A)$ is one-dimensional.

The dimension of the image of a matrix has a special name: The column rank of the matrix.

4 Row rank

We have defined the column rank as the dimension of the space generated by the columns of the matrix. Similarly we can look at the dimension of the space spanned by the rows of a matrix. This number is called the row rank.

We found two numerical values associated to a matrix: Its row rank and its column rank. Now we want to understand how the row rank and the column rank relate to each other.

The question is the following: How do the rows of the matrix (i.e. the different equations of the linear system) affect the dimension of the image of the matrix (i.e. the “size” of the space of vectors y for which the system $A x = y$ is solvable)? Put differently, can the dimension of the image be determined directly by the relations between the rows of the matrix? And if so, why?

5 How unique are solutions of a linear system of equations?

Let’s look at the following linear systems of equations in three variables $x_{1}, x_{2}, x_{3} \in ℝ$ :

System 1: Consider

\begin{aligned} 3 x_{1} + x_{2} & = 5 \\ x_{1} + 4 x_{2} + x_{3} & = - 1 \\ x_{1} & = 1 \end{aligned}

It has the unique solution $x_{1} = 1, x_{2} = 2, x_{3} = - 10$ .

System 2: Consider

\begin{aligned} 3 x_{1} + x_{2} & = 5 \\ x_{1} + 4 x_{2} + x_{3} & = - 1 \\ 6 x_{1} + 2 x_{2} & = 10 \end{aligned}

For each $x_{1} \in ℝ$ it has the solution $x_{2} = 5 - 3 x_{1}, x_{3} = - 21 + 11 x_{1}$ . Hence there are infinitely many solutions.

System 3: Consider

\begin{aligned} 3 x_{1} + x_{2} & = 5 \\ 6 x_{1} + 2 x_{2} & = 10 \\ - 3 x_{1} - x_{2} & = - 5 \end{aligned}

Here we can choose $x_{2} = 5 - 3 x_{1}$ and get a solution for arbitrary $x_{1}, x_{3} \in ℝ$ .

If there are more redundant rows then the linear system of equations has more solutions.

System 1: unique solution; represents a point in $ℝ^{3}$
System 2: solution for any $x_{1}$ ; solutions represent a straight line in $ℝ^{3}$
System 3: solution for any choice of $x_{1}$ and $x_{3}$ ; represents a plane in $ℝ^{3}$

If we have more than one solution, what do they have in common?

Let $x ’$ and $x ’ ’$ be two solutions of a linear system of equations $A x = y$ . What is the difference between the two solutions $x ’$ and $x ’ ’$ ? We apply $A$ to the difference $x ’ - x ’ ’$ : $A (x ’ - x ’ ’) = A x ’ - A x ’ ’ = y - y = 0$ .

The more solutions to the system $A x = y$ there are, the more vectors we can find that get mapped to zero. Hence the “amount” of x with $A x = 0$ is a measure for the uniqueness of the solution $A x = y$ for any $y$ . For that reason, we give it a name: The kernel of $A$ . We write it as

\ker (A) = {x \in ℝ^{n} | A x = 0} .

The bigger the kernel the more solutions exist. If $\ker (A) = 0$ the solution is unique.

6 Understanding the relation between row rank and column rank

We have seen that the kernel of a matrix is a measure for the uniqueness of solutions. But how to compute this kernel? A vector $x \in ℝ^{n}$ is in the kernel if and only if $A x = 0$ . Explicitly this is the case if

0 = a_{i 1} x_{1} + \dots a_{i n} x_{n}

for all $i \in {1, \dots, m}$ . This is the standard scalar product of the vector $x$ with the $i$ -th row vector

r_{i} = (\begin{array}{c} a_{i 1} \\ ⋮ \\ a_{i n} \end{array}) .

We thus have the description that the kernel of A is given by all vectors in $ℝ^{n}$ which are orthogonal to the vectors $r_{1}, \dots, r_{m}$ . Formally

\ker (A) = {x \in ℝ^{n} | ⟨ r_{i}, x ⟩ for all i \in {1, \dots, m}}

where $⟨ y, x ⟩ : = y^{T} x$ is the standard scalar product of $x, y \in ℝ^{n}$ .

Consider the linear system of equations

(\begin{array}{cc} 1 & 1 \\ 2 & 2 \end{array}) (\begin{array}{c} x_{1} \\ x_{2} \end{array}) = (\begin{array}{c} y_{1} \\ y_{2} \end{array}),

that we encountered before. It is represented by the matrix

A = (\begin{array}{cc} 1 & 1 \\ 2 & 2 \end{array})

with kernel

\ker (A) = {(\begin{array}{c} x \\ - x \end{array}) | x \in ℝ} .

If we draw the kernel and the span of the rows

R = {λ (\begin{array}{c} 1 \\ 1 \end{array}) + μ (\begin{array}{c} 2 \\ 2 \end{array}) | λ, μ \in ℝ}

we see that they are indeed orthogonal to each other:

If we have $y \in im (A)$ , by definition we find an $x \in ℝ^{2}$ such that $A x = y$ . In the last section we defined the kernel and found that the difference of solutions is contained in the kernel. On the other hand, we can add any element from $\ker (A)$ to $x$ and obtain a new solution $x ’$ with $A x ’ = y$ : If we have $A x = y$ and $z \in \ker (A)$ , then $A (x + z) = A x + A z = y + 0 = y$ . In the picture this corresponds to moving the point x parallel to the kernel.

If we pick the right element in the kernel, we can move $x$ into the span of the rows $R$ and get $x ’ \in R$ with $A x ’ = y$ . We can do this with any element in the image of $A$ . Hence we find for every element $b \in im (A)$ , an element $x \in R$ with $A x = y$ .

The picture suggests even more: Whenever we have another solution $x ’ ’ \in R$ , then the difference $x ’ - x ’ ’$ has to lie in the kernel of $A$ . Since $x ’ - x ’ ’$ is also in $R$ , it has to lie in the intersection of the kernel and $R$ . By the picture it has to be zero, which implies $x ’ = x ’ ’$ . If we look closely, we find a familiar reason for this: The kernel is orthogonal to the span of the rows.

Formally we can express this observation as follows. We know that $x ’ - x ’ ’$ is in $R$ , hence we can write it as a linear combination $x ’ - x ’ ’ = λ_{1} r_{1} + \dots + λ_{m} r_{m}$ of the rows $r_{1}, \dots, r_{m}$ of $A$ . Next we can exploit that elements in the kernel are orthogonal to the rows, by computing the scalar product of $x ’ - x ’ ’ \in \ker (A)$ with $x ’ - x ’ ’$ as the linear combination of the rows. That is

\begin{aligned} ⟨ x ’ - x ’ ’, x ’ - x ’ ’ ⟩ & = ⟨ λ_{1} r_{1} + \dots + λ_{m} r_{m}, x - x ’ ⟩ \\ = λ_{1} \underset{= 0}{\underset{⏟}{⟨ r_{1}, x - x ’ ⟩}} + \dots λ_{m} \underset{= 0}{\underset{⏟}{⟨ r_{m}, x - x ’ ⟩}} \\ = 0. \end{aligned}

Hence we get $x ’ - x ’ ’ = 0$ and thus $x ’ = x ’ ’$ .

All together, we showed that given some $y$ in the image of $A$ , i.e. some linear combination of the columns, there exists a unique solution $x$ with $A x = y$ , such that $x$ is a linear combination of the rows. So, the matrix $A$ yields a 1-1-correspondence between the space of linear combinations of the rows $R$ and the span of the columns $im (A)$ . Formally, the restriction of $A$ to $R$

A : R \to im (A), x \mapsto A x

gives a bijection (a 1-1 map or correspondence) between $R$ and $im (A)$ . Furthermore, the correspondence preserves the structure of $R$ and $im (A)$ . This means that we can transport all calculations and relations between elements from $R$ to $im (A)$ and back using the above restriction of $A$ to $R$ (and its inverse). Hence both spaces have the same properties. In particular $R$ and $im (A)$ , i.e. the span of the rows and the span of the columns, have the same dimension. The dimension of $R$ is the row rank and the dimension of $im (A)$ is the column rank. Hence the row rank equals the column rank.

7 Conclusion

We have seen that there is a 1-1 correspondence, induced by the matrix, between the subspace of $ℝ^{n}$ spanned by its rows and the subspace of $ℝ^{m}$ spanned by its columns (that is, the image of the matrix). This means that the dimensions of these two subspaces are equal. The dimension of the space spanned by the rows is the row rank and the dimension of the space spanned by the columns is the column rank. We have seen before why they are equal. Because of that one simply speaks of the “rank” of the matrix.

Our derivation of the equality of row rank and column rank works only in $ℝ^{n}$ , since our argument relies on the standard scalar product in $ℝ^{n}$ : $⟨ x, y ⟩ = x^{T} y$ . In $ℂ^{n}$ the mapping $⟨ x, y ⟩ = x^{T} y$ is not a scalar product, since in $ℂ^{2}$

{(\begin{array}{c} 1 \\ i \end{array})}^{T} (\begin{array}{c} 1 \\ i \end{array}) = (\begin{array}{cc} 1 & i \end{array}) (\begin{array}{c} 1 \\ i \end{array}) = 0.

Note that this means that the vector $(1, i)^{T}$ is in the kernel of the matrix

(\begin{array}{cc} 1 & i \end{array}) .

Hence, even over $ℂ$ , the kernel of a matrix is in general not the orthogonal complement of the rows.

If $x^{T} y$ was a sensible notion of a scalar product on $ℂ^{2}$ , this would mean that the nonzero vector $(1, i)^{T}$ is orthogonal to itself. Intuitively this does not make sense.

This content is licensed under
CC BY-SA 4.0 → What does that mean? serlo.org