Serlo Logo The Open Learning Platform

Why does row rank equal column rank?

1 Introduction

Everyone who studies elementary linear algebra encounters the concepts of row rank and column rank of a matrix at some point. Interestingly, it turns out that they are always equal. Although this is not difficult to prove, it often remains somewhat of a mystery: Why is this true? The aim of this course is to present a surprisingly ostensive answer to this question for matrices over the real numbers.

Everyone can follow the explanations, including those who have never heard of the rank of a matrix, since we introduce all needed concepts on an intuitive level. For all who are already familiar with the terminology, we also include formal proofs based on the intuitive arguments.

2 From linear systems of equations to matrices

Suppose we have a linear system of equations:

a11x1++a1nxn=y1a21x2++a2nxn=y2am1x1++amnxn=ym.

It consists of m equations in the variables  x1,,xn. The aij are coefficients in .

For which values y1,,ym do these equations have a solution (x1,,xn)? This question certainly depends on the number of equations and their relationship to each other.

Example: The system

x1+x2=y12x1+2x2=y2

does not always have a solution. For instance, it has no solution for y1=0 and y2=1, since the second equation is two times the first equation. More generally the same reasoning gives that there does not exist a solution whenever 2y1y2. On the other hand it has a solution for y1=1 and y2=2 or more generally for 2y1=y2, for example x1=y1 and x2=0.

Writing a linear system of equations the way we wrote it above carries a lot of redundancy. For example the variables x1,,xn appear in every row. To ease notation, we can use the matrix vector multiplication: We assemble the coefficients aij into an (m×n)-matrix and write the xi’s and yj’s as vectors.

(a11a1na21a2nam1amn)(x1xn)=(y1y2ym)

By definition of matrix vector multiplication, we have yi=ai1x1+ainxn for each i{1,,m}.

Our example turns into the following matrix vector multiplication

(1122)(x1x2)=(y1y2).

3 Column rank

In the previous example we saw that the linear equation

(1122)(x1x2)=(y1y2)

has a solution y=(y1,y2)T2 if and only if 2y1=y2.

We can plot this to see that the y’s which have a solution form a line in 2:

The elements for which the linear systems of equations have a solution form a line.

We give these y’s, for which the linear system has a solution, a name: the image of the matrix. That is, for an (m×n)-matrix A we define

im(A)={ym|We find xn with Ax=y}.

If y=(y1,y2)T and y=(y1,y2)T are in the image of the coefficient matrix of the above linear system, then their sum y+y is again in the image, as well as any multiple αy=(αy1,αy2)T with α.

This holds for the image of any matrix. We say that the image of a matrix is a subspace of m. It contains the columns of the matrix, since the i-th column is Aei where ei=(0,,1,,0)T with the 1 in the i-th position. Moreover, one can show that it is the smallest subspace containing the columns. We say that im(A) is spanned by the columns of A.

In particular, we can speak of the dimension of im(A). Intuitively this is the number of different “directions” in the space. For example a single point is zero-dimensional, a line is one-dimensional, a plane is two-dimensional, and a space looking like reality is three-dimensional. In our example, im(A) is one-dimensional.

The dimension of the image of a matrix has a special name: The column rank of the matrix.

4 Row rank

We have defined the column rank as the dimension of the space generated by the columns of the matrix. Similarly we can look at the dimension of the space spanned by the rows of a matrix. This number is called the row rank.

We found two numerical values associated to a matrix: Its row rank and its column rank. Now we want to understand how the row rank and the column rank relate to each other.

The question is the following: How do the rows of the matrix (i.e. the different equations of the linear system) affect the dimension of the image of the matrix (i.e. the “size” of the space of vectors y for which the system Ax=y is solvable)? Put differently, can the dimension of the image be determined directly by the relations between the rows of the matrix? And if so, why?

5 How unique are solutions of a linear system of equations?

Let’s look at the following linear systems of equations in three variables x1,x2,x3:

System 1: Consider

3x1+x2=5x1+4x2+x3=1x1=1

It has the unique solution x1=1,x2=2,x3=10.

System 2: Consider

3x1+x2=5x1+4x2+x3=16x1+2x2=10

For each x1 it has the solution x2=53x1,x3=21+11x1. Hence there are infinitely many solutions.

System 3: Consider

3x1+x2=56x1+2x2=103x1x2=5

Here we can choose x2=53x1 and get a solution for arbitrary x1,x3.

If there are more redundant rows then the linear system of equations has more solutions.

  • System 1: unique solution; represents a point in 3

  • System 2: solution for any x1; solutions represent a straight line in 3

  • System 3: solution for any choice of x1 and x3; represents a plane in 3

If we have more than one solution, what do they have in common?

Let x and x be two solutions of a linear system of equations Ax=y. What is the difference between the two solutions x and x? We apply A to the difference xx: A(xx)=AxAx=yy=0.

The more solutions to the system Ax=y there are, the more vectors we can find that get mapped to zero. Hence the “amount” of x with Ax=0 is a measure for the uniqueness of the solution Ax=y for any y. For that reason, we give it a name: The kernel of A. We write it as

ker(A)={xn|Ax=0}.

The bigger the kernel the more solutions exist. If ker(A)=0 the solution is unique.

6 Understanding the relation between row rank and column rank

We have seen that the kernel of a matrix is a measure for the uniqueness of solutions. But how to compute this kernel? A vector xn is in the kernel if and only if Ax=0. Explicitly this is the case if

0=ai1x1+ainxn

for all i{1,,m}. This is the standard scalar product of the vector x with the i-th row vector

ri=(ai1ain).

We thus have the description that the kernel of A is given by all vectors in n which are orthogonal to the vectors r1,,rm. Formally

ker(A)={xn|ri,x for all i{1,,m}}

where y,x:=yTx is the standard scalar product of x,yn.

Consider the linear system of equations

(1122)(x1x2)=(y1y2),

that we encountered before. It is represented by the matrix

A=(1122)

with kernel

ker(A)={(xx)|x}.

If we draw the kernel and the span of the rows

R={λ(11)+μ(22)|λ,μ}

we see that they are indeed orthogonal to each other:

The span of the rows is orthogonal to the kernel.

If we have yim(A), by definition we find an x2 such that Ax=y. In the last section we defined the kernel and found that the difference of solutions is contained in the kernel. On the other hand, we can add any element from ker(A) to x and obtain a new solution x with Ax=y: If we have Ax=y and zker(A), then A(x+z)=Ax+Az=y+0=y. In the picture this corresponds to moving the point x parallel to the kernel.

The element x is transported parallel to ther kernel into the span of the rows.

If we pick the right element in the kernel, we can move x into the span of the rows R and get xR with Ax=y. We can do this with any element in the image of A. Hence we find for every element bim(A), an element xR with Ax=y.

The picture suggests even more: Whenever we have another solution xR, then the difference xx has to lie in the kernel of A. Since xx is also in R, it has to lie in the intersection of the kernel and R. By the picture it has to be zero, which implies x=x. If we look closely, we find a familiar reason for this: The kernel is orthogonal to the span of the rows.

Formally we can express this observation as follows. We know that xx is in R, hence we can write it as a linear combination xx=λ1r1++λmrm of the rows r1,,rm of A. Next we can exploit that elements in the kernel are orthogonal to the rows, by computing the scalar product of xxker(A) with xx as the linear combination of the rows. That is

xx,xx=λ1r1++λmrm,xx=λ1r1,xx=0+λmrm,xx=0=0.

Hence we get xx=0 and thus x=x.

All together, we showed that given some y in the image of A, i.e. some linear combination of the columns, there exists a unique solution x with Ax=y, such that x is a linear combination of the rows. So, the matrix A yields a 1-1-correspondence between the space of linear combinations of the rows R and the span of the columns im(A). Formally, the restriction of A to R

A:Rim(A),xAx

gives a bijection (a 1-1 map or correspondence) between R and im(A). Furthermore, the correspondence preserves the structure of R and im(A). This means that we can transport all calculations and relations between elements from R to im(A) and back using the above restriction of A to R (and its inverse). Hence both spaces have the same properties. In particular R and im(A), i.e. the span of the rows and the span of the columns, have the same dimension. The dimension of R is the row rank and the dimension of im(A) is the column rank. Hence the row rank equals the column rank.

7 Conclusion

We have seen that there is a 1-1 correspondence, induced by the matrix, between the subspace of n spanned by its rows and the subspace of m spanned by its columns (that is, the image of the matrix). This means that the dimensions of these two subspaces are equal. The dimension of the space spanned by the rows is the row rank and the dimension of the space spanned by the columns is the column rank. We have seen before why they are equal. Because of that one simply speaks of the “rank” of the matrix.

Our derivation of the equality of row rank and column rank works only in n, since our argument relies on the standard scalar product in n: x,y=xTy. In n the mapping x,y=xTy is not a scalar product, since in 2

(1i)T(1i)=(1i)(1i)=0.

Note that this means that the vector (1,i)T is in the kernel of the matrix

(1i).

Hence, even over , the kernel of a matrix is in general not the orthogonal complement of the rows.

If xTy was a sensible notion of a scalar product on 2, this would mean that the nonzero vector (1,i)T is orthogonal to itself. Intuitively this does not make sense.


This content is licensed under
CC BY-SA 4.0 What does that mean? serlo.org