Skip to content or footer

Course

Why does row rank equal column rank?

1 Introduction

Everyone who studies elementary linear algebra encounters the concepts of row rank and column rank of a matrix at some point. Interestingly, it turns out that they are always equal. Although this is not difficult to prove, it often remains somewhat of a mystery: Why is this true? The aim of this course is to present a surprisingly ostensive answer to this question for matrices over the real numbers.

Everyone can follow the explanations, including those who have never heard of the rank of a matrix, since we introduce all needed concepts on an intuitive level. For all who are already familiar with the terminology, we also include formal proofs based on the intuitive arguments.

2 From linear systems of equations to matrices

Suppose we have a linear system of equations:

a11x1++a1nxn=y1a21x2++a2nxn=y2am1x1++amnxn=ym.\displaystyle \def\arraystretch{1.25} \begin{aligned} a_{11} x_1 + \dots + a_{1n} x_n &= y_1\\ a_{21} x_2 + \dots + a_{2n} x_n &= y_2\\ &\vdots\\ a_{m1}x_1 + \dots + a_{mn} x_n &= y_m. \end{aligned}

It consists of mm equations in the variables  x1,,xnx_1, \dots, x_n. The aija_{ij} are coefficients in R\R.

For which values y1,,ymRy_1, \dots, y_m \in \R do these equations have a solution (x1,,xn)(x_1, \dots, x_n)? This question certainly depends on the number of equations and their relationship to each other.

Example: The system

x1+x2=y12x1+2x2=y2\displaystyle \def\arraystretch{1.25} \begin{aligned} x_1 + x_2 &= y_1\\ 2x_1 + 2x_2 &= y_2 \end{aligned}

does not always have a solution. For instance, it has no solution for y1=0y_1 = 0 and y2=1y_2 = 1, since the second equation is two times the first equation. More generally the same reasoning gives that there does not exist a solution whenever 2y1y22y_1\neq y_2. On the other hand it has a solution for y1=1y_1 = 1 and y2=2y_2=2 or more generally for 2y1=y22y_1=y_2, for example x1=y1x_1 = y_1 and x2=0x_2=0.

Writing a linear system of equations the way we wrote it above carries a lot of redundancy. For example the variables x1,,xnx_1, \dots, x_n appear in every row. To ease notation, we can use the matrix vector multiplication: We assemble the coefficients aija_{ij} into an (m×n)(m\times n)-matrix and write the xix_i’s and yjy_j’s as vectors.

(a11a1na21a2nam1amn)(x1xn)=(y1y2ym)\displaystyle \begin{pmatrix} a_{11} & \cdots & a_{1n}\\ a_{21} & \cdots & a_{2n}\\ \vdots && \vdots\\ a_{m1} & \cdots & a_{mn} \end{pmatrix} \begin{pmatrix} x_1 \\ \vdots \\ x_n \end{pmatrix} = \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{pmatrix}

By definition of matrix vector multiplication, we have yi=ai1x1+ainxny_i = a_{i1}x_1 + \dots a_{in}x_n for each i{1,,m}i \in \{1, \dots, m\}.

Our example turns into the following matrix vector multiplication

(1122)(x1x2)=(y1y2).\displaystyle \begin{pmatrix} 1 & 1\\ 2 & 2 \end{pmatrix} \begin{pmatrix} x_1\\ x_2 \end{pmatrix} = \begin{pmatrix} y_1 \\ y_2 \end{pmatrix}.

3 Column rank

In the previous example we saw that the linear equation

(1122)(x1x2)=(y1y2)\displaystyle \begin{pmatrix} 1 & 1\\ 2 & 2 \end{pmatrix} \begin{pmatrix} x_1\\ x_2 \end{pmatrix} = \begin{pmatrix} y_1 \\ y_2 \end{pmatrix}

has a solution y=(y1,y2)TR2y=(y_1, y_2)^T\in\R^2 if and only if 2y1=y22 y_1 = y_2.

We can plot this to see that the yy’s which have a solution form a line in R2\R^2:

The elements for which the linear systems of equations have a solution form a line.

We give these yy’s, for which the linear system has a solution, a name: the image of the matrix. That is, for an (m×n)(m\times n)-matrix AA we define

im(A)={yRmWe find xRn with Ax=y}.\displaystyle \operatorname{im}(A) = \{ y \in \R^m \mid \text{We find } x\in \R^n\text{ with } A\cdot x = y\}.

If y=(y1,y2)Ty=(y_1,y_2)^T and y=(y1,y2)Ty’=(y_1’, y_2’)^T are in the image of the coefficient matrix of the above linear system, then their sum y+yy+y’ is again in the image, as well as any multiple αy=(αy1,αy2)T\alpha\cdot y=(\alpha y_1, \alpha y_2)^T with αR\alpha\in\R.

This holds for the image of any matrix. We say that the image of a matrix is a subspace of Rm\R^m. It contains the columns of the matrix, since the ii-th column is AeiAe_i where ei=(0,,1,,0)Te_i = (0, \dots, 1, \dots, 0)^T with the 11 in the ii-th position. Moreover, one can show that it is the smallest subspace containing the columns. We say that im(A)\operatorname{im}(A) is spanned by the columns of A.

In particular, we can speak of the dimension of im(A)\operatorname{im}(A). Intuitively this is the number of different “directions” in the space. For example a single point is zero-dimensional, a line is one-dimensional, a plane is two-dimensional, and a space looking like reality is three-dimensional. In our example, im(A)\operatorname{im}(A) is one-dimensional.

The dimension of the image of a matrix has a special name: The column rank of the matrix.

4 Row rank

We have defined the column rank as the dimension of the space generated by the columns of the matrix. Similarly we can look at the dimension of the space spanned by the rows of a matrix. This number is called the row rank.

We found two numerical values associated to a matrix: Its row rank and its column rank. Now we want to understand how the row rank and the column rank relate to each other.

The question is the following: How do the rows of the matrix (i.e. the different equations of the linear system) affect the dimension of the image of the matrix (i.e. the “size” of the space of vectors y for which the system Ax=yAx=y is solvable)? Put differently, can the dimension of the image be determined directly by the relations between the rows of the matrix? And if so, why?

5 How unique are solutions of a linear system of equations?

Let’s look at the following linear systems of equations in three variables x1,x2,x3Rx_1,x_2,x_3\in\R:

System 1: Consider

3x1+x2=5x1+4x2+x3=1x1=1\displaystyle \def\arraystretch{1.25} \begin{aligned} 3x_1+x_2&=5\\ x_1+4x_2+x_3&=-1\\ x_1&=1 \end{aligned}

It has the unique solution x1=1,x2=2,x3=10x_1=1, x_2=2, x_3 = -10.

System 2: Consider

3x1+x2=5x1+4x2+x3=16x1+2x2=10\displaystyle \def\arraystretch{1.25} \begin{aligned} 3x_1+x_2&=5\\ x_1+4x_2+x_3&=-1\\ 6x_1+2x_2&=10 \end{aligned}

For each x1Rx_1 \in \R it has the solution x2=53x1,x3=21+11x1x_2 = 5-3x_1, x_3 = -21+11x_1. Hence there are infinitely many solutions.

System 3: Consider

3x1+x2=56x1+2x2=103x1x2=5\displaystyle \def\arraystretch{1.25} \begin{aligned} 3x_1+x_2&=5\\ 6x_1+2x_2&=10\\ -3x_1-x_2&=-5 \end{aligned}

Here we can choose x2=53x1x_2=5-3x_1 and get a solution for arbitrary x1,x3Rx_1, x_3\in\R.

If there are more redundant rows then the linear system of equations has more solutions.

  • System 1: unique solution; represents a point in R3\R^3

  • System 2: solution for any x1x_1; solutions represent a straight line in R3\R^3

  • System 3: solution for any choice of x1x_1 and x3x_3; represents a plane in R3\R^3

If we have more than one solution, what do they have in common?

Let xx’ and x’’x’’ be two solutions of a linear system of equations Ax=yAx=y. What is the difference between the two solutions xx’ and x’’x’’? We apply AA to the difference xx’’x’-x’’: A(xx’’)=AxAx’’=yy=0A(x’-x’’) = Ax’ - Ax’’ = y-y = 0.

The more solutions to the system Ax=yAx=y there are, the more vectors we can find that get mapped to zero. Hence the “amount” of x with Ax=0Ax = 0 is a measure for the uniqueness of the solution Ax=yAx = y for any yy. For that reason, we give it a name: The kernel of AA. We write it as

ker(A)={xRnAx=0}.\displaystyle \ker(A) = \{x \in \R^n \mid Ax = 0\}.

The bigger the kernel the more solutions exist. If ker(A)=0\ker(A)=0 the solution is unique.

6 Understanding the relation between row rank and column rank

We have seen that the kernel of a matrix is a measure for the uniqueness of solutions. But how to compute this kernel? A vector xRnx \in \R^n is in the kernel if and only if Ax=0Ax = 0. Explicitly this is the case if

0=ai1x1+ainxn\displaystyle 0 = a_{i1}x_1 + \dots a_{in}x_n

for all i{1,,m}i \in \{1, \dots, m\}. This is the standard scalar product of the vector xx with the ii-th row vector

ri=(ai1ain).\displaystyle r_i = \begin{pmatrix} a_{i1}\\ \vdots\\ a_{in}\end{pmatrix}.

We thus have the description that the kernel of A is given by all vectors in Rn\R^n which are orthogonal to the vectors r1,,rmr_1, \dots, r_m. Formally

ker(A)={xRnri,x for all i{1,,m}}\displaystyle \ker(A) = \{x \in \R^n \mid \langle r_i, x\rangle \text{ for all } i \in \{1, \dots, m\}\}

where y,x:=yTx\langle y, x\rangle:=y^Tx is the standard scalar product of x,yRnx, y \in \R^n.

Consider the linear system of equations

(1122)(x1x2)=(y1y2),\displaystyle \begin{pmatrix} 1 & 1\\ 2 & 2 \end{pmatrix} \begin{pmatrix} x_1\\ x_2 \end{pmatrix} = \begin{pmatrix} y_1 \\ y_2 \end{pmatrix},

that we encountered before. It is represented by the matrix

A=(1122)\displaystyle A = \begin{pmatrix} 1 & 1\\ 2 & 2 \end{pmatrix}

with kernel

ker(A)={(xx)xR}.\displaystyle \ker(A) = \left\{\begin{pmatrix}x\\-x\end{pmatrix}\mid x \in \R\right\}.

If we draw the kernel and the span of the rows

R={λ(11)+μ(22)λ,μR}\displaystyle R = \left\{\lambda \begin{pmatrix}1\\1\end{pmatrix} + \mu \begin{pmatrix}2\\ 2\end{pmatrix}\mid \lambda, \mu \in \R\right\}

we see that they are indeed orthogonal to each other:

The span of the rows is orthogonal to the kernel.

If we have yim(A)y \in \operatorname{im}(A), by definition we find an xR2x \in \R^2 such that Ax=yAx = y. In the last section we defined the kernel and found that the difference of solutions is contained in the kernel. On the other hand, we can add any element from ker(A)\ker(A) to xx and obtain a new solution xx’ with Ax=yAx’ = y: If we have Ax=yAx = y and zker(A)z \in \ker(A), then A(x+z)=Ax+Az=y+0=yA(x+z) = Ax + Az = y + 0 = y. In the picture this corresponds to moving the point x parallel to the kernel.

The element x is transported parallel to ther kernel into the span of the rows.

If we pick the right element in the kernel, we can move xx into the span of the rows RR and get xRx’ \in R with Ax=yAx’ = y. We can do this with any element in the image of AA. Hence we find for every element bim(A)b \in \operatorname{im}( A), an element xRx \in R with Ax=yAx = y.

The picture suggests even more: Whenever we have another solution x’’Rx’’\in R, then the difference xx’’x’-x’’ has to lie in the kernel of AA. Since xx’’x’-x’’ is also in RR, it has to lie in the intersection of the kernel and RR. By the picture it has to be zero, which implies x=x’’x’ = x’’. If we look closely, we find a familiar reason for this: The kernel is orthogonal to the span of the rows.

Formally we can express this observation as follows. We know that xx’’x’ -x’’ is in RR, hence we can write it as a linear combination xx’’=λ1r1++λmrmx’ -x’’ = \lambda_1 r_1 + \dots + \lambda_m r_m of the rows r1,,rmr_1,\ldots,r_m of AA. Next we can exploit that elements in the kernel are orthogonal to the rows, by computing the scalar product of xx’’ker(A)x’-x’’ \in \ker(A) with xx’’x’-x’’ as the linear combination of the rows. That is

xx’’,xx’’=λ1r1++λmrm,xx=λ1+λm=0.\displaystyle \def\arraystretch{1.25} \begin{aligned} \langle x’-x’’, x’-x’’ \rangle &= \langle \lambda_1 r_1 + \dots + \lambda_m r_m, x-x’ \rangle\\ &= \lambda_1 \underset{=0}{\underbrace{\langle r_1, x-x’ \rangle}} + \dots \lambda_m\underset{=0}{\underbrace{\langle r_m, x-x’ \rangle}}\\ &= 0. \end{aligned}

Hence we get xx’’=0x’-x’’ = 0 and thus x=x’’x’ = x’’.

All together, we showed that given some yy in the image of AA, i.e. some linear combination of the columns, there exists a unique solution xx with Ax=yAx = y, such that xx is a linear combination of the rows. So, the matrix AA yields a 1-1-correspondence between the space of linear combinations of the rows RR and the span of the columns im(A)\operatorname{im}( A). Formally, the restriction of AA to RR

A ⁣:Rim(A),xAx\displaystyle A\colon R\to \operatorname{im}(A), \quad x\mapsto Ax

gives a bijection (a 1-1 map or correspondence) between RR and im(A)\operatorname{im}(A). Furthermore, the correspondence preserves the structure of RR and im(A)\operatorname{im}(A). This means that we can transport all calculations and relations between elements from RR to im(A)\operatorname{im}(A) and back using the above restriction of AA to RR (and its inverse). Hence both spaces have the same properties. In particular RR and im(A)\operatorname{im}(A), i.e. the span of the rows and the span of the columns, have the same dimension. The dimension of RR is the row rank and the dimension of im(A)\operatorname{im}(A) is the column rank. Hence the row rank equals the column rank.

7 Conclusion

We have seen that there is a 1-1 correspondence, induced by the matrix, between the subspace of Rn\R^n spanned by its rows and the subspace of Rm\R^m spanned by its columns (that is, the image of the matrix). This means that the dimensions of these two subspaces are equal. The dimension of the space spanned by the rows is the row rank and the dimension of the space spanned by the columns is the column rank. We have seen before why they are equal. Because of that one simply speaks of the “rank” of the matrix.

Our derivation of the equality of row rank and column rank works only in Rn\R^n, since our argument relies on the standard scalar product in Rn\R^n: x,y=xTy\langle x,y\rangle =x^Ty. In Cn\mathbb{C}^n the mapping x,y=xTy\langle x,y\rangle =x^Ty is not a scalar product, since in C2\mathbb{C}^2

(1i)T(1i)=(1i)(1i)=0.\displaystyle \begin{pmatrix}1\\ i\end{pmatrix}^T\begin{pmatrix}1\\ i\end{pmatrix} = \begin{pmatrix}1 & i\end{pmatrix}\begin{pmatrix}1\\ i\end{pmatrix} = 0.

Note that this means that the vector (1,i)T(1,i)^T is in the kernel of the matrix

(1i).\displaystyle \begin{pmatrix}1& i\end{pmatrix}.

Hence, even over C\mathbb C, the kernel of a matrix is in general not the orthogonal complement of the rows.

If xTyx^Ty was a sensible notion of a scalar product on C2\mathbb{C}^2, this would mean that the nonzero vector (1,i)T(1, i)^T is orthogonal to itself. Intuitively this does not make sense.


This content is licensed under
CC BY-SA 4.0Info