1 Introduction
Everyone who studies elementary linear algebra encounters the concepts of row rank and column rank of a matrix at some point. Interestingly, it turns out that they are always equal. Although this is not difficult to prove, it often remains somewhat of a mystery: Why is this true? The aim of this course is to present a surprisingly ostensive answer to this question for matrices over the real numbers.
Everyone can follow the explanations, including those who have never heard of the rank of a matrix, since we introduce all needed concepts on an intuitive level. For all who are already familiar with the terminology, we also include formal proofs based on the intuitive arguments.
2 From linear systems of equations to matrices
Suppose we have a linear system of equations:
It consists of equations in the variables . The are coefficients in .
For which values do these equations have a solution ? This question certainly depends on the number of equations and their relationship to each other.
Example: The system
does not always have a solution. For instance, it has no solution for and , since the second equation is two times the first equation. More generally the same reasoning gives that there does not exist a solution whenever . On the other hand it has a solution for and or more generally for , for example and .
Writing a linear system of equations the way we wrote it above carries a lot of redundancy. For example the variables appear in every row. To ease notation, we can use the matrix vector multiplication: We assemble the coefficients into an -matrix and write the ’s and ’s as vectors.
By definition of matrix vector multiplication, we have for each .
Our example turns into the following matrix vector multiplication
3 Column rank
In the previous example we saw that the linear equation
has a solution if and only if .
We can plot this to see that the ’s which have a solution form a line in :
We give these ’s, for which the linear system has a solution, a name: the image of the matrix. That is, for an -matrix we define
If and are in the image of the coefficient matrix of the above linear system, then their sum is again in the image, as well as any multiple with .
This holds for the image of any matrix. We say that the image of a matrix is a subspace of . It contains the columns of the matrix, since the -th column is where with the in the -th position. Moreover, one can show that it is the smallest subspace containing the columns. We say that is spanned by the columns of A.
In particular, we can speak of the dimension of . Intuitively this is the number of different “directions” in the space. For example a single point is zero-dimensional, a line is one-dimensional, a plane is two-dimensional, and a space looking like reality is three-dimensional. In our example, is one-dimensional.
The dimension of the image of a matrix has a special name: The column rank of the matrix.
4 Row rank
We have defined the column rank as the dimension of the space generated by the columns of the matrix. Similarly we can look at the dimension of the space spanned by the rows of a matrix. This number is called the row rank.
We found two numerical values associated to a matrix: Its row rank and its column rank. Now we want to understand how the row rank and the column rank relate to each other.
The question is the following: How do the rows of the matrix (i.e. the different equations of the linear system) affect the dimension of the image of the matrix (i.e. the “size” of the space of vectors y for which the system is solvable)? Put differently, can the dimension of the image be determined directly by the relations between the rows of the matrix? And if so, why?
5 How unique are solutions of a linear system of equations?
Let’s look at the following linear systems of equations in three variables :
System 1: Consider
It has the unique solution .
System 2: Consider
For each it has the solution . Hence there are infinitely many solutions.
System 3: Consider
Here we can choose and get a solution for arbitrary .
If there are more redundant rows then the linear system of equations has more solutions.
System 1: unique solution; represents a point in
System 2: solution for any ; solutions represent a straight line in
System 3: solution for any choice of and ; represents a plane in
If we have more than one solution, what do they have in common?
Let and be two solutions of a linear system of equations . What is the difference between the two solutions and ? We apply to the difference : .
The more solutions to the system there are, the more vectors we can find that get mapped to zero. Hence the “amount” of x with is a measure for the uniqueness of the solution for any . For that reason, we give it a name: The kernel of . We write it as
The bigger the kernel the more solutions exist. If the solution is unique.
6 Understanding the relation between row rank and column rank
We have seen that the kernel of a matrix is a measure for the uniqueness of solutions. But how to compute this kernel? A vector is in the kernel if and only if . Explicitly this is the case if
for all . This is the standard scalar product of the vector with the -th row vector
We thus have the description that the kernel of A is given by all vectors in which are orthogonal to the vectors . Formally
where is the standard scalar product of .
Consider the linear system of equations
that we encountered before. It is represented by the matrix
with kernel
If we draw the kernel and the span of the rows
we see that they are indeed orthogonal to each other:
If we have , by definition we find an such that . In the last section we defined the kernel and found that the difference of solutions is contained in the kernel. On the other hand, we can add any element from to and obtain a new solution with : If we have and , then . In the picture this corresponds to moving the point x parallel to the kernel.
If we pick the right element in the kernel, we can move into the span of the rows and get with . We can do this with any element in the image of . Hence we find for every element , an element with .
The picture suggests even more: Whenever we have another solution , then the difference has to lie in the kernel of . Since is also in , it has to lie in the intersection of the kernel and . By the picture it has to be zero, which implies . If we look closely, we find a familiar reason for this: The kernel is orthogonal to the span of the rows.
Formally we can express this observation as follows. We know that is in , hence we can write it as a linear combination of the rows of . Next we can exploit that elements in the kernel are orthogonal to the rows, by computing the scalar product of with as the linear combination of the rows. That is
Hence we get and thus .
All together, we showed that given some in the image of , i.e. some linear combination of the columns, there exists a unique solution with , such that is a linear combination of the rows. So, the matrix yields a 1-1-correspondence between the space of linear combinations of the rows and the span of the columns . Formally, the restriction of to
gives a bijection (a 1-1 map or correspondence) between and . Furthermore, the correspondence preserves the structure of and . This means that we can transport all calculations and relations between elements from to and back using the above restriction of to (and its inverse). Hence both spaces have the same properties. In particular and , i.e. the span of the rows and the span of the columns, have the same dimension. The dimension of is the row rank and the dimension of is the column rank. Hence the row rank equals the column rank.
7 Conclusion
We have seen that there is a 1-1 correspondence, induced by the matrix, between the subspace of spanned by its rows and the subspace of spanned by its columns (that is, the image of the matrix). This means that the dimensions of these two subspaces are equal. The dimension of the space spanned by the rows is the row rank and the dimension of the space spanned by the columns is the column rank. We have seen before why they are equal. Because of that one simply speaks of the “rank” of the matrix.
Our derivation of the equality of row rank and column rank works only in , since our argument relies on the standard scalar product in : . In the mapping is not a scalar product, since in
Note that this means that the vector is in the kernel of the matrix
Hence, even over , the kernel of a matrix is in general not the orthogonal complement of the rows.
If was a sensible notion of a scalar product on , this would mean that the nonzero vector is orthogonal to itself. Intuitively this does not make sense.