Mathematics for Machine Learning - Day 6

Pour Terra13 July, 2024

Row echelon matrix meme Technically it takes the exact same steps... But we're not splitting hairs here. Also, I'm late :D

Row-Echelon Form (REF)

To know REF, a few definitions need to be established.

  1. A leading coefficient / pivot: The first non-zero number in a row starting from the left
  2. Staircase structure: Each following pivot is strictly on the right of the matrix. Meaning, if row 1 pivot is on a12, row 2 pivot cannot be at 21 or at a22, but it can skip a column, so a23 is fine.

Rule

  1. All rows that contain only zeros are at the bottom.
  2. All rows containing at least one non-zero will form a staircase structure.

Example

Not row echelon form:

Why? Because on the fourth and fifth row, the pivot is on the same column.

(24214 08331 00611 00034 00031)\begin{pmatrix} -2 & 4 & -2 & -1 & 4 \\\ 0 & -8 & 3 & -3 & 1 \\\ 0 & 0 & 6 & -1 & 1 \\\ 0 & 0 & 0 & -3 & 4 \\\ 0 & 0 & 0 & -3 & 1 \end{pmatrix}

Row echelon form:

I'm happy to announce that these are in fact both row echelon form and upper triangle matrix!

The first example might be a non-brainer, but the second one is interesting because it's not the typical style of a sound matrix. But the pivots are on different and subsequent columns while having the zero row at the bottom, so both example 1 and 2 are row-echelon form!

Example 1=(24214 08331 00211 00034 00001) Example 2=(24214 08331 00211 00004 00000)\text{Example 1} = \begin{pmatrix} -2 & 4 & -2 & -1 & 4 \\\ 0 & -8 & 3 & -3 & 1 \\\ 0 & 0 & 2 & -1 & 1 \\\ 0 & 0 & 0 & -3 & 4 \\\ 0 & 0 & 0 & 0 & 1 \end{pmatrix} \\\ \text{Example 2} = \begin{pmatrix} -2 & 4 & -2 & -1 & 4 \\\ 0 & -8 & 3 & -3 & 1 \\\ 0 & 0 & 2 & -1 & 1 \\\ 0 & 0 & 0 & 0 & 4 \\\ 0 & 0 & 0 & 0 & 0 \end{pmatrix}

Regarding Day 5

I was right!, REF is a upper triangle like in LU decomposition method Though there is a caveat when searching further and a few discussion forums online.

Upper triangle matrix can on be defined on square matrices, while row-echelon form does not, meaning, REF is only a upper triangle and vice versa on square matrices, not on rectangular (though there was a forum saying that they can be...)

Reduced Row Echelon Form (RREF) / Row Canonical Form

Rule

  1. It's in REF (Meaning it needs to retain the rule of REF)
  2. Every pivot is 1

Explanation and difference between REF

It's... a simplified version of an REF, what's the main difference? After you've obtain a REF matrix, on each row you divide using elementary transformation based on the pivot.

Take the previous matrix as an example:

This is a REF=242140 083312 002111 000342 000012\text{This is a REF} = \begin{array}{ccccc|c} -2 & 4 & -2 & -1 & 4 & 0 \\\ 0 & -8 & 3 & -3 & 1 & -2 \\\ 0 & 0 & 2 & -1 & 1 & 1 \\\ 0 & 0 & 0 & -3 & 4 & -2 \\\ 0 & 0 & 0 & 0 & 1 & -2 \end{array}

But if you divide each row by it's pivot, it becomes:

This is a RREF=1211220 0138381814 001121212 00014323 000012\text{This is a RREF} = \begin{array}{ccccc|c} 1 & -2 & 1 & \frac{1}{2} & -2 & 0 \\\ 0 & 1 & \frac{3}{8} & -\frac{3}{8} & \frac{1}{8} & \frac{1}{4} \\\ 0 & 0 & 1 & -\frac{1}{2} & \frac{1}{2} & \frac{1}{2} \\\ 0 & 0 & 0 & 1 & -\frac{4}{3} & \frac{2}{3} \\\ 0 & 0 & 0 & 0 & 1 & -2 \end{array}

Please keep in mind, the reason I use augmented matrix is to show that the result also change based on the division and not just the matrix itself!

Now this is where the previous day comes in as well! Remember this matrix?

This is an example of a reduced row echelon matrix form in augmented matrix, with the non-transformed matrix being complicated, what we did was transforming it into an RREF.

121110 001132 000121 00000a+1\begin{array}{ccccc|c} 1 & -2 & 1 & -1 & 1 & 0 \\\ 0 & 0 & 1 & -1 & 3 & -2 \\\ 0 & 0 & 0 & 1 & -2 & 1 \\\ 0 & 0 & 0 & 0 & 0 & a + 1 \end{array}

Calculating Inverse

This is where I spent a lot of time on today, because I couldn't wrap my head around the concept, but fear not! I think I understand it correctly.

[AI]=[IA1]\left[\begin{array}{c|c} A & I \end{array}\right] = \left[\begin{array}{c|c} I & A^{-1} \end{array}\right]

If you're confused, I feel you, basically what I'm going to show you and provide proof (from a non mathematician) regarding this concept.

Let's start with an example!, then the proof.

Example:

Given:

A=(422 312 212)A = \begin{pmatrix} 4 & 2 & 2 \\\ 3 & 1 & 2 \\\ 2 & 1 & 2 \end{pmatrix}

We create an augmented matrix with A on the left side and an identity matrix on the right.

[AI]=[422100 312010 212001]\left[\begin{array}{c|c} A & I \end{array}\right] = \left[\begin{array}{ccc|ccc} 4 & 2 & 2 & 1 & 0 & 0 \\\ 3 & 1 & 2 & 0 & 1 & 0 \\\ 2 & 1 & 2 & 0 & 0 & 1 \end{array}\right]
Step 1: Subtract R1 with R2
[110110 312010 212001]R1R2R1\overset{R_1 - R_2 \rightarrow R_1} { \left[\begin{array}{ccc|ccc} 1 & 1 & 0 & 1 & -1 & 0 \\\ 3 & 1 & 2 & 0 & 1 & 0 \\\ 2 & 1 & 2 & 0 & 0 & 1 \end{array}\right] }
Step 2: Subtract R2 with R3
[110110 312010 100011]R2R3R2\overset{R_2 - R_3 \rightarrow R_2} { \left[\begin{array}{ccc|ccc} 1 & 1 & 0 & 1 & -1 & 0 \\\ 3 & 1 & 2 & 0 & 1 & 0 \\\ 1 & 0 & 0 & 0 & 1 & -1 \end{array}\right] }
Step 3: Subtract R1 with R2
[202120 312010 100011]R1R2R1\overset{R_1 - R_2 \rightarrow R_1} { \left[\begin{array}{ccc|ccc} -2 & 0 & -2 & 1 & -2 & 0 \\\ 3 & 1 & 2 & 0 & 1 & 0 \\\ 1 & 0 & 0 & 0 & 1 & -1 \end{array}\right] }
Step 4: Swap R1 with R2
[100011 010121 212001]R1R2\overset{R_1 \leftrightarrow R_2} { \left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 0 & 1 & -1 \\\ 0 & 1 & 0 & 1 & -2 & 1 \\\ 2 & 1 & 2 & 0 & 0 & 1 \end{array}\right] }
Step 5: Subtract R3 with R2 and 2R1
[100011 010121 002102]R3R22R1R3\overset{R_3 - R_2 - 2R_1 \rightarrow R_3} { \left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 0 & 1 & -1 \\\ 0 & 1 & 0 & 1 & -2 & 1 \\\ 0 & 0 & 2 & -1 & 0 & 2 \end{array}\right] }
Step 6: Divide R3 by 2
[100011 010121 0011211]R32R3\overset{\frac{R3}{2} \rightarrow R_3} { \left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 0 & 1 & -1 \\\ 0 & 1 & 0 & 1 & -2 & 1 \\\ 0 & 0 & 1 & -\frac{1}{2} & -1 & 1 \end{array}\right] }

Conclusion

[IA1]=[100011 010121 0011211]\left[\begin{array}{c|c} I & A^{-1} \end{array}\right] = \left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 0 & 1 & -1 \\\ 0 & 1 & 0 & 1 & -2 & 1 \\\ 0 & 0 & 1 & -\frac{1}{2} & -1 & 1 \end{array}\right]

Proof

For those who are really curious on why this works, I've got you covered!. Let's dive even deeper inside this math for your curiosity and to justify myself spending hours on this that isn't even about machine learning anymore.

So, like before we have this augmented matrix
[AI]\left[\begin{array}{c|c} A & I \end{array}\right]
Our goal is to use elementary transformations to change A to I.

So it should look something like this:

EkEk1E2E1A=IE_k E_{k-1} \cdots E_2 E_1 A = I

With Ek, ..., E2, E1 being the transformations we did, so with the example I showed, we did 6 transformations that made A into I

P.S. That's why just for today I added steps :D

Then, on both sides, we multiply by inverse A
EkEk1E2E1AA1=IA1E_k E_{k-1} \cdots E_2 E_1 AA^{-1} = IA^{-1}
Remember the property of matrix and its inverse? We'll apply them here!
AA1=I IA1=A1AA^{-1} = I \\\ IA^{-1} = A^{-1}

Making the equation into:

EkEk1E2E1I=A1E_k E_{k-1} \cdots E_2 E_1 I = A^{-1}
Finally, we get the full formula
EkEk1E2E1[AI]=[IA1]E_k E_{k-1} \cdots E_2 E_1 \left[\begin{array}{c|c} A & I \end{array}\right] = \left[\begin{array}{c|c} I & A^{-1} \end{array}\right]

Meaning that with elementary transformation we can inverse the matrix!

Side note:
I skipped a section called Minus-1 Trick, I didn't feel it gave anything to the solution given we can already answer the problem being given on the example with just the normal method, but it's should be called out :D

Acknowledgement

I can't overstate this: I'm truly grateful for this book being open-sourced for everyone. Many people will be able to learn and understand machine learning on a fundamental level. Whether changing careers, demystifying AI, or just learning in general, this book offers immense value even for fledgling composer such as myself. So, Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong, thank you for this book.

Source: Deisenroth, M. P., Faisal, A. A., & Ong, C. S. (2020). Mathematics for Machine Learning. Cambridge: Cambridge University Press. https://mml-book.com

Made with TERRA Made with TERRA Made with TERRA Made with TERRA Made with TERRA Made with TERRA Made with TERRA Made with TERRA Made with TERRA Made with TERRA