Mathematics for Machine Learning - Day 7
A weekly review
Today I'm just going to summarize and translate some into code regarding the properties and regarding the particular solution so there isn't anything new today nor will there be much mathematical notation, aside from the last section.
A fun fact about today is, remember the first ever equation regarding the particular solution? I used gradient descent and found another different way :D so fun.
Matrices
Matrix comparison function
The reason I'm using this function instead of a built-in function is for you (the readers) to know how the comparison is made (because honestly, I don't know how np.array_equal works).
import numpy as np
def compare_two_matrices(matrixA:np.ndarray, matrixB:np.ndarray)->np.ndarray:
# Ensuring matrix A and B is the same
if matrixA.shape != matrixB.shape:
return "Matrix A and B should have the same shape"
# Comparing each index of the matrices and returning a list
result = [i for j in matrixA==matrixB for i in j]
#Return text if all inside the list is true
if all(result):
return "Both matrices are exactly the same"
#Return text if not all inside the list is true
return "Matrices are not the same"
Addition and subtraction
m = 5
n = 3
Amn = np.random.randint(low=0,high=100,size=(m,n))
Bmn = np.random.randint(low=0,high=100,size=(m,n))
Cmn = Amn+Bmn
Dmn = Amn-Bmn
print(Amn.shape,Bmn.shape, Cmn.shape, Dmn.shape)
# (5, 3) (5, 3) (5, 3) (5, 3)
Multiplication
m = 5
n = 3
k = 9
Amn = np.random.randint(low=0,high=100,size=(m,n))
Bnk = np.random.randint(low=0,high=100,size=(n,k))
Cmk = np.dot(Amn,Bnk)
print(Amn.shape,Bnk.shape, Cmk.shape)
# (5, 3) (3, 9) (5, 9)
Associativity
m = 9
n = 3
k = 5
l = 7
Amn = np.random.randint(low=0,high=100,size=(m,n))
Bnk = np.random.randint(low=0,high=100,size=(n,k))
Ckl = np.random.randint(low=0,high=100,size=(k,l))
Left_section = np.dot(np.dot(Amn,Bnk), Ckl)
Right_section = np.dot(Amn, np.dot(Bnk,Ckl))
compare_two_matrices(Left_section, Right_section)
# 'Both matrices are exactly the same'
Distributivity
Test 1
m = 9
n = 3
k = 5
Amn = np.random.randint(low=0,high=100,size=(m,n))
Bmn = np.random.randint(low=0,high=100,size=(m,n))
Cnk = np.random.randint(low=0,high=100,size=(n,k))
Left_section = np.dot((Amn+Bmn),Cnk)
Right_section = np.dot(Amn, Cnk) + np.dot(Bmn, Cnk)
compare_two_matrices(Left_section, Right_section)
# 'Both matrices are exactly the same'
Test 2
m = 9
n = 3
k = 5
Amn = np.random.randint(low=0,high=100,size=(m,n))
Bnk = np.random.randint(low=0,high=100,size=(n,k))
Cnk = np.random.randint(low=0,high=100,size=(n,k))
Left_section = np.dot(Amn, (Bnk + Cnk))
Right_section = np.dot(Amn, Bnk) + np.dot(Amn, Cnk)
compare_two_matrices(Left_section, Right_section)
# 'Both matrices are exactly the same'
Inverse
identity_matrix = np.identity(2)
wrong_matrix = np.array([[4,8],[1,2]])
right_matrix = np.array([[4,8],[0.5,2]])
I hope you remember why the wrong matrix won't work when I try to inverse the matrix while the right matrix works just fine even when it's just a one value difference!
Inverse Function
def create_inverse(matrix:np.ndarray)->np.ndarray:
# Confirming square matrix
if matrix.shape[0] != matrix.shape[1]:
return "Matrix needs to be square to be inversed."
# Confirming shape
if matrix.shape != (2,2):
return "I'm not smart enough to code more complex matrices and don't ask for an inverse of 1x1."
# Creating the adjoint matrix
adj_matrix = [[matrix[-1][-1],-matrix[0][-1]],\
[-matrix[-1][0],matrix[0][0]]]
# Creating the determinant of the matrix
det_matrix = np.dot(matrix[0][0],matrix[-1][-1])-np.dot(matrix[0][-1],matrix[-1][0])
#Calculating the inverse of the matrix
inverse_matrix = adj_matrix/det_matrix
return inverse_matrix
create_inverse(wrong_matrix)
# RuntimeWarning: divide by zero encountered in divide (inverse_matrix = adj_matrix/det_matrix)
Damn... my function spoiled the fun. So always remember! not all matrices can be inversed, aside from the rule that it must be a square matrix, it also needs to have a non-zero determinant.
create_inverse(right_matrix)
"""
array([[ 0.5 , -2. ],
[-0.125, 1. ]])
"""
multiply_with_inverse = np.dot(create_inverse(right_matrix), right_matrix)
compare_two_matrices(multiply_with_inverse, identity_matrix)
# 'Both matrices are exactly the same'
This also proves the formula that of a matrix is multiplied by the inverse of said matrix, the result is an identity matrix!
Particular Solution
This is where it gets fun. So let me ask you reader, if you've read the previous days, you know that aside from having a sort of identity matrix
inside the matrix, the formula to find the particular solution that I used is more like guessing or iterating values to find the answer.
So what?
That means, when translating it into code, I also need to make it iterative and change the value of x until it matches the value we know as the result.
I'm skipping a few chapters. (Gradient Descent)
A name so famous and so badass that I can't help to learn it quicker. This is what we'll use in determining x. Now bear in mind, today is more coding than mathematics, so I won't go to too much detail regarding why do they use theta or why is a nabla there.
Today, I'll just explain that what we did is just gradient descent but in our brain, so this is how I'm translating it.
Mean Squared Error (MSE)
I'm going to be using Mean Square Error (MSE), because since there'll be some negative values, I need to ensure what's being calculated is the difference in value not accounting if it's negative or positive.
def MSE_function(predicted:np.ndarray, expected:np.ndarray)->np.ndarray: # Mean squared error
if len(predicted) != len(expected):
return "Predicted output and expected output should be the same"
#The amount of values in prediction and expected values
n = len(predicted)
# Calculating the difference squared of the expected and predicted value
total_square = np.array([(predicted[i]-expected[i]) for i in range(n)])
# The mean squared of the difference
mse_value = total_square/n
return mse_value
Gradient Descent
def finding_gradient(input_matrix:np.ndarray, mse_error:np.ndarray)->np.ndarray:
#Multiplying matrix A (transposed) by the MSE vector
At_E = np.dot(input_matrix.T, mse_error)
# Dividing it by m and multiplying it by 2
gradient = (2/len(mse_error))*At_E
return gradient
def update_x(current_x:np.ndarray, gradient:np.ndarray, descent:int = 0.1)->np.ndarray:
"""
I'm using assert because I'd rather have an error on this section rather than outputing a string that'll be an error
somewhere else down the line :D
"""
assert len(current_x)==len(gradient)
# The amount of value
n = len(current_x)
# Multiplying the gradient by the learning rate
gradient = gradient*descent
# Iterating against x and subtracting it by alpha*gradient
update_values = [(current_x[i]-gradient[i]) for i in range(n)]
return update_values
Yes, it's split into two functions. I try my best to ensure each function play only one specific role (from what I know this is best practice in coding, S in SOLID principle).
So, you can see from my code and the function that there's similarities. Here's what my code mean in the mathematical notation.
P.S. I can't use underscore (_) inside of katex text, so it should've been At_E
, current_x
and updated_values
which refer to my variables and not just some random name.
P.P.S. I'll never change from snake case so katex can fight me. I'm a python developer, the snake god might hate me if I don't use snake case and if you don't know what I'm talking about... It's been a long day, I'm sorry for rambling.
Full code
And that's it! we can use it to calculate the system equation from the previous days.
A = np.array([[1, 0, 8, 0, -4], [0, 1, 2, 0, -12], [0, 0, 4, 1, 7]])
B = np.array([42, 8, 12])
x = np.array([0, 0, 0, 0, 0])
descent_value = 0.041
for i in range(10000):
prediction = np.dot(A,x)
mse_value = MSE_function(prediction, B)
gradient = finding_gradient(A, mse_value)
x = update_x(x, gradient, descent_value)
# Generating final report
if np.abs(sum(mse_value))<0.0001:
print("Generation finish after {} iteration".format(i))
print("A total mean square error value of {} or an average of {}\n".format(round(sum(mse_value),5),np.mean(mse_value)))
print("With x:",x)
print("With Ax:",np.dot(A,x))
break
"""
Generation finish after 1077 iteration
A total mean square error value of 0.0001 or an average of 3.303279802955059e-05
With x: [4.024892733606064, -4.136195147684619, 4.626743232157506, -4.825002085330449, -0.24024375952185326]
With Ax: [41.99981363 8.00021643 12.00026453]
"""
And that's it! You've made a basic model that learns from previous data making it a machine learning model!
Acknowledgement
I can't overstate this: I'm truly grateful for this book being open-sourced for everyone. Many people will be able to learn and understand machine learning on a fundamental level. Whether changing careers, demystifying AI, or just learning in general, this book offers immense value even for fledgling composer such as myself. So, Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong, thank you for this book.
Source: Deisenroth, M. P., Faisal, A. A., & Ong, C. S. (2020). Mathematics for Machine Learning. Cambridge: Cambridge University Press. https://mml-book.com