Handwriting a neural net, part 7 - Calculus cheat sheet
What I want to show now are some basic calculus shortcuts that everyone can use to take derivatives of functions.
These shortcuts are a series of rules. There are mathematical proofs for all these rules, but instead of showing you all of them, we’re just going to look at the end result to help us with handwriting our neural net’s back propagation.
The Power Rule
Remember how one of our functions was y = x²
The derivative of that is y = 2x, again, the derivative means the graphing function of all the values of the slopes of x².
The shortcut when it come to exponents is this. For something like x^2, take the exponent number, and put it in front and multiply it, so that first it becomes 2x². Next subtract 1 from the exponent number. This becomes 2x¹. But x¹ is just x. So 2x¹ becomes 2x.
Let’s try it with x³. That gives us 3x².
Now let’s try it with x⁴. That gives us 4x³.
What about 3x³. Well we take the two and multiple, and subtract from the exponent. So that gives us 3*3x², or 9x².
It’s a powerful rule!
You might be wondering, so we said in our first example 2x is the same as 2x¹.
What if after taking the derivative of x², which gives us 2x¹, we wanted to take the derivative again, this time of 2x¹.
Well that gives us 1 * 2x⁰. And x⁰ is always equal to 1, so that gives us 1 * 2(1), which just gives us 2.
This makes sense when we take the derivative of a straight, sloped line. The straight line, all along the line, has the exact same slope. So the derivative is just a single value.
And what happens if we take the derivate yet again? This time on 2? It simply goes away. There is not slope of a flat line along 2. It has 0 slope.
So if we have a big equation like 2x⁴ + 3x +4, we can easily take the derivative of x.
The first part becomes 6x³. The next part becomes 3, and then the 4 goes away. So we get 6x³ + 3. And if we take the derivative of that we get 18x², because the 3 by itself goes away. And if we take the derivative of that we get 36x. Then we get 36. And then we get nothing.
The Product Rule
Let’s say that we have three different functions. In fact we already had an example with three different functions.
Our example of 2x⁴ + 3x +4 could be considered 3 different functions added together.
But let’s take away the 4 and just say that we have two different functions.
Let’s say (x²) *. 3x, and we want to take the derivative. It would be great if it worked like the product rule where we say 2x * 3 or something, but that’s not how it works with multiplication and division.
So again, we have x² * 3x.
The product rule says that you take the derivative of the first function times the second function, and then add that to the first function times the derivative of the second function.
That’s a lot to say, so let’s write it a different way. Let’s say that f is our first function and g is our second function. Let’s say f’ means the derivative of the first function, and g’ is the derivative of the second function.
So (f*g)’ is what we’re after, the derivative of the first function times the second function.
(f * g)’ = f’ * g + f * g’
That a way of writing out what I said above.
Let’s write out our example
(x² * 3x)’ = ((x²)’ * 3x) + (x² * (3x)’)
= (2x * 3x) + (x² * 3)
= 6x² + 3x²
We can test our result be rewriting the original derivative in a way that uses only the product rule
x² * 3x = 3x³
The derivative of 3x³ = 9x²
So why didn’t we just adjust our formula and use the power rule every time? Sometimes there are functions that use different numbers and patterns where we can’t use the power rule directly, but we still need to take the derivative to two functions that are multiplied by each other.
The Chain Rule
Finally we’re going to use a rule called the chain rule.
Let’s take a function like this:
y = (2x -1)²
We can look at this as two separate functions inside of each other
z = 2x -1
And y = z²
So we have function z, and function y. And function z is sort of inside function y. Function z is being used as a parameter (which is another word for input), for function y.
In these cases what we do is solve the derivative of the outside function first, and then multiply that times the derivative of the second function.
So for y = (2x -1)², let’s take the derivative of the y, the outside part, first.
y = z², so the derivative of y is y = 2z. And then we substitute (2x-1) for z and we get y = 2(2x -1).
And finally we multiply that part by the derivative of the inside function.
So the inside function in this case was z. And z = 2x -1. And the derivative of z = 2.
So we multiply the inside derivative by the outside derivative and we get
2(2x -1) * 2
Which equals = 8x - 4.
There are other handy things to remember about derivatives when you’re doing calculus in the real world.
Like the derivative of:
sin(x) = cos(x)
cos(x) = -sin(x)
-sin(x) = -cos(x)
-cos(x) = sin(x)
(If you don’t remember for trigonometry, the sine is taking the angle of a right angle triangle and dividing opposite over the hypotenuse, and cosine is adjacent over the hypotenuse. Here is a handy page with more about that https://www.mathsisfun.com/sine-cosine-tangent.html).
And the derivative of e^x is e^x. That’s right. The meta slope of e^x is the slope of e^x. I told you that e is a trippy number!