Handwriting a neural net, part 6 - An Absolute Beginner’s Guide to Calculus
Last time we found out how much we were off by using a sum of squared errors.
But what do we do with those errors? Ideally we want to go back to our second neuron and say, you were off by this much, so adjust the weights accordingly. Then we want to go back to our first neuron and say, you were off by this much, so adjust your weights accordingly. So first we did the forward pass. Now we need to do what’s called backpropagtion, where we take our error and work backwards to adjust with our error.
To do this, we’re going to need some calculus.
Don’t worry if you don’t remember or never took calculus. There are plenty of programmers and computer professionals who never use calculus, at least not directly. But we’re going to take you behind the scenes and actually use it!
Consider this graph:
How to graph a function
This is a line that is defined as y = 2x.
And again, if you’re not familiar with graphing functions, it’s basically replacing x with a number on the horizontal x axis, doing the math, and making a dot where it intersects with the vertical y axis.
So at x = 0, we put a dot at y = 0, because 2 * 0 = 0. And then at x = 2, we say y = 2 * 2, which equals 4, so we put a dot at the intersection of 2 horizontally and 4 vertically. Connect those points, and you’ve got that line.
But you have to make sure your line goes through every point you could put on the line when you add your number into y = 2x and then do that math.
What is a slope?
The slope essentially means the steepness of the line. The way we find it for this line is to take two points on the line and subtract the second y value from the first y value, and then take that and divide it by the second x value subtracted from the first x value.
So for this line, let’s look at two points. We have a point at (x = 1, y = 2), or (1, 2), and we have a point at (x = 2, y = 4), or (2, 4)
So our formula for slope is (y2nd - y1st / x2nd - x1st), which for us mean (4 -2 / 2 - 1), which is (2 / 1) which is 2. So our slope for the line between those two points is 2. And because this is a straight line, no matter what two points you choose on the line, the slope will always be 2.
And 2 is a positive number, so our slope is positive.
Let’s look at a negative slope really quickly.
So we can see that this line slopes downward from left to right. And there are two points on it. There is (-2, 4) and there is (2, -4), which means the slope
= (-4 - 4 / 2 - (-2))
= (-8/ 4)
So our slope is -2
Basically if you see a straight line going down from left to right, it’s negative, and if you see a straight line going up from left to right, it’s positive, in a standard coordinate system.
So that’s what a slope looks like on a straight line.
What about on a curved line?
Here we see the graph of the line y = x²
Let’s take two points on this line and find the slope.
Let’s look at (-1, 1) and (-2, 4)
Using our formula, this slope equals -3
What about if those points got closer together though? Maybe we use (-1.25, 1.5625) and (-1.75, 3.0625). Then in our formula for slope, this becomes 1.5/0.5 = -3.
Interesting. As we slowly move the points toward each other, their slopes stay at -3. However, this will be different for different kinds of lines. This one just happens to work this way, so don’t be too amazed that it continues to be -3. The thing is, we’re moving closer to the middle of the two points we started with.
But what happens when those two points actually meet in the middle? What happens when the two points become one point?
This is also known as what happens to the slope when the limit approaches 0. When the distance between the points is zero, how do we calculate the slope? In our formula we needed two points, but if they meet in the middle, we only have one.
One way to do this is to imagine that the two points never quite meet in the middle, but get infinitely close. We say that the limit approaches 0 instead of actually becomes 0. That way we can imagine that we have two points that are super close to one another.
The way it looks when graphed is that the line sort of just shoots across the line like a bullet grazing a mound, or a stick placed against a curved wall.
And that’s the slope at that point. Well, as the limit approaches 0.
So let’s look at a graph of several slopes on this line where the limit approaches 0.
You can see that at some points, the slope is negative. At other points it’s positive. In fact, at 0, the slope is 0. It’s just a straight across flat line.
So here is an interesting question: What if we made a graph of all the slopes? We could treat the value of each slope like a new y coordinate with in a different graph. It’s like we could make a new line (and in so doing, a new function, or formula). We would be finding a new line with its own slopes! We could be finding the slope of our slopes!
And here is the heart of what we’re after, this line of slopes, is the derivative.
This process of taking all the possible slopes of a line, and using those slopes as values for a new line, is conceptually behind taking the derivative of a function.
So let’s look at a bunch of our slopes and make a graph of them.
So at -1.5 we saw that our slope line has a value of -3. So we put a dot at (-1.5, 3). And we see that at -2 or slope is -4. So we put a dot at (-2, -4). And at 0 we saw our slope was 0, so we put appoint at (0,0). And at 2, our slope is 4, so we put a dot at (2,4).
At this point you can eyeball that we’re making a straight line. And this straight line has its own formula. In our case, every time we have a value for x, the y value is twice as much. In fact, the value for this is y = 2x!
There are mathematical ways of demonstrating that the derivative of y = x² is equal to 2x. There are formal proofs that demonstrate what’s going on. And in the next post, we’ll teach you the shortcuts that mathematicians use to workout calculus problems on paper or in their head.
For now you’ve got the basic concept, basically we’re finding the formula for the slope anywhere along the original line.
So when we have a function (you’ll notice I’m using the words formula and function kind of interchangeably. I’ll start just calling it function) like y = x², the derivative of that function is y = 2x.
So that’s the concept behind finding a derivative!
Why we’re doing calculus
At this point, maybe you want to know why we’re doing all this calculus with regard to our neural network.
When we do machine learning we want to find ways to minimize our error. To use an example that is often used in AI, it’s like we’re on top of a mountain. And on this peak the error is very high. What we want to do is to go down to the lowest possible valley, where error can be very low. When our error is as close to 0 as we can get, we have our right answers!
So when graphing out all the points of error and such, it turns out that one way we can know where to go is to find out which way the slope points downward. And we follow that slope, and it leads us to the lower error. We will be taking the derivative of the function that calculates our error. We’ll take the derivative of the cost function that we learned about in the previous post. And that gives us the slope we want to follow for all the different weights we used in our calculations from before.
Then our machine can learn what the error was, adjust, and do better next time.