Machines Learning Matt

*Folksy voice

Sometimes I can’t tell if I’m teaching the machines or the machines are teaching me.

Handwriting a neural net, part 3 - Activation

So last time we were considering the happiness of Alejandra, Bob and Alice.

For Alejandra and Bob, the weights were 0.25 importance for money, 1 importance for relationships, 0.25 importance for having sandwiches, and a 1 importance for just being there (our baseline happiness).

So we multiplied Alejandra and Bob’s amounts of money, quality of relationships, sandwich score and baseline amount and multiplied them by all those weights to get an estimate of happiness.

From now on, we’ll also consider Alice’s numbers to get a full calculation going. Let’s say she has scores of 30 in money, 80 in relationships, 50 in sandwiches and 40 for just being there.

So now we have all of our data. What’s going to happen now is that we will want to decide if each of them are happy or not. And we can do a percentage between 0 and 1.0 (like saying between 0 and 100, but divided by 100), where 0 is not happy at all and very happy is 1. Last time we said Alejandra was happy because her score was above average at 130, and we’ll also say that Bob is happy with a score that was around 80. We’ll change those numbers later to fit on our 0.0 to 1.0 scale.

Once we calculate the numbers, what will happen is that we need to make the multiplication and decide whether to activate the neuron. The neuron is the step where we do all of our multiplications and look at our outcomes. Then we need to see if it “activates.”

Activation is kind of like saying to the information of a neuron, you need to be this tall to ride. For one neuron to pass its findings the next neuron, we put it through a bit of math that tells us how much information we should pass on.

There are different ways doing activation. You could simply say, if our calculation says you’re greater than 0, you move forward, if not, you don’t. This can be called a step activation, since it jumps directly up to yes or know in a sharp manner. So instead of a gradual change, like a ramp, it’s shaped more like a step going between the floors of a building. It’s either all yes or all no.

You can also do something more like the ramp, where there is a gradual change.

There are at least a couple of ways of doing this: linear and non-linear.

The linear way produces a straight line with a slope. This makes it where we have different possible values, like, instead of just yes or no, we could say, 0.7 yes, or 0.2 no. Sort of like saying, a lot hot, or a little bit cold, when we’re asking how hot or cold we are to the right answer. With a linear function we can have many answers, but again, they are in the form of a straight line when you graph them out.

You might not remember how to graph functions. We’ll go over that here:

How to graph a function

A lot of lines are defined at y = Mx + b. So that could look like y = 4x + 5, or y = 2x +9

Take a line that is defined as y = 2x. (You could also write it as y = 2x + 0)

When you graph the function, it looks like this:

To graph a function, in this case, replace x with a number on the horizontal x axis, do the math, and make a dot where it intersects with the vertical y axis.

So at x = 0, we put a dot at y = 0, because 2 * 0 = 0. And then at x = 2, we say y = 2 * 2, which equals 4, so we put a dot at the intersection of 2 horizontally and 4 vertically. Connect those points, and you’ve got that line.

But you have to make sure your line goes through every point you could put on the line when you add your number into y = 2x and then do that math.

So that is the shape of a linear function. And that’s what a linear activation can look like. Because of math we’ll learn later, it’s actually helpful for us to use a non-linear activation function. That means a function that, when you graph it out, has some curved shapes to it. We’ll learn a special one, called sigmoid.

Sigmoid

The formula for sigmoid looks like this:

And if you add different values of x and plot out all those points on a graph, and then draw a line between all the points, it looks like this:

So what happens is that our math values will get squished down between a value of 1 and 0. And then when we pass those numbers to the next neuron, those numbers will get used in the calculations and then those will get squished between 1 and 0 and so on. It’s like making sausage, where our initial input becomes new numbers that adjust to new weights and then come out looking differently on the other side until it’s all packaged up into the answer we’re looking for.

By the way, the equation looks pretty simple when you plug in x, except for that e value. What is this e? We only know about x. Well e refers to a special number called Euler’s number. It’s pronounced “oiler.” It’s kind of like pi. It has some fascinating properties, especially for calculus, which we’ll be going over soon. Check out this link about e for fun!

Matthew Waller