Handwriting a neural net, part 1 - The Concept
You've heard about these neural nets. Maybe in a movie. Maybe in a news article. I'll admit, at first blush they sound like gobbledy-gook sci-fi language, but they are actually a super useful way to make sense of a lot of information and do some remarkable things with that information. So we're going to break down what they are, what they can do, and what they do, and we're going to find that out EXACTLY. We're basically going to handwrite the math for a neural network until we get the idea of what's going on and let the computer handle the rest.
What are they?
So neural nets are a way to do machine learning, which is a way for computers to find patterns in information. They call them neurons because they’re meant to mimic the way the brain works. They’re called nets because they have lots of connections between them so they kind of look like nets when you draw them out.
You can think of a single neuron as a person playing a game called Find-The-Item, where you are told if you’re hot or cold when you’re trying to find something. The player gets some information, some clues, and moves toward the item.
When the player is done looking the first time, the game runner says you’re hot or cold by this much. So the player thinks a bit, reconsiders the clues, or gets new clues and takes new steps to guess at where the item is.
The game also includes several other players, where the first player takes their steps, passes on the clues and their steps to a new group of players, which takes their own version of steps to try to reach the hidden item. And each time, the person in charge of the game says "Sorry you got it wrong by this much, so go back and adjust your steps by this much." So the players adjust and try again and again, going forward with their guesses/steps, and backward making adjustments, until they gets close enough and the game runner says, “You found the item!”
The useful part is, when players are looking for a similar item, they can remember the steps they took for the last item to figure out where it is.
When you draw out all the connections between the players/neurons, it looks a little like a net. It's a neural net.
They call them neurons because the whole process - of interconnected individual things processing something and then moving it forward and then backward - is meant to mimick the way that the brain works. Individual neurons deciding what information to pass to the next neurons, and so forth.
The Secret to Happiness
Let's take a look at a specific item that our players might be trying to find. Let's say our players are trying to find happiness. That’s right, we’re switching now from our theoretical game to looking for the formula for happiness.
Every machine learning problem starts with some data, some information. You feed data into your machine, your machine goes back and forth trying to learn from the data, and then the machine is able to take new data it has never seen and make predictions.
Our machine will try to predict if a person will be happy, and the data will come from Alejandra, Bob and Alice.
We're going to say that there are 4 things, 4 features, determining happiness. Money, relationships, sandwiches and baseline happiness. Money means cash, relationships can be things like the quality of relationships with friends and family (not necessarily how many you have), and sandwiches can be things like a nice BLT or a tuna melt. And baseline we’ll just say is how happy a person is just for being around.
It’s silly, but the question is, how do we weight each feature? By this I mean, how important should money be? How important should relationships be? And how important should sandwiches be? And how big should the baseline be?
So, those weights are the things that we're really after. So let’s get to calculating.
Finding out the Weights
Let’s use an arbitrary set of scores for happiness. Let's say that we're told Alejandra’s happiness is at 130 (and let's say that most people's happiness is at 80, so Alejandra is doing great), and her money score is 40, her relationships are 80 and her sandwiches are 40. And the baseline is 20.
What weights do we need to give Alejandra happiness so that they reach 130? In this case, that's actually really easy. We could say MoneyWeight = 0.5, RelationshipWeight = 1, and SandwichWeight = 0.25, BaselineWeight = 1.
Then our formula in our correcting accountant's spreadsheet could say:
Alejandra’s Happiness = (Money * MoneyWeight) + (Relationships * RelationshipWeight) + (Sandwiches * SandwichesWeight) + (Baseline * BaselineWeight)
And when we substitute in Alejandra’s happiness and the corresponding values, we get
130 = (40 * 0.5) + (80 * 1) + (40 * 0.25) + (20 * 1)
130 = 20 + 80 + 10 + 20
And as we see, this adds up!
So, we've got a set of weights [0.5, 1, 0.25, 1.0] for [MoneyWeight, RelationshipWeight, SandwichesWeight, BaselineWeight] that works for Alejandra.
Next, let's say Bob has a happiness of 60. So he's a little under the average. His Money = 100, his relationships = 20 and his sandwiches = 40, and his baseline = 20.
What would happen if we used our current [0.5, 1, 0.25, 1.0] weights for Bob?
Again we start out with Bob's Happiness = (Money * MoneyWeight) + (Relationships * RelationshipWeight) + (Sandwiches * SandwichesWeight) + (Baseline * BaselineWeight)
85 = (100 * 0.5) + (20 * 1) + (40 * 0.25) + (20 * 1.0)
85 = 50 + 20 + 10 +20
85 = 100
That's not right! We're going to have to adjust our weights. But when we do that, we need to make sure it works for both Alejandra and Bob.
So, maybe we make sandwiches worth 0.25 instead of 0.5 since Bob has way more money, but is way less happy than Alejandra. And maybe we make our baseline higher, around 30.
So let's see if that works for Alejandra
130 = (40 * 0.25) + (80 * 1) + (40 * 0.25) + (30 * 1)
130 = 10 + 80 + 10 + 30
So far so good for Alejandra. Now for Bob.
85 = (100 * 0.25) + (20 * 1) + (40 * 0.25) + (30 * 1)
85 = 25 + 20 + 10 + 30
Cool, so our updated weights are [0.25, 1, 0.25, 1] for [Money, Relationships, Sandwiches, BaselineHappiness]. These weights work for both Bob and Alejandra.
But now we have Alice to consider! Good grief, does that mean we'll have to shuffle things around to adjust our weights yet again? It does. And imagine doing that for 100 people, for 1,000 people, for a million people!
This is where our computers come in, and our deep learning tricks. Then we can do this in a systematic way. So in part two, we'll set things up to do things the machine learning way.
PS: Those with experience in machine learning and linear algebra might know that there is something called a bias that can also be added to these calculations. It’s like a feature that doesn’t get multiplied by a weight, but still gets added up. For the sake of simplicity, our bias will be 0 so that we don’t have to worry about it.