Blueshift Development
Apps by Matthew Waller, aiming to empower and delight.

Machines Learning Matt

*Folksy voice

Sometimes I can’t tell if I’m teaching the machines or the machines are teaching me.

Handwriting a neural net, part 9 - Handwriting Be Darned

Image by  AxxLC  from  Pixabay

Image by AxxLC from Pixabay

Okay, we’ve done our forward pass and our backward pass. We’ve done a full cycle of machine learning via neural net. It’s time to let the computers take over.

If you were to fully do the neural net by hand, you would have to do all the calculations we’ve done more than a 1000 times before getting at a pretty much final answer.

Image doing all that, with a pen and paper or a calculator and working it out a 1000 times. No thanks. And this is a dirt simple neural net.

Neural nets have been around since the 1940s, but it’s only with the last few decades that they’ve been able to be generally useful, and we’re still finding the best ways to apply them. What they really needed to take off was high powered computers that can handle doing massive calculations, thousands, millions, billions of times.

So I’ll unveil the computer code that runs the neural net we’ve been making throughout this blogpost. Kudos to the people at Towards Data Science who shared a simple neural net that much of this course was modeled on.

And without further ado, here is the Python code that you can paste into a service like Google Colab and run right away.

import numpy as np

def sigmoid(x):
    return 1.0/(1+ np.exp(-x))

def sigmoid_derivative(x):
    return sigmoid(x) * (1.0 - sigmoid(x))

class NeuralNetwork:
    def __init__(self, x, y):
        self.input      = x
        self.weights1 = np.array([[0.25], [1.0], [0.25], [1.0]])
        self.weights2 = np.array([[0.5]])               
        self.y          = y
        self.output     = np.zeros(self.y.shape)

    def feedforward(self):
        self.layer1 = sigmoid(np.dot(self.input, self.weights1))
        self.output = sigmoid(np.dot(self.layer1, self.weights2))

    def backprop(self):
        # application of the chain rule to find derivative of the loss function with respect to weights2 and weights1
        d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * (sigmoid_derivative(self.output))))
        d_weights1 = np.dot(self.input.T,  (np.dot(2*(self.y - self.output) * sigmoid_derivative(self.output), self.weights2.T) * sigmoid_derivative(self.layer1)))

        # update the weights with the derivative (slope) of the loss function
        self.weights1 += d_weights1
        self.weights2 += d_weights2
        
X = np.array([    [.4,.8,.4,.3],
                  [1,.2,.4,.3],
                  [.3,.4,.5,.4]])
y = np.array([[1.0],[0.85],[0.4]])
nn = NeuralNetwork(X,y)

for i in range(1500):
    nn.feedforward()
    nn.backprop()

print(nn.output)


You might not be familiar with Python, and that’s okay. Maybe in a future blog post we can tackle decrypting this code line by line, but if you look around, you’re sure to recognize a lot. You can see dot from dot product, you can see the “X =” toward the bottom is our list of features, you can see “y =” as our ground truth, the three amounts of happiness for Alejandra, Bob and Alice.

And at the bottom you can see we run this forwards and backwards 1500 times! And then we print the output. That output is supposed to go as close as it can to the values of “y =”.

So how does it do?

Instead of 1500, I’ll try it for just 100 and report back.

For 100 steps of training, we get

[

[0.77633351],

[0.76799445],

[0.74576682]

]

Hmmm. That’s not close at all to

[

[1.0],

[0.85],

[0.4]

]

Let’s train it for another 300 steps.

Now it’s

[

[0.86459987],

[0.82810535],

[0.62330616]

]

Interesting. Now we see that they’re starting to separate. The first two are close to the top, and the bottom one is getting smaller. So, yeah, closer. How about we train for 500 more steps!

[

[0.93005164],

[0.83085754],

[0.50152117]

]

Wow, now it’s getting really close The top one is 0.07 away from the correct amount, the next one is only 0.02 or so away and the bottom one is around 0.05 away. Much better.

What if I did it 1,000 more steps?!

[

[0.93114243],

[0.83099963],

[0.5 ]

]

Huh, now it’s not changing very much, even though that was the most we trained it.

At this point our neural net has basically done the best that it can do.

How do we use this?

Now we have a fully trained neural net. It has its weights stored and ready to go. So how do we use this?

Well, let’s look at our happiness problem again. Let’s say that it takes millions of dollars and years of observation to get someone’s level of happiness. Those levels we started with, 1.0 for Alejandra, 0.85 for Bob and 0.45 for Alice are super valuable in this situation.

But what if the data points we got, the relationship score, the money score, the sandwiches score and the baseline were, for some reason, very easy to get. Takes a few dollars and a few minutes.

Then we could do millions of dollars worth of work and years of work in less than a second!

So for instance, we have the original data for Alejandra, Bob and Alice. But what if James comes along, and his scores are 0.2, 1.0, 0.4 and 0.3. Well, we could make a new “X =” just for him, and run only the forward pass. We don’t have to do the backward pass because we’re done training. So James has low money, but high relationship, medium sandwich and low baseline scores.

If we keep the current value of our weights and run the forward pass, we find out in an instant that James’ happiness score is about 0.95. That’s almost as happy as Alejandra. And we found it out in a second.

(Here is kind of makes sense that he would be happy if he had great relationships. But you have to be careful when applying a model trained on very little data. It can do what’s called overfitting. That means it gets so used to understanding the data it trained on, it can't generalize well about other things. Like training a neural net only on pictures of robins and then asking if an emu is a bird. It may not be able to say accurately, because it has never seen anything very different.)

This doesn’t just apply to our made up happiness neural net. Neural nets can help find out like what customers might like to buy, which direction a car should go given a lot of environmental situations (so that you have self-driving cars), when a stock will go up or down, when a photograph shows that a patient has cancer, and on and on. There are many valuable situations like these, and neural nets have shown awesome results.

Where do we go from here?

What if we wanted to tweak our neural net? Make it a lot better? There are a lot of ways you can experiment and tweak a neural net. You could add more layers. You could start out with more weights for the individual people. You could get more data to make sure your data is accurate. You can look through your data to make sure that you don’t have any weird, one-off things that are throwing off everything else. You could pick different categories to observe. Like, really, who is going to base their entire happiness as a person around sandwiches? Not many people, I would imagine, so maybe you remove that whole set of data and don’t worry about sandwiches.

Here is the thing. You’ve seen how much math goes into a neural net. That’s only part of the problem if you want something that will do good work. If you feed bad data, you get bad data. If you set bad results, you get bad results. Sometimes, to get on the right track, you need good intuition. In our case, knowing that your basic needs are met and caring about others are pretty obviously important parts of happiness. That’s something you might’ve had a sense for already. Without getting too deep on, how do we define a “computer” and a “brain” and the “mind,” it’s safe to say that there is a lot more to our own brains than there is to neural networks at present. Machine learning is as much art as it is math.

Matthew Waller