Machines Learning Matt

*Folksy voice

Sometimes I can’t tell if I’m teaching the machines or the machines are teaching me.

Handwriting a neural net, part 2 - Entering the Machine Learning Matrix

Last time we were trying to figure out how important each of our "features" or categories (money, relationships, sandwiches, baseline) are to happiness. We called that importance number a weight.

And our data is coming from Alejandra, Bob, and Alice.

Let’s make an arbitrary happiness score that could be any number, although most people are around 20 to 150 range.

For Alejandra, happiness is at 130, and her features are money = 40, relationships = 80, sandwiches = 40, and baseline = 30.

Let's put this in a group, or a bucket or whatever you want to call it.

Alejandra features = [40, 80, 40, 30]. Computer programmers call this group of items in square brackets an array, generally, if the order matters. And in our case it does. So again, an array is basically a group with an order. So when we take Alejandra’s features and multiply them by our weights her happiness amounts to 130.

Now let's do the same for Bob. Happiness = 85. His features are money = 100, relationships = 20, sandwiches = 40, baseline = 30

So Bob features = [100, 20, 40, 30]

Now let's put them together in a group of all our data

Bob, Alejandra features = [[40, 80, 40, 30], [100, 20, 40, 30]]. So what we have is a group of groups, or an array with two arrays.

This can be a little hard to read, so let's adjust:

Bob, Alejandra features = [

[40, 80, 40, 30],

[100, 20, 40, 30]

]

There we go. And now let's get the weights, or imporance values that we decided on last time:

Weights = [0.25, 1, 0.25, 1]

So, how do we calculate happiness again? We we multiply each weight by the feature and add them all up!

Enter the matrix

We can think of these two things as two matrixes. And mathematically, we can do things by matrix multiplication.

So, if we have a matric that has two columns and two rows:

[1, 2

3, 4]

We can multiply that by a matrix with 2 rows and 3 columns

[5, 6, 7

8, 9, 10]

When we talk about the rows and columns of a matrix, we can talk about the “shape” of a matrix. So the first one, because it has two rows and two columns, we can call it a 2x2 matrix, with a shape of (2,2). And the second matrix has 2 rows and 3 columns, so we can call it a matrix of 2x3, or a shape of (2,3).

To multiply them the way we want, we take the first row of the first matrix, multiplying the corresponding members in the first column of the second matrix and add them all together. Then you take the first row of the first matrix, multiply by the second column of the second matrix and add them all together. Then you do the same for the first row of the first matrix and the third column of the second matrix.

Then you do the row and column multiplication and addition again, but using the second row of the first matrix

Remember: it's always rows times columns, all added up.

That’s a lot to process in writing, so let’s work out our example:

[

[1, 2],

[3, 4]

]

matrix multiplied by

[

[5, 6, 7],

[8, 9, 10]

]

equals

[

[(1 * 5) + (2 * 8), (1 * 6) + (2 * 9), (1 * 7) + (2 * 10)],

[(3 * 5) + (4 * 8), (3 * 6) + (4 * 9), (3 * 7) + (4 * 10)]

]

Which equals this:

[

[21, 24, 27],

[47, 54, 61]

]

And that’s our new matrix!

There is a different kind of multiplication called scalar multiplication. If I multiplied 2 * the [[1,2],[3,4]] matrix I would just multiply each value by two so that it is [[2,4],[6,8]]

But we're not interested in scalar multiplication for now. We want to do the first kind of multiplication, matrix multiplication.

The type of matrix multiplication we did can be called a dot product. And the `dot` operation shows up a lot in deep learning code.

Fun things to know about matrix multiplication. Notice that in the dot product above we had a matrix with 2 rows and 2 columns. It's a 2x2 matrix. The other matrix had 2 rows and 3 columns. It's a 2x3. So we did a 2x2 dot 2x3. If the inner numbers are the same, (meaning if the number of columns in the first matrix are the same as the number of rows in the second matrix), then it's possible to perform matrix multiplication. Our our both equal to 2, so we're good. Otherwise things won't add up.

Another fun thing. In a 2x2 dot 2x3, the outer numbers will be the "shape" of our matrix (again, the shape is the number of rows and columns, in this case). So our resulting matrix should've had 2 rows and 3 columns. And it did!

Matrix multiplication and happiness

Calculating it all out.

Back to our happiness calculations. We've already given a run through of what our happiness calculation looked like in the previous blogpost. Let's run through it again with matrix multiplication.

So we have our Bob and Alejandra features:

Bob, Alejandra features = [

[40, 80, 40, 30],

[100, 20, 40, 30]

]

You can think of these numbers in the shape of a spreadsheet. If there were column titles at the top, they might look like this:

[    Money   Relationships Sandwiches  Baseline

Alejandra [40,        80,           40,         30],

Bob   [100,       20,           40,         30]

]

Again, those are our features.

Now for our weights:

[0.25, 1, 0.25, 1]

As a spreadsheet these might look like:

Money   Relationships Sandwiches  Baseline

[Weights   [0.25,          1,       0.25,           1]   ]

So our first matrix has a shape of 2x4 and our second matrix has a shape of 1x4. Notice that we cannot do weights dot features, because that would mean 1x4 dot 2x4. We need the 4s to be on the inside. How could we switch it so that our features are not a 2x4 but a 4x2?

Transpose help

Fortunately, there is a thing called a transpose.

So if we take

[1,2

3,4]

The transpose of that matrix is

[1,3

2, 4]

It sort of flips the rows and columns in a way that keeps the right relationships. This is important so that when we take the transpose, the numbers of our different categories stay in alignment.

So let's do it. Let's take the transpose of our features.

[

[40, 80, 40, 30],

[100, 20, 40, 30]

]

Becomes

[

[40, 100],

[80, 20],

[40, 40],

[30, 30]

]

So our spreadsheet version looks like this

[                  Alejandra   Bob

Money:         [40,    100]

Relationships: [80,    20]

Sandwiches:    [40,    40]

Baseline bias: [30,    30]

]

So everything matches up in the transpose, and now we can calculate our happiness matrix

[

[0.25, 1, 0.25, 1]

]

dot

[

[40, 100],

[80, 20],

[40, 40],

[30, 30]

]

equals

[

[(0.25 * 40) + (1 * 80) + (0.25 * 40) + (1 * 30),

(0.25 * 100) + (1 * 20) + (0.25 * 40) + (1 * 30)]

]

which equals

[

[130, 85]

]

Once again, our 1x4 dot 4x2, became a 1x2 (one row, two columns), because that's what the number on the outside of our dot product are, 1 and 2. It worked!

The spreadsheet version of the dot product is:

[                    Alejandra   Bob

Happiness Level: [130,    85]

]

And those are the same results we got from before!

Matthew Waller