How Backpropogation works?

 · 3 mins read

Figure

"NN"

In the given example, activation for hidden layer as well as for output layer is taken to be a Sigmoid function.
The goal of backpropagation is to optimize the weights so that the neural network can learn how to correctly map arbitrary inputs to outputs.

Forward Pass

We are assuming that there is no bias. The net input for H11 can be calculated as below followed by squashing using the logistic function to get the output at that node:

img1

Now the output at the nodes of first hidden layer serve as a input for the second hidden layer. We perform the same procedure to find out the output at the nodes of second hidden layer which in turn determines the value at the output node.

img2

Calculating the Total Error

We calculate the error for each output neuron using the squared error function and sum them to get the total error:

img3

The Backwards Pass

We use backpropogation to update the weights in order to make the predicted output closer to the desired/target output, thereby minimizing the error for each output neuron and the network as a whole.

Ouput Layer

Consider WO1.H21. We want to know how much a change in WO1.H21 affects the total error, aka img04

By applying the chain rule we know that:

img4

We first determine how much does the total error change with respect to the output.

img5

Then we determine how much does the output of O1 change with respect to its total net input. For that, first we calculate the partial derivative of the logistic function which is the output multiplied by 1 minus the output.

img6

Finally, we determine how much does the total net input of O1 change with respect to WO1.H21.

img7

Putting it all together:

img8

img9

Alternatively, we have img10 and img11 which can be written as img12 , aka img13 (the Greek letter delta) aka the node delta. We can use this to rewrite the calculation above:

img14

img15

Therefore:

img16

Some sources extract the negative sign from \delta so it would be written as:

img17

To decrease the error, we then subtract this value from the current weight (optionally multiplied by some learning rate, eta, which we’ll set to 0.2):

img18

similarly img19

Hidden Layer

We want to know how much a change in WH21.H11 affects the total error. We use the same process but there will be a slight difference that here we will consider the effect of output neurons as output of each hidden layer neuron contributes to the output (and therefore error).

img20

img21

img22

img23

And img24 is equal to img25

img25

img26

img27

img28

we have img29 ,we need to figure out img30 and then img31 for each weight:

img32

img33

We calculate the partial derivative of the total net input to img34 with respect to img35 the same as we did for the output neuron:

img36

img37

img38

img39

img40

img41

Similarly all the weights can be find out using the same as given above.

After updating the weights we can see loss has been minimised. Originally it was 0.0317292 but it reduces to 0.0314531 after applying backpropogation.

Code:

Source Code(Github repo)

Neural Network from scratch in c++

References:

a step by step backpropagation example