How Backpropogation works?

Figure

"NN"

In the given example, activation for hidden layer as well as for output layer is taken to be a Sigmoid function.
The goal of backpropagation is to optimize the weights so that the neural network can learn how to correctly map arbitrary inputs to outputs.

Forward Pass

We are assuming that there is no bias. The net input for H₁₁ can be calculated as below followed by squashing using the logistic function to get the output at that node:

$img1$

Now the output at the nodes of first hidden layer serve as a input for the second hidden layer. We perform the same procedure to find out the output at the nodes of second hidden layer which in turn determines the value at the output node.

$img2$

Calculating the Total Error

We calculate the error for each output neuron using the squared error function and sum them to get the total error:

$img3$

The Backwards Pass

We use backpropogation to update the weights in order to make the predicted output closer to the desired/target output, thereby minimizing the error for each output neuron and the network as a whole.

Ouput Layer

Consider W_O₁.H₂₁. We want to know how much a change in W_O₁.H₂₁ affects the total error, aka $img04$

By applying the chain rule we know that:

$img4$

We first determine how much does the total error change with respect to the output.

$img5$

Then we determine how much does the output of O₁ change with respect to its total net input. For that, first we calculate the partial derivative of the logistic function which is the output multiplied by 1 minus the output.

$img6$

Finally, we determine how much does the total net input of O₁ change with respect to W_O₁.H₂₁.

$img7$

Putting it all together:

$img8$

$img9$

Alternatively, we have $img10$ and $img11$ which can be written as $img12$ , aka $img13$ (the Greek letter delta) aka the node delta. We can use this to rewrite the calculation above:

$img14$

$img15$

Therefore:

$img16$

Some sources extract the negative sign from \delta so it would be written as:

$img17$

To decrease the error, we then subtract this value from the current weight (optionally multiplied by some learning rate, eta, which we’ll set to 0.2):

$img18$

similarly $img19$

Hidden Layer

We want to know how much a change in W_{H₂₁.H₁₁} affects the total error. We use the same process but there will be a slight difference that here we will consider the effect of output neurons as output of each hidden layer neuron contributes to the output (and therefore error).

$img20$

$img21$

$img22$

$img23$

And $img24$ is equal to $img25$

$img25$

$img26$

$img27$

$img28$

we have $img29$ ,we need to figure out $img30$ and then $img31$ for each weight:

$img32$

$img33$

We calculate the partial derivative of the total net input to $img34$ with respect to $img35$ the same as we did for the output neuron:

$img36$

$img37$

$img38$

$img39$

$img40$

$img41$

Similarly all the weights can be find out using the same as given above.

After updating the weights we can see loss has been minimised. Originally it was 0.0317292 but it reduces to 0.0314531 after applying backpropogation.