Figure
In the given example, activation for hidden layer as well as for output layer is taken to be a Sigmoid function.
The goal of backpropagation is to optimize the weights so that the neural network can learn how to correctly map arbitrary inputs to outputs.
Forward Pass
We are assuming that there is no bias. The net input for H11 can be calculated as below followed by squashing using the logistic function to get the output at that node:
Now the output at the nodes of first hidden layer serve as a input for the second hidden layer. We perform the same procedure to find out the output at the nodes of second hidden layer which in turn determines the value at the output node.
Calculating the Total Error
We calculate the error for each output neuron using the squared error function and sum them to get the total error:
The Backwards Pass
We use backpropogation to update the weights in order to make the predicted output closer to the desired/target output, thereby minimizing the error for each output neuron and the network as a whole.
Ouput Layer
Consider WO1.H21. We want to know how much a change in WO1.H21 affects the total error, aka
By applying the chain rule we know that:
We first determine how much does the total error change with respect to the output.
Then we determine how much does the output of O1 change with respect to its total net input. For that, first we calculate the partial derivative of the logistic function which is the output multiplied by 1 minus the output.
Finally, we determine how much does the total net input of O1 change with respect to WO1.H21.
Putting it all together:
Alternatively, we have and which can be written as , aka (the Greek letter delta) aka the node delta. We can use this to rewrite the calculation above:
Therefore:
Some sources extract the negative sign from \delta so it would be written as:
To decrease the error, we then subtract this value from the current weight (optionally multiplied by some learning rate, eta, which we’ll set to 0.2):
similarly
Hidden Layer
We want to know how much a change in WH21.H11 affects the total error. We use the same process but there will be a slight difference that here we will consider the effect of output neurons as output of each hidden layer neuron contributes to the output (and therefore error).
And is equal to
we have ,we need to figure out and then for each weight:
We calculate the partial derivative of the total net input to with respect to the same as we did for the output neuron:
Similarly all the weights can be find out using the same as given above.
After updating the weights we can see loss has been minimised. Originally it was 0.0317292 but it reduces to 0.0314531 after applying backpropogation.
Code:
Source Code(Github repo)
Neural Network from scratch in c++