Question : Why should I use chain rule to compute gradient, when we can calculate the gradient of the single function itself?
Answer : Using the chain rule, we can easily compute the gradient of the single complex function. Computing the gradient of the single complex function is usually difficult and computationally intensive. For example, Let’s say
y = f(u) = 5 * u^4
u = g(x) = x^3 + 7
dy / dx = ?
dy /dx = d (5 (x^3 +7 )^4 ) /dx
dy / du = 20 u^3 and du / dx = 3x ^2
Therefore, dy / dx = dy/du * du / dx = 20(x^3 + 7)^3 * 3 x^2