Section 3.6 The Chain Rule
Consider the function \(f(x) = \sqrt{x^2 + 1}\text{.}\) Using only the power rule, we cannot differentiate \(\sqrt{x^2 + 1} = (x^2 + 1)^{1/2}\text{,}\) as the base of the power is not just \(x\) but a function of \(x\text{,}\) in particular \(x^2 + 1\text{.}\) The chain rule gives insight for differentiating composite functions. The chain rule is among the most powerful rules for differentiation functions.
If \(y\) is a function of \(u\text{,}\) and \(u\) is a function of \(x\text{,}\) then \(y\) is a function of \(x\text{.}\) In particular, if \(y = f(u)\) and \(u = f(x)\text{,}\) then \(y = f(g(x))\text{,}\) a composite function. In the above example, we have \(y = \sqrt{u}\text{,}\) and \(u = x^2 + 1\text{.}\) Then, \(y = \sqrt{x^2 + 1}\text{.}\) We want to consider the derivative of \(y\) with respect to \(x\text{.}\) That is, how fast does \(y\) change when \(x\) changes. In particular, how does this derivative \(\frac{dy}{dx}\) relate to the derivatives \(\frac{dy}{du}\) and \(\frac{du}{dx}\) (which, in this example, are \(\frac{dy}{du} = \frac{1}{2\sqrt{u}}\text{,}\) and \(\frac{du}{dx} = 2x\))?
Consider the general case. If \(y = f(u)\) and \(u = g(x)\text{,}\) and if \(y\) is changing at a rate of \(\frac{dy}{du}\) with respect to \(u\text{,}\) and \(u\) is changing at a rate of \(\frac{du}{dx}\) with respect to \(x\text{,}\) then \(y\) is changing at a rate with respect to \(x\) of their product, \(\frac{dy}{du} \cdot \frac{du}{dx}\text{.}\)
Subsection 3.6.1 The Chain Rule
Let \(f,g\) be functions, and consider their composition \(f \circ g\text{.}\)
Theorem 3.6.1.
Let \(f, g\) be functions, with \(R(g) \subseteq D(f)\) so the composition \(f \circ g\) is defined. Then, if \(g(x)\) is differentiable at \(x\text{,}\) and \(f\) is differentiable at the point \(g(x)\text{,}\) then the composite function \(f \circ g\) is differentiable at \(x\text{,}\) and,
\begin{equation*}
\boxed{(f \circ g)'(x) = f'(g(x)) g'(x)}
\end{equation*}
Intuitively, the derivative of the composition \(f \circ g\) is the product of the derivatives of \(f\) and \(g\text{,}\) with the caveat that the derivative of \(f\) is evaluated at \(g(x)\text{.}\)
Using Leibniz notation, if \(u = g(x)\) and \(y = f(u)\text{,}\) then
\begin{equation*}
\boxed{\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}}
\end{equation*}
where \(\frac{dy}{du}\) is evaluated at \(u = g(x)\text{.}\) This notation is particularly natural, because if \(\frac{dy}{du}\) and \(\frac{du}{dx}\) are treated as quotients, then \(du\) could be cancelled from the numerator and denominator to get \(\frac{dy}{dx}\text{.}\) However, recall that strictly speaking, \(\frac{dy}{du}\) is not the quotient of two quantities, but simply notation to represent the single quantity “the derivative of \(y\) with respect to \(u\)”.
Subsection 3.6.2 Chain Rule for More Than Two Functions
The chain rule can be naturally extended to 3 functions, or an arbitrary number of functions.
Corollary 3.6.2.
Let \(f, g, h\) be functions. If \(h\) is differentiable at \(x\text{,}\) \(g\) is differentiable at the point \(h(x)\text{,}\) and \(f\) is differentiable at \((g \circ h)(x)\text{,}\) then \(f \circ g \circ h\) is differentiable at \(x\text{,}\) and,
\begin{equation*}
\boxed{(f \circ g \circ h)'(x) = f'(g(h(x))) g'(h(x)) h'(x)}
\end{equation*}
Intuitively, start from the outside and make your way inside, like layers of an onion.
\begin{equation*}
(f \circ g)'(x) = \underbrace{f'(g(x))}_{\text{outer function}} \times \underbrace{g'(x)}_{\text{inner function}}
\end{equation*}
“The derivative of \(f\) of something is \(f'\) of that something, multiplied by the derivative of that something”. Also, note that the chain rule is in some sense a generalization of other derivative rules, as for \(g(x) = x\text{,}\) then \(f(g(x)) = f(x)\text{,}\) and the chain rule becomes,
\begin{equation*}
(f \circ g)'(x) = f'(g(x)) g'(x) = f'(x) \cdot 1 = f'(x)
\end{equation*}
as expected.
Subsection 3.6.3 Intuitive Proof of the Chain Rule
The chain rule is actually quite a subtle rule to prove, unlike some of the rules previously shown. First, we give an intuitive proof that works for almost all functions.
Proof using increments.
With Leibniz notation, let \(\Delta u = g(x + h) - g(x), \Delta y = f(x + h) - f(x)\text{.}\) Then,
\begin{align*}
\frac{dy}{dx} \amp = \lim_{\Delta x \to 0} \frac{\Delta y}{\Delta x}\\
\amp = \lim_{\Delta x \to 0} \frac{\Delta y}{\Delta u} \cdot \frac{\Delta u}{\Delta x} \amp\amp \text{multiplying and dividing by $\Delta u$, and rearranging}\\
\amp = \lim_{\Delta x \to 0} \frac{\Delta y}{\Delta u} \cdot \lim_{\Delta x \to 0} \frac{\Delta u}{\Delta x} \amp\amp \text{by the limit product rule}
\end{align*}
Then, the limit on the right is \(\frac{du}{dx} = g'(x)\text{.}\) Also, since \(y = f(u)\) is differentiable at \(u = g(x)\text{,}\) it is continuous, and so as \(\Delta x \to 0\text{,}\) \(\Delta u \to 0\) also. Thus,
\begin{align*}
\frac{dy}{dx} \amp = \lim_{\Delta u \to 0} \frac{\Delta y}{\Delta u} \cdot \lim_{\Delta x \to 0} \frac{\Delta u}{\Delta x}\\
\amp = \frac{dy}{du} \cdot \frac{du}{dx}
\end{align*}
Subsection 3.6.4 Combining the Chain Rule with Other Differentiation Rules
The chain rule can be “built-in” to other differentiation formulas, so that they apply to arbitrary functions of \(x\) rather than just \(x\text{.}\) For example, recall the power rule for derivatives,
\begin{equation*}
\frac{d}{dx} x^r = r x^{r-1}
\end{equation*}
Then, for an arbitrary function \(g(x)\text{,}\)
\begin{equation*}
\boxed{\frac{d}{dx} (g(x))^r = r (g(x))^{r-1} g'(x)}
\end{equation*}
Corollary 3.6.3. Generalized power rule.
If \(u = u(x)\) is differentiable, then,
\begin{equation*}
\boxed{\frac{d}{dx}(u(x))^n = n(u(x))^{n-1} u'(x)} \qquad \text{or} \qquad \boxed{\frac{d}{dx} u^n = n u^{n-1} \frac{du}{dx}}
\end{equation*}
The generalized power rule can be proved using the limit definition of derivative, independent of the chain rule. This can be done for integers and rational powers using similar techniques used to prove those derivatives.
Example 3.6.4.
For example, the derivative of \(f(x) = (u(x))^3\text{,}\) using the chain rule, is \(f'(x) = 3(u(x))^2 u'(x)\text{.}\) Using the limit definition of derivative,
\begin{align*}
f'(x) \amp = \lim_{h \to 0} \frac{(u(x + h))^3 - (u(x))^3}{h}\\
\amp = \lim_{h \to 0} \frac{(u(x + h) - u(x)) \brac{(u(x + h))^2 + u(x + h) u(x) + (u(x))^2}}{h}
\end{align*}
Using the limit product rule,
\begin{align*}
\amp = \lim_{h \to 0} \brac{(u(x + h))^2 + u(x + h) u(x) + (u(x))^2} \cdot \lim_{h \to 0} \frac{u(x + h) - u(x)}{h}\\
\amp = 3(u(x))^2 \cdot u'(x)
\end{align*}
as desired.