Section 2.4 Proof that the straight line is shortest
¶In the equation \(y=y(x)+a\eta(x)\) of the family of curves passing through the points 1 and 2 the function \(\eta(x)\) was entirely arbitrary except for the restrictions that it should be admissible and satisfy the relations \(\eta(x_1)=\eta(x_2)=0\text{,}\) and we have seen that the expression (2.2.3) for \(I'(0)\) must vanish for every such family. The lemma of the preceding section is therefore applicable and it tells us that along the minimizing arc \(E_{12}\) an equation
must hold, where \(C\) is a constant. If we solve this equation for \(y'\) we see that \(y'\) is also a constant along \(E_{12}\) and that the only possible minimizing arc is therefore a single straight-line segment without corners joining the point 1 with the point 2.
The property just deduced for the shortest arc has so far only been proved to be necessary for a minimum. We have not yet demonstrated conclusively that the straight-line segment \(E_{12}\) joining 1 and 2 is actually shorter than every other admissible arc joining these points. In order to actually establish this fact let us now use \(\eta(x)\) to denote the increment which must be added to the ordinate 1 of \(E_{12}\) at the value \(x\) in order to get the ordinate of an arbitrarily selected admissible arc \(C_{12}\) joining 1 with 2, so that the equation of \(C_{12}\) will be
Activity 2.4.1.
Okay, so the book is about to invoke some pretty deep results from Taylor series, and doesn't do much to explain them. Here's some stuff to help you figure out what's going on.-
Write down the first few terms of the Taylor series expansion for a generic function \(f(x)\) centered at \(x=x_0\text{.}\) What's \(x_0\text{?}\) What's \(x\text{?}\)
Solution\begin{equation*} f(x)=f(a) + f'(x_0)(x-a) + \frac{1}{2} f''(x_0) (x-x_0)^2 + \frac{1}{3!} f^{(3)}(x_0) (x-x_0)^3 + \ldots \end{equation*}In this expansion, \(x_0\) is some fixed value in the domain. Usually it's some "easy point" where we know a lot of information about \(f\) and its derivatives. For instance, if our function was \(f(x)=\sqrt{x}\text{,}\) some examples of "easy points" might be 36, 81, or 121.
\(x\text{,}\) on the other hand, is some honestly variable value in the domain. Usually it's some value close to \(x_0\text{,}\) but it's "harder" than \(x_0\text{.}\) Returning to the example of the function \(f(x)=\sqrt{x}\text{,}\) we might use the "easy point" \(x_0=36\) to help us figure out the value of the function at the harder point \(x=38\text{.}\)
-
Consider the difference between the lengths of \(C_{12}\) and \(E_{12}\text{:}\)
\begin{equation*} I\left(C_{12}\right)-I\left(E_{12}\right) = \int_{x_{1}}^{x_{1}}\left[f\left(y^{\prime}+\eta^{\prime}\right)-f\left(y^{\prime}\right)\right]\, dx \end{equation*}In this expression, we have \(f(y')\) and \(f(y'+\eta')\) running around. Which of these do you think might be like \(f(x)\) and which of these do you think might be like \(f(x_0)\) in the Taylor setup?
SolutionI think \(y'\) is going to be like \(x_0\text{,}\) because it's the "easy point" that we already know something about, and \(y'+\eta'\) is going to be the "nearby point" that's "harder."
-
You might be familiar with the idea of the Lagrange error bound, which describes how big the error in using the \(n\)th Taylor polynomial to calculate \(f(x)\) might be. There's a slightly different version of this same idea which gives a more exact value of the remainder (ie. the error):
\begin{equation*} R_n(x)=\frac{1}{(n+1)!} f^{(n+1)}(\xi) (x-x_0)^{n+1} \end{equation*}for some value \(\xi\) between \(x\) and \(x_0\text{.}\)
We're going to use this in the case where \(n=1\) -- that is, we're interested in the remainder after approximating \(f\) with just its linear approximation. Write out the formula above in the case where \(n=1\text{.}\)
Solution\begin{equation*} R_1(x)=\frac{1}{2} f''(\xi) (x-x_0)^2. \end{equation*} -
In this formula for \(R_1(x)\text{,}\) substitute in what you decided for \(x\) and \(x_0\) in step 2. Simplify a little.
Solution\begin{equation*} R_1(y'+\eta') = \frac{1}{2} f''(\xi) [(y'+\eta')-y']^2 = \frac{1}{2} f''(\xi) (\eta')^2. \end{equation*} Now we just have to think hard about \(\xi\text{,}\) which is supposed to be somewhere between \(x\) and \(x_0\text{.}\) In our case, that's somewhere between \(y'+\eta'\) and \(y'\text{.}\) The book is about to say that \(\xi = y'+\theta\cdot\eta'\text{,}\) where \(\theta\) is some number between 0 and 1. Why does this make sense?
The difference between the lengths of \(C_{12}\) and \(E_{12}\) can now be expressed with the help of Taylor's formula in the form
where \(I(C_{12})\) and \(I(E_{12})\) are the values of the integral \(I\) along the two arcs; \(f_{y'y'}\) is the second derivative of the function \(f\) with respect to \(y'\text{;}\) and \(\theta\) is the value between 0 and 1 introduced by Taylor's formula. The next to last integral vanishes since \(f_{y'}\) is a constant along \(E_{12}\) and since the difference \(\eta(x)\) of the ordinates of two arcs \(C_{12}\) and \(E_{12}\) with the same end-points must vanish at \(x_1\) and \(x_2\text{.}\) Furthermore the last integral is never negative since the second derivative
is always positive. We see therefore that \(I(C_{12})-I(E_{12})\) is greater than zero unless \(\eta'(x)\) vanishes identically, in which case \(\eta'(x)\) itself would have everywhere the constant value zero which it has at \(x_1\) and \(x_2\text{,}\) and \(C_{12}\) would coincide with \(E_{12}\text{.}\)
It has been proved therefore that the shortest arc from the point 1 to the point 2 is necessarily the straight-line segment joining those points, and that this segment is actually shorter than every other admissible arc with the same endpoints.
Remark 2.4.1. Necessary and sufficient conditions.
Here's the logic of what we just did: first we showed that if we had a minimizing arc \(E_{12}\text{,}\) then it was necessary for \(E_{12}\) to be a straight line -- that is, "if minimizing arc, then straight line."
Then we showed that if \(C_{12}\) was any other admissible arc (which we can think of as some variation of \(E_{12}\)), then it was definitely longer than \(E_{12}\text{,}\) unless the variation was 0. That is, we showed "if straight line, then minimizing arc" -- if we want \(E_{12}\) to be a minimizing arc, then it is sufficient for \(E_{12}\) to be a straight line.
For a little reminder about all of this "necessary" and "sufficient" business, please reference the earlier discussion in Remark 1.2.2.
One should notice the rôle which the positive sign of the derivative \(f_{y'y'}\) has played in the determination of the minimum property. If the sign of this derivative had been negative the difference \(I(C_{12})-I(E_{12})\) would have been negative and \(I(E_{12})\) would have been a maximum instead of a minimum. This is an analogue of the criterion mentioned in Section 1.2 for the simpler theory of maxima and minima of functions of a single variable.