Skip to main content

Section 2.2 A first necessary condition

Let it be granted that a particular admissible arc \(E_{12}\) with the equation

\begin{equation*} y=y(x)\quad (x_1\leq x\leq x_2) \end{equation*}

furnishes the solution of our problem, and let us then seek to find the properties which distinguish it from the other admissible arcs joining points 1 and 2. If we select arbitrarily an admissible function \(\eta(x)\) satisfying the conditions \(\eta(x_1) = \eta(x_2) = 0\text{,}\) the equation

\begin{equation} y=y(x)+a \eta(x)\quad (x_1\leq x\leq x_2),\label{eqn-variation}\tag{2.2.1} \end{equation}

involving the arbitrary constant \(a\text{,}\) represents a one-parameter family of curves which includes the arc \(E_{12}\) for the special value \(a=0\text{,}\) and all of the curves of the family pass through the end-points 1 and 2 of \(E_{12}\text{.}\)

Remark 2.2.1. Variations and wiggles.

Admissible functions like \(\eta(x)\) as described in (2.2.1) are called by other authors variations -- hence, the "calculus of variations". Think of starting with \(y(x)\) and "wiggling" it by \(\eta(x)\) -- I'm thus going to call anything that looks like \(y=y(x)+a \eta(x)\) a wiggle of \(y\text{.}\) That is, any member of this "one-parameter family of curves" is a wiggle of the original function \(y(x)\text{.}\)

We insist that \(\eta(x_1) = \eta(x_2) = 0\) so that the starting and ending points don't get wiggled away from where they're supposed to be, so that any other wiggle is some other admissible arc joining points 1 and 2. You can "scale up" the amount of wiggle by multiplying \(\eta(x)\) by some constant \(a\text{.}\) Note in particular that:

  • For any constant \(a\text{,}\) the function \(a\cdot\eta(x)\) is just a vertical stretch of \(\eta(x)\) by a factor of \(a\text{,}\) and is therefore itself a variation.

  • When \(a=0\text{,}\) the amount of wiggle is 0, and so the wiggle \(y(x)+a \eta(x)\) is just the original function \(y(x)\text{.}\)

The value of the integral \(I\) taken along an arc of the family depends upon the value of \(a\) and may be represented by the symbol

\begin{equation} I(a)=\int_{x_1}^{x_2} f(y'+a \eta')\, dx.\label{eqn-iofa}\tag{2.2.2} \end{equation}

Along the initial arc \(E_{12}\) the integral has the value \(I(0)\text{,}\) and if this is to be a minimum when compared with the values of the integral along all other admissible arcs joining 1 with 2 it must in particular be a minimum when compared with the values \(I(a)\) along the arcs of the family (2.2.1). Hence according to the criterion for a minimum of a function given in Section 1.2 we must have \(I'(0) = 0\text{.}\)

It should perhaps be emphasized here that the method of the calculus of variations, as it has been developed in the past, consists essentially of three parts; first, the deduction of necessary conditions which characterize a minimizing arc; second, the proof that these conditions, or others obtained from them by slight modifications, are sufficient to insure the minimum sought; and third, the search for an arc which satisfies the sufficient conditions. For the deduction of necessary conditions the value of the integral \(I\) along the minimizing arc can be compared with its values along any special admissible arcs which may be convenient for the purposes of the proof in question, for example along those of the family (2.2.1) described above, but the sufficiency proofs must be made with respect to all admissible arcs joining the points 1 and 2. The third part of the problem, the determination of an arc satisfying the sufficient conditions, is frequently the most difficult of all, and is the part for which fewest methods of a general character are known. For shortest-distance problems fortunately this determination is usually easy.

Activity 2.2.1.

This next result is a doozy, and it uses an important technique with which you're probably not especially familiar: differentiation under the integral sign  1 . Here's a little activity to walk you through what's going on here.

  1. Convince yourself that \(I(a)\) as given in (2.2.2) is indeed a function of \(a\text{,}\) and thus it's reasonable for us to compute \(\frac{dI}{da}\text{.}\) Also, convince yourself that \(y'+a\eta'\) is a function of both \(x\) and \(a\text{.}\)

  2. Here's where "differentiation under the integral sign" comes in: according to something called Leibniz's rule, as long as our functions are "nice enough" (which they are),

    \begin{equation*} \frac{d}{da} \int_{x_1}^{x_2} F(x, a)\, dx = \int_{x_1}^{x_2} \frac{\partial}{\partial a} F(x,a)\, dx. \end{equation*}

    Apply Leibniz's rule to write down an expression for \(\frac{dI}{da}\) in (2.2.2).

    Solution
    \begin{equation*} \frac{d}{da} \int_{x_1}^{x_2} f(y'+a\eta')\, dx = \int_{x_1}^{x_2} \frac{\partial}{\partial a} f(y'+a\eta')\, dx. \end{equation*}
  3. We're going to need the chain rule to deal with the integrand on the RHS. In particular, it'll be helpful for us to think about the chain rule in Leibniz notation (he's just popping up all over today!):

    \begin{equation*} \frac{\partial f}{\partial a} = \frac{\partial f}{\partial u} \cdot \frac{\partial u}{\partial a}. \end{equation*}

    Explain why this version of the chain rule is equivalent to the usual understanding: "first take the derivative of the outside stuff, leaving the inside stuff alone, then multiply by the derivative of the inside stuff."

  4. What's something good you can label as \(u\) in your expression for \(\frac{dI}{da}\text{?}\) What's \(\frac{\partial u}{\partial a}\text{,}\) and so what's \(\frac{\partial f}{\partial a}\text{?}\)

    Solution

    \(u\) should be the inside stuff, \(u=y'+a\eta'\text{.}\) Therefore,

    \begin{equation*} \frac{\partial u}{\partial a} = \eta' \textrm{, so } \frac{\partial f}{\partial a} = \frac{\partial f}{\partial u}\cdot \eta'. \end{equation*}
  5. We mostly care about \(I(a)\) and \(I'(a)\) when \(a=0\) -- that is, when there's zero variation on the original curve \(y(x)\text{.}\) If \(a=0\text{,}\) then what's \(u\text{?}\) Use this to rewrite your integrand \(\frac{\partial f}{\partial a}\text{.}\)

    Solution

    If \(a=0\text{,}\) then \(u = y'+0\cdot\eta' = y'\text{.}\) Therefore, we can rewrite our integrand to remove the convenience variable \(u\) that we kinda don't care about anyway:

    \begin{equation*} \frac{\partial f}{\partial a} = \frac{\partial f}{\partial u}\cdot \eta' = \frac{\partial f}{\partial y'}\cdot \eta'. \end{equation*}
  6. Conclude by writing down a final expression for \(I'(0)\text{.}\)

    Solution
    \begin{equation*} I'(0) = \int_{x_1}^{x_2} \frac{\partial}{\partial y'}f(y') \cdot \eta'\, dx. \end{equation*}
For more on this, see this interesting article and its references.

By differentiating the expression (2.2.2) with respect to \(a\) and then setting \(a=0\) the value of \(I'(0)\) is readily seen to be

\begin{equation} I'(0) = \int_{x_1}^{x_2} f_{y'} \eta'\,dx,\label{eqn-iprime0}\tag{2.2.3} \end{equation}

where for convenience we use the notation \(f_{y'}\) for the derivative of the integrand \(f(y')\) with respect to \(y'\text{.}\) It will always be understood that the argument in \(f\) and its derivatives is the function \(y'(x)\) belonging to the arc \(E_{12}\) unless some other is expressly indicated, as is done, for example, in the formula (2.2.2).

What now are the conclusions which can be drawn from the necessity of the condition \(I'(0)=0\text{?}\) The answer to this question is to be found in the lemma of the following section which will be frequently applied in later chapters as well as in the solution of the shortest-distance problems to which this chapter is devoted.