1
Experimental facts

V. I. Arnold¹

(1)

Department of Mathematics Steklov Mathematical Institute, Russian Academy of Sciences, GSP-1, 117966, Moscow, Russia

In this chapter we write down the basic experimental facts which lie at the foundation of mechanics: Galileo’s principle of relativity and Newton’s differential equation. We examine constraints on the equation of motion imposed by the relativity principle, and we mention some simple examples.

1 The principles of relativity and determinacy

In this paragraph we introduce and discuss the notion of an inertial coordinate system. The mathematical statements of this paragraph are formulated exactly in the next paragraph.

A series of experimental facts is at the basis of classical mechanics.² We list some of them.

A Space and time

Our space is three-dimensional and euclidean, and time is one-dimensional.

B Galileo’s principle of relativity

There exist coordinate systems (called inertial) possessing the following two properties:

All the laws of nature at all moments of time are the same in all inertial coordinate systems.

All coordinate systems in uniform rectilinear motion with respect to an inertial one are themselves inertial.

In other words, if a coordinate system attached to the earth is inertial, then an experimenter on a train which is moving uniformly in a straight line with respect to the earth cannot detect the motion of the train by experiments conducted entirely inside his car.

In reality, the coordinate system associated with the earth is only approximately inertial. Coordinate systems associated with the sun, the stars, etc. are more nearly inertial.

C Newton’s principle of determinacy

The initial state of a mechanical system (the totality of positions and velocities of its points at some moment of time) uniquely determines all of its motion.

It is hard to doubt this fact, since we learn it very early. One can imagine a world in which to determine the future of a system one must also know the acceleration at the initial moment, but experience shows us that our world is not like this.

2 The galilean group and Newton’s equations

In this paragraph we define and investigate the galilean group of space-time transformations. Then we consider Newton’s equation and the simplest constraints imposed on its right-hand side by the property of invariance with respect to galilean transformations.³

A Notation

We denote the set of all real numbers by ℝ. We denote by ℝⁿ an n-dimensional real vector space.

Affine n-dimensional space Aⁿ is distinguished from ℝⁿ in that there is “no fixed origin.” The group ℝⁿ acts on Aⁿ as the group of parallel displacements (Figure 1):

[Thus the sum of two points of Aⁿ is not defined, but their difference is defined and is a vector in ℝⁿ.]

Figure 1

Parallel displacement

A euclidean structure on the vector space ℝⁿ is a positive definite symmetric bilinear form called a scalar product. The scalar product enables one to define the distance

between points of the corresponding affine space Aⁿ. An affine space with this distance function is called a euclidean space and is denoted by Eⁿ.

B Galilean structure

The galilean space-time structure consists of the following three elements:

The universe—a four-dimensional affine⁴ space A⁴. The points of A⁴ are called world points or events. The parallel displacements of the universe A⁴ constitute a vector space ℝ⁴.

Time—a linear mapping t : ℝ⁴ → ℝ from the vector space of parallel displacements of the universe to the real “time axis.” The time interval from event a ∈ A⁴ to event b ∈ A⁴ is the number t(b − a) (Figure 2). If t(b − a) = 0, then the events a and b are called simultaneous.

Figure 2

Interval of time t

The set of events simultaneous with a given event forms a three-dimensional affine subspace in A⁴. It is called a space of simultaneous events A³.

The kernel of the mapping t consists of those parallel displacements of A⁴ which take some (and therefore every) event into an event simultaneous with it. This kernel is a three-dimensional linear subspace ℝ³ of the vector space ℝ⁴.

The galilean structure includes one further element.

The distance between simultaneous events

is given by a scalar product on the space ℝ³. This distance makes every space of simultaneous events into a three-dimensional euclidean space E³.

A space A⁴, equipped with a galilean space-time structure, is called a galilean space.

One can speak of two events occurring simultaneously in different places, but the expression “two non-simultaneous events a, b ∈ A⁴ occurring at one and the same place in three-dimensional space” has no meaning as long as we have not chosen a coordinate system.

The galilean group is the group of all transformations of a galilean space which preserve its structure. The elements of this group are called galilean transformations. Thus, galilean transformations are affine transformations of A⁴ which preserve intervals of time and the distance between simultaneous events.

Example. Consider the direct product⁵ ℝ × ℝ³ of the t axis with a three-dimensional vector space ℝ³; suppose ℝ³ has a fixed euclidean structure. Such a space has a natural galilean structure. We will call this space galilean coordinate space.

We mention three examples of galilean transformations of this space. First, uniform motion with velocity v:

Next, translation of the origin:

Finally, rotation of the coordinate axes:

where G : ℝ³ → ℝ³ is an orthogonal transformation.

Problem. Show that every galilean transformation of the space ℝ × ℝ³ can be written in a unique way as the composition of a rotation, a translation, and a uniform motion (ɡ = ɡ₁ ○ ɡ₂ ○ ɡ₃) (thus the dimension of the galilean group is equal to 3 + 4 + 3 = 10).

Problem. Show that all galilean spaces are isomorphic to each other⁶ and, in particular, isomorphic to the coordinate space ℝ × ℝ³.

Let M be a set. A one-to-one correspondence φ₁ : M → ℝ × ℝ³ is called a galilean coordinate system on the set M. A coordinate system φ₂ moves uniformly with respect to φ₁ if

is a galilean transformation. The galilean coordinate systems φ₁ and φ₂ give M the same galilean structure.

C Motion, velocity, acceleration

A motion in ℝ^N is a differentiable mapping x: I → ℝ^N, where I is an interval on the real axis.

The derivative

is called the velocity vector at the point t₀ ∈ I.

The second derivative

is called the acceleration vector at the point t_o.

We will assume that the functions we encounter are continuously differentiable as many times as necessary. In the future, unless otherwise stated, mappings, functions, etc. are understood to be differentiable mappings, functions, etc. The image of a mapping x: I → ℝ^N is called a trajectory or curve in ℝ^N.

Problem. Is it possible for the trajectory of a differentiable motion on the plane to have the shape drawn in Figure 3? Is it possible for the acceleration vector to have the value shown?

Figure 3

Trajectory of motion of a point

Answer. Yes. No.

We now define a mechanical system of n points moving in three-dimensional euclidean space.

Let x : ℝ → ℝ³ be a motion in ℝ³. The graph⁷ of this mapping is a curve in ℝ × ℝ³.

A curve in galilean space which appears in some (and therefore every) galilean coordinate system as the graph of a motion, is called a world line (Figure 4).

Figure 4

World lines

A motion of a system of n points gives, in galilean space, n world lines. In a galilean coordinate system they are described by n mappings x_i: ℝ → ℝ³, i = 1,..., n.

The direct product of n copies of ℝ³ is called the configuration space of the system of n points. Our n mappings x_i: ℝ → ℝ³ define one mapping

of the time axis into the configuration space. Such a mapping is also called a motion of a system of n points in the galilean coordinate system on ℝ × ℝ³.

D Newton’s equations

According to Newton’s principle of determinacy (Section 1C) all motions of a system are uniquely determined by their initial positions (x(t₀) ∈ ℝ^N) and initial velocities (ẋ(t₀) ∈ ℝ^N).

In particular, the initial positions and velocities determine the acceleration. In other words, there is a function F: ℝ^N × ℝ^N × ℝ→ ℝ^N such that

(1)

Newton used Equation (1) as the basis of mechanics. It is called Newton’s equation.

By the theorem of existence and uniqueness of solutions to ordinary differential equations, the function F and the initial conditions x(t₀) and ẋ(t₀) uniquely determine a motion.⁸

For each specific mechanical system the form of the function F is determined experimentally. From the mathematical point of view the form of F for each system constitutes the definition of that system.

E Constraints imposed by the principle of relativity

Galileo’s principle of relativity states that in physical space-time there is a selected galilean structure (“the class of inertial coordinate systems”) having the following property.

If we subject the world lines of all the points of any mechanical system⁹ to one and the same galilean transformation, we obtain world lines of the same system (with new initial conditions) (Figure 5).

Figure 5

Galileo’s principle of relativity

This imposes a series of conditions on the form of the right-hand side of Newton’s equation written in an inertial coordinate system: Equation (1) must be invariant with respect to the group of galilean transformations.

Example 1. Among the galilean transformations are the time translations. Invariance with respect to time translations means that “the laws of nature remain constant,” i.e., if x = φ(t) is a solution to Equation (1), then for any s ∈ ℝ, x = φ(t + s) is also a solution.

From this it follows that the right-hand side of Equation (1) in an inertial coordinate system does not depend on the time:

Remark. Differential equations in which the right-hand side does depend on time arise in the following situation.

Suppose that we are studying part I of the mechanical system I + II. Then the influence of part II on part I can sometimes be replaced by a time variation of parameters in the system of equations describing the motion of part I. For example, the influence of the moon on the earth can be ignored in investigating the majority of phenomena on the earth. However, in the study of the tides this influence must be taken into account; one can achieve this by introducing, instead of the attraction of the moon, periodic changes in the strength of gravity on earth.

Equations with variable coefficients can appear also as the result of formal operations in the solution of problems.

Example 2. Translations in three-dimensional space are galilean transformations. Invariance with respect to such translations means that space is homogeneous, or “has the same properties at all of its points.” That is, if x_i = φ_i(t)(i = 1,..., n) is a motion of a system of n points satisfying (1), then for any r ∈ ℝ³ the motion φ_i(t) + r (i = 1,..., n) also satisfies Equation (1).

From this it follows that the right-hand side of Equation (1) in the inertial coordinate system can depend only on the “relative coordinates” x_j − x_k.

From invariance under passage to a uniformly moving coordinate system (which does not change ẍ_i or x_j − x_k, but adds to each ẋ_j a fixed vector v) it follows that the right-hand side of Equation (1) in an inertial system of coordinates can depend only on the relative velocities

Example 3. Among the galilean transformations are the rotations in three-dimensional space. Invariance with respect to these rotations means that space is isotropic; there are no preferred directions.

Thus, if φ_i: ℝ→ ℝ³(i = 1,..., n) is a motion of a system of points satisfying (1), and G: ℝ³ → ℝ³ is an orthogonal transformation, then the motion Gφ_i: ℝ→ ℝ³(i,..., n) also satisfies (1). In other words.

where Gx denotes (Gx₁,...,Gx_n), x_i ∈ ℝ³.

Problem. Show that if a mechanical system consists of only one point, then its acceleration in an inertial coordinate system is equal to zero (“Newton’s first law”).

Hint. By Examples 1 and 2 the acceleration vector does not depend on x, ẋ, or t, and by Example 3 the vector F is invariant with respect to rotation.

Problem. A mechanical system consists of two points. At the initial moment their velocities (in some inertial coordinate system) are equal to zero. Show that the points will stay on the line which connected them at the initial moment.

Problem. A mechanical system consists of three points. At the initial moment their velocities (in some inertial coordinate system) are equal to zero. Show that the points always remain in the plane which contained them at the initial moment.

Problem. A mechanical system consists of two points. Show that for any initial conditions there exists an inertial coordinate system in which the two points remain in a fixed plane.

Problem. Show that mechanics “through the looking glass” is identical to ours.

Hint. In the galilean group there is a reflection transformation, changing the orientation of ℝ³.

Problem. Is the class of inertial systems unique?

Answer. No. Other classes can be obtained if one changes the units of length and time or the direction of time.

3 Examples of mechanical systems

We have already remarked that the form of the function F in Newton’s equation (1) is determined experimentally for each mechanical system. Here are several examples.

In examining concrete systems it is reasonable not to include all the objects of the universe in a system. For example, in studying the majority of phenomena taking place on the earth we can ignore the influence of the moon. Furthermore, it is usually possible to disregard the effect of the processes we are studying on the motion of the earth itself; we may even consider a coordinate system attached to the earth as “fixed.” It is clear that the principle of relativity no longer imposes the constraints found in Section 2 for equations of motion written in such a coordinate system. For example, near the earth there is a distinguished direction, the vertical.

A Example 1: A stone falling to the earth

Experiments show that

* (2)

where x is the height of a stone above the surface of the earth.

If we introduce the “potential energy” U = ɡx, then Equation (2) can be written in the form

If U : E^N → ℝ is a differentiable function on euclidean space, then we will denote by ∂U/∂x the gradient of the function U. If E^N = E^n₁ × ⋯ × E^nk is a direct product of euclidean spaces, then we will denote a point x ∈ E^N by (x₁,..., x_k), and the vector ∂U/∂x by (∂U/∂x₁,..., ∂U/∂x_k). In particular, if x₁,..., x_N are cartesian coordinates in E^N, then the components of the vector ∂U/∂x are the partial derivatives ∂U/∂x₁,..., ∂U/∂x_N.

Experiments show that the radius vector of the stone with respect to some point 0 on the earth satisfies the equation

(3)

The vector in the right-hand side is directed towards the earth. It is called the gravitational acceleration vector g. (Figure 6.)

Figure 6

A stone falling to the earth

B Example 2: Falling from great height

Like all experimental facts, the law of motion (2) has a restricted domain of application. According to a more precise law of falling bodies, discovered by Newton, acceleration is inversely proportional to the square of the distance from the center of the earth:

where r = r₀ + x (Figure 7).

Figure 7

The earth’s gravitational field

This equation can also be written in the form (3), if we introduce the potential energy

(3)

inversely proportional to the distance to the center of the earth.

Problem. Determine with what velocity a stone must be thrown in order that it fly infinitely far from the surface of the earth.¹⁰

Answer. ≥ 11.2 km/sec.

C Example 3: Motion of a weight along a line under the action of a spring

Experiments show that under small extensions of the spring the equation of motion of the weight will be (Figure 8)

Figure 8

Weight on a spring

This equation can also be written in the form (3) if we introduce the potential energy

If we replace our one weight by two weights, then it turns out that, under the same extension of the spring, the acceleration is half as large.

It is experimentally established that for any two bodies the ratio of the accelerations ẍ₁/ẍ₂ under the same extension of a spring is fixed (does not depend on the extent of extension of the spring or on its characteristics, but only on the bodies themselves). The value inverse to this ratio is by definition the ratio of masses:

For a unit of mass we take the mass of some fixed body, e.g., one liter of water. We know by experience that the masses of all bodies are positive. The product of mass times acceleration mẍ does not depend on the body, and is a characteristic of the extension of the spring. This value is called the force of the spring acting on the body.

As a unit of force, we take the “newton.” If one liter of water is suspended on a spring at the surface of the earth, the spring acts with a force of 9.8 newtons (= 1 kg).

D Example 4: Conservative systems

Let E³ⁿ = E³ × ⋯ × E³ be the configuration space of a system of n points in the euclidean space E³. Let U : E³ⁿ → ℝ be a differentiable function and let m₁,..., m_n be positive numbers.

Definition. The motion of n points, of masses m₁,..., m_n, in the potential field with potential energy U is given by the system of differential equations

(4)

The equations of motion in Examples 1 to 3 have this form. The equations of motion of many other mechanical systems can be written in the same form. For example, the three-body problem of celestial mechanics is problem (4) in which

Many different equations of entirely different origin can be reduced to form (4), for example the equations of electrical oscillations. In the following chapter we will study mainly systems of differential equations in the form (4).

In this and other sections, the mass of a particle is taken to be 1.

All these “experimental facts” are only approximately true and can be refuted by more exact experiments. In order to avoid cumbersome expressions, we will not specify this from now on and we will speak of our mathematical models as if they exactly described physical phenomena.

The reader who has no need for the mathematical formulation of the assertions of Section 1 can omit this section.

⁴

Formerly, the universe was provided not with an affine, but with a linear structure (the geocentric system of the universe).

⁵

Recall that the direct product of two sets A and B is the set of ordered pairs (a, b), where a ∈ A and b ∈ B. The direct product of two spaces (vector, affine, euclidean) has the structure of a space of the same type.

⁶

That is, there is a one-to-one mapping of one to the other preserving the galilean structure.

⁷

The graph of a mapping f: A → B is the subset of the direct product A × B consisting of all pairs (a, f(a)) with a ∈ A.

⁸

Under certain smoothness conditions, which we assume to be fulfilled. In general, a motion is determined by Equation (1) only on some interval of the time axis. For simplicity we will assume that this interval is the whole time axis, as is the case in most problems in mechanics.

⁹

In formulating the principle of relativity we must keep in mind that it is relevant only to closed physical (in particular, mechanical) systems, i.e., that we must include in the system all bodies whose interactions play a role in the study of the given phenomena. Strictly speaking, we should include in the system all bodies in the universe. But we know from experience that one can disregard the effect of many of them: for example, in studying the motion of planets around the sun we can disregard the attractions among the stars, etc.

On the other hand, in the study of a body in the vicinity of earth, the system is not closed if the earth is not included; in the study of the motion of an airplane the system is not closed if it does not include the air surrounding the airplane, etc. In the future, the term “mechanical system” will mean a closed system in most cases, and when there is a non-closed system in question this will be explicitly stated (cf., for example, Section 3).

¹⁰

This is the so-called second cosmic velocity v₂. Our equation does not take into account the attraction of the sun. The attraction of the sun will not let the stone escape from the solar system if the velocity of the stone with respect to the earth is less than 16.6 km/sec.

2
Investigation of the equations of motion

V. I. Arnold¹

(1)

Department of Mathematics Steklov Mathematical Institute, Russian Academy of Sciences, GSP-1, 117966, Moscow, Russia

In most cases (for example, in the three-body problem) we can neither solve the system of differential equations nor completely describe the behavior of the solutions. In this chapter we consider a few simple but important problems for which Newton’s equations can be solved.

4 Systems with one degree of freedom

In this paragraph we study the phase flow of the differential equation (1). A look at the graph of the potential energy is enough for a qualitative analysis of such an equation. In addition, Equation (1) is integrated by quadratures.

A Definitions

A system with one degree of freedom is a system described by one differential equation

(1)

The kinetic energy is the quadratic form*

The potential energy is the function

The sign in this formula is taken so that the potential energy of a stone is larger if the stone is higher off the ground.

Notice that the potential energy determines f. Therefore, to specify a system of the form (1) it is enough to give the potential energy. Adding a constant to the potential energy does not change the equation of motion (1).

The total energy is the sum

In general, the total energy is a function,

, of x and ẋ.

Theorem (The law of conservation of energy). The total energy of points moving according to the equation (1) is conserved:

is independent of t.

Proof.

□

B Phase flow

Equation (1) is equivalent to the system of two equations:

(2)

We consider the plane with coordinates x and y, which we call the phase plane of Equation (1). The points of the phase plane are called phase points. The right-hand side of (2) determines a vector field on the phase plane, called the phase velocity vector field.

A solution of (2) is a motion φ: ℝ → ℝ² of a phase point in the phase plane, such that the velocity of the moving point at each moment of time is equal to the phase velocity vector at the location of the phase point at that moment.¹¹

The image of φ is called the phase curve. Thus the phase curve is given by the parametric equations

Problem. Show that through every phase point there is one and only one phase curve.

Hint. Refer to a textbook on ordinary differential equations.

We notice that a phase curve could consist of only one point. Such a point is called an equilibrium position. The vector of phase velocity at an equilibrium position is zero.

The law of conservation of energy allows one to find the phase curves easily. On each phase curve the value of the total energy is constant. Therefore, each phase curve lies entirely in one energy level set E(x, y) = h.

C Examples

Example 1. The basic equation of the theory of oscillations is

In this case (Figure 9) we have:

The energy level sets are the concentric circles and the origin. The phase velocity vector at the phase point (x, y) has components (y, − x). It is perpendicular to the radius vector and equal to it in magnitude. Therefore, the motion of the phase point in the phase plane is a uniform motion around 0: x = r₀ cos(φ₀ − t), y = r₀ sin(φ₀ − t). Each energy level set is a phase curve.

Figure 9

Phase plane of the equation

Example 2. Suppose that a potential energy is given by the graph in Figure 10. We will draw the energy level sets

. For this, the following facts are helpful.

Figure 10

Potential energy and phase curves

Any equilibrium position of (2) must lie on the x axis of the phase plane. The point x = ξ, y = 0 is an equilibrium position if ξ is a critical point of the potential energy, i.e., if (∂U/∂x) | _{x = ξ} = 0.

Each level set is a smooth curve in a neighborhood of each of its points which is not an equilibrium position (this follows from the implicit function theorem). In particular, if the number E is not a critical value of the potential energy (i.e., is not the value of the potential energy at one of its critical points), then the level set on which the energy is equal to E is a smooth curve.

It follows that in order to study the energy level curve, we should turn our attention to the critical and near-critical values of E. It is convenient here to imagine a little ball rolling in the potential well U.

For example, consider the following argument: “Kinetic energy is nonnegative. This means that potential energy is less than or equal to the total energy. The smaller the potential energy, the greater the velocity.” This translates to: “The ball cannot jump out of the potential well, rising higher than the level determined by its initial energy. As it falls into the well, the ball gains velocity.” We also notice that the local maximum points of the potential energy are unstable, but the minimum points are stable equilibrium positions.

Problem. Prove this.

Problem. How many phase curves make up the separatrix (figure eight) curve, corresponding to the level E₂?

Answer. Three.

Problem. Determine the duration of motion along the separatrix.

Answer. It follows from the uniqueness theorem that the time is infinite.

Problem. Show that the time it takes to go from x₁ to x₂ (in one direction) is equal to

Problem. Draw the phase curves, given the potential energy graphs in Figure 11.

Figure 11

Potential energy

Answer. Figure 12.

Figure 12

Phase curves

Problem. Draw the phase curves for the “equation of an ideal planar pendulum”:

Problem. Draw the phase curves for the “equation of a pendulum on a rotating axis”:

Remark. In these two problems x denotes the angle of displacement of the pendulum. The phase points whose coordinates differ by 2π correspond to the same position of the pendulum. Therefore, in addition to the phase plane, it is natural to look at the phase cylinder {x(mod 2π), y}.

Problem. Find the tangent lines to the branches of the critical level corresponding to maximal potential energy E = U(ξ) (Figure 13).

Figure 13

Critical energy level lines

Answer.

Problem. Let S(E) be the area enclosed by the closed phase curve corresponding to the energy level E. Show that the period of motion along this curve is equal to

Problem. Let E₀ be the value of the potential function at a minimum point ξ. Find the period T₀ = lim_{E → E₀} T(E) of small oscillations in a neighborhood of the point ξ.

Answer.

Problem. Consider a periodic motion along the closed phase curve corresponding to the energy level E. Is it stable in the sense of Liapunov.¹²

Answer. No.¹³

D Phase flow

Let M be a point in the phase plane. We look at the solution to system (2) whose initial conditions at t = 0 are represented by the point M. We assume that any solution of the system can be extended to the whole time axis. The value of our solution at any value of t depends on M. We denote the resulting phase point (Figure 14) by

Figure 14

Phase flow

In this way we have defined a mapping of the phase plane to itself, g^t: ℝ² → ℝ². By theorems in the theory of ordinary differential equations, the mapping g^t is a diffeomorphism (a one-to-one differentiable mapping with a differentiable inverse). The diffeomorphisms g^t, t ∈ ℝ, form a group: g^{t + s} = g^t ∘ g^s. The mapping g⁰ is the identity (g⁰ M = M), and g^{− t} is the inverse of g^t. The mapping g: ℝ × ℝ² → ℝ², defined by g(t, M) = g^t M is differentiable. All these properties together are expressed by saying that the transformations g^t form a one-parameter group of diffeomorphisms of the phase plane. This group is also called the phase flow, given by system (2) (or Equation (1)).

Example. The phase flow given by the equation

is the group g^t of rotations of the phase plane through angle t around the origin.

Problem. Show that the system with potential energy U = −x⁴ does not define a phase flow.

Problem. Show that if the potential energy is positive, then there is a phase flow.

Hint. Use the law of conservation of energy to show that a solution can be extended without bound.

Problem. Draw the image of the circle

under the action of a transformation of the phase flow for the equations (a) of the “inverse pendulum,”

and (b) of the “nonlinear pendulum,”

Answer. Figure 15.

Figure 15

Action of the phase flow on a circle

5 Systems with two degrees of freedom

Analyzing a general potential system with two degrees of freedom is beyond the capability of modern science. In this paragraph we look at the simplest examples.

A Definitions

By a system with two degrees of freedom we will mean a system defined by the differential equations

(1)

where f is a vector field on the plane.

A system is said to be conservative if there exists a function U: E² → ℝsuch that f = −∂U/∂x. The equation of motion of a conservative system then has the form¹⁴

B The law of conservation of energy

Theorem. The total energy of a conservative system is conserved, i.e.,

Proof.

by the equation of motion.

Corollary. If at the initial moment the total energy is equal to E, then all trajectories lie in the region where U(x) ≤ E, i.e., a point remains inside the potential well U(x₁, x₂) ≤ E for all time.

Remark. In a system with one degree of freedom it is always possible to introduce the potential energy

For a system with two degrees of freedom this is not so.

Problem. Find an example of a system of the form

, which is not conservative.

C Phase space

The equation of motion (1) can be written as the system:

(2)

The phase space of a system with two degrees of freedom is the four-dimensional space with coordinates x₁, x₂, y₁, and y₂.

The system (2) defines the phase velocity vector field in four space as well as¹⁵ the phase flow of the system (a one-parameter group of diffeomorphisms of four-dimensional phase space). The phase curves of (2) are subsets of four-dimensional phase space. All of phase space is partitioned into phase curves. Projecting the phase curves from four space to the x₁, x₂ plane gives the trajectories of our moving point in the x₁, x₂ plane. These trajectories are also called orbits. Orbits can have points of intersection even when the phase curves do not intersect one another. The equation of the law of conservation of energy

defines a three-dimensional hypersurface in four space: E(x₁, x₂, y₁, y₂) = E₀; this surface, π_E₀, remains invariant under the phase flow: g^tπ_E₀ = π_E₀. One could say that the phase flow flows along the energy level hypersurfaces. The phase velocity vector field is tangent at every point to π_E₀. Therefore, π_E₀ is entirely composed of phase curves (Figure 16).

Figure 16

Energy level surface and phase curves

Example 1 (“small oscillations of a spherical pendulum”). Let

. The level sets of the potential energy in the x₁, x₂ plane will be concentric circles (Figure 17).

Figure 17

Potential energy level curves for a spherical pendulum

The equations of motion,

, are equivalent to the system

This system decomposes into two independent ones; in other words, each of the coordinates x₁ and x₂ changes with time in the same way as in a system with one degree of freedom.

A solution has the form

It follows from the law of conservation of energy that

i.e., the level surface π_E₀ is a sphere in four space.

Problem. Show that the phase curves are great circles of this sphere. (A great circle is the intersection of a sphere with a two-dimensional plane passing through its center.)

Problem. Show that the set of phase curves on the surface π_{E_o} forms a two-dimensional sphere. The formula w = (x₁ + iy₁)/(x₂ + iy₂) gives the “Hopf map” from the three sphere π_E₀ to the two sphere (the complex w-plane completed by the point at infinity). Our phase curves are the pre-images of points under the Hopf map.

Problem. Find the projection of the phase curves on the x₁, x₂ plane (i.e., draw the orbits of the motion of a point).

Example 2 (“Lissajous figures”). We look at one more example of a planar motion (“small oscillations with two degrees of freedom”):

The potential energy is

From the law of conservation of energy it follows that, if at the initial moment of time the total energy is

then all motions will take place inside the ellipse U(x₁, x₂) ≤ E.

Our system consists of two independent one-dimensional systems. Therefore, the law of conservation of energy is satisfied for each of them separately, i.e., the following quantities are preserved

Consequently, the variable x₁ is bounded by the region

, and x₂ oscillates within the region |x₂| ≤ A₂. The intersection of these two regions defines a rectangle which contains the orbits (Figure 18).

Figure 18

The regions U ≤ E, U₁ ≤ E and U₂ ≤ E

Problem. Show that this rectangle is inscribed in the ellipse U ≤ E.

The general solution of our equations is x₁ = A₁ sin(t + φ₁), x₂ = A₂ sin(ωt + φ₂); a moving point independently performs an oscillation with frequency 1 and amplitude A₁ along the horizontal and an oscillation with frequency ω and amplitude A₂ along the vertical.

Consider the following method of describing an orbit in the x₁, x₂ plane. We look at a cylinder with base 2A₁ and a band of width 2A₂. We draw on the band a sine wave with period 2πA₁/ω and amplitude A₂ and wind the band onto the cylinder (Figure 19). The orthogonal projection of the sinusoid wound around the cylinder onto the x₁, x₂ plane gives the desired orbit, called a Lissajous figure.

Figure 19

Construction of a Lissajous figure

Lissajous figures can conveniently be seen on an oscilloscope which displays independent harmonic oscillations on the horizontal and vertical axes.

The form of a Lissajous figure very strongly depends on the frequency ω. If ω = 1 (the spherical pendulum of Example 1), then the curve on the cylinder is an ellipse. The projection of this ellipse onto the x₁, x₂ plane depends on the difference φ₂ − φ₁ between the phases. For φ₁ = φ₂ we get a segment of the diagonal of the rectangle; for small φ₂ − φ₁ we get an ellipse close to the diagonal and inscribed in the rectangle. For φ₂ − φ₁ = π/2 we get an ellipse with major axes x₁, x₂; as φ₂ − φ₁ increases from π/2 to π the ellipse collapses onto the second diagonal; as φ₂ − φ₁ increases further the whole process is repeated from the beginning (Figure 20).

Figure 20

Series of Lissajous figures with ω = 1

Now let the frequencies be only approximately equal: ω ≈ 1. The segment of the curve corresponding to 0 ≤ t ≤ 2π is very close to an ellipse. The next loop also reminds one of an ellipse, but here the phase shift φ₂ − φ₁ is greater than in the original by 2π(ω − 1). Therefore, the Lissajous curve with ω ≈ 1 is a distorted ellipse, slowly progressing through all phases from collapsed onto one diagonal to collapsed onto the other (Figure 21).

Figure 21

Lissajous figure with ω ≈ 1

If one of the frequencies is twice the other (ω = 2), then for some particular phase shift the Lissajous figure becomes a doubly traversed arc (Figure 22).

Figure 22

Lissajous figure with ω = 2

Problem. Show that this curve is a parabola. By increasing the phase shift φ₂ − φ₁ we get in turn the curves in Fig. 23.

Figure 23

Series of Lissajous figures with ω = 2

In general, if one of the frequencies is n times bigger than the other (ω = n), then among the graphs of the corresponding Lissajous figures there is the graph of a polynomial of degree n (Figure 24); this polynomial is called a Chebyshev polynomial.

Figure 24

Chebyshev polynomials

Problem. Show that if ω = m/n, then the Lissajous figure is a closed algebraic curve; but if ω is irrational, then the Lissajous figure fills the rectangle everywhere densely. What does the corresponding phase trajectory fill out?

6 Conservative force fields

In this section we study the connection between work and potential energy.

A Work of a force field along a path

Recall the definition of the work by a force F on a path S. The work of the constant force F (for example, the force with which we lift up a load) on the path

is, by definition, the scalar product (Figure 25)

Figure 25

Work of the constant force F along the straight path S

Suppose we are given a vector field F and a curve l of finite length. We approximate the curve l by a polygonal line with components ΔS_i and denote by F_i the value of the force at some particular point of ΔS_i; then the work of the field F on the path l is by definition (Figure 26)

In analysis courses it is proved that if the field is continuous and the path rectifiable, then the limit exists. It is denoted by ∫_l (F, dS).

Figure 26

Work of the force field F along the path l

B Conditions for a field to be conservative

Theorem. A vector field F is conservative if and only if its work along any path M₁ M₂ depends only on the endpoints of the path, and not on the shape of the path.

Proof. Suppose that the work of a field F does not depend on the path. Then

is well defined as a function of the point M. It is easy to verify that

i.e., the field is conservative and U is its potential energy. Of course, the potential energy is defined only up to the additive constant U(M₀), which can be chosen arbitrarily.

Conversely, suppose that the field F is conservative and that U is its potential energy. Then it is easily verified that

i.e., the work does not depend on the shape of the path.

Problem. Show that the vector field F₁ = x₂, F₂ = −x₁ is not conservative (Figure 27).

Figure 27

A non-potential field

Problem. Is the field in the plane minus the origin given by

conservative? Show that a field is conservative if and only if its work along any closed contour is equal to zero.

C Central fields

Definition. A vector field in the plane E² is called central with center at 0, if it is invariant with respect to the group of motions¹⁶ of the plane which fix 0.

Problem. Show that all vectors of a central field lie on rays through 0, and that the magnitude of the vector field at a point depends only on the distance from the point to the center of the field.

It is also useful to look at central fields which are not defined at the point 0.

Example. The newtonian field F = −k(r/|r|³) is central, but the field in the problem in Section 6B is not.

Theorem. Every central field is conservative, and its potential energy depends only on the distance to the center of the field, U = U(r).

Proof. According to the previous problem, we may set F(r) = Φ(r)e_r, where r is the radius vector with respect to 0, r is its length and the unit vector e_r = r/|r| its direction. Then

and this integral is obviously independent of the path. □

Problem. Compute the potential energy of the newtonian field.

Remark. The definitions and theorems of this paragraph can be directly carried over to a euclidean space Eⁿ of any dimension.

7 Angular momentum

We will see later that the invariance of an equation of a mechanical problem with respect to some group of transformations always implies a conservation law. A central field is invariant with respect to the group of rotations. The corresponding first integral is called the angular momentum.

Definition. The motion of a material point (with unit mass) in a central field on a plane is defined by the equation

where r is the radius vector beginning at the center of the field 0, r is its length, and e_r its direction. We will think of our plane as lying in three-dimensional oriented euclidean space.

Definition. The angular momentum of a material point of unit mass relative to the point 0 is the vector product

The vector M is perpendicular to our plane and is given by one number: M = Mn, where n = [e₁, e₂] is the normal vector, e₁ and e₂ being an oriented frame in the plane (Figure 28).

Figure 28

Angular momentum

Remark. In general, the moment of a vector a “applied at the point r” relative to the point 0 is [r, a]; for example, in a school statics course one studies the moment of force. [The literal translation of the Russian term for angular momentum is “kinetic moment.” (Trans. note)]

A The law of conservation of angular momentum

Lemma. Let a and b be two vectors changing with time in the oriented euclidean space ℝ³. Then

Proof. This follows from the definition of derivative. □

Theorem (The law of conservation of angular momentum). Under motions in a central field, the angular momentum M relative to the center of the field 0 does not change with time.

Proof. By definition

. By the lemma,

. Since the field is central it is apparent from the equations of motion that the vectors

and r are collinear. Therefore

. □

B Kepler’s law

The law of conservation of angular momentum was first discovered by Kepler through observation of the motion of Mars. Kepler formulated this law in a slightly different way.

We introduce polar coordinates r, φ on our plane with pole at the center of the field 0. We consider, at the point r with coordinates (|r| = r, φ), two unit vectors: e_r directed along the radius vector so that

and e_φ, perpendicular to it in the direction of increasing φ. We express the velocity vector

in terms of the basis e_r, e_φ (Figure 29).

Figure 29

Decomposition of the vector ṙ in terms of the basis e_r, e_φ

Lemma. We have the relation

Proof. Clearly, the vectors e_r and e_φ rotate with angular velocity

, i.e.,

Differentiating the equality r = re_r gives us

Consequently, the angular momentum is

Thus, the quantity

is preserved. This quantity has a simple geometric meaning.

Kepler called the rate of change of the area S(t) swept out by the radius vector the sectorial velocity C (Figure 30):

The law discovered by Kepler through observation of the motion of the planets says: in equal times the radius vector sweeps out equal areas, so that the sectorial velocity is constant, dS/dt = const. This is one formulation of the law of conservation of angular momentum. Since

this means that the sectorial velocity

is half the angular momentum of our point of mass 1, and therefore constant.

Figure 30

Sectorial velocity

Example. Some satellites have very elongated orbits. By Kepler’s law such a satellite spends most of its time in the distant part of its orbit, where the magnitude of

is small.

8 Investigation of motion in a central field

The law of conservation of angular momentum lets us reduce problems about motion in a central field to problems with one degree of freedom. Thanks to this, motion in a central field can be completely determined.

A Reduction to a one-dimensional problem

We look at the motion of a point (of mass 1) in a central field on the plane:

It is natural to use polar coordinates r, φ.

By the law of conservation of angular momentum the quantity

is constant (independent of t).

Theorem. For the motion of a material point of unit mass in a central field the distance from the center of the field varies in the same way as r varies in the one-dimensional problem with potential energy

Proof. Differentiating the relation shown in Section 7

, we find

Since the field is central,

Therefore the equation of motion in polar coordinates takes the form

But, by the law of conservation of angular momentum,

where M is a constant independent of t, determined by the initial conditions. Therefore,

The quantity V(r) is called the effective potential energy. □

Remark. The total energy in the derived one-dimensional problem

is the same as the total energy in the original problem

since

B Integration of the equation of motion

The total energy in the derived one-dimensional problem is conserved. Consequently, the dependence of r on t is defined by the quadrature

Since

, and the equation of the orbit in polar coordinates is found by quadrature,

C Investigation of the orbit

We fix the value of the angular momentum at M. The variation of r with time is easy to visualize, if one draws the graph of the effective potential energy V(r) (Figure 31).

Figure 31

Graph of the effective potential energy

Let E be the value of the total energy. All orbits corresponding to the given E and M lie in the region V(r) ≤ E. On the boundary of this region, V = E, i.e., ṙ = 0. Therefore, the velocity of the moving point, in general, is not equal to zero since

for M ≠ 0.

The inequality V(r) ≤ E gives one or several annular regions in the plane:

If 0 ≤ r_min < r_max < ∞, then the motion is bounded and takes place inside the ring between the circles of radius r_min and r_max.

The shape of an orbit is shown in Figure 32. The angle φ varies monotonically while r oscillates periodically between r_min and r_max. The points where r = r_min are called pericentral, and where r = r_max, apocentral (if the center is the earth—perigee and apogee; if it is the sun—perihelion and aphelion; if it is the moon—perilune and apolune).

Figure 32

Orbit of a point in a central field

Each of the rays leading from the center to the apocenter or to the pericenter is an axis of symmetry of the orbit.

In general, the orbit is not closed: the angle between the successive pericenters and apocenters is given by the integral

The angle between two successive pericenters is twice as big.

The orbit is closed if the angle Φ is commensurable with 2π, i.e., if Φ = 2π(m/n), where m and n are integers.

It can be shown that if the angle Φ is not commensurable with 2π, then the orbit is everywhere dense in the annulus (Figure 33).

Figure 33

Orbit dense in an annulus

If r_min = r_max, i.e., E is the value of V at a minimum point, then the annulus degenerates to a circle, which is also the orbit.

Problem. For which values of α is motion along a circular orbit in the field with potential energy U = r^α, −2 ≤ α < ∞, Liapunov stable?

Answer. Only for α = 2.

For values of E a little larger than the minimum of V the annulus r_min ≤ r ≤ r_max will be very narrow, and the orbit will be close to a circle. In the corresponding one-dimensional problem, r will perform small oscillations close to the minimum point of V.

Problem. Find the angle Φ for an orbit close to the circle of radius r.

Hint. Cf. Section D below.

We now look at the case r_max = ∞. If lim_r→∞ U(r) = lim_r→∞ V(r) = U_∞ < ∞, then it is possible for orbits to go off to infinity. If the initial energy E is larger than U, then the point goes to infinity with finite velocity

. We notice that if U(r) approaches its limit slower than r⁻², then the effective potential V will be attracting at infinity (here we assume that the potential U is attracting at infinity).

If, as r → 0, |U(r)| does not grow faster than M²/2r², then r_min > 0 and the orbit never approaches the center. If, however, U(r) + (M²/2r²) → −∞ as r → 0, then it is possible to “fall into the center of the field.” Falling into the center of the field is possible even in finite time (for example, in the field U(r) = −1/r³).

Problem. Examine the shape of an orbit in the case when the total energy is equal to the value of the effective energy V at a local maximum point.

D Central fields in which all bounded orbits are closed

It follows from the following sequence of problems that there are only two cases in which all the bounded orbits in a central field are closed, namely,

and

Problem 1. Show that the angle Φ between the pericenter and apocenter is equal to the semiperiod of an oscillation in the one-dimensional system with potential energy W(x) = U(M/x) + (x²/2).

Hint. The substitution x = M/r gives

Problem 2. Find the angle Φ for an orbit close to the circle of radius r.

Answer.

Problem 3. For which values of U is the magnitude of Φ_cir independent of the radius r?

Answer. U(r) = ar^α (α ≥ −2, α ≠ 0) and U(r) = b log r.

It follows that

(the logarithmic case corresponds to α = 0)). For example, for α = 2 we have Φ_cir = π/2, and for α = − 1 we have Φ_cir = π.

Problem 4. Let in the situation of problem 3 U(r) → ∞ as r → ∞. Find lim_E→∞Φ(E, M).

Answer. π/2.

Hint. The substitution x = yx_max reduces Φ to the form

As E → ∞ we have x_max → ∞ and y_min → 0, and the second term in W* can be discarded.

Problem 5. Let U(r) = −kr^−β, 0 < β < 2. Find Φ₀ = lim_{E→ −0}Φ.

Answer.

. Note that Φ₀ does not depend on M.

Problem 6. Find all central fields in which bounded orbits exist and are all closed.

Answer. U = ar² or U = −k/r.

Solution. If all bounded orbits are closed, then, in particular, Φ_cir = 2π(m/n) = const. According to Problem 3, U = ar^α(α ≥ − 2), or U = b ln r (α = 0). In both cases

. If α > 0, then according to Problem 4, lim_E→∞Φ(E, M) = π/2. Therefore, Φ_cir = π/2, α = 2. If α < 0, then according to Problem 5, lim_{E→ − ∞} Φ(E, M) = π/(2 + α). Therefore,

, α = −1. In the case α = 0 we find

, which is not commensurable with 2π. Therefore, all bounded orbits can be closed only in fields where U = ar² or U = −k/r. In the field U = ar², a > 0, all the orbits are closed (these are ellipses with center at 0, cf. Example 1, Section 5). In the field U = −k/r all bounded orbits are also closed and also elliptical, as we will now show.

E Kepler’s problem

This problem concerns motion in a central field with potential U = −k/r and therefore V(r) = −(k/r) + (M²/2r²) (Figure 34).

Figure 34

Effective potential of the Kepler problem

By the general formula

Integrating, we get

To this expression we should have added an arbitrary constant. We will assume it equal to zero; this is equivalent to the choice of an origin of reference for the angle φ at the pericenter. We introduce the following notation:

Now we get φ = arc cos((p/r) − 1)/e, i.e.,

This is the so-called focal equation of a conic section. The motion is bounded (Figure 35) for E < 0. Then e < 1, i.e., the conic section is an ellipse. The number p is called the parameter of the ellipse, and e the eccentricity. Kepler’s first law, which he discovered by observing the motion of Mars, consists of the fact that the planets describe ellipses, with the sun at one focus.

Figure 35

Keplerian ellipse

If we assume that the planets move in a central field of gravity, then Kepler’s first law implies Newton’s law of gravity: U = −(k/r) (cf. Section 2D above).

The parameter and eccentricity are related with the semi-axes by the formulas

i.e.,

, where c = ae is the distance from the center to the focus (cf. Figure 35).

Remark. An ellipse with small eccentricity is very close to a circle.¹⁷ If the distance from the focus to the center is small of first order, then the difference between the semi-axes is of second order:

. For example, in the ellipse with major semi-axes of 10 cm and eccentricity 0.1, the difference of the semi-axes is 0.5 mm, and the distance between the focus and the center is 1 cm.

The eccentricities of planets’ orbits are very small. Therefore, Kepler originally formulated his first law as follows: the planets move around the sun in circles, but the sun is not at the center.

Kepler’s second law, that the sectorial velocity is constant, is true in any central field.

Kepler’s third law says that the period of revolution around an elliptical orbit depends only on the size of the major semi-axes.

The squares of the revolution periods of two planets on different elliptical orbits have the same ratio as the cubes of their major semi-axes.¹⁸

Proof. We denote by T the period of revolution and by S the area swept out by the radius vector in time T. 2S = MT, since M/2 is the sectorial velocity. But the area of the ellipse, S, is equal to πab, so T = 2πab/M. Since

(from a = p/(1 − e²)), and

then

; but 2|E| = k/a, so T = 2πa^3/2k^−1/2. □

We note that the total energy E depends only on the major semi-axis a of the orbit and is the same for the whole set of elliptical orbits, from a circle of radius a to a line segment of length 2a.

Problem. At the entry of a satellite into a circular orbit at a distance 300 km from the earth the direction of its velocity deviates from the intended direction by 1° towards the earth. How is the perigee changed?

Answer. The height of the perigee is less by approximately 110 km.

Hint. The orbit differs from a circle only to second order, and we can disregard this difference. The radius has the intended value since the initial energy has the intended value. Therefore, we get the true orbit (Figure 36) by twisting the intended orbit through 1°.

Figure 36

An orbit which is close to circular

Problem. How does the height of the perigee change if the actual velocity is 1 m/sec less than intended?

Problem. The first cosmic velocity is the velocity of motion on a circular orbit of radius close to the radius of the earth. Find the magnitude of the first cosmic velocity v₁ and show that

(cf. Section 3B).

Answer. 8.1 km/sec.

Problem.¹⁹ During his walk in outer space, the cosmonaut A. Leonov threw the lens cap of his movie camera towards the earth. Describe the motion of the lens cap with respect to the spaceship, taking the velocity of the throw as 10 m/sec.

Answer. The lens cap will move relative to the cosmonaut approximately in an ellipse with major axis about 32 km and minor axis about 16 km. The center of the ellipse will be situated 16 km in front of the cosmonaut in his orbit, and the period of circulation around the ellipse will be equal to the period of motion around the orbit.

Hint. We take as our unit of length the radius of the space ship’s circular orbit, and we choose a unit of time so that the period of revolution around this orbit is 2π. We must study solutions to Newton’s equation

close to the circular solution with r₀ = 1, φ₀ = t. We seek those solutions in the form

By the theorem on the differentiability of a solution with respect to its initial conditions, the functions r₁(t) and φ₁(t) satisfy a system of linear differential equations (equations of variation) up to small amounts which are of higher than first order in the initial deviation.

By substituting the expressions for r and φ in Newton’s equation, we get, after simple computation, the variational equations in the form

After solving these equations for the given initial conditions

, we get the answer given above.

Disregarding the small quantities of second order gives an effect of under 1/800 of the one obtained (i.e., on the order of 10 meters on one loop). Thus the lens cap describes a 30 km ellipse in an hour-and-a-half, returns to the space ship on the side opposite the earth, and goes past at the distance of a few tens of meters.

Of course, in this calculation we have disregarded the deviation of the orbit from a circle, the effect of forces other than gravity, etc.

9 The motion of a point in three-space

In this paragraph we define the angular momentum relative to an axis and we show that, for motion in an axially symmetric field, it is conserved.

All the results obtained for motion in a plane can be easily carried over to motions in space.

A Conservative fields

We consider a motion in the conservative field

where U = U(r), r ∈ E³.

The law of conservation of energy holds:

B Central fields

For motion in a central field the vector

does not change: dM/dt = 0.

Every central field is conservative (this is proved as in the two-dimensional case), and

since

, and the vector ∂U/∂r is collinear with r since the field is central.

Corollary. For motion in a central field, every orbit is planar.

Proof.

; therefore r(t) ⊥ M, and since M = const., all orbits lie in the plane perpendicular to M.²⁰

Thus the study of orbits in a central field in space reduces to the planar problem examined in the previous paragraph.

Problem. Investigate motion in a central field in n-dimensional euclidean space.

C Axially symmetric fields

Definition. A vector field in E³ has axial symmetry if it is invariant with respect to the group of rotations of space which fix every point of some axis.

Problem. Show that if a field is axially symmetric and conservative, then its potential energy has the form U = U(r, z), where r, φ, and z are cylindrical coordinates.

In particular, it follows from this that the vectors of the field lie in planes through the z axis.

As an example of such a field we can take the gravitational field created by a solid of revolution.

Let z be the axis, oriented by the vector e_z in three-dimensional euclidean space E³; F a vector in the euclidean linear space ℝ³; 0 a point on the z axis; r = x − 0 ∈ ℝ³ the radius vector of the point x ∈ E³ relative to 0 (Figure 37).

Figure 37

Moment of the vector F with respect to an axis

Definition. The moment M_z relative to the z axis of the vector F applied at the point r is the projection onto the z axis of the moment of the vector F relative to some point on this axis:

The number M_z does not depend on the choice of the point 0 on the z axis. In fact, if we look at a point 0′ on the axis, then by properties of the triple product, M′_z = (e_z, [r′, F]) = ([e_z, r′], F) = ([e_z, r], F) = M_z.

Remark. M_z depends on the choice of the direction of the z axis: if we change e_z to −e_z, then M_z changes sign.

Theorem. For a motion in a conservative field with axial symmetry around the z axis, the moment of velocity relative to the z axis is conserved.

Proof.

. Since

, it follows that r and

lie in a plane passing through the z axis, and therefore

is perpendicular to e_z.

Therefore,

Remark. This proof works for any force field in which the force vector F lies in the plane spanned by r and e_z.

10 Motions of a system of n points

In this paragraph we prove the laws of conservation of energy, momentum, and angular momentum for systems of material points in E³.

A Internal and external forces

Newton’s equations for the motion of a system of n material points, with masses m_i and radius vectors r_i ∈ E³ are the equations

The vector F_i is called the force acting on the i-th point.

The forces F_i are determined experimentally. We often observe in a system that for two points these forces are equal in magnitude and act in opposite directions along the straight line joining the points (Figure 38).

Figure 38

Forces of interaction

Such forces are called forces of interaction (example: the force of universal gravitation).

If all forces acting on a point of the system are forces of interaction, then the system is said to be closed. By definition, the force acting on the i-th point of a closed system is

The vector F_ij is the force with which the j-th point acts on the i-th.

Since the forces F_ij and F_ji are opposite (F_ij = −F_ji), we can write them in the form F_ij = f_ije_ij, where f_ij = f_ji is the magnitude of the force and e_ij is the unit vector in the direction from the i-th point to the j-th point.

If the system is not closed, then it is often possible to represent the forces acting on it in the form

where F_ij are forces of interaction and F′_i(r_i) is the so-called external force.

Example. (Figure 39) We separate a closed system into two parts, I and II. The force F_i applied to the i-th point of system I is determined by forces of interaction inside system I and forces acting on the i-th point from points of system II, i.e.,

Figure 39

Internal and external forces

F′_i is the external force with respect to system I.

B The law of conservation of momentum

Definition. The momentum of a system is the vector

Theorem. The rate of change of momentum of a system is equal to the sum of all external forces acting on points of the system.

Proof.

, since for forces of interaction F_ij = −F_ji.

Corollary 1. The momentum of a closed system is conserved.

Corollary 2. If the sum of the exterior forces acting on a system is perpendicular to the x axis, then the projection P_x of the momentum onto the x axis is conserved: P_x = const.

Definition. The center of mass of a system is the point

Problem. Show that the center of mass is well defined, i.e., does not depend on the choice of the origin of reference for radius vectors.

The momentum of a system is equal to the momentum of a particle lying at the center of mass of the system and having mass ∑m_i.

In fact, (∑m_i)r = ∑ (m_ir_i), from which it follows that

We can now formulate the theorem about momentum as a theorem about the motion of the center of mass.

Theorem. The center of mass of a system moves as if all masses were concentrated at it and all forces were applied to it.

Proof.

. Therefore,

. □

Corollary. If a system is closed, then its center of mass moves uniformly and linearly.

C The law of conservation of angular momentum

Definition. The angular momentum of a material point of mass m relative to the point 0, is the moment of the momentum vector relative to 0:

The angular momentum of a system relative to 0 is the sum of the angular momenta of all the points in the system:

Theorem. The rate of change of the angular momentum of a system is equal to the sum of the moments of the external forces²¹ acting on the points of the system.

Proof.

. The first term is equal to zero, and the second is equal to

by Newton’s equations.

The sum of the moments of two forces of interaction is equal to zero since

Therefore, the sum of the moments of all forces of interaction is equal to zero:

Therefore,

. □

Corollary 1 (The law of conservation of angular momentum). If the system is closed, then M = const.

We denote the sum of the moments of the external forces by

Then, by the theorem above, dM/dt = N, from which we have

Corollary 2. If the moment of the external forces relative to the z axis is equal to zero, then M_z is constant.

D The law of conservation of energy

Definition. The kinetic energy of a point of mass m is

Definition. The kinetic energy of a system of mass points is the sum of the kinetic energies of the points:

where the m_i are the masses of the points and ṙ_i are their velocities.

Theorem. The increase in the kinetic energy of a system is equal to the sum of the work of all forces acting on the points of the system.

Proof.

Therefore,

□

The configuration space of a system of n mass points in E³ is the direct product of n euclidean spaces: E³ⁿ = E³ × ⋯ × E³. It has itself the structure of a euclidean space.

Let r = (r₁,..., r_n) be the radius vector of a point in the configuration space, and F = (F₁,..., F_n) the force vector. We can write the theorem above in the form

In other words:

The increase in kinetic energy is equal to the work of the “force” F on the “path” r(t) in configuration space.

Definition. A system is called conservative if the forces depend only on the location of a point in the system (F = F(r)), and if the work of F along any path depends only on the initial and final points of the path:

Theorem. For a system to be conservative it is necessary and sufficient that there exist a potential energy, i.e., a function U(r) such that

Proof. Cf. Section 6B. □

Theorem. The total energy of a conservative system (E = T + U) is preserved under the motion: E(t₁) = E(t₀).

Proof. By what was shown earlier,

□

Let all the forces acting on the points of a system be divided into forces of interaction and external forces:

where F_ij = −F_ji = f_ije_ij.

Proposition. If the forces of interaction depend only on distance, f_ij = f_ij(|r_i − r_j|), then they are conservative.

Proof. If a system consists entirely of two points i and j, then, as is easily seen, the potential energy of the interaction is given by the formula

We then have

Therefore, the potential energy of the interaction of all the points will be

□

If the external forces are also conservative, i.e.,

, then the system is conservative, and its total potential energy is

For such a system the total mechanical energy

is conserved.

If the system is not conservative, then the total mechanical energy is not generally conserved.

Definition. A decrease in the mechanical energy E(t₀) − E(t₁) is called an increase in the non-mechanical energy E′:

Theorem (The law of conservation of energy). The total energy H = E + E′ is conserved.

This theorem is an obvious corollary of the definition above. Its value lies in the fact that in concrete physical systems, expressions for the size of the non-mechanical energy can be found in terms of other physical quantities (temperature, etc.).

E Example: The two-body problem

Suppose that two points with masses m₁ and m₂ interact with potential U, so that the equations of motion have the form

Theorem. The time variation of r = r₁ − r₂ in the two-body problem is the same as that for the motion of a point of mass m = m₁m₂/(m₁ + m₂) in a field with potential U(|r|).

We denote by r₀ the radius vector of the center of mass: r₀ = (m₁r₁ + m₂r₂)/(m₁ + m₂). By the theorem on the conservation of momentum, the point r₀ moves uniformly and linearly.

We now look at the vector r = r₁ − r₂. Multiplying the first of the equations of motion by m₂, the second by m₁, and computing, we find that

, where U = U(|r₁ − r₂|) = U(|r|).

In particular, in the case of a Newtonian attraction, the points describe conic sections with foci at their common center of mass (Figure 40).

Figure 40

The two body problem

Problem. Determine the major semi-axis of the ellipse which the center of the earth describes around the common center of mass of the earth and the moon. Where is this center of mass, inside the earth or outside? (The mass of the moon is 1/81 times the mass of the earth.)

11 The method of similarity

In some cases it is possible to obtain important information from the form of the equations of motion without solving them, by using the methods of similarity and dimension. The main idea in these methods is to choose a change of scale (of time, length, mass, etc.) under which the equations of motion preserve their form.

A Example

Let r(t) satisfy the equation m(d²r/dt²) = −(∂U/∂r). We set t₁ = αt and m₁ = α²m. Then r(t₁) satisfies the equation

. In other words:

If the mass of a point is decreased by a factor of 4, then the point can travel the same orbit in the same force field twice as fast.²²

B A problem

Suppose that the potential energy of a central field is a homogeneous function of degree v:

Show that if a curve γ is the orbit of a motion, then the homothetic curve αγ is also an orbit (under the appropriate initial conditions). Determine the ratio of the circulation times along these orbits. Deduce from this the isochronicity of the oscillation of a pendulum (v = 2) and Kepler’s third law (v = − 1).

Problem. If the radius of a planet is α times the radius of the earth and its mass β times that of the earth, find the ratio of the acceleration of the force of gravity and the first and second cosmic velocities to the corresponding quantities for the earth.

Answer. γ = βα⁻²,

For the moon, for example, α = 1/3.7 and β = 1/81. Therefore, the acceleration of gravity is about 1/6 that of the earth (γ ≈ 1/6), and the cosmic velocities are about 1/5 those for the earth (δ ≈ 1/4.7).

Problem.²³ A desert animal has to cover great distances between sources of water. How does the maximal time the animal can run depend on the size L of the animal?

Answer. It is directly proportional to L.

Solution. The store of water is proportional to the volume of the body, i.e., L³; the evaporation is proportional to the surface area, i.e., L². Therefore, the maximal time of a run from one source to another is directly proportional to L.

We notice that the maximal distance an animal can run also grows proportionally to L (cf. the following problem).

Problem.²⁴ How does the running velocity of an animal on level ground and uphill depend on the size L of the animal?

Answer. On level ground ∼ L⁰, uphill ∼ L⁻¹.

Solution. The power developed by the animal is proportional to L² (the percentage used by muscle is constant at about 25 %, the other 75 % of the chemical energy is converted to heat; the heat output is proportional to the body surface, i.e., L², which means that the effective power is proportional to L²).

The force of air resistance is directly proportional to the square of the velocity and the area of a cross-section; the power spent on overcoming it is therefore proportional to v²L²v. Therefore, v³L² ∼ L², so v ∼ L⁰. In fact, the running velocity on level ground, no smaller for a rabbit than for a horse, in practice does not specifically depend on the size.

The power necessary to run uphill is mgv ∼ L³v; since the generated power is ∼L², we find that v ∼ L^{− 1}. In fact, a dog easily runs up a hill, while a horse slows its pace.

Problem. ^24a How does the height of an animal’s jump depend on its size?

Answer. ∼L⁰.

Solution. For a jump of height h one needs energy proportional to L³h, and the work accomplished by muscular strength F is proportional to FL. The force F is proportional to L² (since the strength of bones is proportional to their section). Therefore, L³h ∼ L²L, i.e., the height of a jump does not depend on the size of the animal. In fact, a jerboa and a kangaroo can jump to approximately the same height.

see footnote on p. 11.

¹¹

Here we assume for simplicity that the solution φ is defined on the whole time axis ℝ.

¹²

For a definition, see, e.g., p. 155 of Ordinary Differential Equations by V. I. Arnold, MIT Press, 1973.

¹³

The only exception is the case when the period does not depend on the energy.

¹⁴

In cartesian coordinates on the plane E²,

and

¹⁵

With the usual limitations.

¹⁶

Including reflections.

¹⁷

Let a drop of tea fall into a glass of tea close to the center. The waves collect at the symmetric point. The reason is that, by the focal definition of an ellipse, waves radiating from one focus of the ellipse collect at the other.

¹⁸

By planets we mean here points in a central field.

¹⁹

This problem is taken from V. V. Beletskii’s delightful book, “Sketches on the Motion of Celestial Bodies,” Nauka, 1972.

²⁰

The case M = 0 is left to the reader.

²¹

The moment of force is also called the torque [Trans. note].

²²

Here we are assuming that U does not depend on m. In the field of gravity, the potential energy U is proportional to m, and therefore the acceleration does not depend on the mass m of the moving point.

²³

J. M. Smith, Mathematical Ideas in Biology. Cambridge University Press, 1968.

²⁴

Ibid.

^24a

Ibid.

Part II
Lagrangian Mechanics

3
Variational principles

V. I. Arnold¹

(1)

Department of Mathematics Steklov Mathematical Institute, Russian Academy of Sciences, GSP-1, 117966, Moscow, Russia

In this chapter we show that the motions of a newtonian potential system are extremals of a variational principle, “Hamilton’s principle of least action.”

This fact has many important consequences, including a quick method for writing equations of motion in curvilinear coordinate systems, and a series of qualitative deductions—for example, a theorem on returning to a neighborhood of the initial point.

In this chapter we will use an n-dimensional coordinate space. A vector in such a space is a set of numbers x = (x₁,..., x_n). Similarly, ∂f/∂x means (∂f/∂x₁,..., ∂f/∂x_n), and (a, b) = a₁b₁ + ⋯ + a_nb_n.

12 Calculus of variations

For what follows, we will need some facts from the calculus of variations. A more detailed exposition can be found in “A Course in the Calculus of Variations” by M. A. Lavrentiev and L. A. Lusternik, M. L., 1938, or G. E. Shilov, “Elementary Functional Analysis,” MIT Press, 1974.

The calculus of variations is concerned with the extremals of functions whose domain is an infinite-dimensional space: the space of curves. Such functions are called functionals.

An example of a functional is the length of a curve in the euclidean plane: if γ = {(t, x): x(t) = x, t₀ ≤ t ≤ t₁}, then

In general, a functional is any mapping from the space of curves to the real numbers.

We consider an “approximation” γ′ to γ, γ′ = {(t, x): x = x(t) + h(t)}. We will call it γ′ = γ + h. Consider the increment of Φ, Φ(γ + h) − Φ(γ) (Figure 41).

Figure 41

Variation of a curve

A Variations

Definition. A functional Φ is called differentiable²⁶ if Φ(γ + h) − Φ(γ) = F + R, where F depends linearly on h (i.e., for a fixed γ, F(h₁ + h₂) = F(h₁) + F(h₂) and F(ch) = cF(h)), and R(h, γ) = O(h²) in the sense that, for |h| < ε and |dh/dt| < ε, we have |R| < Cε². The linear part of the increment, F(h), is called the differential.

It can be shown that if Φ is differentiable, its differential is uniquely defined. The differential of a functional is also called its variation, and h is called a variation of the curve.

Example. Let γ = {(t, x): x = x(t), t₀ ≤ t ≤ t ₁} be a curve in the (t, x)-plane; ẋ =dx/dt; L = L(a, b, c) a differentiable function of three variables. We define a functional Φ by

In case

, we get the length of γ.

Theorem. The functional

is differentiable, and its derivative is given by the formula

Proof.

where

Integrating by parts, we find that

B Extremals

Definition. An extremal of a differentiable functional Φ(γ) is a curve γ such that F(h) = 0 for all h.

(In exactly the same way that γ is a stationary point of a function if the differential is equal to zero at that point.)

Theorem. The curve γ: x = x(t) is an extremal of the functional Φ(γ) =

on the space of curves passing through the points x(t₀) = x₀ and x(t₁) = x₁, if and only if

Lemma. If a continuous function f(t), t₀ ≤ t ≤ t₁ satisfies

for any continuous²⁷ function h(t) with h(t₀) = h(t₁) = 0, then f(t) ≡ 0.

Proof of the lemma. Let f(t*) > 0 for some t*, t₀ < t* < t₁. Since f is continuous, f(t) > c in some neighborhood Δ of the point t*: t₀ < t* − d < t < t* + d < t₁. Let h(t) be such that h(t) = 0 outside Δ, h(t) > 0 in Δ, and h(t) = 1 in Δ/2 (i.e., for t s.t.

. Then, clearly,

(Figure 42). This contradiction shows that f(t*) = 0 for all t*, t₀ < t* < t₁. □

Figure 42

Construction of the function h

Proof of the theorem. By the preceding theorem,

The term after the integral is equal to zero since h(t₀) = h(t₁) = 0. If γ is an extremal, then F(h) = 0 for all h with h(t₀) = h(t₁) = 0. Therefore,

where

for all such h. By the lemma, f(t) ≡ 0. Conversely, if f(t) ≡ 0, then clearly F(h) ≡ 0. □

Example. We verify that the extremals of length are straight lines. We have:

C The Euler-Lagrange equation

Definition. The equation

is called the Euler-Lagrange equation for the functional

Now let x be a vector in the n-dimensional coordinate space ℝⁿ, γ = {(t, x): x = x(t), t₀ ≤ t ≤ t₁} a curve in the (n + 1)-dimensional space ℝ × ℝⁿ, and L: ℝⁿ × ℝⁿ × ℝ → ℝ a function of 2n + 1 variables. As before, we show:

Theorem. The curve γ is an extremal of the functional

on the space of curves joining (t₀, x₀) and (t₁, x₁), if and only if the Euler- Lagrange equation is satisfied along γ.

This is a system of n second-order equations, and the solution depends on 2n arbitrary constants. The 2n conditions x(t₀) = x₀, x(t₁) = x₁ are used for finding them.

Problem. Cite examples where there are many extremals connecting two given points, and others where there are none at all.

D An important remark

The condition for a curve γ to be an extremal of a functional does not depend on the choice of coordinate system.

For example, the same functional—length of a curve—is given in cartesian and polar coordinates by the different formulas

The extremals are the same—straight lines in the plane. The equations of lines in cartesian and polar coordinates are given by different functions: x₁ = x₁(t), x₂ = x₂(t), and r = f(t), φ = φ(t).

However, both these vector functions satisfy the Euler—Lagrange equation

only, in the first case, when x_cart = x₁, x₂ and

, and in the second case when x_pol = r, φ and

In this way we can easily describe in any coordinates a differential equation for the family of all straight lines.

Problem. Find the differential equation for the family of all straight lines in the plane in polar coordinates.

13 Lagrange’s equations

Here we indicate the variational principle whose extremals are solutions of Newton’s equations of motion in a potential system.

We compare Newton’s equations of dynamics
(1)
with the Euler-Lagrange equation

A Hamilton’s principle of least action

Theorem. Motions of the mechanical system (1) coincide with extremals of the functional

is the difference between the kinetic and potential energy.

Proof. Since U = U(r) and

, we have

and ∂L/∂r_i = −∂U/∂r_i.☐

Corollary. Let (q₁,..., q_3n) be any coordinates in the configuration space of a system of n mass points. Then the evolution of q with time is subject to the Euler-Lagrange equations

Proof. By the theorem above, a motion is an extremal of the functional ∫ L dt. Therefore, in any system of coordinates the Euler-Lagrange equation written in that coordinate system is satisfied. □

Definition. In mechanics we use the following terminology:

is the Lagrange function or lagrangian, q_i are the generalized coordinates,

are generalized velocities,

are generalized momenta, ∂L/∂q_i are generalized forces,

is the action,

−(∂L/∂q_i) = 0 are Lagrange’s equations.

The last theorem is called “Hamilton’s form of the principle of least motion” because in many cases the action q(t) is not only an extremal but is also a minimum value of the action functional

B The simplest examples

Example 1. For a free mass point in E³,

in cartesian coordinates q_i = r_i we find

Here the generalized velocities are the components of the velocity vector, the generalized momenta

are the components of the momentum vector, and Lagrange’s equations coincide with Newton’s equations dp/dt = 0. The extremals are straight lines. It follows from Hamilton’s principle that straight lines are not only shortest (i.e., extremals of the length

but also extremals of the action

Problem. Show that this extremum is a minimum.

Example 2. We consider planar motion in a central field in polar coordinates q₁ = r, q₂ = φ. From the relation

we find the kinetic energy

and the lagrangian

, where U = U(q₁).

The generalized momenta will be

, i.e.,

The first Lagrange equation

takes the form

We already obtained this equation in Section 8.

Since q₂ = φ does not enter into L, we have ∂L/∂q₂ = 0. Therefore, the second Lagrange equation will be

. This is the law of conservation of angular momentum.

In general, when the field is not central (U = U(r, φ)), we find

This equation can be rewritten in the form d(M, e_z)/dt = N, where N = ([r, F], e_z) and F = −∂U/∂r. (The rate of change in angular momentum relative to the z axis is equal to the moment of the force F relative to the z axis.)

In fact, we have dU = (∂U/∂r)dr + (∂U/∂φ)dφ = −(F, dr) = −(F, e_r)dr − r(F, e_φ)dφ; therefore, −∂U/∂φ = r(F, e_φ) = r([e_r, F], e_z) = ([r, F], e_z).

This example suggests the following generalization of the law of conservation of angular momentum.

Definition. A coordinate q_i is called cyclic if it does not enter into the lagrangian: ∂L/∂q_i = 0.

Theorem. The generalized momentum corresponding to a cyclic coordinate is conserved: p_i = const.

Proof. By Lagrange’s equation dp_i/dt = ∂L/∂q_i = 0. □

14 Legendre transformations

The Legendre transformation is a very useful mathematical tool: it transforms functions on a vector space to functions on the dual space. Legendre transformations are related to projective duality and tangential coordinates in algebraic geometry and the construction of dual Banach spaces in analysis. They are often encountered in physics (for example, in the definition of thermodynamic quantities).

A Definition

Let y = f(x) be a convex function, f″(x) > 0.

The Legendre transformation of the function f is a new function g of a new variable p, which is constructed in the following way (Figure 43). We draw the graph of f in the x, y plane. Let p be a given number. Consider the straight line y = px. We take the point x = x(p) at which the curve is farthest from the straight line in the vertical direction: for each p the function px − f(x) = F(p, x) has a maximum with respect to x at the point x(p). Now we define g(p) = F(p, x(p)).

Figure 43

Legendre transformation

The point x(p) is defined by the extremal condition ∂F/∂x = 0, i.e., f′(x) = p. Since f is convex, the point x(p) is unique.²⁸

Problem. Show that the domain of g can be a point, a closed interval, or a ray if f is defined on the whole x axis. Prove that if f is defined on a closed interval, then g is defined on the whole p axis.

B Examples

Examples 1. Let f(x) = x². Then

Examples 2. Let f(x) = mx²/2. Then g(p) = p²/2m.

Examples 3. Let f(x) = x^α/α. Then g(p) = p^β/β, where (1/α) + (1/β) = 1 (α > 1, β > 1).

Example 4. Let f(x) be a convex polygon. Then g(p) is also a convex polygon, in which the vertices of f(x) correspond to the edges of g(p), and the edges of f(x) to the vertices of g(p). For example, the corner depicted in Figure 44 is transformed to a segment under the Legendre transformation.

Figure 44

Legendre transformation taking an angle to a line segment

C Involutivity

Let us consider a function f which is differentiable as many times as necessary, with f″(x) > 0. It is easy to verify that a Legendre transformation takes convex functions to convex functions. Therefore, we can apply it twice.

Theorem. The Legendre transformation is involutive, i.e., its square is the identity: if under the Legendre transformation f is taken to g, then the Legendre transform of g will again be f.

Proof. In order to apply the Legendre transform to g, with variable p, we must by definition look at a new independent variable (which we will call x), construct the function

and find the point p(x) at which G attains its maximum: ∂G/∂p = 0, i.e., g′(p) = x. Then the Legendre transform of g(p) will be the function of x equal to G(x, p(x)).

We will show that G(x, p(x)) = f(x). To this end we notice that G(x, p) = xp − g(p) has a simple geometric interpretation: it is the ordinate of the point with abscissa x on the line tangent to the graph of f(x) with slope p (Figure 45). For fixed p, the function G(x, p) is a linear function of x, with ∂G/∂x = p, and for x = x(p) we have G(x, p) = xp − g(p) = f(x) by the definition of g(p).

Figure 45

Involutivity of the Legendre transformation

Let us now fix x = x₀ and vary p. Then the values of G(x, p) will be the ordinates of the points of intersection of the line x = x₀ with the line tangent to the graph of f(x) with various slopes p. By the convexity of the graph it follows that all these tangents lie below the curve, and therefore the maximum of G(x, p) for a fixed x(p₀) is equal to f(x) (and is achieved for p = p(x₀) = f′(x₀)). □

Corollary. ²⁹ Consider a given family of straight lines y = px − g(p). Then its envelope has the equation y = f(x), where f is the Legendre transform of g.

D Young’s inequality

Definition. Two functions, f and g, which are the Legendre transforms of one another are called dual in the sense of Young.

By definition of the Legendre transform, F(x, p) = px − f(x) is less than or equal to g(p) for any x and p. From this we have Young’s inequality:

Example 1. If

, then

and we obtain the well-known inequality

for all x and p.

Example 2. If f(x) = x^α/α, then g(p) = p^β/β, where (1/α) + (1/β) = 1, and we obtain Young’s inequality px ≤ (x^α /α) + (p^β/β) for all x > 0, p > 0, α > 1, β > 1, and (1/α) + (1/β) = 1.

E The case of many variables

Now let f(x) be a convex function of the vector variable x = (x₁,..., x_n) (i.e., thequadraticform((∂²f/∂x²)dx, dx) is positive definite). Then the Legendre transform is the function g(p) of the vector variable p = (p₁,..., p_n), defined as above by the equalities g(p) = F(p, x(p)) = max_x F(p, x), where F(p, x) = (p, x) − f(x) and p = ∂f/∂x.

All of the above arguments, including Young’s inequality, can be carried over without change to this case.

Problem. Let f: ℝⁿ → ℝ be a convex function. Let ℝ^n* denote the dual vector space. Show that the formulas above completely define the mapping g: ℝ^n* → ℝ (under the condition that the linear form df|_x ranges over all of ℝ^n* when x ranges over ℝⁿ).

Problem. Let f be the quadratic form

. Show that its Legendre transform is again a quadratic form

, and that the values of both forms at corresponding points coincide (Figure 46):

Figure 46

Legendre transformation of a quadratic form

15 Hamilton’s equations

By means of a Legendre transformation, a lagrangian system of second-order differential equations is converted into a remarkably symmetrical system of 2n first-order equations called a hamiltonian system of equations (or canonical equations).

A Equivalence of Lagrange’s and Hamilton’s equations

We consider the system of Lagrange’s equations

, where p =

, with a given lagrangian function L: ℝⁿ × ℝⁿ × ℝ → ℝ, which we will assume to be convex³⁰ with respect to the second argument

Theorem. The system of Lagrange’s equations is equivalent to the system of 2n first-order equations (Hamilton’s equations)

where

is the Legendre transform of the lagrangian function viewed as a function of

Proof. By definition, the Legendre transform of

with respect to

is the function

, in which

is expressed in terms of p by the formula

, and which depends on the parameters q and t. This function H is called the hamiltonian.

The total differential of the hamiltonian

is equal to the total differential of

for

Both expressions for dH must be the same. Therefore,

Applying Lagrange’s equations

, we obtain Hamilton’s equations.

We have seen that, if q(t) satisfies Lagrange’s equations, then (p(t), q(t)) satisfies Hamilton’s equations. The converse is proved in an analogous manner. Therefore, the systems of Lagrange and Hamilton are equivalent. □

Remark. The theorem just proved applies to all variational problems, not just to the lagrangian equations of mechanics.

B Hamilton’s function and energy

Example. Suppose now that the equations are mechanical, so that the lagrangian has the usual form L = T − U, where the kinetic energy T is a quadratic form with respect to

Theorem. Under the given assumptions, the hamiltonian H is the total energy H = T + U.

The proof is based on the following lemma on the Legendre transform of a quadratic form.

Lemma. The values of a quadratic form f(x) and of its Legendre transform g(p) coincide at corresponding points: f(x) = g(p).

Example. For the form f(x) = x² this is a well-known property of a tangent to a parabola. For the form

we have p = mx and g(p) = p²/2m = mx²/2 = f(x).

Proof of the lemma By Euler’s theorem on homogeneous functions (∂f/∂x)x = 2f. Therefore, g(p(x)) = px − f(x) = (∂f/∂x)x − f = 2f(x) − f(x) = f(x). □

Proof of the theorem. Reasoning as in the lemma, we find that

. □

Example. For one-dimensional motion

In this case

and Hamilton’s equations take the form

This example makes it easy to remember which of Hamilton’s equations has a minus sign.

Several important corollaries follow from the theorem on the equivalence of the equations of motion to a hamiltonian system. For example, the law of conservation of energy takes the simple form:

Corollary 1. dH/dt = ∂H/∂t. In particular, for a system whose hamiltonian function does not depend explicitly on time (∂H/∂t = 0), the law of conservation of the hamiltonian function holds: H(p(t), q(t)) = const.

Proof. We consider the variation in H along the trajectory H(p(t), q(t), t). Then, by Hamilton’s equations,

C Cyclic coordinates

When considering central fields, we noticed that a problem could be reduced to a one-dimensional problem by the introduction of polar coordinates. It turns out that, given any symmetry of a problem allowing us to choose a system of coordinates q in such a way that the hamiltonian function is independent of some of the coordinates, we can find some first integrals and thereby reduce to a problem in a smaller number of coordinates.

Definition. If a coordinate q₁ does not enter into the hamiltonian function H(p₁, p₂,..., p_n; q₁,..., q_n; t), i.e., ∂H/∂q₁ = 0, then it is called cyclic (the term comes from the particular case of the angular coordinate in a central field).

Clearly, the coordinate q₁ is cyclic if and only if it does not enter into the lagrangian function (∂L/∂q₁ = 0). It follows from the hamiltonian form of the equations of motion that:

Corollary 2. Let q₁ be a cyclic coordinate. Then p₁ is a first integral. In this case the variation of the remaining coordinates with time is the same as in a system with the n − 1 independent coordinates q₂,..., q_n and with hamiltonian function

depending on the parameter c = p₁.

Proof. We set p′ = (p₂,..., p_n) and q′ = (q₂,..., q_n). Then Hamilton’s equations take the form

The last equation shows that p₁ = const. Therefore, in the system of equations for p′ and q′, the value of p₁ enters only as a parameter in the hamiltonian function. After this system of 2n − 2 equations is solved, the equation for q₁ takes the form

and is easily integrated. □

Almost all the solved problems in mechanics have been solved by means of Corollary 2.

Corollary 3. Every closed system with two degrees of freedom (n = 2) which has a cyclic coordinate is integrable.

Proof. In this case the system for p′ and q′ is one-dimensional and is immediately integrated by means of the integral H(p′, q′) = c. ☐

16 Liouville’s theorem

The phase flow of Hamilton’s equations preserves phase volume. It follows, for example, that a hamiltonian system cannot be asymptotically stable.

For simplicity we look at the case in which the hamiltonian function does not depend explicitly on the time: H = H(p, q).

A The phase flow

Definition. The 2n-dimensional space with coordinates p₁,..., p_n; q₁,..., q_n is called phase space.

Example. In the case n = 1 this is the phase plane of the system ẍ = −∂U/∂x, which we considered in Section 4.

Just as in this simplest example, the right-hand sides of Hamilton’s equations give a vector field: at each point (p, q) of phase space there is a 2n-dimensional vector (−∂H/∂q, ∂H/∂p). We assume that every solution of Hamilton’s equations can be extended to the whole time axis.³¹

Definition. The phase flow is the one-parameter group of transformations of phase space

where p(t) and q(t) are solutions of Hamilton’s system of equations (Figure 47).

Figure 47

Phase flow

Problem. Show that {g^t} is a group.

B Liouville’s theorem

Theorem 1. The phase flow preserves volume: for any region D we have (Figure 48)

Figure 48

Conservation of volume

We will prove the following slightly more general proposition also due to Liouville.

Suppose we are given a system of ordinary differential equations ẍ = f(x), x = (x₁,..., x_n), whose solution may be extended to the whole time axis. Let {g^t} be the corresponding group of transformations:

(1)

Let D(0) be a region in x-space and v(0) its volume;

Theorem 2. If div f ≡ 0, then g^t preserves volume: v(t) = v(0).

C Proof

Lemma 1. (dv/dt)|_{t = 0} = ∫_D(0) div f dx (dx = dx₁ ⋯ dx_n).

Proof. For any t, the formula for changing variables in a multiple integral gives

Calculating ∂g^tx/∂x by formula (1), we find

We will now use a well-known algebraic fact:

Lemma 2. For any matrix A = (a_ij),

where tr

is the trace of A (the sum of the diagonal elements).

(The proof of Lemma 2 is obtained by a direct expansion of the determinant: we get 1 and n terms in t; the remaining terms involve t², t³, etc.)

Using this, we have

But tr

. Therefore,

which proves Lemma 1. □

Proof of theorem 2. Since t = t₀ is no worse than t = 0, Lemma 1 can be written in the form

and if divf ≡ 0, dv/dt ≡ 0. □

In particular, for Hamilton’s equations we have

This proves Liouville’s theorem (Theorem 1). □

Problem. Prove Liouville’s formula W = W₀e^{∫ tr A dt} for the Wronskian determinant of the linear system ẋ = A(t)x.

Liouville’s theorem has many applications.

Problem. Show that in a hamiltonian system it is impossible to have asymptotically stable equilibrium positions and asymptotically stable limit cycles in the phase space.

Liouville’s theorem has particularly important applications in statistical mechanics.

Liouville’s theorem allows one to apply methods of ergodic theory³² to the study of mechanics. We consider only the simplest example:

D Poincaré’s recurrence theorem

Let g be a volume-preserving continuous one-to-one mapping which maps a bounded region D of euclidean space onto itself: gD = D.

Then in any neighborhood U of any point of D there is a point x ∈ U which returns to U, i.e., gⁿx ∈ U for some n > 0.

This theorem applies, for example, to the phase flow g^t of a two-dimensional system whose potential U(x₁, x₂) goes to infinity as (x₁, x₂) → ∞; in this case the invariant bounded region in phase space is given by the condition (Figure 49)

Figure 49

The way a ball will move in an asymmetrical cup is unknown; however Poincaré’s theorem predicts that it will return to a neighborhood of the original position.

Poincaré’s theorem can be strengthened, showing that almost every moving point returns repeatedly to the vicinity of its initial position. This is one of the few general conclusions which can be drawn about the character of motion. The details of motion are not known at all, even in the case

The following prediction is a paradoxical conclusion from the theorems of Poincaré and Liouville: if you open a partition separating a chamber containing gas and a chamber with a vacuum, then after a while the gas molecules will again collect in the first chamber (Figure 50).

Figure 50

Molecules return to the first chamber.

The resolution of the paradox lies in the fact that “a while” may be longer than the duration of the solar system’s existence.

Proof of Poincaré’s theorem. We consider the images of the neighborhood U (Figure 51):

Figure 51

Theorem on returning

All of these have the same volume. If they never intersected, D would have infinite volume. Therefore, for some k ≥ 0 and l ≥ 0, with k > l,

Therefore, g^{k − 1}U ∩ U ≠ ∅. If y is in this intersection, then y = gⁿx, with x ∈ U(n = k − l). Then x ∈ U and gⁿx ∈ U(n = k − l). □

E Applications of Poincaré’s theorem

Example 1. Let D be a circle and g rotation through an angle α. If α = 2π(m/n), then gⁿ is the identity, and the theorem is obvious. If α is not commensurable with 2π, then Poincaré’s theorem gives

(Figure 52)

Figure 52

Dense set on the circle

It easily follows that

Theorem. If α ≠ 2π(m/n), then the set of points g^kx is dense³³ on the circle (k = 1, 2,...).

Problem. Show that every orbit of motion in a central field with U = r⁴ is either closed or densely fills the ring between two circles.

Example 2. Let D be the two-dimensional torus and φ₁ and φ₂ angular coordinates on it (longitude and latitude) (Figure 53).

Figure 53

Torus

Consider the system of ordinary differential equations on the torus

Clearly, div f = 0 and the corresponding motion

preserves the volume dφ₁ dφ₂. From Poincaré’s theorem it is easy to deduce

Theorem. If α₁/α₂ is irrational, then the “winding line” on the torus, g^t(φ₁, φ₂), is dense in the torus.

Problem. Show that if ω is irrational, then the Lissajous figure (x = cost, y = cos ωt) is dense in the square |x| ≤ 1, |y| ≤ 1.

Example 3. Let D be the n-dimensional torus Tⁿ, i.e., the direct product³⁴ of n circles:

A point on the n-dimensional torus is given by n angular coordinates φ = (φ₁,..., φ_n). Let α = (α₁,..., α_n), and let g^t be the volume-preserving transformation

Problem. Under which conditions on α are the following sets dense: (a) the trajectory {g^tφ}; (b) the trajectory {g^kφ} (t belongs to the group of real numbers ℝ, k to the group of integers ℤ).

The transformations in Examples 1 to 3 are closely connected to mechanics. But since Poincaré’s theorem is abstract, it also has applications unconnected with mechanics.

Example 4. Consider the first digits of the numbers 2ⁿ: 1, 2, 4, 8, 1, 3, 6, 1, 2, 5, 1, 2, 4,....

Problem. Does the digit 7 appear in this sequence? Which digit appears more often, 7 or 8? How many times more often?

²⁶

We should specify the class of curves on which Φ is defined and the linear space which contains h. One could assume, for example, that both spaces consist of the infinitely differentiable functions.

²⁷

Or even for any infinitely differentiable function h.

²⁸

If it exists.

²⁹

One can easily see that this is the theory of “Clairaut’s equation.”

³⁰

In practice this convex function will often be a positive definite quadratic form.

³¹

For this it is sufficient, for example, that the level sets of H be compact.

³²

Cf, for example, the book: Halmos, Lectures on Ergodic Theory, 1956 (Mathematical Society of Japan. Publications. No. 3).

³³

A set A is dense in B if there is a point of A in every neighborhood of every point of B.

³⁴

The direct product of the sets A, B,... is the set of points (a, b,...), with a ∈ A, b ∈ B,....

4
Lagrangian mechanics on manifolds

V. I. Arnold¹

(1)

Department of Mathematics Steklov Mathematical Institute, Russian Academy of Sciences, GSP-1, 117966, Moscow, Russia

In this chapter we introduce the concepts of a differentiable manifold and its tangent bundle. A lagrangian function, given on the tangent bundle, defines a lagrangian “holonomic system” on a manifold. Systems of point masses with holonomic constraints (e.g., a pendulum or a rigid body) are special cases.

17 Holonomic constraints

In this paragraph we define the notion of a system of point masses with holonomic constraints.

A Example

Let γ be a smooth curve in the plane. If there is a very strong force field in a neighborhood of γ, directed towards the curve, then a moving point will always be close to γ. In the limit case of an infinite force field, the point must remain on the curve γ. In this case we say that a constraint is put on the system (Figure 54).

Figure 54

Constraint as an infinitely strong field

To formulate this precisely, we introduce curvilinear coordinates q₁ and q₂ on a neighborhood of γ; q₁ is in the direction of γ and q₂ is distance from the curve.

We consider the system with potential energy

depending on the parameter N (which we will let tend to infinity) (Figure 55).

Figure 55

Potential energy U_N

We consider the initial conditions on γ:

Denote by q₁ = φ(t, N) the evolution of the coordinate q₁ under a motion with these initial conditions in the field U_N.

Theorem. The following limit exists, as N → ∞:

The limit q₁ = ψ(t) satisfies Lagrange’s equation

where

(T is the kinetic energy of motion along γ).

Thus, as N → ∞, Lagrange’s equations for q₁ and q₂ induce Lagrange’s equation for q₁ = ψ(t).

We obtain exactly the same result if we replace the plane by the 3n-dimensional configuration space of n points, consisting of a mechanical system with metric

(the m_i are masses), replace the curve γ by a submanifold of the 3n-dimensional space, replace q₁ by some coordinates q₁ on γ, and replace q₂ by some coordinates q₂ in the directions perpendicular to γ. If the potential energy has the form

then as N → ∞, a motion on γ is defined by Lagrange’s equations with the lagrangian function

B Definition of a system with constraints

We will not prove the theorem above,³⁵ but neither will we use it. We need it only to justify the following.

Definition. Let γ be an m-dimensional surface in the 3n-dimensional configuration space of the points r₁,..., r_n with masses m₁,..., m_n. Let q = (q₁,..., q_m) be some coordinates on γ:r_i = r_i(q). The system described by the equations

is called a system of n points with 3n − m ideal holonomic constraints. The surface γ is called the configuration space of the system with constraints.

If the surface γ is given by k = 3n − m functionally independent equations f₁(r) = 0,..., f_k(r) = 0, then we say that the system is constrained by the relations f₁ = 0,..., f_k = 0.

Holonomic constraints also could have been defined as the limiting case of a system with a large potential energy. The meaning of these constraints in mechanics lies in the experimentally determined fact that many mechanical systems belong to this class more or less exactly.

From now on, for convenience, we will call ideal holonomic constraints simply constraints. Other constraints will not be considered in this book.

18 Differentiable manifolds

The configuration space of a system with constraints is a differentiable manifold. In this paragraph we give the elementary facts about differentiable manifolds.

A Definition of a differentiable manifold

A set M is given the structure of a differentiable manifold if M is provided with a finite or countable collection of charts, so that every point is represented in at least one chart.

A chart is an open set U in the euclidean coordinate space q = (q₁,..., q_n), together with a one-to-one mapping φ of U onto some subset of M, φ: U → φU ⊂ M.

We assume that if points p and p′ in two charts U and U′ have the same image in M, then p and p′ have neighborhoods V ⊂ U and V′ ⊂ U′ with the same image in M (Figure 56). In this way we get a mapping φ′⁻¹ φ: V → V′.

Figure 56

Compatible charts

This is a mapping of the region V of the euclidean space q onto the region V′ of the euclidean space q′, and it is given by n functions of n variables, q′ = q′(q), (q = q(q′)). The charts U and U′ are called compatible if these functions are differentiable.³⁶

An atlas is a union of compatible charts. Two atlases are equivalent if their union is also an atlas.

A differentiable manifold is a class of equivalent atlases. We will consider only connected manifolds.³⁷ Then the number n will be the same for all charts; it is called the dimension of the manifold.

A neighborhood of a point on a manifold is the image under a mapping φ: U → M of a neighborhood of the representation of this point in a chart U. We will assume that every two different points have non-intersecting neighborhoods.

B Examples

Example 1. Euclidean space ℝⁿ is a manifold, with an atlas consisting of one chart.

Example 2. The sphere S² = {(x, y, z): x² + y² + z² = 1} has the structure of a manifold, with atlas, for example, consisting of two charts (U_i, φ_i, i = 1, 2) in stereographic projection (Figure 57). An analogous construction applies to the n-sphere

Figure 57
Atlas of a sphere

Example 3. Consider a planar pendulum. Its configuration space—the circle S¹—is a manifold. The usual atlas is furnished by the angular coordinates φ: ℝ¹ → S¹, U₁ = (− π, π), U₂ = (0, 2π) (Figure 58).

Figure 58
Planar, spherical and double planar pendulums

Example 4. The configuration space of the “spherical” mathematical pendulum is the two-dimensional sphere S² (Figure 58).

Example 5. The configuration space of a“planar double pendulum” is the direct product of two circles, i.e., the two-torus T² = S¹ × S¹ (Figure 58).

Example 6. The configuration space of a spherical double pendulum is the direct product of two spheres, S² × S².

Example 7. A rigid line segment in the (q₁, q₂)-plane has for its configuration space the manifold ℝ² × S¹, with coordinates q₁, q₂, q₃ (Figure 59). It is covered by two charts.

Figure 59
Configuration space of a segment in the plane

Example 8. A rigid right triangle O AB moves around the vertex O. The position of the triangle is given by three numbers: the direction OA ∈ S² is given by two numbers, and if OA is given, one can rotate OB ∈ S¹ around the axis OA (Figure 60).

Figure 60
Configuration space of a triangle

Connected with the position of the triangle O AB is an orthogonal right-handed frame, e₁ = OA/|OA|, e₂ = OB/|OB|, e₃ = [e₁, e₂]. The correspondence is one-to-one; therefore the position of the triangle is given by an orthogonal three-by-three matrix with determinant 1.

The set of all three-by-three matrices is the nine-dimensional space ℝ⁹. Six orthogonality conditions select out two three-dimensional connected manifolds of matrices with determinant + 1 and −1. The rotations of three-space (determinant + 1) form a group, which we call SO(3).

Therefore, the configuration space of the triangle OAB is SO(3).

Problem. Show that SO(3) is homeomorphic to three-dimensional real projective space.

Definition. The dimension of the configuration space is called the number of degrees of freedom.

Example 9. Consider a system of k rods in a closed chain with hinged joints.

Problem. How many degrees of freedom does this system have?

Example 10. Embedded manifolds. We say that M is an embedded k-dimensional sub-manifold of euclidean space ℝⁿ (Figure 61) if in a neighborhood U of every point x ∈ M there are n − k functions f₁: U → ℝ, f₂: U → ℝ,..., f_{n − k}: U → ℝ such that the intersection of U with M is given by the equations f₁ = 0,..., f_{n − k} = 0, and the vectors grad f₁,..., grad f_{n − k} at x are linearly independent.

Figure 61
Embedded submanifold

It is easy to give M the structure of a manifold, i.e., coordinates in a neighborhood of x(how?).

It can be shown that every manifold can be embedded in some euclidean space. In Example 8, SO(3) is a subset of ℝ⁹.

Problem. Show that SO(3) is embedded in ℝ⁹, and at the same time, that SO(3) is a manifold.

C Tangent space

If M is a k-dimensional manifold embedded in Eⁿ, then at every point x we have a k-dimensional tangent space TM_x. Namely, TM_x is the orthogonal complement to {grad f₁,..., grad f_{n − k}} (Figure 62). The vectors of the tangent space TM_x based at x are called tangent vectors to M at x. We can also define these vectors directly as velocity vectors of curves in M:

Figure 62

Tangent space

The definition of tangent vectors can also be given in intrinsic terms, independent of the embedding of M into Eⁿ.

We will call two curves x = φ(t) and x = ψ(t) equivalent if φ(0) = ψ(0) = x and lim_{t → 0} (φ(t) − ψ(t))/t = 0 in some chart. Then this tangent relationship is true in any chart (prove this!).

Definition. A tangent vector to a manifold M at the point x is an equivalence class of curves φ(t), with φ(0) = x.

It is easy to define the operations of multiplication of a tangent vector by a number and addition of tangent vectors. The set of tangent vectors to M at x forms a vector space TM_x. This space is also called the tangent space to M at x.

For embedded manifolds the definition above agrees with the previous definition. Its advantage lies in the fact that it also holds for abstract manifolds, not embedded anywhere.

Definition. Let U be a chart of an atlas for M with coordinates q₁,..., q_n. Then the components of the tangent vector to the curve q = φ(t) are the numbers ξ₁,..., ξ_n where ξ_i = (dφ_i/dt)|_{t = 0}.

D The tangent bundle

The union of the tangent spaces to M at the various points,

TM_x, has a natural differentiable manifold structure, the dimension of which is twice the dimension of M.

This manifold is called the tangent bundle of M and is denoted by TM. A point of TM is a vector ξ, tangent to M at some point x. Local coordinates on TM are constructed as follows. Let q₁,..., q_n be local coordinates on M, and ξ₁,..., ξ_n components of a tangent vector in this coordinate system. Then the 2n numbers (q₁,..., q_n, ξ₁,..., ξ_n) give a local coordinate system on TM. One sometimes writes dq_i for ξ_i.

The mapping p: TM → M which takes a tangent vector ξ to the point x ∈ M at which the vector is tangent to M (ξ ∈ TM_x), is called the natural projection. The inverse image of a point x ∈ M under the natural projection, p⁻¹(x), is the tangent space TM_x. This space is called the fiber of the tangent bundle over the point x.

E Riemannian manifolds

If M is a manifold embedded in euclidean space, then the metric on euclidean space allows us to measure the lengths of curves, angles between vectors, volumes, etc. All of these quantities are expressed by means of the lengths of tangent vectors, that is, by the positive-definite quadratic form given on every tangent space TM_x (Figure 63):

Figure 63

Riemannian metric

For example, the length of a curve on a manifold is expressed using this form as l(γ) = , or, if the curve is given parametrically, γ: [t₀, t₁] → M, t → x(t) ∈ M, then

Definition. A differentiable manifold with a fixed positive-definite quadratic form 〈ξ, ξ〉 on every tangent space TM_x is called a Riemannian manifold. The quadratic form is called the Riemannian metric.

Remark. Let U be a chart of an atlas for M with coordinates q₁,..., q_n. Then a Riemannian metric is given by the formula

where dq_i are the coordinates of a tangent vector.

The functions a_ij(q) are assumed to be differentiable as many times as necessary.

F The derivative map

Let f: M → N be a mapping of a manifold M to a manifold N. f is called differentiable if in local coordinates on M and N it is given by differentiable functions.

Definition. The derivative of a differentiable mapping f: M → N at a point x ∈ M is the linear map of the tangent spaces

which is given in the following way (Figure 64):

Figure 64

Derivative of a mapping

Let v ∈ TM_x. Consider a curve φ: ℝ → M with φ(0) = x, and velocity vector (dφ/dt)|_{t = 0} = v. Then f_*xv is the velocity vector of the curve f ◦ φ: ℝ → N,

Problem. Show that the vector f_*xv does not depend on the curve φ, but only on the vector v.

Problem. Show that the map f_*_x: TM_x → TN_f(x) is linear.

Problem. Let x = (x₁,..., x_m) be coordinates in a neighborhood of x ∈ M, and y = (y₁,..., y_n) be coordinates in a neighborhood of y ∈ N. Let ξ be the set of components of the vector v, and η the set of components of the vector f_*xv. Show that

Taking the union of the mappings f_*_x for all x, we get a mapping of the whole tangent bundle

Problem. Show that f_* is a differentiable map.

Problem. Let f: M → N, g: N → K, and h = g ◦ f: M → K. Show that h_* = g_* ◦ f_*.

19 Lagrangian dynamical systems

In this paragraph we define lagrangian dynamical systems on manifolds. Systems with holonomic constraints are a particular case.

A Definition of a lagrangian system

Let M be a differentiable manifold, TM its tangent bundle, and L: TM → ℝ a differentiable function. A map γ: ℝ → M is called a motion in the lagrangian system with configuration manifold M and lagrangian function L if γ is an extremal of the functional

where

is the velocity vector

Example. Let M be a region in a coordinate space with coordinates q = (q₁,...,q_n). The lagrangian function L: TM → ℝ may be written in the form of a function of the 2n coordinates. As we showed in Section 12, the evolution of coordinates of a point moving with time satisfies Lagrange’s equations.

Theorem. The evolution of the local coordinates q = (q₁,..., q_n) of a point γ(t) under motion in a lagrangian system on a manifold satisfies the Lagrange equations

where

is the expression for the function L: TM → ℝ in the coordinates q and

on TM.

We often encounter the following special case.

B Natural systems

Let M be a Riemannian manifold. The quadratic form on each tangent space,

is called the kinetic energy. A differentiable function U: M → ℝ is called a potential energy.

Definition. A lagrangian system on a Riemannian manifold is called natural if the lagrangian function is equal to the difference between kinetic and potential energies: L = T − U.

Example. Consider two mass points m₁ and m₂ joined by a line segment of length l in the (x, y)-plane. Then a configuration space of three dimensions
is defined in the four-dimensional configuration space ℝ² × ℝ² of two free points (x₁, y₁) and (x₂, y₂) by the condition (Figure 65).

Figure 65
Segment in the plane

There is a quadratic form on the tangent space to the four-dimensional space (x₁, x₂, y₁, y₂):

Our three-dimensional manifold, as it is embedded in the four-dimensional one, is provided with a Riemannian metric. The holonomic system thus obtained is called in mechanics a line segment of fixed length in the (x, y)-plane. The kinetic energy is given by the formula

C Systems with holonomic constraints

In Section 17 we defined the notion of a system of point masses with holonomic constraints. We will now show that such a system is natural.

Consider the configuration manifold M of a system with constraints as embedded in the 3n-dimensional configuration space of a system of free points. The metric on the 3n-dimensional space is given by the quadratic form

. The embedded Riemannian manifold M with potential energy U coincides with the system defined in Section 17 or with the limiting case of the system with potential

, which grows rapidly outside of M.

D Procedure for solving problems with constraints

Determine the configuration manifold and introduce coordinates q₁,..., q_k (in a neighborhood of each of its points).

Express the kinetic energy

as a quadratic form in the generalized velocities

Construct the lagrangian function L = T − U(q) and solve Lagrange’s equations.

Example. We consider the motion of a point mass of mass 1 on a surface of revolution in three-dimensional space. It can be shown that the orbits are geodesics on the surface. In cylindrical coordinates r, φ, z the surface is given (locally) in the form r = r(z) or z = z(r). The kinetic energy has the form (Figure 66)
in coordinates φ and z, and
in coordinates r and φ. (We have used the identity .)

Figure 66
Surface of revolution

The lagrangian function L is equal to T. In both coordinate systems φ is a cyclic coordinate. The corresponding momentum is preserved; is nothing other than the z-component of angular momentum. Since the system has two degrees of freedom, knowing the cyclic coordinate φ is sufficient for integrating the problem completely (cf. Corollary 3, Section 15).

We can obtain more easily a clear picture of the orbits by reasoning slightly differently. Denote by α the angle of the orbit with a meridian. We have , where |v| is the magnitude of the velocity vector (Figure 66).

By the law of conservation of energy, H = L = T is preserved. Therefore, |v| = const, so the conservation law for p_φ takes the form
(“Clairaut’s theorem”).

This relationship shows that the motion takes place in the region |sin α| ≤ 1, i.e., r ≥ r₀ sin α₀. Furthermore, the inclination of the orbit from the meridian increases as the radius r decreases. When the radius reaches the smallest possible value, r = r₀ sin α₀, the orbit is reflected and returns to the region with larger r (Figure 67).

Figure 67
Geodesics on a surface of revolution

Problem. Show that the geodesics on a convex surface of revolution are divided into three classes: meridians, closed curves, and geodesics dense in a ring r ≥ c.

Problem. Study the behavior of geodesics on the surface of a torus ((r − R)² + z² = ρ²).

E Non-autonomous systems

A lagrangian non-autonomous system differs from the autonomous systems, which we have been studying until now, by the additional dependence of the lagrangian function on time:

In particular, both the kinetic and potential energies can depend on time in a non-autonomous natural system:

A system of n mass points, constrained by holonomic constraints dependent on time, is defined with the help of a time-dependent submanifold of the configuration space of a free system. Such a manifold is given by a mapping

which, for any fixed t ∈ ℝ, defines an embedding M → E³ⁿ. The formula of section D remains true for non-autonomous systems.

Example. Consider the motion of a bead along a vertical circle of radius r (Figure 68) which rotates with angular velocity ω around the vertical axis passing through the center O of the circle. The manifold M is the circle. Let q be the angular coordinate on the circle, measured from the highest point.

Figure 68
Bead on a rotating circle

Let x, y, and z be cartesian coordinates in E³ with origin O and vertical axis z. Let φ be the angle of the plane of the circle with the plane xOz. By hypothesis, φ = ωt. The mapping i: M × ℝ → E³ is given by the formula

From this formula (or, more simply, from an “infinitesimal right triangle”) we find that

In this case the lagrangian function L = T − U turns out to be independent of t, although the constraint does depend on time. Furthermore, the lagrangian function turns out to be the same as in the one-dimensional system with kinetic energy
and with potential energy

The form of the phase portrait depends on the ratio between A and B. For 2B < A (i.e., for a rotation of the circle slow enough that ω²r < g), the lowest position of the bead (q = π) is stable and the characteristics of the motion are generally the same as in the case of a mathematical pendulum (ω = 0).

For 2B > A, i.e., for sufficiently fast rotation of the circle, the lowest position of the bead becomes unstable; on the other hand, two stable positions of the bead appear on the circle, where cos q = −A/2B = −g/ω²r. The behavior of the bead under all possible initial conditions is clear from the shape of the phase curves in the (Figure 69).

Figure 69
Effective potential energy and phase plane of the bead

20 E. Noether’s theorem

Various laws of conservation (of momentum, angular momentum, etc.) are particular cases of one general theorem: to every one-parameter group of diffeomorphisms of the configuration manifold of a lagrangian system which preserves the lagrangian function, there corresponds a first integral of the equations of motion.

A Formulation of the theorem

Let M be a smooth manifold, L: TM → ℝ a smooth function on its tangent bundle TM. Let h: M → M be a smooth map.

Definition. A lagrangian system (M, L) admits the mapping h if for any tangent vector v ∈ TM,

Example. Let . The system admits the translation h: (x₁, x₂, x₂) → (x₁ + s, x_1, x₃) along the x₁ axis and does not admit, generally speaking, translations along the x₂ axis.

Noether’s theorem. If the system (M, L) admits the one-parameter group of diffeomorphisms h^s: M → M, s ∈ ℝ, then the lagrangian system of equations corresponding to L has a first integral I: TM → ℝ.

In local coordinates q on M the integral I is written in the form

B Proof

First, let M = ℝⁿ be coordinate space. Let φ: ℝ → M, q = φ(t) be a solution to Lagrange’s equations. Since

preserves L, the translation of a solution, h^s ◦ φ: ℝ → M also satisfies Lagrange’s equations for any s.³⁸

We consider the mapping Φ: ℝ × ℝ → ℝⁿ, given by q = Φ(s, t) = h^s(φ(t)) (Figure 70).

Figure 70

Noether’s theorem

We will denote derivatives with respect to t by dots and with respect to s by primes. By hypothesis

(1)

where the partial derivatives of L are taken at the point

As we stated above, the mapping Φ|_s = _const: ℝ → ℝⁿ for any fixed s satisfies Lagrange’s equation

We introduce the notation

and substitute ∂F/∂t for ∂L/∂q in (1).

Writing

as dq′/dt, we get

□

Remark. The first integral

is defined above using local coordinates q. It turns out that the value of I(v) does not depend on the choice of coordinate system q.

In fact, I is the rate of change of L(v) when the vector v ∈ TM_x varies inside TM_x with velocity (d/ds)|_{s = 0}h^sx. Therefore, I(v) is well defined as a function of the tangent vector v ∈ TM_x. Noether’s theorem is proved in the same way when M is a manifold.

C Examples

Example 1. Consider a system of point masses with masses m_i:

constrained by the conditions f_j(x) = 0. We assume that the system admits translations along the e₁ axis:

In other words, the constraints admit motions of the system as a whole along the e₁ axis, and the potential energy does not change under these.

By Noether’s theorem we conclude: If a system admits translations along the e₁ axis, then the projection of its center of mass on the e₁ axis moves linearly and uniformly.

In fact, (d/ds)|_{s = 0}h^sx_i = e₁. According to the remark at the end of B, the quantity

is preserved, i.e., the first component P₁ of the momentum vector is preserved. We showed this earlier for a system without constraints.

Example 2. If a system admits rotations around the e₁ axis, then the angular momentum with respect to this axis,

is conserved.

It is easy to verify that if h^s is rotation around the e₁ axis by the angle s, then (d/ds)|_{s = 0}h^sx_i = [e₁, x_i], from which it follows that

Problem 1. Suppose that a particle moves in the field of the uniform helical line x = cos φ, y = sin φ, z = cφ. Find the law of conservation corresponding to this helical symmetry.

Answer. In any system which admits helical motions leaving our helical line fixed, the quantity I = cP₃ + M₃ is conserved.

Problem 2. Suppose that a rigid body is moving under its own inertia. Show that its center of mass moves linearly and uniformly. If the center of mass is at rest, then the angular momentum with respect to it is conserved.

Problem 3. What quantity is conserved under the motion of a heavy rigid body if it is fixed at some point O? What if, in addition, the body is symmetric with respect to an axis passing through O?

Problem 4. Extend Noether’s theorem to non-autonomous lagrangian systems.

Hint. Let M₁ = M × ℝ be the extended configuration space (the direct product of the configuration manifold M with the time axis ℝ).

Define a function L₁: TM₁ by
i.e., in local coordinates q, t on M₁ we define it by the formula

We apply Noether’s theorem to the lagrangian system (M₁, L₁).

If L₁ admits the transformations h^s: M₁ → M₁, we obtain a first integral I₁: TM₁ → ℝ. Since ∫ L dt = ∫L₁ dτ. this reduces to a first integral I: TM × ℝ → ℝ of the original system. If, in local coordinates (q, t) on M₁ we have I₁ = I₁(q, t, dq/dτ, dt/dτ), then .

In particular, if L does not depend on time, L₁ admits translations along time, h^s(q, t) = (q, t + s). The corresponding first integral I is the energy integral.

21 D’Alembert’s principle

We give here a new definition of a system of point masses with holonomic constraints and prove its equivalence to the definition given in Section 17.

A Example

Consider the holonomic system (M, L), where M is a surface in three-dimensional space {x}:

In mechanical terms, “the mass point x of mass m must remain on the smooth surface M.”

Consider a motion of the point, x(t). If Newton’s equations mẍ + (∂U/∂x) = 0 were satisfied, then in the absence of external forces (U = 0) the trajectory would be a straight line and could not lie on the surface M.

From the point of view of Newton, this indicates the presence of a new force “forcing the point to stay on the surface.”

Definition. The quantity

is called the constraint force (Figure 71).

Figure 71

Constraint force

If we take the constraint force R(t) into account, Newton’s equations are obviously satisfied:

The physical meaning of the constraint force becomes clear if we consider our system with constraints as the limit of systems with potential energy U + NU₁ as N → ∞, where U₁(x) = ρ²(x, M). For large N the constraint potential NU₁ produces a rapidly changing force F = −N ∂U₁/∂x; when we pass to the limit (N → ∞) the average value of the force F under oscillations of x near M is R. The force F is perpendicular to M. Therefore, the constraint force R is perpendicular to M: (R, ξ) = 0 for every tangent vector ξ.

B Formulation of the D’Alembert-Lagrange principle

In mechanics, tangent vectors to the configuration manifold are called virtual variations. The D’Alembert-Lagrange principle states:

for any virtual variation ξ, or stated differently, the work of the constraint force on any virtual variation is zero.

For a system of points x_i with masses m_i the constraint forces R_i are defined by R_i = m_iẍ_i + (∂U/∂x_i), and D’Alembert’s principle has the form ∑(R_iξ_i) = 0, or ∑((m_iẍ_i + ∂U/∂x_i, ξ_i) = 0, i.e., the sum of the works of the constraint forces on any virtual variation {ξ_i} ∈ TM_x is zero.

Constraints with the property described above are called ideal.

If we define a system with holonomic constraints as a limit as N → ∞ then the D’Alembert-Lagrange principle becomes a theorem: its proof is sketched above for the simplest case.

It is possible, however, to define an ideal holonomic constraint using the D’Alembert-Lagrange principle. In this way we have three definitions of holonomic systems with constraints:

1.

The limit of systems with potential energies U + NU₁ as N → ∞.

2.

A holonomic system (M, L), where M is a smooth submanifold of the configuration space of a system without constraints and L is the lagrangian.

3.

A system which complies with the D’Alembert-Lagrange principle.

All three definitions are mathematically equivalent.

The proof of the implications (1) ⟹ (2) and (1) ⟹ (3) is sketched above and will not be given in further detail. We will now show that (2) ⟺ (3).

C The equivalence of the D’Alembert-Lagrange principle and the variational principle

Let M be a submanifold of euclidean space, M ⊂ ℝ^N, and x: ℝ→ M a curve, with x(t₀) = x₀, x(t₁) = x₁.

Definition. The curve x is called a conditional extremal of the action functional

if the differential δΦ is equal to zero under the condition that the variation consists of nearby curves³⁹ joining x₀ to x₁ in M.

We will write

(1)

Clearly, Equation (1) is equivalent to the Lagrange equations

in some local coordinate system q on M.

Theorem. A curve x: ℝ → M ⊂ ℝ^N is a conditional extremal of the action (i.e., satisfies Equation (1)) if and only if it satisfies D’Alembert’s equation

(2)

Lemma. Let f: {t: t₀ ≤ t ≤ t₁} → ℝ^N be a continuous vector field. If, for every continuous tangent vector field ξ, tangent to M along x (i.e., ξ(t) ∈ TM_x(t) with ξ(t) = 0 for t = t₀, t₁), we have

then the field f(t) is perpendicular to M at every point x(t) (i.e., (f(t), h) = 0 for every vector h ∈ TM_x(t) (Figure 72).

Figure 72

Lemma about the normal field

The proof of the lemma repeats the argument which we used to derive the Euler-Lagrange equations in Section 12.

Proof of the theorem. We compare the value of Φ on the two curves x(t) and x(t) + ξ(t), where ξ(t₀) = ξ(t₁) = 0. Integrating by parts, we obtain

It is obvious from this formula⁴⁰ that Equation (1), δ_MΦ = 0, is equivalent to the collection of equations

(3)

for all tangent vector fields ξ(t) ∈ TM_x(t) with ξ(t₀) = ξ(t₁) = 0. By the lemma (where we must set f = ẍ + (∂U/∂x)) the collection of equations (3) is equivalent to the D’Alembert-Lagrange equation (2). □

D Remarks

Remark 1. We derive the D’Alembert-Lagrange principle for a system of n points x_i ∈ ℝ³, i = 1,..., n, with masses m_i, with holonomic constraints, from the above theorem.

In the coordinates

, the kinetic energy takes the form

By the theorem, the extremals of the principle of least action satisfy the condition

(the D’Alembert-Lagrange principle for points in ℝ³ⁿ: the 3n-dimensional reaction force is orthogonal to the manifold M in the metric T). Returning to the coordinates x_i, we get

i.e., the D’Alembert-Lagrange principle in the form indicated earlier: the sum of the work of the reaction forces on virtual variations is zero.

Remark 2. The D’Alembert-Lagrange principle can be given in a slightly different form if we turn to statics. An equilibrium position is a point x₀ which is the orbit of a motion: x(t) = x₀.

Suppose that a point mass moves along a smooth surface M under the influence of the force f = −∂U/∂x.

Theorem. The point x₀ in M is an equilibrium position if and only if the force is orthogonal to the surface at x₀: (f(x₀), ξ) = 0 for all ξ ∈ TM_x₀.

This follows from the D’Alembert-Lagrange equations in view of the fact that ẍ = 0.

Definition. −mẍ is called the force of inertia.

Now the D’Alembert-Lagrange principle takes the form:

Theorem. If the forces of inertia are added to the acting forces, x becomes an equilibrium position.

Proof. D’Alembert’s equation

expresses the fact, as in the preceding theorem, that x is an equilibrium position of a system with forces −mẍ + f. □

Entirely analogous statements are true for systems of points: If x = {x_i} are equilibrium positions, then the sum of the work of the forces acting on the virtual variations is equal to zero. If the forces of inertia −m_iẍ_i(t) are added to the acting forces, then the position x(t) becomes an equilibrium position.

Now a problem about motions can be reduced to a problem about equilibrium under actions of other forces.

Remark 3. Up to now we have not considered cases when the constraints depend on time. All that was said above carries over to such constraints without any changes.

Example. Consider a bead sliding along a rod which is tilted at an angle α to the vertical axis and is rotating uniformly with angular velocity ω around this axis (its weight is negligible). For our coordinate q we take the distance from the point 0 (Figure 73). The kinetic energy and lagrangian are:

Figure 73

Bead on a rotating rod

The constraint force at each moment is orthogonal to virtual variations (i.e., to the direction of the rod), but is not at all orthogonal to the actual trajectory.

Remark 4. It is easy to derive conservation laws from the D’Alembert-Lagrange equations. For example, if translation along the x₁ axis ξ_i = e₁ is among the virtual variations, then the sum of the work of the constraint forces on this variation is equal to zero:

If we now consider constraint forces as external forces, then we notice that the sum of the first components of the external forces is equal to zero. This means that the first component, P₁, of the momentum vector is preserved.

We obtained this same result earlier from Noether’s theorem.

Remark 5. We emphasize once again that the holonomic character of some particular physical constraint or another (to a given degree of exactness) is a question of experiment. From the mathematical point of view, the holonomic character of a constraint is a postulate of physical origin; it can be introduced in various equivalent forms, for example, in the form of the principle of least action (1) or the D’Alembert-Lagrange principle (2), but, when defining the constraints, the term always refers to experimental facts which go beyond Newton’s equations.

Remark 6. Our terminology differs somewhat from that used in mechanics textbooks, where the D’Alembert-Lagrange principle is extended to a wider class of systems (“non-holonomic systems with ideal constraints”). In this book we will not consider non-holonomic systems. We remark only that one example of a non-holonomic system is a sphere rolling on a plane without slipping. In the tangent space at each point of the configuration manifold of a non-holonomic system there is a fixed subspace to which the velocity vector must belong.

Remark 7. If a system consists of mass points connected by rods, hinges, etc., then the need may arise to talk about the constraint force of some particular constraint.

We defined the total “constraint force of all constraints” R_i for every mass point m_i. The concept of a constraint force for an individual constraint is impossible to define, as may be already seen from the simple example of a beam resting on three columns. If we try to define constraint forces of the columns, R₁, R₂, R₃ by passing to a limit (considering the columns as very rigid springs), then we may become convinced that the result depends on the distribution of rigidity.

Problems for students are selected so that this difficulty does not arise.

Problem. A rod of weight P, tilted at an angle of 60° to the plane of a table, begins to fall with initial velocity zero (Figure 74). Find the constraint force of the table at the initial moment, considering the table as (a) absolutely smooth and (b) absolutely rough. (In the first case, the holonomic constraint holds the end of the rod on the plane of the table, and in the second case, at a given point.)

Figure 74
Constraint force on a rod

³⁵

The proof is based on the fact that, due to the conservation of energy, a moving point cannot move further from γ than cN^−½, which approaches zero as N → ∞.

³⁶

By differentiable here we mean r times continuously differentiable; the exact value of r (1 ≤ r ≤ ∞) is immaterial (we may take r = ∞, for example).

³⁷

A manifold is connected if it cannot be divided into two disjoint open subsets.

³⁸

The authors of several textbooks mistakenly assert that the converse is also true, i.e., that if h^s takes solutions to solutions, then

preserves L.

³⁹

Strictly speaking, in order to define a variation δΦ, one must define on the set of curves near x on M the structure of a region in a vector space. This can be done using coordinates on M; however, the property of being a conditional extremal does not depend on the choice of a coordinate system.

⁴⁰

The distance of the points x(t) + ξ(t) from M is small of second-order compared with ξ(t).

5
Oscillations

V. I. Arnold¹

(1)

Department of Mathematics Steklov Mathematical Institute, Russian Academy of Sciences, GSP-1, 117966, Moscow, Russia

Because linear equations are easy to solve and study, the theory of linear oscillations is the most highly developed area of mechanics. In many nonlinear problems, linearization produces a satisfactory approximate solution. Even when this is not the case, the study of the linear part of a problem is often a first step, to be followed by the study of the relation between motions in a nonlinear system and in its linear model.

22 Linearization

We give here the definition of small oscillations.

A Equilibrium positions

Definition. A point x₀ is called an equilibrium position of the system

(1)

if x(t) ≡ x₀ is a solution of this system. In other words, f(x₀) = 0, i.e., the vector field f(x) is zero at x₀.

Example. Consider the natural dynamical system with lagrangian function

, where

and U = U(q):

(2)

Lagrange’s equations can be written in the form of a system of 2n first-order equations of form (1). We will try to find an equilibrium position:

Theorem. The point

will be an equilibrium position if and only if

and q₀ is a critical point of the potential energy, i.e.,

(3)

Proof. We write down Lagrange’s equations

From (2) it is clear that, for

, we will have

. Therefore, q = q₀ is a solution in case (3) holds and only in that case. ☐

B Stability of equilibrium positions

We will now investigate motions with initial conditions close to an equilibrium position.

Theorem. If the point q₀ is a strict local minimum of the potential energy U, then the equilibrium q = q₀ is stable in the sense of Liapunov.

Proof. Let U(q₀) = h. For sufficiently small ε > 0, the connected component of the set {q: U(q) ≤ h + ε} containing q₀ will be an arbitrarily small neighborhood of q₀ (Figure 75). Furthermore, the connected component of the corresponding region in phase space p, q, {p, q: E(p, q) ≤ h + ε}, (where

is the momentum and E = T + U is the total energy) will be an arbitrarily small neighborhood of the point p = 0, q = q₀.

Figure 75

Stable equilibrium position

But the region {p, q: E ≤ h + ε} is invariant with respect to the phase flow by the law of conservation of energy. Therefore, for initial conditions p(0), q(0) close enough to (0, q₀), every phase trajectory (p(t), q(t)) is close to (0, q₀). ☐

Problem. Can an equilibrium position q = q₀, p = 0 be asymptotically stable?

Problem. Show that in an analytic system with one degree of freedom an equilibrium position q₀ which is not a strict local minimum of the potential energy is not stable in the sense of Liapunov. Produce an example of an infinitely differentiable system where this is not true.

Remark. It seems likely that in an analytic system with n degrees of freedom, an equilibrium position which is not a minimum point is unstable; but this has never been proved for n > 2.

C Linearization of a differential equation

We now turn to the general system (1). In studying solutions of (1) which are close to an equilibrium position x₀, we often use a linearization. Assume that x₀ = 0 (the general case is reduced to this one by a translation of the coordinate system). Then the first term of the Taylor series for f is linear:

where the linear operator A is given in coordinates x₁,...,x_n by the matrix a_ij:

Definition. The passage from system (1) to the system

(4)

is called the linearization of (1).

Problem. Show that linearization is a well-defined operation: the operator A does not depend on the coordinate system.

The advantage of the linearized system is that it is linear and therefore easily solved:

Knowing the solution of the linearized system (4), we can say something about solutions of the original system (1). For small enough x, the difference between the linearized and original systems, R₂(x), is small in comparison with x. Therefore, for a long time, the solutions y(t), x(t) of both systems with initial conditions y(0) = x(0) = x₀ remain close. More explicitly, we can easily prove the following:

Theorem. For any T > 0 and for any ε > 0 there is a δ > 0 such that if |x(0)| < δ, then |x(t) − y(t)| < εδ for all t in the interval 0 < t < T.

D Linearization of a lagrangian system

We return again to the lagrangian system (2) and try to linearize it in a neighborhood of the equilibrium position q = q₀. In order to simplify the formulas, we choose a coordinate system so that q₀ = 0.

Theorem. In order to linearize the lagrangian system (2) in a neighborhood of the equilibrium position q = 0, it is sufficient to replace the kinetic energy

by its value at q = 0,

and replace the potential energy U(q) by its quadratic part

Proof. We reduce the lagrangian system to the form (1) by using the canonical variables p and q:

Since p = q = 0 is an equilibrium position, the expansions of the right-hand sides in Taylor series at zero begin with terms that are linear in p and q. Since the right-hand sides are partial derivatives, these linear terms are determined by the quadratic terms H₂ of the expansion for H(p, q). But H₂ is precisely the hamiltonian function of the system with lagrangian L₂ = T₂ − U₂, since, clearly, H₂ = T₂(p) + U₂(q). Therefore, the linearized equations of motion are the equations of motion for the system described in the theorem with L₂ = T₂ − U₂. ☐

Example. We consider the system with one degree of freedom:

Let q = q₀ be a stable equilibrium position: (∂U/∂q)|_{q = q₀} = 0, (∂²U/∂q²)|_{q = q₀} > 0 (Figure 76).

Figure 76

Linearization

As we know from the phase portrait, for initial conditions close to q = q₀, p = 0, the solution is periodic with period τ depending, generally speaking, on the initial conditions. The above two theorems imply

Corollary. The period τ of oscillations close to the equilibrium position q₀ approaches the limit τ₀ = 2π/ω₀, (where

, and a = a(q₀)) as the amplitudes of the oscillations decrease.

Proof. For the linearized system,

(taking q₀ = 0). The solutions to Lagrange’s equation

have period τ₀ = 2π/ω₀:

for any initial amplitude. □

E Small oscillations

Definition. Motions in a linearized system (L₂ = T₂ − U₂) are called small oscillations⁴¹ near an equilibrium q = q₀. In a one-dimensional problem the numbers τ₀ and ω₀ are called the period and the frequency of small oscillations.

Problem. Find the period of small oscillations of a bead of mass 1 on a wire y = U(x) in a gravitational field with g = 1, near an equilibrium position x = x₀ (Figure 77).

Figure 77
Bead on a wire
Solution. We have
Let x₀ be a stable equilibrium position: (∂U/∂x)|_x₀ = 0; (∂²U/∂x²)|_x₀ > 0. Then the frequency of small oscillations, ω, is defined by the formula
since, for the linearized system, .

Problem. Show that not only a small oscillation, but any motion of the bead is equivalent to a motion in some one-dimensional system with lagrangian function .

Hint. Take length along the wire for q.

23 Small oscillations

We show here that a lagrangian system undergoing small oscillations decomposes into a direct product of systems with one degree of freedom.

A A problem about pairs of forms

We will consider in more detail the problem of small oscillations. In other words, we consider a system whose kinetic and potential energies are quadratic forms

(1)

The kinetic energy is a positive-definite form.

In order to integrate Lagrange’s equations, we will make a special choice of coordinates.

As we know from linear algebra, a pair of quadratic forms (Aq, q), (Bq, q), the first of which is positive-definite, can be reduced to principal axes by a linear change of coordinates:⁴²

In addition, the coordinates Q can be chosen so that the form (Aq, q) decomposes into the sum of squares (Q, Q). Let Q be such coordinates; then, since

we have

(2)

The numbers λ_i are called the eigenvalues of the form B with respect to A.

Problem. Show that the eigenvalues of B with respect to A satisfy the characteristic equation

(3)

all the roots of which are, therefore, real (the matrices A and B are symmetric and A > 0).

B Characteristic oscillations

In the coordinates Q the lagrangian system decomposes into n independent equations

(4)

Therefore we have proved:

Theorem. A system performing small oscillations is the direct product of n one-dimensional systems performing small oscillations.

For the one-dimensional systems, there are three possible cases:

Case 1: λ = ω² > 0; the solution is Q = C₁ cos ωt + C₂ sin ωt (oscillation)

Case 2: λ = 0; the solution is Q = C₁ + C₂t (neutral equilibrium)

Case 3: λ = −k² < 0; the solution is Q = C₁ cosh kt + C₂ sinh kt (instability)

Corollary. Suppose one of the eigenvalues of (3) is positive: λ = ω² > 0. Then system (1) can perform a small oscillation of the form

(5)

where ξ is an eigenvector corresponding to λ (Figure 78):

Figure 78

Characteristic oscillation

This oscillation is the product of the one-dimensional motion Q_i = C₁ cos ω_it + C₂ sin ω_it and the trivial motion Q_j = 0 (j ≠ i).

Definition. The periodic motion (5) is called a characteristic oscillation of system (1), and the number ω is called the characteristic frequency.

Remark. Characteristic oscillations are also called principal oscillations or normal modes. A nonpositive λ also has eigenvectors; we will also call the corresponding motions “characteristic oscillations,” although they are not periodic; the corresponding “characteristic frequencies” are imaginary.

Problem. Show that the number of independent real characteristic oscillations is equal to the dimension of the largest positive-definite subspace for the potential energy

Now the result may be formulated as follows:

Theorem. The system (1) has n characteristic oscillations, the directions of which are pairwise orthogonal with respect to the scalar product given by the kinetic energy A.

Proof. The coordinate system Q is orthogonal with respect to the scalar product (Aq, q) by (2). □

C Decomposition into characteristic oscillations

It follows from the above theorem that:

Corollary. Every small oscillation is a sum of characteristic oscillations.

A sum of characteristic oscillations is generally not periodic (remember the Lissajous figures!).

To decompose a motion into a sum of characteristic oscillations, it is sufficient to project the initial conditions

onto the characteristic directions ξ_i and solve the corresponding one-dimensional problems (4).

Therefore, the Lagrange equations for system (1) can be solved in the following way. We first look for characteristic oscillations of the form q = e^iωtξ. Substituting these into Lagrange’s equations

we find

From the characteristic equation (3) we find n eigenvalues

. To these there correspond n pairwise orthogonal eigenvectors ξ_k. A general solution in the case λ ≠ 0 has the form

Remark. This result is also true when some of the λ are multiple eigenvalues.

Thus, in a lagrangian system, as opposed to a general system of linear differential equations, resonance terms of the form t sin ωt, etc. do not arise, even in the case of multiple eigenvalues.

D Examples

Example 1. Consider the system of two identical mathematical pendulums of length l₁ = l₂ = 1 and mass m₁ = m₂ = 1 in a gravitational field with g = 1. Suppose that the pendulums are connected by a weightless spring whose length is equal to the distance between the points of suspension (Figure 79). Denote by q₁ and q₂ the angles of inclination of the pendulums. Then

Figure 79
Identical connected pendulums
for small oscillations, and , where is the potential energy of the elasticity of the spring. Set
Then
and both forms are reduced to principal axes:
where ω₁ = 1 and (Figure 80). So the two characteristic oscillations are as follows (Figure 81):

1.

Q₂ = 0, i.e., q₁ = q₂; both pendulums move in phase with the original frequency 1, and the spring has no effect;

2.

Q₁ = 0, i.e., q₁ = −q₂: the pendulums move in opposite phase with increased frequency ω₂ > 1 due to the action of the spring.

Figure 80
Configuration space of the connected pendulums

Figure 81
Characteristic oscillations of the connected pendulums

Now let the spring be very weak: . Then an interesting effect called exchange of energy occurs.

Example 2. Suppose that the pendulums are at rest at the initial moment, and one of them is given velocity . We will show that after some time T the first pendulum will be almost stationary, and all the energy will have gone to the second.

It follows from the initial conditions that Q₁(0) = Q₂(0) = 0. Therefore, Q₁ = c₁ sin t, and Q₂ = c₂ sin ωt with . But . Therefore, and , and our solution has the form
or, disregarding the term v(1 − (1/ω))sin ωt, which is small since α is,

The quantity ε ≈ α/2 is small, since α is; therefore q₁ undergoes an oscillation of frequency ω′ ≈ 1 with slowly changing amplitude v cos εt (Figure 82).

Figure 82
Beats: trajectories in the configuration space

After time T = π/2ε ≈ π/α, essentially only the second pendulum will be oscillating; after 2T, again only the first, etc. (“beats”) (Figure 83).

Figure 83
Beats

Example 3. We investigate the characteristic oscillations of two different pendulums (m₁ ≠ m₂, l₁ ≠ l₂, g = 1), connected by a spring with energy (Figure 84). How do the characteristic frequencies behave as α → 0 or as α → ∞?

Figure 84
Connected pendulums

We have
Therefore (Figure 85),

Figure 85
Potential energy of strongly connected pendulums
and the characteristic equation has the form
or
where

This is the equation of a hyperbola in the (α, λ)-plane (Figure 86). As α → 0 (weak spring) the frequencies approach the frequencies of free pendulums ; as α → ∞, one of the

Figure 86
Dependence of characteristic frequencies on the stiffness of the spring
frequencies tends to ∞, while the other approaches the characteristic frequency ω_∞ of a pendulum with two masses on one rod (Figure 87):

Figure 87
Limiting case of pendulums connected by an infinitely stiff spring

Problem. Investigate the characteristic oscillations of a planar double pendulum (Figure 88).

Figure 88
Double pendulum

Problem. Find the shape of the trajectories of the small oscillations of a point mass on the plane, sitting inside an equilateral triangle and connected by identical springs to the vertices (Figure 89).

Figure 89
System with an infinite set of characteristic oscillations

Solution. Under rotation by 120° the system is mapped onto itself. Consequently, all directions are characteristic, and both characteristic frequencies are the same: . Therefore, the trajectories are ellipses (cf. Figure 20).

24 Behavior of characteristic frequencies

We prove here the Rayleigh-Courant-Fisher theorem on the behavior of characteristic frequencies of a system under increases in rigidity and under imposed constraints.

A Behavior of characteristic frequencies under a change in rigidity

Consider a system performing small oscillations, with kinetic and potential energies

Definition. A system with the same kinetic energy, and a new potential energy U′, is called more rigid if

for all q.

We wish to understand how the characteristic frequencies change under an increase in the rigidity of a system.

Problem. Discuss the one-dimensional case.

Theorem 1. Under an increase in rigidity, all the characteristic frequencies are increased, i.e., if ω₁ ≤ ω₂ ≤ ⋯ ≤ ω_n are the characteristic frequencies of the less rigid system, and

are the characteristic frequencies of the more rigid system, then

This theorem has a simple geometric meaning. Without loss of generality we may assume that A = E, i.e., that we are considering the euclidean structure given by the kinetic energy

. To each system we associate the ellipsoids E: (Bq, q) = 1 and E′: (B′q, q) = 1.

It is clear that

Lemma 1. If the system U′ is more rigid than U, then the corresponding ellipsoid E′ lies inside E.

It is also clear that

Lemma 2. The major semi-axes of the ellipsoid are the inverses of the characteristic frequencies ω_i: ω_i = 1/a_i.

Therefore, Theorem 1 is equivalent to the following geometric proposition (Figure 90).

Figure 90

The semi-axes of the inside ellipse are smaller.

Theorem 2. If the ellipsoid E with semi-axes a₁ ≥ a₂ ≥ ⋯ ≥ a_n contains the ellipsoid E′ with semi-axes

, both ellipses having the same center, then the semi-axes of the inside ellipsoid are smaller:

Example. Under an increase in the rigidity α of the spring connecting the pendulums of Example 3, Section 23, the potential energy grows, and by Theorem 1, the characteristic frequencies grow: dω_i/dα > 0.

Now consider the case when the rigidity of the spring approaches infinity, α → ∞. Then in the limit the pendulums are rigidly connected and we get a system with one degree of freedom; the limiting characteristic frequency ω_∞ satisfies ω₁ < ω_∞ < ω₂.

B Behavior of characteristic frequencies under the imposition of a constraint

We return to a general system with n degrees of freedom, and let

and

be the kinetic and potential energies of a system performing small oscillations.

Let ℝ^{n − 1} ⊂ ℝⁿ be an (n − 1)-dimensional subspace in ℝⁿ (Figure 91). Consider the system with n − 1 degrees of freedom (q ∈ ℝ^{n − 1}) whose kinetic and potential energies are the restrictions of T and U to ℝ^{n − 1}. We say that this system is obtained from the original by imposition of a linear constraint.

Figure 91

Linear constraint

Let ω₁ ≤ ω₂ ≤ ⋯ ≤ ω_n be the n characteristic frequencies of the original system, and

the (n − 1) characteristic frequencies of the system with a constraint.

Theorem 3. The characteristic frequencies of the system with a constraint separate the characteristic frequencies of the original system (Figure 92):

Figure 92

Separation of frequencies

By Lemma 2 this theorem is equivalent to the following geometric proposition.

Theorem 4. Consider the cross-section of the n-dimensional ellipsoid E = {q: (Bq, q) = 1} with semi-axes a₁ ≥ a₂ ≥ ⋯ ≥ a_n by a hyperplane ℝ^{n − 1} through its center. Then the semi-axes of this (n − 1)-dimensional ellipsoid—the cross-section E′—separate the semi-axes of the ellipsoid E′ (Figure 93):

Figure 93

The semi-axes of the intersection separate the semi-axes of the ellipsoid

C Extremal properties of eigenvalues

Theorem 5. The smallest semi-axis of any cross-section of the ellipsoid E with semi-axes a₁ ≥ a₂ ≥ ⋯ ≥ a_n by a subspace ℝ^k is less than or equal to a_k:

(the upper bound is attained on the subspace spanned by the semi-axes a₁ ≥ a₂ ≥ ⋯ ≥ a_k).

Proof.⁴³ Consider the subspace ℝ^{n − k + 1} spanned by the axes a_k ≥ a_{k + 1} ≥ ⋯ ≥ a_n. Its dimension is n − k + 1. Therefore, it intersects ℝ^k. Let x be a point of the intersection lying on the ellipsoid. Then ‖x‖ ≤ a_k, since x ∈ ℝ^{n − k + 1}

Since l ≤ ‖x‖, where l is the length of the smallest semi-axis of the ellipsoid E ⋂ ℝ^k, l must be no larger than a_k. □

Proof of theorem 2. The smallest semi-axis of every k-dimensional section of the inner ellipsoid ℝ^k ⋂ E′ is less than or equal to the smallest semi-axis of ℝ^k ⋂ E. By Theorem 5,

□

Proof of theorem 4. The inequality

follows from Theorem 5, since in the calculation of a_k the maximum is taken over a larger set. To prove the inequality

, we intersect ℝ^{n − 1} with any k + 1-dimensional subspace ℝ^{k + 1}. The intersection has dimension greater than or equal to k. The smallest semi-axis of the ellipsoid E′ ⋂ ℝ^{k + 1} is greater than or equal to the smallest semi-axis of E ⋂ ℝ^{k + 1}. By Theorem 5,

□

Theorems 1 and 3 follow directly from those just proven.

Problem. Show that if we increase the kinetic energy of a system without decreasing the potential energy (for example, we increase the mass on a given spring), then every characteristic frequency decreases.

Problem. Show that under the orthogonal projection of an ellipsoid lying in one subspace of euclidean space onto another subspace, all the semi-axes are decreased.

Problem. Suppose that a quadratic form A(ε) on euclidean space ℝⁿ is a continuously differentiable function of the parameter ε. Show that every characteristic frequency depends differentiably on ε, and find the derivatives.

Answer. Let λ₁,...,λ_k be the eigenvalues of A(0). To every eigenvalue λ_i of multiplicity v_i there corresponds a subspace ℝ^v_i. The derivatives of the eigenvalues of A(ε) at 0 are equal to the eigenvalues of the restricted form B = (dA/dε)|_{ε = 0} on ℝ^v_i.

In particular, if all the eigenvalues of A(0) are simple, then their derivatives are equal to the diagonal elements of the matrix B in the characteristic basis for A(0).

It follows from this problem that when a form is increased, its eigenvalues grow. In this way we obtain new proofs of Theorems 1 and 2.

Problem. How does the pitch of a bell change when a crack appears in the bell?

25 Parametric resonance

If the parameters of a system vary periodically with time, then an equilibrium position can be unstable, even if it is stable for each fixed value of the parameter. This instability is what makes it possible to swing on a swing.

A Dynamical systems whose parameters vary periodically with time

Example 1. A swing: the length of the equivalent mathematical pendulum l(t) varies periodically with time: l(t + T) = l(t) (Figure 94).

Figure 94

Swing

Example 2. A pendulum in a periodically varying gravitational field (for example, the moon) is described by Hill’s equation:

(1)

Example 3. A pendulum suspended from a point which periodically oscillates vertically is also described by an equation of the form (1).

For systems with periodically varying parameters the right-hand side of the equations of motion are periodic functions of t. The equations of motion can be written in the form of a system of first-order ordinary differential equations

(2)

with periodic right-hand sides. For example, Equation (1) can be written as the system

(3)

B The mapping at a period

Recall the general properties of the system (2). We denote by g^t: ℝⁿ → ℝⁿ the mapping taking x ∈ ℝⁿ to the value at time t, g^tx = φ(t), of the solution φ of system (2) with initial conditions φ(0) = x (Figure 95).

Figure 95

Mapping at a period

The mappings g^t do not form a group: in general,

Problem. Show that {g^t} is a group if and only if the right-hand sides f do not depend on t.

Problem. Show that, if T is the period of f, then g^{T + s} = g^s · g^T and, in particular, g^nT = (g^T)ⁿ, so that the mappings g^nT (n an integer) form a group.

The mapping g^T : ℝⁿ → ℝⁿ plays an important role in what is to come; we will call it the mapping at a period and will denote it by

Example. For the systems
which can be considered periodic with any period T, the mapping A is a rotation or a hyperbolic rotation (Figure 96).

Figure 96
Rotation and hyperbolic rotation

Theorem.

The point x₀ is a fixed point of the mapping A (Ax₀ = x₀) if and only if the solution with initial conditions x(0) = x₀ is periodic with period T.

The periodic solution x(t) is Liapunov stable (asymptotically stable) if and only if the fixed point x₀ of the mapping A is Liapunov stable (asymptotically stable).⁴⁴

If the system (2) is linear, i.e., f(x, t) = f(t)x is a linear function of x, then A is linear.

If the system (2) is hamiltonian, then A preserves volume: det A_* = 1.

Proof. Assertions (1) and (2) follow from the relationship g^{T + s} = g^sA. Assertion (3) follows from the fact that a sum of solutions of a linear system is again a solution. Assertion (4) follows from Liouville’s theorem. ☐

We apply the theorem above to the mapping A of the phase plane {(x₁, x₂)} onto itself, corresponding to the equation (1) and the system (3). Since (3) is linear and hamiltonian

, we get:

Corollary. The mapping A is linear, and preserves area (det A = 1). The trivial solution of Equation (1) is stable if and only if the mapping A is stable.

Problem. Show that a rotation of the plane is a stable mapping, and a hyperbolic rotation is unstable.

C Linear mappings of the plane to itself which preserve area

Theorem. Let A be the matrix of a linear mapping of the plane to itself which preserves area (det A = 1). Then the mapping A is stable ιf |tr A| < 2, and unstable if |tr A| > 2 (tr A = a₁₁ + a₂₂).

Proof. Let λ₁ and λ₂ be the eigenvalues of A. They satisfy the characteristic equation λ² − (tr A)λ + 1 = 0 with real coefficients λ₁ + λ₂ = tr A and λ₁ · λ₂ = det A = 1. The roots λ₁ and λ₂ of this real quadratic equation are real for |tr A| > 2 and complex conjugate for |tr A| < 2.

In the first case one of the eigenvalues has absolute value greater than 1, and one has absolute value less than 1; the mapping A is a hyperbolic rotation and is unstable (Figure 97).

Figure 97

Eigenvalues of the mapping A

In the second case the eigenvalues lie on the unit circle (Figure 97):

The mapping A is equivalent to a rotation through angle α (where λ_{1, 2} = e^{± iα}), i.e., it may be reduced to a rotation by means of an appropriate choice of coordinates on the plane. Therefore, it is stable. □

In this way, every question about the stability of the trivial solution of an equation of the form (1) is reduced to computation of the trace of the matrix A. Unfortunately, the calculation of this trace can be done explicitly only in special cases. It is always possible to find the trace approximately by numerically integrating the equation on the interval 0 ≤ t ≤ T. In the important case when ω(t) is close to a constant, some simple general arguments can help.

D Strong stability

Definition. The trivial solution of a hamiltonian linear system is strongly stable if it is stable, and if the trivial solution of every sufficiently close linear hamiltonian system is also stable.⁴⁵

The two theorems above imply:

Corollary. If |tr A| < 2, then the trivial solution is strongly stable.

Proof. If |tr A| < 2, then a mapping A′ corresponding to a sufficiently close system will also have |tr A′| < 2. □

Let us apply this to a system with almost constant (only slightly varying) coefficients. Consider, for example, the equation

(4)

where a(t + 2π) = a(t), e.g., a(t) = cos t (Figure 98) (a pendulum whose frequency oscillates near ω with small amplitude and period 2π).⁴⁶

Figure 98

Instantaneous frequency as a function of time

We will represent each system of the form (4) by a point in the plane of parameters ε, ω > 0. Clearly, the stable systems with |tr A| < 2 form an open set in the (ω, ε)-plane; so do the unstable systems with |tr A| > 2 (Figure 99).

Figure 99

Zones of parametric resonance

The boundary of stability is given by the equation |tr A| = 2.

Theorem. All points on the ω-axis except the integers and half-integers ω = k/2, k = 0, 1, 2,... correspond to strongly stable systems (4).

Thus, the set of unstable systems can approach the ω-axis only at the points ω = k/2. In other words, swinging a swing by small periodic changes of the length is possible only in the case when one period of the change in length is close to a whole number of half-periods of characteristic oscillations—a result well known experimentally.

The proof of the theorem above is based on the fact that for ε = 0, Equation (4) has constant coefficients and is clearly solvable.

Problem. Calculate the matrix of the transformation A after period T = 2π in the basis x, ẋ for system (4) with ε = 0.

Solution. The general solution is:

The solution with initial conditions x = 1, ẋ = 0 is:

The solution with initial conditions x = 0, ẋ = 1 is:

Answer.

Therefore, |tr A| = |2 cos 2ωπ| < 2 if ω ≠ k/2, k = 0, 1,..., and the theorem follows from the preceding corollary.

A more careful analysis⁴⁷ shows that in general (and for a(t) = cos t) the region of instability (shaded in Figure 99) in fact approaches the ω-axis near the points ω = k/2, k = 1, 2,....

Thus, for ω ≈ k/2, k = 1, 2,..., the lowest equilibrium position of the idealized swing (4) is unstable and it swings under an arbitrarily small periodic change of length. This phenomenon is called parametric resonance. A characteristic property of parametric resonance is that it is strongest when the frequency of the variation of the parameter v (in Equation (4), v = 1) is twice the characteristic frequency ω.

Remark. Theoretically, parametric resonance can be observed for the infinite collection of cases ω/v ≈ k/2, k = 1, 2,.... In practice, it is usually observed only when k is small (k = 1, 2, and more rarely, 3). The reason is that:

For large k the region of instability approaches the ω-axis in a very narrow “tongue” and the resonance frequencies ω must satisfy very rigid bounds (∼εθ^k,where θ ∈ (0, 1) depends on the width of the analyticity band for the function a(t) in (4)).

The instability itself is weak for large k, since |tr A| − 2 is small and the eigenvalues are close to 1 for large k.

If there is an arbitrarily small amount of friction, then there is a minimal value ε_k of the amplitude in order for parametric resonance to begin (for ε less than this the oscillation dies out). As k grows, ε_k grows quickly (Figure 100).

Figure 100

Influence of friction on parametric resonance

We also notice that for Equation (4) the size of x grows without bound in the unstable case. In real systems, oscillations attain only finite amplitudes, since for large x the linear equation (4) itself loses influence, and we must consider the nonlinear effects.

Problem. Find the shape of the region of stability in the ε,ω-plane for the system described by the equations

Solution. It follows from the solution of the preceding problem that A = A₂ A₁, where
Therefore, the boundary of the zone of stability has the equation
(5)

Since , we have ω₁/ω₂ = (ω + ε)/(ω − ε) ≈ 1. We introduce the notation

Then, as is easily computed, . Using the relations 2c₁c₂ = cos 2πε + cos 2πω and 2s₁s₂ = cos 2πε − cos 2πω, we rewrite Equation (5) in the form
or
(6a)

(6b)
In the first case cos 2πω ≈ 1. Therefore, we set
We rewrite Equation (6a) in the form
or 2π²a² + O(a⁴) = Δπ²ε² + O(ε⁴).

Substituting in the value Δ = (2ε²/ω²) + O(ε⁴), we find
Equation (6b) is solved analogously; for the result we get

Therefore the answer has the form depicted in Figure 101.

Figure 101
Zones of parametric resonance for f = ω ± ε.

E Stability of an inverted pendulum with vertically oscillating point of suspension

Problem. Can the topmost, usually unstable, equilibrium position of a pendulum become stable if the point of suspension oscillates in the vertical direction (Figure 102)?

Figure 102

Inverted pendulum with oscillating point of suspension

Let the length of the pendulum be l, the amplitude of the oscillation of the point of suspension be

, the period of oscillation of the point of suspension 2τ; and, moreover, in the course of every half-period let the acceleration of the point of suspension be constant and equal to ± c (then c = 8a/τ²). It turns out that for fast enough oscillations of the point of suspension

the topmost equilibrium becomes stable.

Solution. The equation of motion can be written in the form ẋ = (ω² ± d²)x (the sign changes after time τ), where ω² = g/l and d² = c/l. If the oscillation of the suspension is fast enough, then d² > ω² (d² = 8a/lτ²).

As in the previous problem, A = A₂ A₁, where

The stability condition |tr A| < 2 therefore has the form
(7)

We will show that this condition is fulfilled for sufficiently fast oscillations of the point of suspension, i.e., when . We introduce the dimensionless variables ε, μ:
Then

Therefore, for small ε and μ we have the following expansion with error o(ε⁴ + μ⁴):
so the stability condition (7) takes the form
i.e., disregarding the small higher-order terms, . This condition can be rewritten as
where N = 1/2τ is the number of oscillations of the point in one unit of time. For example, if the length of the pendulum l is 20 cm, and the amplitude of the oscillation of the point of suspension a is 1 cm, then

For example, the topmost position is stable if the frequency of oscillation of the point of suspension is greater than 40 per second.

⁴¹

If the equilibrium position is unstable, we will talk about “unstable small oscillations” even though these motions may not have an oscillatory character.

⁴²

If one wants to, one can introduce a euclidean structure by taking the first form as the scalar product, and then reducing the second form to the principal axes by a transformation which is orthogonal with respect to this euclidean structure.

⁴³

It is useful to think of the case n = 3, k = 2.

⁴⁴

A fixed point x₀ of the mapping A is Liapunov stable (respectively, asymptotically stable) if ∀ε > 0, ∃δ > 0 such that if |x − x₀| < δ, then | Aⁿx − Aⁿx₀| < ε for all 0 < n < ∞ (respectively, Aⁿx − Aⁿx₀ → 0 as n → ∞).

⁴⁵

The distance between two linear systems with periodic coefficients, ẋ = B₁(t)x, ẋ = B₂(t)x is defined as the maximum over t of the distance between the operators B₁(t) and B₂(t).

⁴⁶

In the case a(t) = cos t, Equation (4) is called Mathieu’s equation.

⁴⁷

Cf., for example, the problem analyzed below.

6
Rigid bodies

V. I. Arnold¹

(1)

Department of Mathematics Steklov Mathematical Institute, Russian Academy of Sciences, GSP-1, 117966, Moscow, Russia

In this chapter we study in detail some very special mechanical problems. These problems are traditionally included in a course on classical mechanics, first because they were solved by Euler and Lagrange, and also because we live in three-dimensional euclidean space, so that most of the mechanical systems with a finite number of degrees of freedom which we are likely to encounter consist of rigid bodies.

26 Motion in a moving coordinate system

In this paragraph we define angular velocity.

A Moving coordinate systems

We look at a lagrangian system described in coordinates q, t by the lagrangian function

. It will often be useful to shift to a moving coordinate system Q = Q(q, t).

To write the equations of motion in a moving system, it is sufficient to express the lagrangian function in the new coordinates.

Theorem. If the trajectory γ: q = φ(t) of Lagrange’s equations

is written as γ: Q = Φ(t) in the local coordinates Q, t (where Q = Q(q, t)), then the function Φ(t) satisfies Lagrange’s equations

, where

Proof. The trajectory γ is an extremal:

. Therefore,

and Φ(t) satisfies Lagrange’s equations. □

B Motions, rotations, and translational motions

We consider, in particular, the important case where q is the cartesian radius vector of a point relative to an inertial coordinate system k (which we will call stationary), and Q is the cartesian radius vector of the same point relative to a moving coordinate system K.

Definition. Let k and K be oriented euclidean spaces. A motion of K relative to k is a mapping smoothly depending on t:

which preserves the metric and the orientation (Figure 103).

Figure 103

The motion D_t decomposed as the product of a rotation B_t and translation C_t

Definition. A motion D_t is called translation if it takes the origin of K to the origin of k, i.e., if D_t is a linear operator.

Theorem. Every motion D_t can be uniquely written as the composition of a rotation B_t: K → k and a translation C_t: k → k:

where C_tq = q + r(t), (q, r ∈ k).

Proof. We set

. Then B_t0 = 0. □

Definition. A motion D_t is called translational if the mapping B_t: K → k corresponding to it does not depend on t: B_t = B₀ = B, D_tQ = BQ + r(t).

We will call k a stationary coordinate system, K a moving one, and q(t) ∈ k the radius-vector of a point moving relative to the stationary system; if

(1)

(Figure 104), Q(t) is called the radius vector of the point relative to the moving system.

Figure 104

Radius vector of a point with respect to stationary (q) and moving (Q) coordinate systems

Warning. The vector B_tQ(t) ∈ k should not be confused with Q(t) ∈ K—they lie in different spaces!

C Addition of velocities

We will now express the “absolute velocity”

in terms of the relative motion Q(t) and the motion of the coordinate system, D_t. By differentiating with respect to t in formula (1) we find a formula for the addition of velocities

(2)

In order to clarify the meaning of the three terms in (2), we consider the following special cases.

The case of translational motion

In this case Equation (2) gives

. In other words, we have shown

Theorem. If the moving system K has a translational motion relative to k, then the absolute velocity is equal to the sum of the relative velocity and the velocity of the motion of the system K:

(3)

where

is the absolute velocity,

is the relative velocity (distinct from

)

is the velocity of motion of the moving coordinate system.

D Angular velocity

In the case of a rotation of K the relationship between the relative and absolute velocities is not so simple. We first consider the case when our point is at rest in K (i.e.,

) and the coordinate system K rotates (i.e., r = 0). In this case the motion of the point q(t) is called a transferred rotation.

Example. Rotation with fixed angular velocity ω ∈ k. Let U(t): k → k be the rotation of the space k around the ω-axis through the angle |ω|t. Then B(t) = U(t)B(0) is called a uniform rotation of K with angular velocity ω.

Clearly, the velocity of the transferred motion of the point q in this case is given by the formula (Figure 105)

Figure 105

Angular velocity

We now turn to the general case of a rotation of K (r = 0,

Theorem. At every moment of time t, there is a vector ω(t) ∈ k such that the transferred velocity is expressed by the formula

(4)

The vector ω is called the instantaneous angular velocity; clearly, it is defined uniquely by Equation (4).

Corollary. Suppose that a rigid body K rotates around a stationary point 0 of the space k. Then at every moment of time there exists an instantaneous axis of rotation—the straight line in the body passing through 0 such that the velocity of its points at the given moment of time is equal to zero. The velocity of the remaining points is perpendicular to this straight line and is proportional to the distance from it.

The instantaneous axis of rotation in k is given by its vector ω; in K the corresponding vector is denoted by Ω = B⁻¹ω ∈ K; Ω is called the vector of angular velocity in the body.

Example. The angular velocity of the earth is directed from the center to the North Pole; its length is equal to 2π/3600 · 24sec⁻¹ ≈ 7.3 · 10⁻⁵ sec⁻¹.

Proof of the theorem. By (2) we have

Therefore, if we express Q in terms of q, we get

, where A = ḂB^{− 1}: k → k is a linear operator on k.

Lemma 1. The operator A is skew-symmetric: A^t + A = 0.

Proof. Since B: K → k is an orthogonal operator from one euclidean space to another, its transpose is its inverse: B^t = B⁻¹: k → K. By differentiating the relationship BB^t = E with respect to t, we get

□

Lemma 2. Every skew-symmetric operator A on a three-dimensional oriented euclidean space is the operator of vector multiplication by a fixed vector:

Proof. The skew-symmetric operators from ℝ³ to ℝ³ form a linear space. Its dimension is 3, since a skew-symmetric 3 × 3 matrix is determined by its three elements below the diagonal.

The operator of vector multiplication by ω is linear and skew-symmetric. The operators of vector multiplication by all possible vectors ω in three-space form a linear subspace of the space of all skew-symmetric operators. □

The dimension of this subspace is equal to 3. Therefore, the subspace of vector multiplications is the space of all skew-symmetric operators.

Conclusion of the proof of the theorem. By Lemmas 1 and 2,

□

In cartesian coordinates the operator A is given by an antisymmetric matrix; we denote its elements by ±ω_{1, 2, 3}:

In this notation the vector ω = ω₁e₁ + ω₂e₂ + ω₃e₃ will be an eigenvector with eigenvalue 0. By applying A to the vector q = q₁e₁ + q₂e₂ + q₃e₃, we obtain by a direct calculation

E Transferred velocity

The case of purely rotational motion

Suppose now that the system K rotates (r = 0), and that a point in K is moving

. From (2) we find (Figure 106)

Figure 106

Addition of velocities

In other words, we have shown

Theorem. If a moving system K rotates relative to 0 ∈ k, then the absolute velocity is equal to the sum of the relative velocity and the transferred velocity:

where

(5)

Finally, the general case can be reduced to the two cases above, if we consider an auxiliary system K₁ which moves by translation with respect to k and with respect to which K moves by rotating around 0 ∈ K₁. From formula (2) one can see that

where

is the absolute velocity,

is the relative velocity,

is the transferred velocity of rotation,
and

is the velocity of motion of the moving coordinate system.

Problem. Show that the angular velocity of a rigid body does not depend on the choice of origin of the moving system K in the body.

Problem. Show that the most general movement of a rigid body is a helical movement, i.e., the composition of a rotation through angle φ around some axis and a translation by h along it.

Problem. A watch lies on a table. Find the angular velocity of the hands of the watch: (a) relative to the earth, (b) relative to an inertial coordinate system.

Hint. If we are given three coordinate systems k, K₁ and K₂, then the angular velocity of K₂ relative to k is equal to the sum of the angular velocities of K₁ relative to k and of K₂ relative to K₁ since

27 Inertial forces and the Coriolis force

The equations of motion in a non-inertial coordinate system differ from the equations of motion in an inertial system by additional terms called inertial forces. This allows us to detect experimentally the non-inertial nature of a system (for example, the rotation of the earth around its axis).

A Coordinate systems moving by translation

Theorem. In a coordinate system K which moves by translation relative to an inertial system k, the motion of a mechanical system takes place as if the coordinate system were inertial, but on every point of mass m an additional “inertial force” acted:

, where

is the acceleration of the system K.

Proof. If Q = q − r(t), then

. The effect of the translation of the coordinate system is reduced in this way to the appearance of an additional homogeneous force field—mW, where W is the acceleration of the origin. □

Example 1. At the moment of takeoff, a rocket has acceleration directed upward (Figure 107). Thus, the coordinate system K connected to the rocket is not inertial, and an observer inside can detect the existence of a force field mW and measure the inertial force, for example, by means of weighted springs. In this case the inertial force is called overload.*

Figure 107
Overload

EXAMPLE 2. When jumping from a loft, a person has acceleration g, directed downwards. Thus, the sum of the inertial force and the force of gravity is equal to zero; weighted springs show that the weight of any object is equal to zero, so such a state is called weightlessness. In exactly the same way, weightlessness is observed in the free ballistic flight of a satellite since the force of inertia is opposite to the gravitational force of the earth.

Example 3. If the point of suspension of a pendulum moves with acceleration W(t), then the pendulum moves as if the force of gravity g were variable and equal to g − W(t).

B Rotating coordinate systems

Let B_t: K → k be a rotation of the coordinate system K relative to the stationary coordinate system k. We will denote by Q(t) ∈ K the radius vector of a moving point in the moving coordinate system, and by q(t) = B_tQ(t) ∈ k the radius vector in the stationary system. The vector of angular velocity in the moving coordinate system is denoted, as in Section 26, by Ω. We assume that the motion of the point q in k is subject to Newton’s equation

Theorem. Motion in a rotating coordinate system takes place as if three additional inertial forces acted on every moving point Q of mass m:

the inertial force of rotation:

the Coriolis force:

, and

the centrifugal force: m[Ω,[Ω, Q]].

Thus

where

The first of the inertial forces is observed only in nonuniform rotation. The second and third are present even in uniform rotation.

The centrifugal force (Figure 108) is always directed outward from the instantaneous axis of rotation Ω; it has magnitude |Ω|²r, where r is the distance to this axis. This force does not depend on the velocity of the relative motion, and acts even on a body at rest in the coordinate system K.

Figure 108

Centrifugal force of inertia

The Coriolis force depends on the velocity

. In the northern hemisphere of the earth it deflects every body moving along the earth to the right, and every falling body eastward.

Proof of the theorem. We notice that for any vector X ∈ K we have ḂX = B[Ω, X]. In fact, by Section 26, ḂX = [ω, x] = [BΩ, BX]. This is equal to B[Ω, X] since the operator B preserves the metric and orientation, and therefore the vector product.

Since q = BQ we see that

. Differentiating once more, we obtain

□

(We again used the relationship ḂX = B[Ω, X]; this time

We will consider in more detail the effect of the earth’s rotation on laboratory experiments. Since the earth rotates practically uniformly, we can take . The centrifugal force has its largest value at the equator, where it attains Ω²ρ/g ≈ (7.3 × 10⁻⁵)² · 6.4 × 10⁶/9.8 ≈ 3/1000 the weight. Within the limits of a laboratory it changes little, so to observe it one must travel some distance. Thus, within the limits of a laboratory the rotation of the earth appears only in the form of the Coriolis force: in the coordinate system Q associated to the earth, we have, with good accuracy,
(the centrifugal force is taken into account in g).

Example 1. A stone is thrown (without initial velocity) into a 250 m deep mine shaft at the latitude of Leningrad. How far does it deviate from the vertical?

We solve the equation
by the following approach, taking Ω ≪ 1. We set (Figure 109)
where and Q₁ = Q₁(0) + gt²/2. For Q₂, we then get

Figure 109
Displacement of a falling stone by Coriolis force

From this it is apparent that the stone lands about
to the east.

Problem. By how much would the Coriolis force displace a missile fired vertically upwards at Leningrad from falling back onto its launching pad, if the missile rose 1 kilometer?

Example 2 (The Foucault pendulum). Consider small oscillations of an ideal pendulum, taking into account the Coriolis force. Let e_x e_y, and e_z be the axes of a coordinate system associated to the earth, with e_z directed upwards, and e_x and e_y in the horizontal plane (Figure 110). In the approximation of small oscillations, (in comparison with ẋ and ẏ); therefore, the horizontal component of the Coriolis force will be 2mẏΩ_ze_x − 2mẋΩ_ze_y. From this we get the equations of motion

Figure 110
Coordinate system for studying the motion of a Foucault pendulum

If we set x + iy = w, then ẇ = ẋ + iẏ, , and the two equations reduce to one complex equation
We solve it: w = e^λt, λ² + 2iΩ_zλ + ω² = 0, . But . Therefore, , from which it follows, by disregarding , that
or, to the same accuracy,

For Ω_z = 0 we get the usual harmonic oscillations of a spherical pendulum. We see that the effect of the Coriolis force reduces to a rotation of the whole picture with angular velocity −Ω_z, where |Ω| = |Ω| sin λ₀.

In particular, if the initial conditions correspond to a planar motion (y(0) = ẏ(0) = 0), then the plane of oscillation will be rotating with angular velocity −Ω_z with respect to the earth’s coordinate system (Figure 111).

Figure 111
Trajectory of a Foucault pendulum

At a pole, the plane of oscillation makes one turn in a twenty-four-hour day (and is fixed with respect to a coordinate system not rotating with the earth). At the latitude of Moscow (56°) the plane of oscillation turns 0.83 of a rotation in a twenty-four-hour day, i.e., 12.5° in an hour.

Problem. A river flows with velocity 3 km/hr. For what radius of curvature of a river bend is the Coriolis force from the earth’s rotation greater than the centrifugal force determined by the flow of the river?

Answer. The radius of curvature must be least on the order of 10 km for a river of medium width.

The solution of this problem explains why a large river in the northern hemisphere (for example, the Volga in the middle of its course), undermines the base of its right bank, while a river like the Moscow River, with its abrupt bends of small radius, undermines either the left or right (whichever is outward from the bend) bank.

28 Rigid bodies

In this paragraph we define a rigid body and its inertia tensor, inertia ellipsoid, moments of inertia, and axes of inertia.

A The configuration manifold of a rigid body

Definition. A rigid body is a system of point masses, constrained by holonomic relations expressed by the fact that the distance between points is constant:

(1)

Theorem. The configuration manifold of a rigid body is a six-dimensional manifold, namely, ℝ³ × SO(3) (the direct product of a three-dimensional space ℝ³ and the group SO(3) of its rotations), as long as there are three points in the body not in a straight line.

Proof. Let x₁, x₂, and x₃ be three points of the body which do not lie in a straight line. Consider the right-handed orthonormal frame whose first vector is in the direction of x₂ − x₁, and whose second is on the x₃ side in the x₁x₂x₃-plane (Figure 112). It follows from the conditions |x_i − x_j| = r_ij (i = 1, 2, 3), that the positions of all the points of the body are uniquely determined by the positions of x₁, x₂, and x₃, which are given by the position of the frame. Finally, the space of frames in ℝ³ is ℝ³ × SO(3), since every frame is obtained from a fixed one by a rotation and a translation.⁴⁸ □

Figure 112

Configuration manifold of a rigid body

Problem. Find the configuration space of a rigid body, all of whose points lie on a line.

Answer. ℝ³ × S².

Definition. A rigid body with a fixed point O is a system of point masses constrained by the condition x₁ = O in addition to conditions (1).

Clearly, its configuration manifold is the three-dimensional rotation group SO(3).

B Conservation laws

Consider the problem of the motion of a free rigid body under its own inertia, outside of any force field. For an (approximate) example we can use the rolling of a spaceship.

The system admits all translational displacements: they do not change the lagrangian function. By Noether’s theorem there exist three first integrals: the three components of the vector of momentum. Therefore, we have shown

Theorem. Under the free motion of a rigid body, its center of mass moves uniformly and linearly.

Now we can look at an inertial coordinate system in which the center of inertia is stationary. Then we have

Corollary. A free rigid body rotates about its center of mass as if the center of mass were fixed at a stationary point O.

In this way, the problem is reduced to the problem, with three degrees of freedom, of the motion of a rigid body around a fixed point O. We will study this problem in more detail (not necessarily assuming that O is the center of mass of the body).

The lagrangian function admits all rotations around O. By Noether’s theorem there exist three corresponding first integrals: the three components of the vector of angular momentum. The total energy of the system, E = T, is also conserved (here it is equal to the kinetic energy). Therefore, we have shown

Theorem. In the problem of the motion of a rigid body around a stationary point O, in the absence of outside forces, there are four first integrals: M_x, M_y, M_z, and E.

From this theorem we can get qualitative conclusions about the motion without any calculation.

The position and velocity of the body are determined by a point in the six-dimensional manifold TSO(3)—the tangent bundle of the configuration manifold SO(3). The first integrals M_x, M_y, M_z, and E are four functions on TSO(3). One can verify that in the general case (if the body does not have any particular symmetry) these four functions are independent. Therefore, the four equations

define a two-dimensional submanifold V_c in the six-dimensional manifold TSO(3).

This manifold is invariant: if the initial conditions of motion give a point on V_c, then for all time of the motion, the point in TSO(3) corresponding to the position and velocity of the body remains in V_c.

Therefore, V_c admits a tangent vector field (namely, the field of velocities of the motion on TSO(3)); for C₄ > 0 this field cannot have singular points. Furthermore, it is easy to verify that V_c is compact (using E) and orientable (since TSO(3) is orientable).⁴⁹

In topology it is proved that the only connected orientable compact two-dimensional manifolds are the spheres with n handles, n ≥ 0 (Figure 113). Of these, only the torus (n = 1) admits a tangent vector field without singular points. Therefore, the invariant manifold V_c is a two-dimensional torus (or several tori).

Figure 113

Two-dimensional compact connected orientable manifolds

We will see later that one can choose angular coordinates φ₁, φ₂,(mod 2π) on this torus such that a motion represented by a point of V_c is given by the equations

In other words, a rotation of a rigid body is represented by the superposition of two periodic motions with (usually) different periods: if the frequencies ω₁ and ω₂ are non-commensurable, then the body never returns to its original state of motion. The magnitudes of the frequencies ω₁ and ω₂ depend on the initial conditions C.

C The inertia operator⁵⁰

We now go on to the quantitative theory and introduce the following notation. Let k be a stationary coordinate system and K a coordinate system rotating together with the body around the point O: in K the body is at rest.

Every vector in K is carried over to k by an operator B. Corresponding vectors in K and k will be denoted by the same letter; capital for K and lower case for k. So, for example (Figure 114),

q ∈ k is the radius vector of a point in space;

Q ∈ K is its radius vector in the body, q = BQ;

is the velocity vector of a point in space;

V ∈ K is the same vector in the body, v = BV;

ω ∈ k is the angular velocity in space;

Ω ∈ K is the angular velocity in the body, ω = BΩ;

m ∈ k is the angular momentum in space;

M ∈ K is the angular momentum in the body, m = BM.

Figure 114

Radius vector and vectors of velocity, angular velocity and angular momentum of a point of the body in space

Since the operator B: K → k preserves the metric and orientation, it preserves the scalar and vector products.

By definition of angular velocity (Section 26),

By definition of the angular momentum of a point of mass m with respect to O,

Therefore,

Hence, there is a linear operator transforming Ω to M:

This operator still depends on a point of the body (Q) and its mass (m).

Lemma. The operator A is symmetric.

Proof. In view of the relation ([a, b], c) = ([c, a], b) we have, for any X and Y in K,

and the last expression is symmetric in X and Y. ☐

By substituting the vector of angular velocity Ω for X and Y and noticing that [Ω, Q]² = V² = v², we obtain

Corollary. The kinetic energy of a point of a body is a quadratic form with respect to the vector of angular velocity Ω, namely:

The symmetric operator A is called the inertia operator (or tensor) of the point Q.

If a body consists of many points Q_i with masses m_i, then by summing we obtain

Theorem. The angular momentum M of a rigid body with respect to a stationary point O depends linearly on the angular velocity Ω, i.e., there exists a linear operator A: K → K, AΩ = M. The operator A is symmetric.

The kinetic energy of a body is a quadratic form with respect to the angular velocity Ω,

Proof. By definition, the angular momentum of a body is equal to the sum of the angular momenta of its points:

Since by the lemma the inertia operator A_i of every point is symmetric, the operator A is also symmetric. For kinetic energy we obtain, by definition,

☐

D Principal axes

Like every symmetric operator, A has three mutually orthogonal characteristic directions. Let e₁, e₂, and e₃ ∈ K be their unit vectors and I₁, I₂, and I₃ their eigenvalues. In the basis e_i, the inertia operator and the kinetic energy have a particularly simple form:

The axes e_i are called the principal axes of the body at the point O.

Finally, if the numbers I₁, I₂, and I₃ are not all different, then the axes e_i are not uniquely defined. We will further clarify the meaning of the eigenvalues I₁, I₂, and I₃.

Theorem. For a rotation of a rigid body fixed at a point O, with angular velocity Ω = Ωe (Ω = |Ω|) around the e axis, the kinetic energy is equal to

and r_i is the distance of the i-th point to the e axis (Figure 115).

Figure 115

Kinetic energy of a body rotating around an axis

Proof. By definition

; but |v_i| = Ωr_i, so

. ☐

The number I_e depends on the direction e of the axis of rotation Ω in the body.

Definition. I_e is called the moment of inertia of the body with respect to the e axix:

By comparing the two expressions for T we obtain:

Corollary. The eigenvalues I_i of the inertia operator A are the moments of inertia of the body with respect to the principal axes e_i.

E The inertia ellipsoid

In order to study the dependence of the moment of inertia I_e upon the direction of the axis e in a body, we consider the vectors

, where the unit vector e runs over the unit sphere.

Theorem. The vectors

form an ellipsoid in K.

Proof. If

, then the quadratic form

is equal to ½. Therefore, {Ω} is the level set of a positive-definite quadratic form, i.e., an ellipsoid. ☐

One could say that this ellipsoid consists of those angular velocity vectors Ω whose kinetic energy is equal to ½.

Definition. The ellipsoid {Ω: (AΩ, Ω) = 1} is called the inertia ellipsoid of the body at the point 0 (Figure 116).

Figure 116

Ellipsoid of inertia

In terms of the principal axes e_i, the equation of the inertia ellipsoid has the form

Therefore the principal axes of the inertia ellipsoid are directed along the principal axes of the inertia tensor, and their lengths are inversely proportional to

Remark. If a body is stretched out along some axis, then the moment of inertia with respect to this axis is small, and consequently, the inertia ellipsoid is also stretched out along this axis; thus, the inertia ellipsoid may resemble the shape of the body.

If a body has an axis of symmetry of order k passing through O (so that it coincides with itself after rotation by 2π/k around the axis), then the inertia ellipsoid also has the same symmetry with respect to this axis. But a triaxial ellipsoid does not have axes of symmetry of order k > 2. Therefore, every axis of symmetry of a body of order k > 2 is an axis of rotation of the inertia ellipsoid and, therefore, a principal axis.

Example. The inertia ellipsoid of three points of mass m at the vertices of an equilateral triangle with center 0 is an ellipsoid of revolution around an axis normal to the plane of the triangle (Figure 117).

Figure 117
Ellipsoid of inertia of an equilateral triangle

If there are several such axes, then the inertia ellipsoid is a sphere, and any axis is principal.

Problem. Draw the line through the center of a cube such that the sum of the squares of its distances from the vertices of the cube is: (a) largest, (b) smallest.

We now remark that the inertia ellipsoid (or the inertia operator or the moments of inertia I₁, I₂, and I₃) completely determines the rotational characteristics of our body: if we consider two bodies with identical inertia ellipsoids, then for identical initial conditions they will move identically (since they have the same lagrangian function L = T).

Therefore, from the point of view of the dynamics of rotation around 0, the space of all rigid bodies is three-dimensional, however many points compose the body.

We can even consider the “solid rigid body of density ρ(Q),” having in mind the limit as ΔQ → 0 of the sequence of bodies with a finite number of points Q_i with masses ρ(Q_i)ΔQ_i (Figure 118) or, what amounts to the same thing, any body with moments of inertia

where r is the distance from Q to the e axis.

Figure 118

Continuous solid rigid body

Example. Find the principal axes and moments of inertia of the uniform planar plate |x| ≤ a, |y| < b, z = 0 with respect to O.

Solution. Since the plate has three planes of symmetry, the inertia ellipsoid has the same planes of symmetry and, therefore, principal axes x, y, and z. Furthermore,
In the same way
Clearly, I_z = I_x + I_y.

Problem. Show that the moments of inertia of any body satisfy the triangle inequalities
and that equality holds only for a planar body.

Problem. Find the axes and moments of inertia of a homogeneous ellipsoid of mass m with semiaxes a, b, and c relative to the center O.

Hint. First look at the sphere.

Problem. Prove Steiner’s theorem: The moments of inertia of any rigid body relative to two parallel axes, one of which passes through the center of mass, are related by the equation

where m is the mass of the body, r is the distance between the axes, and I₀ is the moment of inertia relative to the axis passing through the center of mass.

Thus the moment of inertia relative to an axis passing through the center of mass is less than the moment of inertia relative to any parallel axis.

Problem. Find the principal axes and moments of inertia of a uniform tetrahedron relative to its vertices.

Problem. Draw the angular momentum vector M for a body with a given inertia ellipsoid rotating with a given angular velocity Ω.

Answer. M is in the direction normal to the inertia ellipsoid at a point on the Ω axis (Figure 119).

Figure 119
Angular velocity, ellipsoid of inertia and angular momentum

Problem. A piece is cut off a rigid body fixed at the stationary point O. How are the principal moments of inertia changed? (Figure 120).

Figure 120
Behavior of moments of inertia as the body becomes smaller

Answer. All three principal moments are decreased.

Hint. Cf. Section 24.

Problem. A small mass ε is added to a rigid body with moments of inertia I₁ > I₂ > I₃ at the point Q = x₁e₁ + x₂e₂ + x₃e₃. Find the change in I₁ and e₁ with error O(ε²).

Solution. The center of mass is displaced by a distance of order ε. Therefore, the moments of inertia of the old body with respect to the parallel axes passing through the old and new centers of mass differ in magnitude of order ε². At the same time, the addition of mass changes the moment of inertia relative to any fixed axis by order ε. Therefore, we can disregard the displacement of the center of mass for calculations with error O(ε²).

Thus, after addition of a small mass the kinetic energy takes the form
where is the kinetic energy of the original body. We look for the eigenvalue I₁(ε) and eigenvector e₁(ε) of the inertia operator in the form of a Taylor series in ε. By equating coefficients of ε in the relation A(ε)e₁(ε) = I₁(ε)e₁(ε), we find that, within error O(ε²):
From the formula for I₁(ε) it is clear that the change in the principal moments of inertia (to the first approximation in ε) is as if neither the center of mass nor the principal axes changed. The formula for e₁(ε) demonstrates how the directions of the principal axes change: the largest principal axis of the inertia ellipsoid approaches the added point, and the smallest recedes from it. Furthermore, the addition of a small mass on one of the principal planes of the inertia ellipsoid rotates the two axes lying in this plane and does not change the direction of the third axis. The appearance of the differences of moments of inertia in the denominator is connected with the fact that the major axes of an ellipsoid of revolution are not defined. If the inertia ellipsoid is nearly an ellipsoid of revolution (i .e., I₁ ≈ I₂) then the addition of a small mass could strongly turn the axes e₁ and e₂ in the plane spanned by them.

29 Euler’s equations. Poinsot’s description of the motion

Here we study the motion of a rigid body around a stationary point in the absence of outside forces and the similar motion of a free rigid body. The motion turns out to have two frequencies.

A Euler’s equations

Consider the motion of a rigid body around a stationary point O. Let M be the angular momentum vector of the body relative to O in the body, Ω the angular velocity vector in the body, and A the inertia operator (AΩ = M); the vectors Ω and M belong to the moving coordinate system K (Section 26). The angular momentum vector of the body relative to O in space, m = BM, is preserved under the motion (Section 28B).

Therefore, the vector M in the body (M ∈ K) must move so that m = B_tM(t) does not change when t changes.

Theorem

(1)

Proof. We apply formula (5), Section 26 for the velocity of the motion of the “point” M(t) ∈ K with respect to the stationary space k. We get

But since the angular momentum m with respect to the space is preserved

. ☐

Relation (1) is called the Euler equations. Since M = AΩ, (1) can be viewed as a differential equation for M (or for Ω). If

are the decompositions of Ω and M with respect to the principal axes at O, then M_i = I_iΩ_i and (1) becomes the system of three equations

(2)

where a₁ = (I₂ − I₃)/I₂I₃, a₂ = (I₃ − I₁)/I₃I₁, and a₃ = (I₁ − I₂)/I₁I₂, or, in the form of a system of three equations for the three components of the angular velocity,

Remark. Suppose that outside forces act on the body, the sum of whose moments with respect to O is equal to n in the stationary coordinate system and N in the moving system (n = BN). Then

and the Euler equations take the form

B Solutions of the Euler equations

Lemma. The Euler equations (2) have two quadratic first integrals

Proof. E is preserved by the law of conservation of energy, and M² by the law of conservation of angular momentum m, since m² = M² = M². ☐

Thus, M lies in the intersection of an ellipsoid and a sphere. In order to study the structure of the curves of intersection we will fix the ellipsoid E > 0 and change the radius M of the sphere (Figure 121).

Figure 121

Trajectories of Euler’s equation on an energy level surface

We assume that I₁ > I₂ > I₃. The semiaxes of the ellipsoid will be

. If the radius M of the sphere is less than the smallest semiaxes or larger than the largest

, then the intersection is empty, and no actual motion corresponds to such values of E and M. If the radius of the sphere is equal to the smallest semiaxes, then the intersection consists of two points. Increasing the radius, so that

, we get two curves around the ends of the smallest semiaxes. In exactly the same way, if the radius of the sphere is equal to the largest semiaxes we get their ends, and if it is a little smaller we get two closed curves close to the ends of the largest semiaxes. Finally, if

, the intersection consists of two circles.

Each of the six ends of the semiaxes of the ellipsoid is a separate trajectory of the Euler equations (2)—a stationary position of the vector M. It corresponds to a fixed value of the vector of angular velocity directed along one of the principal axes e_i; during such a motion, Ω remains collinear with M. Therefore, the vector of angular velocity retains its position ω in space collinear with m: the body simply rotates with fixed angular velocity around the principal axis of inertia e_i, which is stationary in space.

Definition. A motion of a body, under which its angular velocity remains constant (ω = const, Ω = const) is called a stationary rotation.

We have proved:

Theorem. A rigid body fixed at a point O admits a stationary rotation around any of the three principal axes e₁, e₂, and e₃.

If, as we assumed, I₁ > I₂ > I₃, then the right-hand side of the Euler equations does not become 0 anywhere else, i.e., there are no other stationary rotations.

We will now investigate the stability (in the sense of Liapunov) of solutions to the Euler equations.

Theorem. The stationary solutions M = M₁e₁ and M = M₃e₃ of the Euler equations corresponding to the largest and smallest principal axes are stable, while the solution corresponding to the middle axis (M = M₂e₂) is unstable.

Proof. For a small deviation of the initial condition from M₁e₁ or M₃e₃, the trajectory will be a small closed curve, while for a small deviation from M₂e₂ it will be a large one. ☐

Problem. Are stationary rotations of the body around the largest and smallest principal axes Liapunov stable?

Answer. No.

C Poinsot’s description of the motion

It is easy to visualize the motion of the angular momentum and angular velocity vectors in a body (M and Ω)—they are periodic if

In order to see how a body rotates in space, we look at its inertia ellipsoid.

where A: Ω → M is the symmetric operator of inertia of the body fixed at O.

At every moment of time the ellipsoid E occupies a position B_t, E in the stationary space k.

Theorem (Poinsot). The inertia ellipsoid rolls without slipping along a stationary plane perpendicular to the angular momentum vector m (Figure 122).

Figure 122

Rolling of the ellipsoid of inertia on the invariable plane

Proof. Consider a plane π perpendicular to the momentum vector m and tangent to the inertia ellipsoid B_tE. There are two such planes, and at the point of tangency the normal to the ellipsoid is parallel to m.

But the inertia ellipsoid E has normal grad(AΩ, Ω) = 2AΩ = 2M at the point Ω. Therefore, at the points

of the ω axis, the normal to B_tE is collinear with m.

So the plane π is tangent to B_tE at the points ±ξ on the instantaneous axis of rotation. But the scalar product of ξ with the stationary vector m is equal to

, and is therefore constant. So the distance of the plane π from O does not change, i.e., π is stationary.

Since the point of tangency lies on the instantaneous axis of rotation, its velocity is equal to zero. This implies that the ellipsoid B_tE rolls without slipping along π. ☐

Translator’s remark: The plane π is sometimes called the invariable plane.

Corollary. Under initial conditions close to a stationary rotation around the large (or small) axis of inertia, the angular velocity always remains close to its initial position, not only in the body (Ω) but also in space (ω).

We now consider the trajectory of the point of tangency in the stationary plane π. When the point of tangency makes an entire revolution on the ellipsoid, the initial conditions are repeated except that the body has turned through some angle α around the m axis. The second revolution will be exactly like the first; if α = 2π(p/q), the motion is completely periodic; if the angle is not commensurable with 2π, the body will never return to its initial state.

In this case the trajectory of the point of tangency is dense in an annulus with center O′ in the plane (Figure 123).

Figure 123

Trajectory of the point of contact on the invariable plane

Problem. Show that the connected components of the invariant two-dimensional manifold V_c (Section 28B) in the six-dimensional space TSO(3) are tori, and that one can choose coordinates φ₁ and φ₂ mod 2π on them so that

and

Hint. Take the phase of the periodic variation of M as φ₁.

We now look at the important special case when the inertia ellipsoid is an ellipsoid of revolution:

In this case the axis of the ellipsoid B_te₁, the instantaneous axis of rotation ω, and the vector m always lie in one plane. The angles between them and the length of the vector ω are preserved; the axes of rotation (ω) and symmetry (B_te₁) sweep out cones around the angular momentum vector m with the same angular velocity (Figure 124). This motion around m is called precession.

Figure 124

Rolling of an ellipsoid of revolution on the invariable plane

Problem. Find the angular velocity of precession.

Answer. Decompose the angular velocity vector ω into components in the directions of the angular momentum vector m and the axis of the body B_te₁. The first component gives the angular velocity of precession, ω_pr = M/I₂.

Hint. Represent the motion of the body as the product of a rotation around the axis of momentum and a subsequent rotation around the axis of the body. The sum of the angular velocity vectors of these rotations is equal to the angular velocity vector of the product.

Remark. In the absence of outside forces, a rigid body fixed at a point O is represented by a lagrangian system whose configuration space is a group, namely SO(3), and the lagrangian function is invariant under left translations. One can show that a significant part of Euler’s theory of rigid body motion uses only this property and therefore holds for an arbitrary left-invariant lagrangian system on an arbitrary Lie group. In particular, by applying this theory to the group of volume-preserving diffeomorphisms of a domain D in a riemannian manifold, one can obtain the basic theorems of the hydrodynamics of an ideal fluid. (See Appendix 2.)

30 Lagrange’s top

We consider here the motion of an axially symmetric rigid body fixed at a stationary point in a uniform force field. This motion is composed of three periodic processes: rotation, precession, and nutation.

A Euler angles

Consider a rigid body fixed at a stationary point O and subject to the action of the gravitational force mg. The problem of the motion of such a “heavy rigid body” has not yet been solved in the general case and in some sense is unsolvable.

In this problem with three degrees of freedom, only two first integrals are known: the total energy E = T + U, and the projection M_z of the angular momentum on the vertical. There is an important special case in which the problem can be completely solved—the case of a symmetric top. A symmetric or lagrangian top is a rigid body fixed at a stationary point O whose inertia ellipsoid at O is an ellipsoid of revolution and whose center of gravity lies on the axis of symmetry e₃ (Figure 125). In this case, a rotation

Figure 125

Lagrangian top

around the e₃ axis does not change the lagrangian function, and by Noether’s theorem there must exist a first integral in addition to E and M_z (as we will see, it turns out to be the projection M₃ of the angular momentum vector on the e₃ axis).

If we can introduce three coordinates so that the angles of rotation around the z axis and around the axis of the top are among them, then these coordinates ordinates will be cyclic, and the problem with three degrees of freedom will reduce to a problem with one degree of freedom (for the third coordinate).

Such a choice of coordinates on the configuration space SO(3) is possible; these coordinates φ, ψ, θ are called the Euler angles and form a local coordinate system in SO(3) similar to geographical coordinates on the sphere: they exclude the poles and are multiple-valued on one meridian.

We introduce the following notation (Figure 126):

Figure 126

Euler angles

e_x, e_y, and e_z	are the unit vectors of a right-handed cartesian stationary coordinate system at the stationary point O;
e₁, e₂, and e₃	are the unit vectors of a right moving coordinate system connected to the body, directed along the principal axes at O;
I₁ = I₂ ≠ I₃ e_N	are the moments of inertia of the body at O; is the unit vector of the axis [e_z, e₃], called the “line of nodes” (all vectors are in the “stationary space” k).

In order to carry the stationary frame (e_x, e_y, e_z) into the moving frame (e₁, e₂, e₃), we must perform three rotations:

Through an angle φ around the e_z axis. Under this rotation, e_z remains fixed, and e_x goes to e_N.

Through an angle θ around the e_N axis. Under this rotation, e_z goes to e₃, and e_N remains fixed.

Through an angle ψ around the e₃ axis. Under this rotation, e_N goes to e₁, and e₃ stays fixed.

After all three rotations, e_x has gone to e₁, and e_z to e₃; therefore, e_y, goes to e₂.

The angles φ, ψ, and θ are called the Euler angles. It is easy to prove:

Theorem. To every triple of numbers φ, θ, ψ the construction above associates a rotation of three-dimensional space, B(φ, θ, ψ) ∈ SO(3), taking the frame (e_x, e_y, e_z) into the frame (e₁, e₂, e₃). In addition, the mapping (φ, θ, ψ) → B(φ, θ, ψ) gives local coordinates

on SO(3), the configuration space of the top. Like geographical longitude, φ and ψ can be considered as angles mod 2π; for θ = 0 or θ = π the map (φ, θ, ψ) → B has a pole-type singularity.

B Calculation of the lagrangian function

We will express the lagrangian function in terms of the coordinates φ, θ, ψ and their derivatives.

The potential energy, clearly, is equal to

where z₀ is the height of the center of gravity above 0 (Figure 125).

We now calculate the kinetic energy. A small trick is useful here: we consider the particular case when φ = ψ = 0.

Lemma. The angular velocity of a top is expressed in terms of the derivatives of the Euler angles by the formula

if φ = ψ = 0.

Proof. We look at the velocity of a point of the top occupying the position r at time t. After time dt this point takes the position (within (dt)²)

where

Consequently, to the same accuracy the displacement vector is the sum of the three terms

(the angular velocities ω_φ, ω_θ, and ω_ψ are defined by these formulas).

Therefore, the velocity of the point r is v = [ω_φ, + ω_Ϙ + ω_ψ, r], so the angular velocity of the body is

where the terms are defined by the formulas above.

It remains to decompose the vectors ω_φ, ω_θ, and ω_ψ with respect to e₁, e₂, and e₃. We have not yet used the fact that φ = ψ = 0. If φ = ψ = 0, then

is simply a rotation around the axis e_z through an angle dφ, so

Furthermore, B(φ, θ + dθ, ψ)B⁻¹(φ, θ, ψ) is simply a rotation around the axis e_N = e_x = e₁ through an angle dθ in the case φ = ψ = 0, so

Finally, B(φ, θ, ψ + dψ)B⁻¹(φ, θ, ψ) is a rotation through an angle dψ around the axis e₃, so

In short, for φ = ψ = 0 we have

But, clearly, for φ = ψ = 0

So the components of the angular velocity along the principal axes e₁, e₂, and e₃ are

□

Since

, the kinetic energy for φ = ψ = 0 is given by the formula

But the kinetic energy cannot depend on φ and ψ: these are cyclic coordinates, and by a choice of origin of reference for φ and ψ which does not change T we can always make φ = 0 and ψ = 0. Thus the formula we got for the kinetic energy is true for all φ and ψ.

In this way we obtain the lagrangian function

C Investigation of the motion

To the cyclic coordinates φ and ψ there correspond the first integrals

Theorem. The inclination θ of the axis of the top to the vertical changes with time in the same way as in the one-dimensional system with energy

where the effective potential energy is given by the formula

Proof. Following the general theory, we express

and

in terms of M₃ and M_z. We get the total energy of the system as

and

The number

, independent of θ, does not affect the equation for θ. ☐

In order to study the one-dimensional system above it is convenient to make the substitution cos θ= u (−1 ≤ u ≤ 1).

We also write

Then we can rewrite the law of conservation of energy E′ as

where f(u) = (α − βu)(1 − u²) − (a − bu)², and the law of variation of the azimuth φ as

We notice that f(u) is a polynomial of degree 3, f(+∞) = +∞, and f(±1) = −(a ∓ b)² < 0 if a ≠ ±b. On the other hand, actual motions correspond to constants a, b, α, and β for which f(u) ≥ 0 for some −1 ≤ u ≤ 1. Thus f(u) has exactly two real roots u₁ and u₂ on the interval −1 ≤ u ≤ 1 (and one for u > 1, Figure 127). Therefore, the inclination θ of the axis of the top changes periodically between two limit values θ₁ and θ₂ (Figure 128). This periodic change in inclination is called nutation.

Figure 127

Graph of the function f(u)

Figure 128

Path of the top’s axis on the unit sphere

We now consider the motion of the azimuth of the axis of the top. The point of intersection of the axis with the unit sphere moves in the ring between the parallels θ₁ and θ₂. The variation of the azimuth of the axis is determined by the equation

If the root u′ of the equation a = bu lies outside of (u₁, u₂), then the angle φ varies monotonically and the axis traces a curve like a sinusoid on the unit sphere (Figure 128(a)). If the root u′ of the equation a = bu lies inside (u₁, u₂), then the rate of change of φ is in opposite directions on the parallels θ₁ and θ₂, and the axis traces a looping curve in the sphere (Figure 128(b)).

If the root u′ of a = bu lies on the boundary (e.g., u′ = u₂), then the axis traces a curve with cusps (Figure 128(c)).

The last case, although exceptional, is observed every time we release the axis of a top launched at inclination θ₂ without initial velocity; the top first falls, but then rises again.

The azimuthal motion of the top is called precession. The complete motion of the top consists of rotation around its own axis, nutation, and precession. Each of the three motions has its own frequency. If the frequencies are incommensurable, the top never returns to its initial position, although it approaches it arbitrarily closely.

31 Sleeping tops and fast tops

The formulas obtained in Section 30 reduce the solution of the equations of motion of a top to elliptic integrals. However, qualitative information about the motion is usually easy to obtain without turning to quadrature.

In this paragraph we investigate the stability of a vertical top and give approximate formulas for the motion of a rapidly spinning top.

A Sleeping tops

We consider first the particular solution of the equations of motion in which the axis of the top is always vertical (θ = 0) and the angular velocity is constant (a “sleeping” top). In this case, clearly, M_z = M₃ = I₃ω₃ (Figure 129).

Figure 129

Sleeping top

Problem. Show that a stationary rotation around the vertical axis is always Liapunov unstable.

We will look at the motion of the axis of the top, and not of the top itself. Will the axis of the top stably remain close to the vertical, i.e., will θ remain small? Expressing the effective potential energy of the system

as a power series in θ, we find

If A > 0, the equilibrium position θ = 0 of the one-dimensional system is stable, and if A < 0 it is unstable. Thus, the condition for stability has the form

When friction reduces the velocity of a sleeping top to below this limit, the top wakes up.

Problem. Show that, for , the axis of a sleeping top is stable with respect to perturbations which change the values of M_z and M₃, as well as θ.

B Fast tops

A top is called fast if the kinetic energy of its rotation is large in comparison with its potential energy:

It is clear from a similarity argument that multiplying the angular velocity by N is exactly equivalent to dividing the weight by N².

Theorem. If, while the initial position of a top is preserved, the angular velocity is multiplied by N, then the trajectory of the top will be exactly the same as if the angular velocity remained as it was and the acceleration of gravity g were divided by N². In the case of large angular velocity the trajectory clearly goes N times faster.⁵¹

In this way we can study the case g → 0 and apply the results to study the case ω → ∞.

To begin, we consider the case g = 0, i.e., the motion of a symmetric top in the absence of gravity. We compare two descriptions of this motion: Lagrange’s (Section 30C) and Poinsot’s (Section 29C).

We first consider Lagrange’s equation for the variation of the angle of inclination θ of the top’s axis.

Lemma. In the absence of gravity, the angle θ₀ satisfying M_z = M₃ cos θ₀ is a stable equilibrium position of the equation of motion of the top’s axis. The frequency of small oscillations of θ near this equilibrium position is equal to

Proof. In the absence of gravity the effective potential energy reduces to
This nonnegative function has the minimum value of zero for the angle θ = θ₀ determined by the condition M_z = M₃ cos θ₀ (Figure 130). Thus, the angle of inclination θ₀ of the top’s axis to the vertical is stably stationary: for small deviations of the initial angle θ from θ₀, there will be periodic oscillations of θ near θ₀ (nutation). The frequency of these oscillations is easily determined by the following general formula: the frequency ω of small oscillations in a one-dimensional system with energy
is given (Section 22D) by the formula
The energy of the one-dimensional system describing oscillations of the inclination of the top’s axis is
For θ = θ₀ + x we find M_z −M₃cosθ =M₃(cosθ₀ −cos(θ₀ + x)) = M₃x sin θ₀ + O(x²)
from which we obtain the expression for the frequency of nutation
□

Figure 130
Effective potential energy of a top

From the formula

sin²θ it is clear that, for θ = θ₀, the azimuth of the axis does not change with time: the axis is stationary. The azimuthal motion of the axis under small deviations of θ from θ₀ could also be studied with the help of this formula, but we will deal with it differently.

The motion of a top in the absence of gravity can be considered in Poinsot’s description. Then the axis of the top rotates uniformly around the angular momentum vector, preserving its position in space. Thus, the axis of the top describes a circle on the sphere whose center corresponds to the angular momentum vector (Figure 131).

Figure 131

Comparison of the descriptions of the motion of a top according to Lagrange and Poinsot

Remark. Now the motion of the top’s axis, which according to Lagrange was called nutation, is called precession in Poinsot’s description of motion.

This means that the formula obtained above for the frequency of a small nutation, ω_nut = I₃ω₃/I₁, agrees with the formula for the frequency of precession ω = M/I₁ in Poinsot’s description: when the amplitude of nutation approaches zero, I₃ω₃ →M.

C A top in a weak field

We go now to the case when the force of gravity is not absent, but is very small (the values of M_z and M₃ are fixed). In this case a term mgl cos θ, small together with its derivatives, is added to the effective potential energy. We will show that this term slightly changes the frequency of nutation.

Lemma. Suppose that the function f(x) has a minimum at x = 0 and Taylor expansion f(x) = Ax²/2 + ..., A > 0. Suppose that the function h(x) has Taylor expansion h(x) = B + Cx + ⋯. Then, for sufficiently small ε, the function f_ε(x) = f(x) + εh(x) has a minimum at the point (Figure 132)
which is close to zero. In addition, .

Proof. We have and the result is obtained by applying the implicit function theorem to . ☐

Figure 132
Displacement of the minimum under a small change of the function

By the lemma, the effective potential energy for small g has a minimum θ_g close to θ₀, and at this point U′′ differs slightly from U″(θ₀). Therefore, the frequency of a small nutation near θ₀ is close to that obtained for g = 0:

D A rapidly thrown top

We now consider the special initial conditions when we release the axis of the top without an initial push from a position with inclination θ₀ to the vertical.

Theorem. If the axis of the top is stationary at the initial moment

and the top is rotating rapidly around its axis (ω₃ → ∞), which is inclined from the vertical with angle θ₀(M_z = M₃ cos θ₀), then asymptotically, as ω₃ → ∞,

the nutation frequency is proportional to the angular velocity;

the amplitude of nutation is inversely proportional to the square of the angular velocity;

the frequency of precession is inversely proportional to the angular velocity;

the following asymptotic formulas hold (as ω₃ → ∞):

(here f(ω₃) ~ g(ω₃) if lim_{φ3→ ∞} (f/g) = 1).

For the proof, we look at the case when the initial angular velocity is fixed, but g → 0. Then by interpreting the formulas with the aid of a similarity argument (cf. Section B), we obtain the theorem.

We already know from Section 30C that under our initial conditions the axis of the top traces a curve with cusps on the sphere.

We apply the lemma to locate the minimum point θ_g of the effective potential energy. We set (Figure 133)
Then we obtain, as above, the Taylor expansion in x at θ₀
Applying the lemma to f = U_{eff|g = 0}, g = ε, h = ml cos(θ₀ + x), we find that the minimum of the effective potential energy U_eff is attained at angle of inclination
Thus the inclination θ of the top’s axis will oscillate near θ_g (Figure 134). But, at the initial moment, θ = θ₀ and . This means that θ₀ corresponds to the highest position of the axis of the top. Thus, for small g, the amplitude of nutation is asymptotically equal to
We now find the precessional motion of the axis. From the general formula
for M_z = M₃cosθ₀ and θ = θ₀ + x, we find that M_z − M₃cosθ = M₃ xsinθ₀ + ⋯; so
But x oscillates harmonically between 0 and 2x_g (up to O(g²)). Therefore, the average value of the velocity of precession over the period of nutation is asymptotically equal to

Figure 133
Definition of the amplitude of nutation

Figure 134
Motion of a top’s axis

Problem. Show that

Translator’s note. The word overload is the literal translation of the Russian term peregruzka. There does not seem to be an English term for this particular kind of inertial force.

⁴⁸

Strictly speaking, the configuration space of a rigid body is ℝ³ × O(3), and ℝ³ × SO(3) is only one of the two connected components of this manifold, corresponding to the orientation of the body.

⁴⁹

The following assertions are easy to prove:

Let f₁,..., f_k : M → ℝ be functions on an oriented manifold M. Consider the set V given by the equations f₁ = c₁,..., f_k = c_k. Assume that the gradients of f₁,..., f_k are linearly independent at each point. Then V is orientable.

The direct product of orientable manifolds is orientable.

The tangent bundle TSO(3) is the direct product ℝ³ × SO(3). A manifold whose tangent bundle is a direct product is called parallelizable. The group SO(3) (like every Lie group) is parallelizable.

A parallelizable manifold is orientable.

It follows from assertions 1–4 that SO(3), TSO(3), and V_c are orientable.

⁵⁰

Often called the inertia tensor (translator’s note).

⁵¹

Denote by φ_g(t, ξ) the position of the top at time t with initial condition ξ ∈ TSO(3) and gravitational acceleration g. Then the theorem says that

Part III
Hamiltonian Mechanics

Hamiltonian mechanics is geometry in phase space. Phase space has the structure of a symplectic manifold. The group of symplectic diffeomorphisms acts on phase space. The basic concepts and theorems of hamiltonian mechanics (even when formulated in terms of local symplectic coordinates) are invariant under this group (and under the larger group of transformations which also transform time).

A hamiltonian mechanical system is given by an even-dimensional manifold (the “phase space”), a symplectic structure on it (the “Poincare integral invariant”) and a function on it (the “hamiltonian function”). Every oneparameter group of symplectic diffeomorphisms of the phase space preserving the hamiltonian function is associated to a first integral of the equations of motion.

Lagrangian mechanics is contained in hamiltonian mechanics as a special case (the phase space in this case is the cotangent bundle of the configuration space, and the hamiltonian function is the Legendre transform ofthe lagrangian function).

The hamiltonian point of view allows us to solve completely a series of mechanical problems which do not yield solutions by other means (for example, the problem of attraction by two stationary centers and the problem of geodesics on the triaxial ellipsoid). The hamiltonian point of view has even greater value for the approximate methods of perturbation theory (celestial mechanics), for understanding the general character of motion in complicated mechanical systems (ergodic theory, statistical mechanics) and in connection with other areas of mathematical physics (optics, quantum mechanics, etc.).

7
Differential forms

V. I. Arnold¹

(1)

Department of Mathematics Steklov Mathematical Institute, Russian Academy of Sciences, GSP-1, 117966, Moscow, Russia

Exterior differential forms arise when concepts such as the work of a field along a path and the flux of a fluid through a surface are generalized to higher dimensions.

Hamiltonian mechanics cannot be understood without differential forms. The information we need about differential forms involves exterior multiplication, exterior differentiation, integration, and Stokes’ formula.

32 Exterior forms

Here we define exterior algebraic forms

A 1-forms

Let ℝⁿ be an n-dimensional real vector space.⁵² We will denote vectors in this space by ξ, η,....

Definition. A form of degree 1 (or a 1-form) is a linear function ω: ℝⁿ → ℝ, i.e.,

We recall the basic facts about 1-forms from linear algebra. The set of all 1-forms becomes a real vector space if we define the sum of two forms by

and scalar multiplication by

The space of 1-forms on ℝⁿ is itself n-dimensional, and is also called the dual space (ℝⁿ)*.

Suppose that we have chosen a linear coordinate system x₁,..., x_n on ℝⁿ. Each coordinate x_i is itself a 1-form. These n 1-forms are linearly independent. Therefore, every 1-form ω has the form

The value of ω on a vector ξ is equal to

where x₁(ξ),..., x_n(ξ) are the components of ξ in the chosen coordinate system.

Example. If a uniform force field F is given on euclidean ℝ³, its work A on the displacement ξ is a 1-form acting on ξ (Figure 135).

Figure 135
The work of a force is a 1-form acting on the displacement.

B 2-forms

Definition. An exterior form of degree 2 (or a 2-form) is a function on pairs of vectors ω²: ℝⁿ × ℝⁿ → ℝ, which is bilinear and skew symmetric:

Example 1. Let S(ξ₁, ξ₂) be the oriented area of the parallelogram constructed on the vectors ξ₁ and ξ₂ of the oriented euclidean plane ℝ², i.e.,
with e₁, e₂ a basis giving the orientation on ℝ².

It is easy to see that S(ξ₁, ξ₂) is a 2-form (Figure 136).

Figure 136
Oriented area is a 2-form.

Example 2. Let v be a uniform velocity vector field for a fluid in three-dimensional oriented euclidean space (Figure 137). Then the flux of the fluid over the area of the parallelogram ξ₁, ξ₂ is a bilinear skew symmetric function of ξ₁ and ξ₂, i.e., a 2-form defined by the triple scalar product

Figure 137
Flux of a fluid through a surface is a 2-form.

Example 3. The oriented area of the projection of the parallelogram with sides ξ₁ and ξ₂ on the x₁, x₂-plane in euclidean ℝ³ is a 2-form.

Problem 1. Show that for every 2-form ω² on ℝⁿ we have

Solution. By skew symmetry, ω²(ξ, ξ) = −ω²(ξ, ξ).

The set of all 2-forms on ℝⁿ becomes a real vector space if we define the addition of forms by the formula

and multiplication by scalars by the formula

Problem 2. Show that this space is finite-dimensional, and find its dimension.

Answer. n(n − 1)/2; a basis is shown below.

C k-forms

Definition. An exterior form of degree k, or a k-form, is a function of k vectors which is k-linear and antisymmetric:

where

Example 1. The oriented volume of the parallelepiped with edges ξ₁,..., ξ_n in oriented euclidean space ℝⁿ is an n-form (Figure 138).

Figure 138
Oriented volume is a 3-form.
where ξ_i = ξ_i1e₁+ ⋯ + ξ_ine_n and e₁,..., e_n are a basis of ℝⁿ.

Example 2. Let ℝ^k be an oriented k-plane in n-dimensional euclidean space ℝⁿ. Then the k-dimensional oriented volume of the projection of the parallelepiped with edges ξ₁, ξ₂,..., ξ_k ∈ ℝⁿ onto ℝ^k is a k-form on ℝⁿ.

The set of all k-forms in ℝⁿ form a real vector space if we introduce operations of addition

and multiplication by scalars

Problem 3. Show that this vector space is finite-dimensional and find its dimension.

Answer. : a basis is shown below.

D The exterior product of two 1-forms

We now introduce one more operation: exterior multiplication of forms. If ω^k is a k-form and ω^l is an l-form on ℝⁿ, then their exterior product ω^k ⋀ ω^l will be a k + l-form. We first define the exterior product of 1-forms, which associates to every pair of 1-forms ω₁, ω₂ on ℝⁿ a 2-form ω₁ ⋀ ω₂ on ℝⁿ.

Let ξ be a vector in ℝⁿ. Given two 1-forms ω₁ and ω₂, we can define a mapping of ℝⁿ to the plane ℝⁿ × ℝⁿ by associating to ξ ∈ℝⁿ the vector ω(ξ) with components ω₁(ξ) and ω₂(ξ) in the plane with coordinates ω₁, ω₂ (Figure 139).

Figure 139

Definition of the exterior product of two 1-forms

Definition. The value of the exterior product ω₁ ⋀ ω₂ on the pair of vectors ξ₁, ξ₂, ∈ ℝⁿ is the oriented area of the image of the parallelogram with sides ω(ξ₁) and ω(ξ₂) on the ω₁, ω₂-plane:

Problem 4. Show that ω₁ ⋀ ω₂ really is a 2-form.

Problem 5. Show that the mapping
is bilinear and skew symmetric:

Hint. The determinant is bilinear and skew-symmetric not only with respect to rows, but also with respect to columns.

Now suppose we have chosen a system of linear coordinates on ℝⁿ, i.e., we are given n independent 1-forms x₁,..., x_n. We will call these forms basic.

The exterior products of the basic forms are the 2-forms x_i ⋀ x_j. By skew-symmetry, x_i ⋀ x_i = 0 and x_i ⋀ x_j = −x_j ⋀ x_i. The geometric meaning of the form x_i ⋀ x_j is very simple: its value on the pair of vectors ξ₁, ξ₂ is equal to the oriented area of the image of the parallelogram ξ₁, ξ₂ on the coordinate plane x_i, x_j under the projection parallel to the remaining coordinate directions.

Problem 6. Show that the forms x_i ⋀ x_j (i < j) are linearly independent.

In particular, in three-dimensional euclidean space (x₁, x₂, x₃), the area of the projection on the (x₁, x₂)-plane is x₁ ⋀ x₂, on the (x₂, x₃)-plane it is x₂ ⋀ x₃, and on the (x₃, x₁)-plane it is x₃ ⋀ x₁.

Problem 7. Show that every 2-form in the three-dimensional space (x₁, x₂, x₃) is of the form

Problem 8. Show that every 2-form on the n-dimensional space with coordinates x₁,..., x_n can be uniquely represented in the form

Hint. Let e_i be the i-th basis vector, i.e., x_i(e_i) = 1, x_j(e_i) = 0 for i ≠ j. Look at the value of the form ω² on the pair e_i, e_j. Then

E Exterior monomials

Suppose that we are given k 1-forms ω₁,..., ω_k. We define their exterior product ω₁ ⋀ ⋯ ⋀ ω_k.

Definition. Set

In other words, the value of a product of 1-forms on the parallelepiped ξ₁,..., ξ_k is equal to the oriented volume of the image of the parallelepiped in the oriented euclidean coordinate space ℝ^k under the mapping ξ → (ω₁(ξ),..., ω_k(ξ)).

Problem 9. Show that ω₁ ⋀ ⋯ ⋀ ω_k is a k-form.

Problem 10. Show that the operation of exterior product of 1-forms gives a multi-linear skew-symmetric mapping

In other words,
and
where

Now consider a coordinate system on ℝⁿ given by the basic forms x₁,..., x_n. The exterior product of k basic forms

is the oriented volume of the image of a k-parallelepiped on the k-plane (x_i₁,..., x_{i_k}) under the projection parallel to the remaining coordinate directions.

Problem 11. Show that, if two of the indices i₁,..., i_k are the same, then the form x_i₁⋀ ⋯ ⋀ x_{i_k} is zero.

Problem 12. Show that the forms
are linearly independent.

The number of such forms is clearly . We will call them basic k-forms.

Problem 13. Show that every k-form on ℝⁿ can be uniquely represented as a linear combination of basic forms:

Hint. a _{i
₁,..., i_k} = ω^k(e_i₁, .., e_{i_k}).

It follows as a result of this problem that the dimension of the vector space of k-forms on ℝⁿ is equal to

. In particular, for

, from which follows

Corollary. Every n-form on ℝⁿ is either the oriented volume of a parallelepiped with some choice of unit volume, or zero:

Problem 14. Show that every k-form on ℝⁿ with k > n is zero.

We now consider the product of a k-form ω^k and an l-form ω^l. First, suppose that we are given two monomials

where ω₁,..., ω_{k + 1} are 1-forms. We define their product ω^k ⋀ ω^l to be the monomial

Problem 15. Show that the product of monomials is associative:
and skew-commutative:

Hint. In order to move each of the l factors of ω^l forward, we need k inversions with the k factors of ω^k.

Remark. It is useful to remember that skew-commutativity means commutativity only if one of the degrees k and l is even, and anti-commutativity if both degrees k and l are odd.

33 Exterior multiplication

We define here the operation of exterior multiplication of forms and show that it is skew-commutative, distributive, and associative.

A Definition of exterior multiplication

We now define the exterior multiplication of an arbitrary k-form ω^k by an arbitrary l-form ω^l. The result ω^k ⋀ ω^l will be a k + l-form. The operation of multiplication turns out to be:

skew-commutative: ω^k ⋀ ω^l = (−1)^klω^l ⋀ ω^k;

distributive:

associative: (ω^k ⋀ ω^l) ⋀ ω^m = ω^k ⋀ (ω^l ⋀ ω^m).

Definition. The exterior product ω^k ⋀ ω^l of a k^_-form ω^k on ℝⁿ with an l-form ω^l on ℝⁿ is the k + l-form on ℝⁿ whose value on the k + l vectors ξ₁,..., ξ_k, ξ_{k + 1},..., ξ_{k + 1} ∈ ℝⁿ is equal to

(1)

where i₁ < ⋯ < i_k and j₁ < ⋯ < j_l; (i₁,..., i_k, j₁,..., j_l) is a permutation of the numbers (1, 2,..., k + l); and

In other words, every partition of the k + l vectors ξ₁,..., ξ_{k + l} into two groups (of k and of l vectors) gives one term in our sum (1). This term is equal to the product of the value of the k-form ω^k on the k vectors of the first group with the value of the l-form ω^l on the l vectors of the second group, with sign + or − depending on how the vectors are ordered in the groups. If they are ordered in such a way that the k vectors of the first group and the l vectors of the second group written in succession form an even permutation of the vectors ξ₁, ξ₂,..., ξ_{k + l}, then we take the sign to be +, and if they form an odd permutation we take the sign to be −.

Example. If k = l = 1, then there are just two partitions: ξ₁, ξ₂ and ξ₂, ξ₁. Therefore,
which agrees with the definition of multiplication of 1-forms in Section 32.

Problem 1. Show that the definition above actually defines a k + l-form (i.e., that the value of (ω^k ⋀ ω^l)(ξ₁,...,ξ_k+l) depends linearly and skew-symmetrically on the vectors ξ).

Theorem. The exterior multiplication of forms defined above is skew-commutative, distributive, and associative. For monomials it coincides with the multiplication defined in Section 32.

The proof of skew-commutativity is based on the simplest properties of even and odd permutations (cf. the problem at the end of Section 32) and will be left to the reader.

Distributivity follows from the fact that every term in (1) is linear with respect to ω^k and ω^l.

The proof of associativity requires a little more combinatorics. Since the corresponding arguments are customarily carried out in algebra courses for the proof of Laplace’s theorem on the expansion of a determinant by column minors, we may use this theorem.⁵³

We begin with the following observation: if associativity is proved for the terms of a sum, then it is also true for the sum, i.e.,

For, by distributivity, which has already been proved, we have

We already know from Section 32 (Problem 13) that every form on ℝⁿ is a sum of monomials; therefore, it is enough to show associativity for multiplication of monomials.

Since we have not yet proved the equivalence of the definition in Section 32 of multiplication of k 1-forms with the general definition (1), we will temporarily denote the multiplication of k 1-forms by the symbol ⊼, so that our monomials have the form

where ω₁,..., ω_{k + l} are 1-forms.

Lemma. The exterior product of two monomials is a monomial:

Proof. We calculate the values of the left and right sides on k + l vectors ξ₁,..., ξ_{k + l}. The value of the left side, by formula (1), is equal to the sum of the products

of the minors of the first k columns of the determinant of order k + l and the remaining minors. Laplace’s theorem on the expansion by minors of the first k columns asserts exactly that this sum, with the same rule of sign choice as in Definition (1), is equal to the determinant det |ω_i(ξ_j)|. ☐

It follows from the lemma that the operations ⊼ and ⊼ coincide: we get, in turn,

The associativity of ⋀-multiplication of monomials therefore follows from the obvious associativity of ⊼-multiplication of 1-forms. Thus, in view of the observation made above, associativity is proved in the general case.

Problem 2. Show that the exterior square of a 1-form, or, in general, of a form of odd order, is equal to zero: ω^k ⋀ ω^k = 0 if k is odd.

Example 1. Consider a coordinate system p₁,..., p_n, q₁,..., q_n on ℝ²ⁿ and the 2-form.

[Geometrically, this form signifies the sum of the oriented areas of the projection of a parallelogram on the n two-dimensional coordinate planes (p₁, q₁),.... (p_n, q_n). Later, we will see that the 2-form ω² has a special meaning for hamiltonian mechanics. It can be shown that every nondegenerate⁵⁴ 2-form on ℝ²ⁿ has the form ω² in some coordinate system (p₁,..., q_n).]

Problem 3. Find the exterior square of the 2-form ω².

Answer.

Problem 4. Find the exterior k-th power of ω².

Answer.

In particular,
is, up to a factor, the volume of a 2n-dimensional parallelepiped in ℝ²ⁿ.

Example 2. Consider the oriented euclidean space ℝ³. Every vector A ∈ ℝ³ determines a 1-form (scalar product) and a 2-form by

Problem 5. Show that the maps and establish isomorphisms of the linear space ℝ³ of vectors A with the linear spaces of 1-forms on ℝ³ and 2-forms on ℝ³. If we choose an orthonormal oriented coordinate system (x₁, x₂, x₃) on ℝ³, then
and

Remark. Thus the isomorphisms do not depend on the choice of the orthonormal oriented coordinate system (x₁, x₂, x₃). But they do depend on the choice of the euclidean structure on ℝ³, and the isomorphism also depends on the orientation (coming implicitly in the definition of triple scalar product).

Problem 6. Show that, under the isomorphisms established above, the exterior product of 1-forms becomes the vector product in ℝ³, i.e., that

In this way the exterior product of 1-forms can be considered as an extension of the vector product in ℝ³ to higher dimensions. However, in the n-dimensional case, the product is not a vector in the same space: the space of 2-forms on ℝⁿ is isomorphic to ℝⁿ only for n = 3.

Problem 7. Show that, under the isomorphisms established above, the exterior product of a 1-form and a 2-form becomes the scalar product of vectors in ℝ³:

C Behavior under mappings

Let f : ℝ^m → ℝⁿ be a linear map, and ω^k an exterior k-form on ℝⁿ. Then there is a k-form f*ω^k on ℝ^m, whose value on the k vectors ξ₁,..., ξ_k ∈ ℝ^m is equal to the value of ω^k on their images:

Problem 8. Verify that f*ω^k is an exterior form.

Problem 9. Verify that f* is a linear operator from the space of k-forms on ℝⁿ to the space of k-forms on ℝ^m (the star superscript means that f* acts in the opposite direction from f).

Problem 10. Let f: ℝ^m → ℝⁿ and g: ℝⁿ → ℝ^p. Verify that (g ○ f)* = f* ○ g*.

Problem 11. Verify that f* preserves exterior multiplication: f*(ω^k ⋀ ω^l) = (f*ω^k) ⋀(f*ω^l)

34 Differential forms

We give here the definition of differential forms on differentiable manifolds.

A Differential 1-forms

The simplest example of a differential form is the differential of a function.

Example. Consider the function y = f(x) = x². Its differential df = 2x dx depends on the point x and on the “increment of the argument,” i.e., on the tangent vector ξ to the x axis. We fix the point x. Then the differential of the function at x, df |_x, depends linearly on ξ. So, if x = 1 and the coordinate of the tangent vector ξ is equal to 1, then df = 2, and if the coordinate of ξ is equal to 10, then df = 20 (Figure 140).

Figure 140
Differential of a function

Let f: M → ℝ be a differentiable function on the manifold M (we can imagine a “function of many variables” f: ℝⁿ → ℝ). The differential df |_x of f at x is a linear map

of the tangent space to M at x into the real line. We recall from Section 18F the definition of this map:

Let ξ ∈ TM_x be the velocity vector of the curve x(t): ℝ → M; x(0) = x and ẋ(0) = ξ. Then, by definition,

Problem 1. Let ξ be the velocity vector of the plane curve x(t) = cos t, y(t) = sin t at t = 0. Calculate the values of the differentials dx and dy of the functions x and y on the vector ξ (Figure 141).

Figure 141
Problem 1

Answer.

Note that the differential of a function f at a point x ∈ M is a 1-form df_x on the tangent space TM_x.

The differential df of f on the manifold M is a smooth map of the tangent bundle TM to the line

This map is differentiable and is linear on each tangent space TM_x ⊂ TM.

Definition. A differential form of degree 1 (or a 1-form) on a manifold M is a smooth map

of the tangent bundle of M to the line, linear on each tangent space TM_x.

One could say that a differential 1-form on M is an algebraic 1-form on TM_x which is “differentiable with respect to x.”

Problem 2. Show that every differential 1-form on the line is the differential of some function.

Problem 3. Find differential 1-forms on the circle and the plane which are not the differential of any function.

B The general form of a differential 1-form on ℝⁿ

We take as our manifold M a vector space with coordinates x₁,..., x_n. Recall that the components ξ₁,...,ξ _n of a tangent vector

are the values of the differentials dx₁,..., dx_n on the vector ξ. These n 1-forms on

are linearly independent. Thus the 1-forms dx₁,..., dx_n form a basis for the n-dimensional space of 1-forms on

, and every 1-form on

can be uniquely written in the form a₁ dx₁ + ⋯ + a_n dx_n, where the a_i are real coefficients. Now let ω be an arbitrary differential 1-form on ℝⁿ. At every point x it can be expanded uniquely in the basis dx₁,..., dx_n. From this we get:

Theorem. Every differential 1-form on the space ℝⁿ with a given coordinate system x₁,..., x_n can be written uniquely in the form

where the coefficients a_i(x) are smooth functions.

Problem 4. Calculate the value of the form ω₁ = dx₁, w₂ = x₁dx₂, and on the vectors ξ₁, ξ₂, and ξ₃ (Figure 142).

Figure 142
Problem 4

Answer.

Problem 5. Let x₁,..., x_n be functions on a manifold M forming a local coordinate system in some region. Show that every 1-form on this region can be uniquely written in the form ω = a(x) dx₁ + ⋯ + a_n(x) dx_n.

C Differential k-forms

Definition. A differential k-form ω^k |_x at a point x of a manifold M is an exterior k-form on the tangent space TM_x to M at x, i.e., a k-linear skew-symmetric function of k vectors ξ₁,..., ξ_k tangent to M at x.

If such a form ω^k |_x is given at every point x of the manifold M and if it is differentiable, then we say that we are given a k-form ω^k on the manifold M.

Problem 6. Put a natural differentiable manifold structure on the set whose elements are k-tuples of vectors tangent to M at some point x.

A differential k-form is a smooth map from the manifold of Problem 6 to the line.

Problem 7. Show that the k-forms on M form a vector space (infinite-dimensional if k does not exceed the dimension of M).

Differential forms can be multiplied by functions as well as by numbers. Therefore, the set of C^∞ differential k-forms has a natural structure as a module over the ring of infinitely differentiable real functions on M.

D The general form of a differential k-form on ℝⁿ

Take as the manifold M the vector space ℝⁿ with fixed coordinate functions x₁,..., x_n: ℝⁿ → ℝ. Fix a point x. We saw above that the n 1-forms dx₁,..., dx_n form a basis of the space of 1-forms on the tangent space

. Consider exterior products of the basic forms:

In Section 32 we saw that these

k-forms form a basis of the space of exterior k-forms on

. Therefore, every exterior k-form on

can be written uniquely in the form

Now let ω be an arbitrary differential k-form on ℝⁿ. At every point x it can be uniquely expressed in terms of the basis above. From this follows:

Theorem. Every differential k-form on the space ℝⁿ with a given coordinate system x₁,..., x_n can be written uniquely in the form

where the a _i
₁,..., _{i_k}(x) are smooth functions on ℝⁿ.

Problem 8. Calculate the value of the forms ω₁ = dx₁ ⋀ dx₂, ω₂ = x₁ dx₁ ⋀ dx₂ − x₂ dx₂ ⋀ dx₁ and ω₃ = r dr ⋀ dφ (where x₁ = r cos φ and x₂ = r sin φ) on the pairs of vectors (ξ₁, η₁), (ξ₂, η₂), and (ξ₃, η₃) (Figure 143).

Figure 143
Problem 8

Answer.

Problem 9. Calculate the value of the forms ω₁ = dx₂ ⋀ dx₃, ω₂ = x₁ dx₃ ⋀ dx₂, and , on the pair of vectors ξ = (1, 1, 1), η = (1, 2, 3) at the point x = (2, 0, 0).

Answer. ω₁ = 1, ω₂ = −2, ω₃ = −8.

Problem 10. Let x₁,..., x_n: M → ℝ be functions on a manifold which form a local coordinate system on some region. Show that every differential form on this region can be written uniquely in the form

Example. Change of variables in a form. Suppose that we are given two coordinate systems on ℝ³: x₁, x₂, x₃ and y₁, y₂, y₃. Let ω be a 2-form on ℝ³. Then, by the theorem above, ω can be written in the system of x-coordinates as ω = X₁ dx₂ ⋀ dx₃ + X₂ dx₃ ⋀ dx₁ + X₃ dx₁ ⋀ dx₂, where X₁, X₂, and X₃ are functions of x₁, x₂, and x₃, and in the system of y-coordinates as ω = Y₁ dy₂ ⋀ dy₃ + Y₂ dy₃ ⋀ dy₁ + Y₃ dy₁ ⋀ dy₂, where Y₁, Y₂, and Y₃ are functions of y₁, y₂, and y₃.

Problem 11. Given the form written in the x-coordinates (i.e., the X_i) and the change of variables formulas x = x(y), write the form in y-coordinates, i.e., find Y.

Solution. We have dx_i = (∂x_i/∂y₁) dy₁ + (∂x_i/ay₂) dy₂ + (∂x_i/∂y₃) dy₃ . Therefore,
from which we get

E Appendix. Differential forms in three-dimensional spaces

Let M be a three-dimensional oriented riemannian manifold (in all future examples M will be euclidean three-space ℝ³). Let x₁, x₂, and x₃ be local coordinates, and let the square of the length element have the form

(i.e., the coordinate system is triply orthogonal).

Problem 12. Find E₁, E₂, and E₃ for cartesian coordinates x, y, z, for cylindrical coordinates r, φ, z and for spherical coordinates R, φ, θ in the euclidean space ℝ³ (Figure 144).

Figure 144
Problem 12

Answer.

We let e₁, e₂, and e₃ denote the unit vectors in the coordinate directions. These three vectors form a basis of the tangent space.

Problem 13. Find the values of the forms dx₁; dx₂, and dx₃ on the vectors e₁, e₂, and e₃.

Answer. , the rest are zero. In particular, for cartesian coordinates dx(e_x) = dy(e_y) = dz(e_z) = 1; for cylindrical coordinates dr(e_r) = dz(e_z) = 1 and dφ(e_φ) = 1/r (Figure 145), for spherical coordinates dR(e_R) = 1, dφ(e_φ) = 1/R cos θ and dθ(e_θ) = 1/R.

Figure 145
Problem 13

The metric and orientation on the manifold M furnish the tangent space to M at every point with the structure of an oriented euclidean three-dimensional space. In terms of this structure, we can talk about scalar, vector, and triple scalar products.

Problem 14. Calculate [e₁, e₂], (e_R, e_θ), and (e_z, e_x, e_y).

Answer. e₃, 0, 1.

In an oriented euclidean three-space every vector A corresponds to a 1-form

and a 2-form

, defined by the conditions

The correspondence between vector fields and forms does not depend on the system of coordinates, but only on the euclidean structure and orientation. Therefore, every vector field A on our manifold M corresponds to a differential 1-form

on M and a differential 2-form

on M.

The formulas for changing from fields to forms and back have a different form in each coordinate system. Suppose that in the coordinates x₁, x₂, and x₃ described above, the vector field has the form

(the components A_i are smooth functions on M). The corresponding 1-form

decomposes over the basis dx_i, and the corresponding 2-form over the basis dx_i ⋀ dx_j.

Problem 15. Given the components of the vector field A, find the decompositions of the 1-form and the 2-form .

Solution. We have

. Also, (a₁ dx₁ + a₂ dx₂ + a₃ dx₃)(e₁) = .

. From this we get that

, so that

In the same way, we have

. Also,

Hence,

, i.e.,

In particular, in cartesian, cylindrical, and spherical coordinates on ℝ³ the vector field

corresponds to the 1-form

and the 2-form

An example of a vector field on a manifold M is the gradient of a function f: M → ℝ. Recall that the gradient of a function is the vector field grad f corresponding to the differential:

Problem 16. Find the components of the gradient of a function in the basis e₁, e₂, e₃.

Solution. We have df = (∂f/∂x₁) dx₁ + (∂f/∂x₂) dx₂ +(∂f/∂x₃) dx₃. By the problem above
In particular, in cartesian, cylindrical, and spherical coordinates

35 Integration of differential forms

We define here the concepts of a chain, the boundary of a chain, and the integration of a form over a chain.

The integral of a differential form is a higher-dimensional generalization of such ideas as the flux of a fluid across a surface or the work of a force along a path.

A The integral of a 1-form along a path

We begin by integrating a 1-form ω¹ on a manifold M. Let

be a smooth map (the “path of integration”). The integral of the form ω¹ on the path γ is defined as a limit of Riemann sums. Every Riemann sum consists of the values of the form ω¹ on some tangent vectors ξ_i (Figure 146):

Figure 146

Integrating a 1-form along a path

The tangent vectors ξ_i are constructed in the following way. The interval 0 ≤ t ≤ 1 is divided into parts ∆_i: t_i ≤ t ≤ t_{i + 1} by the points t_i. The interval Δ_i can be looked at as a tangent vector ∆_i to the t axis at the point t_i. Its image in the tangent space to M at the point γ(t) is

The sum has a limit as the largest of the intervals ∆_i. tends to zero. It is called the integral of the 1-form ω¹ along the path γ.

The definition of the integral of a k-form along a k-dimensional surface follows an analogous pattern. The surface of integration is partitioned into small curvilinear k-dimensional parallelepipeds (Figure 147); these parallelepipeds are replaced by parallelepipeds in the tangent space. The sum of the values of the form on the parallelepipeds in the tangent space approaches the integral as the partition is refined. We will first consider a particular case.

Figure 147

Integrating a 2-form over a surface

B The integral of a k-form on oriented euclidean space ℝ^k

Let x₁,...,x_k be an oriented coordinate system on ℝ^k. Then every k-form on ℝ^k is proportional to the form dx₁ ⋀ ⋯ ⋀ dx_k, i.e., it has the form ω^k = φ(x)dx₁ ⋀ ⋯ ⋀ dx_k, where φ(x) is a smooth function.

Let D be a bounded convex polyhedron in ℝ^k (Figure 148). By definition, the integral of the form ω^k on D is the integral of the function φ:

where the integral on the right is understood to be the usual limit of Riemann sums.

Figure 148

Integrating a k-form in k-dimensional space

Such a definition follows the pattern outlined above, since in this case the tangent space to the manifold is identified with the manifold.

Problem 1. Show that ∫_D ω^k depends linearly on ω^k.

Problem 2. Show that if we divide D into two distinct polyhedra D₁ and D₂, then

In the general case (a k-form on an n-dimensional space) it is not so easy to identify the elements of the partition with tangent parallelepipeds; we will consider this case below.

C The behavior of differential forms under maps

Let f : M → N be a differentiable map of a smooth manifold M to a smooth manifold N, and let ω be a differential k-form on N (Figure 149). Then, a well-defined k-form arises also on M : it is denoted by f*ω and is defined by the relation

for any tangent vectors ξ₁,...,ξ_k ∈ T M_x. Here f_* is the differential of the map f. In other words, the value of the form f*ω on the vectors ξ₁,...,ξ_k is equal to the value of ω on the images of these vectors.

Figure 149

A form on N induces a form on M.

Example. If and ω = dy, then

Problem 3. Show that f*ω is a k-form on M.

Problem 4. Show that the map f* preserves operations on forms:

Problem 5. Let g : L → M be a differentiable map. Show that (fg)* = g*f*.

Problem 6. Let D₁ and D₂ be two compact, convex polyhedra in the oriented k-dimensional space ℝ^k and f: D₁ → D₂ a differentiable map which is an orientation-preserving diffeomorphism⁵⁵ of the interior of D₁ onto the interior of D₂. Then, for any differential k-form ω^k on D₂,

Hint. This is the change of variables theorem for a multiple integral:

D Integration of a k-form on an n-dimensional manifold

Let ω be a differential k-form on an n-dimensional manifold M. Let D be a bounded convex k-dimensional polyhedron in k-dimensional euclidean space ℝ^k (Figure 150). The role of “path of integration” will be played by a

Figure 150

Singular k-dimensional polyhedron

k-dimensional cell⁵⁶ σ of M represented by a triple σ = (D, f, Or) consisting of

a convex polyhedron D ⊂ ℝ^k,

a differentiable map f : D → M, and

an orientation on ℝ^k, denoted by Or.

Definition. The integral of the k-form ω over the k-dimensional cell σ is the integral of the corresponding form over the polyhedron D

Problem 7. Show that the integral depends linearly on the form:

The k-dimensional cell which differs from σ only by the choice of orientation is called the negative of σ and is denoted by −σ or −1 · σ (Figure 151).

Figure 151

Problem 8

Problem 8. Show that, under a change of orientation, the integral changes sign:

E Chains

The set f(D) is not necessarily a smooth submanifold of M. It could have “self-intersections” or “folds” and could even be reduced to a point. However, even in the one-dimensional case, it is clear that it is inconvenient to restrict ourselves to contours of integration consisting of one piece: it is useful to be able to consider contours consisting of several pieces which can be traversed in either direction, perhaps more than once. The analogous concept in higher dimensions is called a chain.

Definition. A chain of dimension k on a manifold M consists of a finite collection of k-dimensional oriented cells σ₁,..., σ_r in M and integers m₁,..., m_r, called multiplicities (the multiplicities can be positive, negative, or zero). A chain is denoted by

We introduce the natural identifications

Problem 9. Show that the set of all k-chains on M forms a commutative group if we define the addition of chains by the formula

F Example: the boundary of a polyhedron

Let D be a convex oriented k-dimensional polyhedron in k-dimensional euclidean space ℝ^k. The boundary of D is the (k − 1)-chain ∂D on ℝ^k defined in the following way (Figure 152).

Figure 152

Oriented boundary

The cells σ_i of the chain ∂D are the (k − 1)-dimensional faces D_i of the polyhedron D, together with maps f_i: D_i → ℝ^k embedding the faces in ℝ^k and orientations Or_i defined below; the multiplicities are equal to 1:

Rule of orientation of the boundary. Let e₁,..., e_k be an oriented frame in ℝ^k. Let D_i be one of the faces of D. We choose an interior point of D_i and there construct a vector n outwardly normal to the polyhedron D. An orienting frame for the face D_i will be a frame f₁,..., f_{k − 1} on D_i such that the frame (n, f₁,..., f_{k − 1}) is oriented correctly (i.e., the same way as the frame e₁,..., e_k).

The boundary of a chain is defined in an analogous way. Let σ = (D, f, Or) be a k-dimensional cell in the manifold M. Its boundary ∂σ is the (k − 1) chain: ∂σ = ∑ σ_i consisting of the cells σ_i = (D_i, f_i, Or_i), where the D_i are the (k − 1)-dimensional faces of D, Or_i are orientations chosen by the rule above, and f_i are the restrictions of the mapping f : D → M to the face D_i.

The boundary ∂c_k of the k-dimensional chain c_k in M is the sum of the boundaries of the cells of c_k with multiplicities (Figure 153):

Obviously, ∂c_k is a (k − 1)-chain on M.⁵⁷

Figure 153

Boundary of a chain

Problem 10. Show that the boundary of the boundary of any chain is zero: ∂∂c_k = 0.

Hint. By the linearity of ∂ it is enough to show that ∂∂D = 0 for a convex polyhedron D. It remains to verify that every (k − 2)-dimensional face of D appears in ∂∂D twice, with opposite signs. It is enough to prove this for k = 2 (planar cross-sections).

G The integral of a form over a chain

Let ω^k be a k-form on M, and c_k a k-chain on M, c_k = ∑ m_iσ_i. The integral of the form ω^k over the chain c_k is the sum of the integrals on the cells, counting multiplicities:

Problem 11. Show that the integral depends linearly on the form:

Problem 12. Show that integration of a fixed form ω^k on chains c_k defines a homomorphism from the group of chains to the line.

Example 1. Let M be the plane {(p, q)}, ω¹ the form pdq, and c₁ the chain consisting of one cell σ with multiplicity 1:
Then ∫_c₁ pdq = π. In general, if a chain c₁ represents the boundary of a region G (Figure 154), then ∫_c₁ pdq is equal to the area of G with sign + or − depending on whether the pair of vectors (outward normal, oriented boundary vector) has the same or opposite orientation as the pair (p axis, q axis).

Figure 154
The integral of the form p dq over the boundary of a region is equal to the area of the region.

Example 2. Let M be the oriented three-dimensional euclidean space ℝ³. Then every 1-form on M corresponds to some vector field , where

The integral of on a chain c₁ representing a curve l is called the circulation of the field A over the curve l:

Every 2-form on M also corresponds to some field . The integral of the form on a chain c₂ representing an oriented surface S is called the flux of the field A through the surface S:

Problem 13. Find the flux of the field A = (1/R²)e_R over the surface of the sphere x² + y² + z² = 1, oriented by the vectors e_x, e_y at the point z = 1. Find the flux of the same field over the surface of the ellipsoid (x²/a²) + (y²/b²) + z² = 1 oriented the same way.

Hint. Cf. Section 36H.

Problem 14. Suppose that, in the 2n-dimensional space ℝⁿ = {(p₁,...,p_n; q₁,...,q_n)}, we are given a 2-chain c₂ representing a two-dimensional oriented surface S with boundary l. Find

Answer. The sum of the oriented areas of the projection of S on the two-dimensional coordinate planes p_i, q_i.

36 Exterior differentiation

We define here exterior differentiation of k-forms and prove Stokes’ theorem: the integral of the derivative of a form over a chain is equal to the integral of the form itself over the boundary of the chain.

A Example: the divergence of a vector field

The exterior derivative of a k-form ω on a manifold M is a (k + 1)-form dω on the same manifold. Going from a form to its exterior derivative is analogous to forming the differential of a function or the divergence of a vector field. We recall the definition of divergence.

Let A be a vector field on the oriented euclidean three-space ℝ³, and let S be the boundary of a parallelepiped Π with edges ξ₁, ξ₂, and ξ₃ at the vertex x (Figure 155). Consider the (“outward”) flux of the field A through the surface S:

Figure 155

Definition of divergence of a vector field

If the parallelepiped Π is very small, the flux F is approximately proportional to the product of the volume of the parallelepiped, V = (ξ₁, ξ₂, ξ₃), and the “source density” at the point x. This is the limit

where εΠ is the parallelepiped with edges εξ₁, εξ₂, εξ₃. This limit does not depend on the choice of the parallelepiped Π but only on the point x, and is called the divergence, div A, of the field A at x.

To go to higher-dimensional cases, we note that the “flux of A through a surface element” is the 2-form which we called

. The divergence, then, is the density in the expression for the 3-form

characterizing the “sources in an elementary parallelepiped.”

The exterior derivative dω^k of a k-form ω^k on an n-dimensional manifold M may be defined as the principal multilinear part of the integral of ω^k over the boundaries of (k + 1)-dimensional parallelepipeds.

B Definition of the exterior derivative

We define the value of the form dω on k + 1 vectors ξ₁,...,ξ_{k + 1} tangent to M at x. To do this, we choose some coordinate system in a neighborhood of x on M, i.e., a differentiable map f of a neighborhood of the point 0 in euclidean space ℝⁿ to a neighborhood of x in M (Figure 156).

Figure 156

The curvilinear parallelepiped Π.

The pre-images of the vectors ξ₁,..., ξ_{k + 1} ∈ T M_x under the differential of f lie in the tangent space to ℝⁿ at 0. This tangent space can be naturally identified with ℝⁿ, so we may consider the pre-images to be vectors

We take the parallelepiped Π* in ℝⁿ spanned by these vectors (strictly speaking, we must look at the standard oriented cube in ℝ^{k + 1} and its linear map onto Π*, taking the edges

, as a (k + 1)-dimensional cell in ℝⁿ). The map f takes the parallelepiped Π* to a (k + 1)-dimensional cell on M (a “curvilinear parallelepiped”). The boundary of the cell Π is a k-chain, ∂Π. Consider the integral of the form ω^k on the boundary ∂Π of Π:

Example. We will call a smooth function φ: M → R a 0-form on M. The integral of the 0-form φ on the 0-chain c₀ = ∑ m_iA_i (where the m_i are integers and the A_i points of M) is

Then the definition above gives the “increment” F(ξ₁) = φ(x₁) − φ(x) (Figure 157) of the function φ, and the principal linear part of F(ξ₁) at 0 is simply the differential of φ.

Figure 157
The integral over the boundary of a one-dimensional parallelepiped is the change in the function.

Problem 1. Show that the function F(ξ₁,...,ξ_{k + 1}) is skew-symmetric with respect to ξ.

It turns out that the principal (k + 1)-linear part of the “increment” F(ξ₁,...,ξ_{k + 1}) is an exterior (k + 1)-form on the tangent space TM_x to M at x. This form does not depend on the coordinate system that was used to define the curvilinear parallelepiped Π. It is called the exterior derivative, or differential, of the form ω^k (at the point x) and is denoted by dω^k.

C A theorem on exterior derivatives

Theorem. There is a unique (k + 1)-form Ω on T M_x which is the principal (k + 1)-linear part at 0 of the integral over the boundary of a curvilinear parallelepiped, F(ξ₁,...,ξ_{k + 1}); i.e.,

(1)

The form Ω does not depend on the choice of coordinates involved in the definition of F. If, in the local coordinate system x₁,...,x_n on M, the form ω^k is written as

then Ω is written as

(2)

We will carry out the proof of this theorem for the case of a form ω¹ = a(x₁, x₂)dx₁ on the x₁, x₂ plane. The proof in the general case is entirely analogous, but the calculations are somewhat longer.

We calculate F(ξ, η), i.e., the integral of ω¹ on the boundary of the parallelogram Π with sides ξ and η and vertex at 0 (Figure 158). The chain ∂Π is given by the mappings of the interval 0 ≤ t ≤ 1 to the plane t → ξt, t → ξ + ηt, t → ηt and t → η + ξt with multiplicities 1, 1, −1, and −1. Therefore,

where ξ₁ = dx₁(ξ), η₁ = dx₁(η), ξ₂ = dx₂(ξ), and η₂ = dx₂(η) are the components of the vectors ξ and η. But

(the derivatives are taken at x₁ = x₂ = 0). In the same way

By using these expressions in the integral, we find that

The principal bilinear part of F, as promised in (1), turns out to be the value of the exterior 2-form

on the pair of vectors ξ, η. Thus the form obtained is given by formula (2), since

Finally, if the coordinate system x₁, x₂ is changed to another (Figure 159), the parallelogram Π is changed to a nearby curvilinear parallelogram Π′, so that the difference in the values of the integrals, ∫_∂Π ω¹ − ∫_∂Π′ ω¹ will be small of more than second order (prove it!). ☐

Figure 158

Theorem on exterior derivatives

Figure 159

Independence of the exterior derivative from the coordinate system

Problem 2. Carry out the proof of the theorem in the general case.

Problem 3. Prove the formulas for differentiating a sum and a product:
and

Problem 4. Show that the differential of a differential is equal to zero: dd = 0.

Problem 5. Let f: M → N be a smooth map and ω a k-form on N. Show that f*(dω) = d(f*ω).

D Stokes’ formula

One of the most important corollaries of the theorem on exterior derivatives is the Newton-Leibniz-Gauss-Green-Ostrogradskii-Stokes-Poincaré formula:

(3)

where c is any (k + 1)-chain on a manifold M and ω is any k-form on M.

To prove this formula it is sufficient to prove it for the case when the chain consists of one cell σ. We assume first that this cell σ is given by an oriented parallelepiped Π ⊂ ℝ^{k + 1} (Figure 160).

Figure 160

Proof of Stokes’ formula for a parallelepiped

We partition Π into N^{k + 1} small equal parallelepipeds Π_i similar to Π. Then, clearly,

By formula (1) we have

where

are the edges of Π_i. But

is a Riemann sum for ∫_Π dω. It is easy to verify that o(N^{−(k + 1)}) is uniform, so

Finally, we obtain

Formula (3) follows automatically from this for any chain whose polyhedra are parallelepipeds.

To prove formula (3) for any convex polyhedron D, it is enough to prove it for a simplex,⁵⁸ since D can always be partitioned into simplices (Figure 161):

Figure 161

Division of a convex polyhedron into simplices

We will prove formula (3) for a simplex. Notice that a k-dimensional oriented cube can be mapped onto a k-dimensional simplex so that:

The interior of the cube goes diffeomorphically, with its orientation preserved, onto the interior of the simplex;

The interiors of some (k − 1)-dimensional faces of the cube go diffeomorphically, with their orientations preserved, onto the interiors of the faces of the simplex; the images of the remaining (k − 1)-dimensional faces of the cube lie in the (k − 2)-dimensional faces of the simplex.

For example, for k = 2 such a map of the cube 0 ≤ x₁, x₂ ≤ 1 onto the triangle is given by the formula y₁ = x₁, y₂ = x₁ x₂ (Figure 162). Then, formula (3) for the simplex follows from formula (3) for the cube and the change of variables theorem (cf. Section 35C).

Figure 162

Proof of Stokes’ formula for a simplex

Example 1. Consider the 1-form

on ℝ²ⁿ with coordinates p₁,...,p_n, q₁,...,q_n. Then dω¹ = dp₁ ⋀ dq₁ + ⋯ + dp_n ⋀ dq_n = dp ⋀ dq, so

In particular, if c₂ is a closed surface (∂c₂ = 0), then ∫∫_c2 dp ⋀ dq = 0.

E Example 2—Vector analysis

In a three-dimensional oriented riemannian space M, every vector field A corresponds to a 1-form

and a 2-form

. Therefore, exterior differentiation can be considered as an operation on vectors.

Exterior differentiation of 0-forms (functions), 1-forms, and 2-forms correspond to the operations of gradient, curl, and divergence defined by the relations

(the form ω³ is the volume element on M). Thus, it follows from (3) that

Problem 6. Show that

Hint. By the formula for differentiating the product of forms,

Problem 7. Show that curl grad = div curl = 0.

Hint. dd = 0.

F Appendix 1: Vector operations in triply orthogonal systems

Let x₁, x₂, x₃ be a triply orthogonal coordinate system on M,

and e_i the coordinate unit vectors (cf. Section 34E).

Problem 8. Given the components of a vector field A = A₁e₁ + A₂e₂ + A₃e₃, find the components of its curl.

Solution. According to Section 34E
Therefore,
According to Section 34E, we have
In particular, in cartesian, cylindrical, and spherical coordinates on ℝ³,

Problem 9. Find the divergence of the field A = A₁e₁ + A₂e₂ + A₃e₃.

Solution. . Therefore,
By the definition of divergence,
This means
In particular, in cartesian, cylindrical, and spherical coordinates on ℝ³:

Problem 10. The Laplace operator on M is the operator Λ = div grad. Find its expression in the coordinates x_i.

Answer.
In particular, on ℝ³

G Appendix 2: Closed forms and cycles

The flux of an incompressible fluid (without sources) across the boundary of a region D is equal to zero. We will formulate a higher-dimensional analogue to this obvious assertion. The higher-dimensional analogue of an incompressible fluid is called a closed form. The field A has no sources if div A = 0.

Definition. A differential form ω on a manifold M is closed if its exterior derivative is zero: dω = 0.

In particular, the 2-form

corresponding to a field A without sources is closed. Also, we have, by Stokes’ formula (3):

Theorem. The integral of a closed form ω^k over the boundary of any (k + 1)-dimensional chain c_{k + 1} is equal to zero:

Problem 11. Show that the differential of a form is always closed.

On the other hand, there are closed forms which are not differentials. For example, take for M the three-dimensional euclidean space ℝ³ without O: M = ℝ³ − O, with the 2-form being the flux of the field A = (1/R²)e_R (Figure 163). It is easy to convince oneself that div A = 0, so that our 2-form

is closed. At the same time, the flux over any sphere with center O is equal to 4π. We will show that the integral of the differential of a form over the sphere must be zero.

Figure 163

The field A

Definition. A cycle on a manifold M is a chain whose boundary is equal to zero.

The oriented surface of our sphere can be considered to be a cycle. It immediately follows from Stokes’ formula (3) that

Theorem. The integral of a differential over any cycle is equal to zero:

Thus, our 2-form

is not the differential of any 1-form.

The existence of closed forms on M which are not differentials is related to the topological properties of M. One can show that every closed k-form on a vector space is the differential of some (k − 1)-form (Poincaré’s lemma).

Problem 12. Prove Poincaré’s lemma for 1-forms.

Hint. Consider .

Problem 13. Show that in a vector space the integral of a closed form over any cycle is zero.

Hint. Construct a (k + 1)-chain whose boundary is the given cycle (Figure 164).

Figure 164
Cone over a cycle

Namely, for any chain c consider the “cone over c with vertex 0.” If we denote the operation of constructing a cone by p, then
Therefore, if the chain c is closed, ∂(pc) = c.

Problem. Show that every closed form on a vector space is an exterior derivative.

Hint. Use the cone construction. Let ω^k be a differential k-form on ℝⁿ. We define a (k − 1)-form (the “co-cone over ω”) pω^k in the following way: for any chain c_{k − 1}
It is easy to see that the (k − 1)-form pω^k exists and is unique; its value on the vectors ξ₁,...,ξ_{k − 1}, tangent to ℝⁿ at x, is equal to
It is easy to see that
Therefore, if the form ω^k is closed, d(pω^k) = ω^k.

Problem. Let X be a vector field on M and ω a differential k-form. We define a differential (k − 1)-form i_xω (the interior derivative of ω by X) by the relation

Prove the homotopy formula
where L_x is the differentiation operator in the direction of the field X.

[The action of L_x on a form is defined, using the phase flow {g^t} of the field X, by the relation
L_X is called the Lie derivative or fisherman’s derivative: the flow carries all possible differential-geometric objects past the fisherman, and the fisherman sits there and differentiates them.]

Hint. We denote by H the “homotopy operator” associating to a k-chain γ: σ → M the (k + 1)-chain Hγ: (I × σ) → M according to the formula (Hγ)(t, x) = g^tγ(x) (where I = [0, 1]). Then

Problem. Prove the formula for differentiating a vector product on three-dimensional euclidean space (or on a riemannian manifold):
(where {a, b} = L_ab is the Poisson bracket of the vector fields, cf. Section 39).

Hint. If τ is the volume element, then
by using these relations and the fact that dτ = 0, it is easy to derive the formula for curl[a, b] from the homotopy formula.

H Appendix 3: Cohomology and homology

The set of all k-forms on M is a vector space, the closed k-forms a subspace and the differentials of (k − 1)-forms a subspace of the subspace of closed forms. The quotient space

is called the k-th cohomology group of the manifold M. An element of this group is a class of closed forms differing from one another only by a differential.

Problem 14. Show that for the circle S¹ we have H¹(S¹, ℝ) = ℝ.

The dimension of the space H^k(M, ℝ) is called the k-th Betti number of M.

Problem 15. Find the first Betti number of the torus T² = S¹ × S¹.

The flux of an incompressible fluid (without sources) over the surfaces of two concentric spheres is the same. In general, when integrating a closed form over a k-dimensional cycle, we can replace the cycle with another one provided that their difference is the boundary of a (k + 1)-chain (Figure 165):

if a − b = ∂c_{k + 1} and dω^k = 0.

Figure 165

Homologous cycles

Poincaré called two such cycles a and b homologous.

With a suitable definition⁵⁹ of the group of chains on a manifold M and its subgroups of cycles and boundaries (i.e., cycles homologous to zero), the quotient group

is called the k-th homology group of M.

An element of this group is a class of cycles homologous to one another.

The rank of this group is also equal to the k-th Betti number of M (“De Rham’s Theorem”).

⁵²

It is essential to note that we do not fix any special euclidean structure on ℝⁿ. In some examples we use such a structure; in these cases this will be specifically stated (“euclidean ℝⁿ”).

⁵³

A direct proof of associativity (also containing a proof of Laplace’s theorem) consists of checking the signs in the identity

where i₁ < ⋯ < i_k, j₁ < ⋯ < j_l, h₁ < ⋯ < h_m; (i₁,..., h_m) is a permutation of the numbers (1,..., k + l + m).

⁵⁴

A bilinear form ω² is nondegenerate if ∀ξ ≠ 0, ∃η: ω²(ξ, η) ≠ 0. See Section 41B.

⁵⁵

i.e., one-to-one with a differentiable inverse.

⁵⁶

The cell σ is usually called a singular k-dimensional polyhedron.

⁵⁷

We are taking k > 1 here. One-dimensional chains are included in the general scheme if we make the following definitions: a zero-dimensional chain consists of a collection of points with multiplicities; the boundary of an oriented interval

(the point B with multiplicity 1 and A with multiplicity −1); the boundary of a point is empty.

⁵⁸

A two-dimensional simplex is a triangle, a three-dimensional simplex is a tetrahedron, a k-dimensional simplex is the convex hull of k + 1 points in ℝⁿ which do not lie in any k − 1-dimensional plane.
Example:

⁵⁹

For this our group {c_k} must be made smaller by identifying pieces which differ only by the choice of parametrization f or the choice of polyhedron D. In particular, we may assume that D is always one and the same simplex or cube. Furthermore, we must take every degenerate k-cell (D, f, Or) to be zero, i.e., (D, f, Or) = 0 if f = f₂ · f₁, where f₁ : D → D′ and D′ has dimension smaller than k.

8
Symplectic manifolds

V. I. Arnold¹

(1)

Department of Mathematics Steklov Mathematical Institute, Russian Academy of Sciences, GSP-1, 117966, Moscow, Russia

A symplectic structure on a manifold is a closed nondegenerate differential 2-form. The phase space of a mechanical system has a natural symplectic structure.

On a symplectic manifold, as on a riemannian manifold, there is a natural isomorphism between vector fields and 1-forms. A vector field on a symplectic manifold corresponding to the differential of a function is called a hamiltonian vector field. A vector field on a manifold determines a phase flow, i.e., a one-parameter group of diffeomorphisms. The phase flow of a hamiltonian vector field on a symplectic manifold preserves the symplectic structure of phase space.

The vector fields on a manifold form a Lie algebra. The hamiltonian vector fields on a symplectic manifold also form a Lie algebra. The operation in this algebra is called the Poisson bracket.

37 Symplectic structures on manifolds

We define here symplectic manifolds, hamiltonian vector fields, and the standard symplectic structure on the cotangent bundle.

A Definition

Let M²ⁿ be an even-dimensional differentiable manifold. A symplectic structure on M²ⁿ is a closed nondegenerate differential 2-form ω² on M²ⁿ:

The pair (M²ⁿ, ω²) is called a symplectic manifold.

Example. Consider the vector space ℝ²ⁿ with coordinates p_i, q_i and let ω² = ∑dp_i ⋀ dq_i.

Problem. Verify that (ℝ²ⁿ, ω²) is a symplectic manifold. For n = 1 the pair (ℝ², ω²) is the pair (the plane, area).

The following example explains the appearance of symplectic manifolds in dynamics. Along with the tangent bundle of a differentiable manifold, it is often useful to look at its dual—the cotangent bundle.

B The cotangent bundle and its symplectic structure

Let V be an n-dimensional differentiable manifold. A 1-form on the tangent space to V at a point x is called a cotangent vector to V at x. The set of all cotangent vectors to V at x forms an n-dimensional vector space, dual to the tangent space TV_x. We will denote this vector space of cotangent vectors by T*V_x and call it the cotangent space to V at x.

The union of the cotangent spaces to the manifold at all of its points is called the cotangent bundle of V and is denoted by T*V. The set T*V has a natural structure of a differentiable manifold of dimension 2n. A point of T*V is a 1-form on the tangent space to V at some point of V. If q is a choice of n local coordinates for points in V, then such a form is given by its n components p. Together, the 2n numbers p, q form a collection of local coordinates for points in T*V.

There is a natural projection f: T*V → V (sending every 1-form on TV_x to the point x). The projection f is differentiable and surjective. The pre-image of a point x ∈ V under f is the cotangent space T*V_x.

Theorem. The cotangent bundle T*V has a natural symplectic structure. In the local coordinates described above, this symplectic structure is given by the formula

Proof. First, we define a distinguished 1-form on T*V. Let ξ ∈ T(T*V)_p be a vector tangent to the cotangent bundle at the point p ∈ T*V_x (Figure 166). The derivative f*: T(T*V) → TV of the natural projection f: T*V → V takes ξ to a vector f*ξ tangent to V at x. We define a 1-form ω¹ on T*V by the relation ω¹(ξ) = p(f*ξ). In the local coordinates described above, this form is ω¹ = p dq. By the example in A, the closed 2-form ω² = dω¹ is nondegenerate. □

Figure 166

The 1-form p dq on the cotangent bundle

Remark. Consider a lagrangian mechanical system with configuration manifold V and lagrangian function L. It is easy to see that the lagrangian “generalized velocity” is a tangent vector to the configuration manifold V, and the “generalized momentum” is a cotangent vector. Therefore, the “p, q” phase space of the lagrangian system is the cotangent bundle of the configuration manifold. The theorem above shows that the phase space of a mechanical problem has a natural symplectic manifold structure.

Problem. Show that the Legendre transform does not depend on the coordinate system: it takes a function L: TV → ℝ on the tangent bundle to a function H: T*V → ℝ on the cotangent bundle.

C Hamiltonian vector fields

A riemannian structure on a manifold establishes an isomorphism between the spaces of tangent vectors and 1-forms. A symplectic structure establishes a similar isomorphism.

Definition. To each vector ξ, tangent to a symplectic manifold (M²ⁿ, ω²) at the point x, we associate a 1-form

on TM_x by the formula

Problem. Show that the correspondence is an isomorphism between the 2n-dimensional vector spaces of vectors and of 1-forms.

Example. In ℝ²ⁿ = {(p, q)} we will identify vectors and 1-forms by using the euclidean structure (x, x) = p² + q². Then the correspondence determines a transformation ℝ²ⁿ → ℝ²ⁿ.

Problem. Calculate the matrix of this transformation in the basis p, q.

Answer.

We will denote by I the isomorphism I: T*M_x → TM_x constructed above. Now let H be a function on a symplectic manifold M²ⁿ. Then dH is a differential 1-form on M, and at every point there is a tangent vector to M associated to it. In this way we obtain a vector field I dH on M.

Definition. The vector field I dH is called a hamiltonian vector field; H is called the hamiltonian function.

Example. If M²ⁿ = ℝ²ⁿ = {(p, q)}, then we obtain the phase velocity vector field of Hamilton’s canonical equations:

38 Hamiltonian phase flows and their integral invariants

Liouville’s theorem asserts that the phase flow preserves volume. Poincaré found a whole series of differential forms which are preserved by the hamiltonian phase flow.

A Hamiltonian phase flows preserve the symplectic structure

Let (M²ⁿ, ω²) be a symplectic manifold and H: M²ⁿ → ℝ a function. Assume that the vector field I dH corresponding to H gives a 1-parameter group of diffeomorphisms g^t: M²ⁿ → M²ⁿ_^:

The group g^t is called the hamiltonian phase, flow with hamiltonian function H.

Theorem. A hamiltonian phase fiow preserves the symplectic structure:

In the case n = 1, M²ⁿ = ℝ², this theorem says that the phase flow g^t preserves area (Liouville’s theorem).

For the proof of this theorem, it is useful to introduce the following notation (Figure 167).

Figure 167

Track of a cycle under homotopy

Definition. The vector field I dH is called a hamiltonian vector field; H is called the hamiltonian function.

Let M be an arbitrary manifold, c a k-chain on M and g^t: M → M a one-parameter family of differentiable mappings. We will construct a (k + 1)-chain Jc on M, which we will call the track of the chain c under the homotopy g^t, 0 ≤ t ≤ τ.

Let (D, f, Or) be one of the cells in the chain c. To this cell will be associated a cell (D′, f′, Or′) in the chain Jc, where D′ = I × D is the direct product of the interval 0 ≤ t ≤ τ and D; the mapping f′: D′ → M is obtained from f: D → M by the formula f′(t, x) = g^tf(x); and the orientation Or′ of the space ℝ^{k + 1} containing D′ is given by the frame e₀, e₁,..., e_k, where e₀ is the unit vector of the t axis, and e₁,..., e_k is an oriented frame for D.

We could say that Jc is the chain swept out by c under the homotopy g^t, 0 ≤ t ≤ τ. The boundary of the chain Jc consists of “end-walls” made up of the initial and final positions of c, and “side surfaces” filled in by the boundary of c.

It is easy to verify that under the choice of orientation made above,

(1)

Lemma. Let γ be a 1-chain in the symplectic manifold (M²ⁿ, ω²). Let g^t be a phase fiow on M with hamiltonian function H. Then

Proof. It is sufficient to consider a chain γ with one cell f: [0, 1] → M. We introduce the notation

By the definition of the integral

But by the definition of the phase flow, η is a vector (at the point f′(s, t)) of the hamiltonian field with hamiltonian function H. By definition of a hamiltonian field, ω²(ξ, η) = dH(ξ). Thus

□

Corollary. If the chain γ is closed (∂γ = 0), then ∫_Jγω² = 0.

Proof. ∫_γ dH = ∫_∂γ H = 0. □

Proof of the theorem. We consider any 2-chain c. We have

(1 since ω² is closed, 2 by Stoke’s, formula, 3 by formula (1), 4 by the corollary above with γ = ∂c). Thus the integrals of the form ω² on any chain c and on its image g^τc are the same. □

Problem. Is every one-parameter group of diffeomorphisms of M²ⁿ which preserves the symplectic structure a hamiltonian phase flow?

Hint. Cf. Section 40.

B Integral invariants

Let g: M → M be a differentiable map.

Definition. A differential k-form ω is called an integral invariant of the map g if the integrals of ω on any k-chain c and on its image under g are the same:

Example. If M = ℝ² and ω² = dp ⋀ dq is the area element, then ω² is an integral invariant of any map g with jacobian 1.

Problem. Show that a form ω^k is an integral invariant of a map g if and only if g*ω^k = ω^k.

Problem. Show that if the forms ω^k and ω^t are integral invariants of the map g, then the form ω^k ⋀ ω^t is also an integral invariant of g.

The theorem in subsection A can be formulated as follows:

Theorem. The form ω² giving the symplectic structure is an integral invariant of a hamiltonian phase flow.

We now consider the exterior powers of ω²,

Corollary. Each of the forms (ω²)², (ω²)³, (ω²)⁴,... is an integral invariant of a hamiltonian phase flow.

Problem. Suppose that the dimension of the symplectic manifold (M²ⁿ, ω²) is 2n. Show that (ω²)^k = 0 for k > n, and that (ω²)ⁿ is a nondegenerate 2n-form on M²ⁿ.

We define a volume element on M²ⁿ using (ω²)ⁿ. Then, a hamiltonian phase flow preserves volume, and we obtain Liouville’s theorem from the corollary above.

Example. Consider the symplectic coordinate space M²ⁿ = ℝ²ⁿ = {(p, q)}, ω² = dp ⋀ dq = ∑dp_i ⋀ dq_i. In this case the form (ω²)^k is proportional to the form

The integral of ω^2k is equal to the sum of the oriented volumes of projections onto the coordinate planes (p_i₁,..., p_{i_k}, q_i₁,..., q_{i_k}).

A map g: ℝ²ⁿ → ℝ²ⁿ is called canonical if it has ω² as an integral invariant. A canonical map is generally called a canonical transformation. Each of the forms ω⁴, ω⁶,..., ω²ⁿ is an integral invariant of every canonical transformation. Therefore, under a canonical transformation, the sum of the oriented areas of projections onto the coordinate planes p_{i_k}, q_i₁,..., q_{i_k}), 1 ≤ k ≤ n, is preserved. In particular, canonical transformations preserve volume.

The hamiltonian phase flow given by the equations

consists of canonical transformations g^t.

The integral invariants considered above are also called absolute integral invariants.

Definition. A differential k-form ω is called a relative integral invariant of the map g: M → M if ∫_gcω = ∫_cω for every closed k-chain c.

Theorem. Let ω be a relative integral invariant of a map g. Then dω is an absolute integral invariant of g.

Proof. Let c be a k + 1-chain. Then

(1 and 4 are by Stokes’ formula, 2 by the definition of relative invariant, and 3 by the definition of boundary). ☐

Example. A canonical map g: ℝ²ⁿ → ℝ²ⁿ has the 1-form
In fact, every closed chain c on ℝ²ⁿ is the boundary of some chain σ, and we find
(1 and 6 are by definition of σ, 2 by definition of ∂, 3 and 5 by Stokes’ formula, and 4 since g is canonical and dω¹ = d(p dq) = dq ⋀ dq = ω²).

Problem. Let dω^k be an absolute integral invariant of the map g: M → M. Does it follow that ω^k is a relative integral invariant?

Answer. No, if there is a closed k-chain on M which is not a boundary.

C The law of conservation of energy

Theorem. The function H is a first integral of the hamiltonian phase flow with hamiltonian function H.

Proof. The derivative of H in the direction of a vector η is equal to the value of dH on η. By definition of the hamiltonian field η = I dH we find

□

Problem. Show that the 1-form dH is an integral invariant of the phase flow with hamiltonian function H.

39 The Lie algebra of vector fields

Every pair of vector fields on a manifold determines a new vector field, called their Poisson bracket.⁶⁰ The Poisson bracket operation makes the vector space of infinitely differentiable vector fields on a manifold into a Lie algebra.

A Lie algebras

One example of a Lie algebra is a three-dimensional oriented euclidean vector space equipped with the operation of vector multiplication. The vector product is bilinear, skew-symmetric, and satisfies the Jacobi identity

Definition. A Lie algebra is a vector space L, together with a bilinear skew-symmetric operation L × L → L which satisfies the Jacobi identity.

The operation is usually denoted by square brackets and called the commutator.

Problem. Show that the set of n × n matrices becomes a Lie algebra if we define the commutator by [A, B] = AB − BA.

B Vector fields and differential operators

Let M be a smooth manifold and A a smooth vector field on M: at every point x ∈ M we are given a tangent vector A(x) ∈ TM_x. With every such vector field we associate the following two objects:

The one-parameter group of diffeomorphisms or flow A^t: M → M for which A is the velocity vector field (Figure 168):⁶¹

Figure 168

The group of diffeomorphisms given by a vector field

The first-order differential operator L_A. We refer here to the differentiation of functions in the direction of the field A: for any function φ: M → ℝ the derivative in the direction of A is a new function L_Aφ, whose value at a point x is

Problem. Show that the operator L_A is linear:
Also, prove Leibniz’s formula L_A(φ₁φ₂) = φ₁L_Aφ₂ + φ₂L_Aφ₁.

Example. Let (x₁,..., x_n) be local coordinates on M. In this coordinate system the vector A(x) is given by its components (A₁(x),..., A_n(x)); the flow A^t is given by the system of differential equations
and, therefore, the derivative of φ = φ(x₁,..., x_n) in the direction A is
We could say that in the coordinates (x₁,..., x_n) the operator L_A has the form
this is the general form of a first-order linear differential operator on coordinate space.

Problem. Show that the correspondences between vector fields A, flows A^t, and differentiations L_A are one-to-one.

C The Poisson bracket of vector fields

Suppose that we are given two vector fields A and B on a manifold M. The corresponding flows A^t and B^s do not, in general, commute: A^tB^s ≠ B^sA^t (Figure 169).

Figure 169

Non-commutative flows

Problem. Find an example.

Solution. The fields A = e₁, B = x₁e₂ on the (x₁, x₂) plane.

To measure the degree of noncommutativity of the two flows A^t and B^s we consider the points A^tB^sx and B^sA^tx. In order to estimate the difference between these points, we compare the value at them of some smooth function φ on the manifold M. The difference

is clearly a differentiable function which is zero for s = 0 and for t = 0. Therefore, the first term different from 0 in the Taylor series in s and t of Δ at 0 contains st, and the other terms of second order vanish. We will calculate this principal bilinear term of Δ at 0.

Lemma 1. The mixed partial derivative ∂²Δ/∂s ∂t at 0 is equal to the commutator of differentiation in the directions A and B:

Proof. By the definition of L_A,

If we denote the function L_A φ by ψ, then by the definition of L_B

Thus,

□

We now consider the commutator of differentiation operators L_BL_A − L_AL_B. At first glance this is a second-order differential operator.

Lemma 2. The operator L_BL_A − L_AL_B is a first-order linear differential operator.

Proof. Let (A₁,..., A_n) and (B₁,..., B_n) be the components of the fields A and B in the local coordinate system (x₁,..., x_n) on M. Then

If we subtract L_AL_Bφ, the term with the second derivatives of φ vanishes, and we obtain

□

Since every first-order linear differential operator is given by a vector field, our operator L_BL_A − L_AL_B also corresponds to some vector field C.

Definition. The Poisson bracket or commutator of two vector fields A and B on a manifold M⁶² is the vector field C for which

The Poisson bracket of two vector fields is denoted by

Problem. Suppose that the vector fields A and B are given by their components A_i, B_i in coordinates x_i. Find the components of the Poisson bracket.

Solution. In the proof of Lemma 2 we proved the formula

Problem. Let A₁ be the linear vector field of velocities of a rigid body rotating with angular velocity ω₁ around 0, and A₂ the same thing with angular velocity ω₂ . Find the Poisson bracket [A₁, A₂].

D The Jacobi identity

Theorem. The Poisson bracket makes the vector space of vector fields on a manifold M into a Lie algebra.

Proof. Linearity and skew-symmetry of the Poisson bracket are clear. We will prove the Jacobi identity. By definition of Poisson bracket, we have

There will be 12 terms in all in the sum L_{[[A, B], C]} + L_{[[B, C], A]} + L_{[[C, A], B]} . Each term appears in the sum twice, with opposite signs. □

E A condition for the commutativity of flows

Let A and B be vector fields on a manifold M.

Theorem. The two flows A^t and B^s commute if and only if the Poisson bracket of the corresponding vector fields [A, B] is equal to zero.

Proof. If A^tB^s ≡ B^sA^t, then [A, B] = 0 by Lemma 1. If [A, B] = 0, then, by Lemma 1,

for any function φ at any point x. We will show that this implies φ(A^tB^sx) = φ(B^sA^tx) for sufficiently small s and t. If we apply this to the local coordinates (φ = x₁,..., φ = x_n), we obtain A^tB^s = B^sA^t.

Consider the rectangle 0 ≤ t ≤ t₀, 0 ≤ s ≤ s₀ (Figure 170) in the t, s-plane. To every path going from (0, 0) to (t₀, s₀) and consisting of a finite number of intervals in the coordinate directions, we associate a product of transformations of the flows A^t and B^s. Namely, to each interval t₁ ≤ t ≤ t₂ we associate A^t₂−t₁, and to each interval s₁ ≤ s ≤ s₂ we associate B^s^₂−s_¹; the transformations are applied in the order in which the intervals occur in the path, beginning at (0, 0). For example, the sides (0 ≤ t ≤ t₀, s = 0) and (t = t₀, 0 ≤ s ≤ s₀) corresponds to the product B^s₀A^t₀, and the sides (t = 0, 0 ≤ s ≤ s₀) and (s = s₀, 0 ≤ t ≤ t₀) to the product A^t₀B^s₀.

Figure 170
Proof of the commutativity of flows

In addition, we associate to each such path in the (t, s)-plane a path on the manifold M starting at the point x and composed of trajectories of the flows A^t and B^s (Figure 171). If a path in the (t, s)-plane corresponds to the product A^t₁B^s₁ ··· A^t_nB^s_n, then on the manifold M the corresponding path ends at the point A^t₁B^s₁x ··· A^t_nB^s_nx. Our goal will be to show that all these paths actually terminate at the one point A^t₀B^s₀x = B^s₀A^t₀x.

Figure 171
Curvilinear quadrilateral βγδεα

We partition the intervals 0 ≤ t ≤ t₀ and 0 ≤ s ≤ s₀ into N equal parts, so that the whole rectangle is divided into N² small rectangles. The passage from the sides (0, 0) − (t₀, 0) − (t₀, s₀) to the sides (0, 0) − (0, s₀) − (t₀, s₀) can be accomplished in N² steps, in each of which a pair of neighboring sides of a small rectangle is exchanged for the other pair (Figure 172). In general, this small rectangle corresponds to a non-closed curvilinear quadrilateral βγδεα on the manifold M (Figure 171). Consider the distance⁶³ between its vertices α and β corresponding to the largest values of s and t. As we saw earlier, ρ(α, β) ≤ C₁N⁻³ (where the constant C₁ > 0 does not depend on N). Using the theorem of the differentiability of solutions of differential equations with respect to the initial data, it is not difficult to derive from this a bound on the distance between the ends α′ and β′ of the paths xδγββ′ and xδεαα′ on M: ρ(α′,β′) < C₂ N^{− 3}, where the constant C₂ > 0 again does not depend on N. But we broke up the whole journey from B^s₀A^t₀x to A^t₀B^s₀x into N² such pieces. Thus, ρ(A^t₀B^s_⁰x, B^s₀A^t₀x) < N²C₂N⁻³ ∀N. Therefore, A^t₀B^s_⁰x = B^s₀A^t₀x. □

Figure 172
Going from one pair of sides to the other

F Appendix: Lie algebras and Lie groups

A Lie group is a group G which is a differentiable manifold, and for which the operations (product and inverse) are differentiable maps G × G → G and G → G.

The tangent space, TG_e, to a Lie group G at the identity has a natural Lie algebra structure; it is defined as follows:

For each tangent vector A ∈ TG_e there is a one-parameter subgroup A^t ⊂ G with velocity vector A = (d/dt)|_{t = 0}A^t.

The degree of non-commutativity of two subgroups A^t and B^t is measured by the product A^tB^sA^−tB^−s. It turns out that there is one and only one subgroup C^r for which

The corresponding vector C = (d/dr)|_{r = 0}C^r is called the Lie bracket C = [A, B] of the vectors A and B. It can be verified that the operation of Lie bracket introduced in this way makes the space TG_e into a Lie algebra (i.e., the operation is bilinear, skew-symmetric, and satisfies the Jacobi identity). This algebra is called the Lie algebra of the Lie group G.

Problem. Compute the bracket operation in the Lie algebra of the group SO(3) of rotations in three-dimensional euclidean space.

Lemma 1 shows that the Poisson bracket of vector fields can be defined as the Lie bracket for the “infinite-dimensional Lie group” of all diffeomorphisms⁶⁴ of the manifold M.

On the other hand, the Lie bracket can be defined using the Poisson bracket of vector fields on a Lie group G. Let g ∈ G. Right translation R_g is the map R_g: G → G, R_gh = hg. The differential of R_g at the point e maps TG_e into TG_g. In this way, every vector A ∈ TG_e corresponds to a vector field on the group: it consists of the right translations (R_g)_* A and is called a right-invariant vector field. Clearly, a right-invariant vector field on a group is uniquely determined by its value at the identity.

Problem. Show that the Poisson bracket of right-invariant vector fields on a Lie group G is a right-invariant vector field, and its value at the identity of the group is equal to the Lie bracket of the values of the original vector fields at the identity.

40 The Lie algebra of hamiltonian functions

The hamiltonian vector fields on a symplectic manifold form a subalgebra of the Lie algebra of all fields. The hamiltonian functions also form a Lie algebra: the operation in this algebra is called the Poisson bracket of functions. The first integrals of a hamiltonian phase flow form a subalgebra of the Lie algebra of hamiltonian functions.

A The Poisson bracket of two functions

Let (M²ⁿ, ω²) be a symplectic manifold. To a given function H: M²ⁿ → ℝ on the symplectic manifold there corresponds a one-parameter group

of canonical transformations of M²ⁿ—the phase flow of the hamiltonian function equal to H. Let F: M²ⁿ → ℝ be another function on M²ⁿ.

Definition. The Poisson bracket (F, H) of functions F and H given on a symplectic manifold (M²ⁿ, ω²) is the derivative of the function F in the direction of the phase flow with hamiltonian function H:

Thus, the Poisson bracket of two functions on M is again a function on M.

Corollary 1. A function F is a first integral of the phase flow with hamiltonian function H if and only if its Poisson bracket with H is identically zero: (F, H) ≡ 0.

We can give the definition of Poisson bracket in a slightly different form if we use the isomorphism I between 1-forms and vector fields on a symplectic manifold (M²ⁿ, ω²). This isomorphism is defined by the relation (cf. Section 37)

The velocity vector of the phase flow

is I dH. This implies

Corollary 2. The Poisson bracket of the functions F and H is equal to the value of the 1-form dF on the velocity vector I dH of the phase flow with hamiltonian function H:

Using the preceding formula again, we obtain

Corollary 3. The Poisson bracket of the functions F and H is equal to the “skew scalar product” of the velocity vectors of the phase flows with hamiltonian functions H and F:

It is now clear that

Corollary 4. The Poisson bracket of the functions F and H is a skew-symmetric bilinear function of F and H:

and

Although the arguments above are obvious, they lead to nontrivial deductions, including the following generalization of a theorem of E. Noether.

Theorem. If a hamiltonian function H on a symplectic manifold (M²ⁿ, ω²) admits the one-parameter group of canonical transformations given by a hamiltonian F, then F is a first integral of the system with hamiltonian function H.

Proof. Since H is a first integral of the flow

(Corollary 1). Therefore, (F, H) = 0 (Corollary 4) and F is a first integral (Corollary 1). □

Problem 1. Compute the Poisson bracket of two functions F and H in the canonical coordinate space ℝ²ⁿ = {(p, q)}, ω²(ξ, η) = (Iξ, η).

Solution. By Corollary 3 we have
(we use the fact that I is symplectic and has the form
in the basis (p, q)).

Problem 2. Compute the Poisson brackets of the basic functions p_i and q_j.

Solution. The gradients of the basic functions form a “symplectic basis”: their skew-scalar products are

Problem 3. Show that the map A: ℝ²ⁿ → ℝ²ⁿ sending (p, q) → (P(p, q), Q(p, q)) is canonical if and only if the Poisson brackets of any two functions in the variables (p, q) and (P, Q) coincide:

Solution. Let A be canonical. Then the symplectic structures dp ∧ dq and dP ∧ dQ coincide. But the definition of the Poisson bracket (F, H) was given invariantly in terms of the symplectic structure; it did not involve the coordinates. Therefore,
Conversely, suppose that the Poisson brackets (P_i, Q_j)_{p, q} have the standard form of Problem 2. Then, clearly, dP ∧ dQ = dp ∧ dq, i.e., the map A is canonical.

Problem 4. Show that the Poisson bracket of a product can be calculated by Leibniz’s rule:

Hint. The Poisson bracket (F₁F₂, H) is the derivative of the product F₁F₂ in the direction of the field I dH.

B The Jacobi identity

Theorem. The Poisson bracket of three functions A, B, and C satisfies the Jacobi identity:

Corollary (Poisson’s theorem). The Poisson bracket of two first integrals F₁, F₂ of a system with hamiltonian function H is again a first integral.

Proof of the corollary. By the Jacobi identity,

as was to be shown. □

In this way, by knowing two first integrals we can find a third, fourth, etc. by a simple computation. Of course, not all the integrals we get will be essentially new, since there cannot be more than 2n independent functions on M²ⁿ. Sometimes we may get functions of old integrals or constants, which may be zero. But sometimes we do obtain new integrals.

Problem. Calculate the Poisson brackets of the components p₁, p₂, p₃, M₁, M₂, M₃ of the linear and angular momentum vectors of a mechanical system.

Answer. (M₁, M₂) = M₃, (M₁, p₁) = 0, (M₁, p₂) = p₃, (M₁, p₃) = −p₂. This implies

Theorem. If two components, M₁ and M₂, of the angular momentum of some mechanical problem are conserved, then the third component is also conserved.

Proof of the Jacobi identity. Consider the sum

This sum is a “linear combination of second partial derivatives” of the functions A, B, and C. We will compute the terms in the second derivatives of A:

where L_ξ is differentiation in the direction of ξ and F is the hamiltonian field with hamiltonian function F.

But, by Lemma 2, Section 39, the commutator of the differentiations L_CL_B − L_BL_C is a first-order differential operator. This means that none of the second derivatives of A are contained in our sum. The same thing is true for the second derivatives of B and C. Therefore, the sum is zero. □

Corollary 5. Let B and C be hamiltonian fields with hamiltonian functions B and C. Consider the Poisson bracket [B, C] of the vector fields. This vector field is hamiltonian, and its hamiltonian function is equal to the Poisson bracket of the hamiltonian functions (B, C).

Proof. Set (B, C) = D. The Jacobi identity can be rewritten in the form

as was to be shown. □

C The Lie algebras of hamiltonian fields, hamiltonian functions, and first integrals

A linear subspace of a Lie algebra is called a subalgebra if the commutator of any two elements of the subspace belongs to it. A subalgebra of a Lie algebra is itself a Lie algebra. The preceding corollary implies, in particular,

Corollary 6. The hamiltonian vector fields on a symplectic manifold form a subalgebra of the Lie algebra of all vector fields.

Poisson’s theorem on first integrals can be re-formulated as

Corollary 7. The first integrals of a hamiltonian phase flow form a subalgebra of the Lie algebra of all functions.

The Lie algebra of hamiltonian functions can be mapped naturally onto the Lie algebra of hamiltonian vector fields. To do this, to every function H we associate the hamiltonian vector field H with hamiltonian function H.

Corollary 8. The map of the Lie algebra of functions onto the Lie algebra of hamiltonian fields is an algebra homomorphism. Its kernel consists of the locally constant functions. If M²ⁿ is connected, the kernel is one-dimensional and consists of constants.

Proof. Our map is linear. Corollary 5 says that our map carries the Poisson bracket of functions into the Poisson bracket of vector fields. The kernel consists of functions H for which I dH ≡ 0. Since I is an isomorphism, dH ≡ 0 and H = const. □

Corollary 9. The phase flows with hamiltonian functions H₁ and H₂ commute if and only if the Poisson bracket of the functions H₁ and H₂ is (locally) constant.

Proof. By the theorem in Section 39, E, it is necessary and sufficient that [H₁, H₂] ≡ 0, and by Corollary 8 this condition is equivalent to d(H₁, H₂) ≡ 0. □

We obtain yet another generalization of E. Noether’s theorem: given a flow which commutes with the one under consideration, one can construct a first integral.

D Locally hamiltonian vector fields

Let (M²ⁿ, ω²) be a symplectic manifold and g^t: M²ⁿ → M²ⁿ a one-parameter group of diffeomorphisms preserving the symplectic structure. Will g^t be a hamiltonian flow?

Example. Let M²ⁿ be a two-dimensional torus T², a point of which is given by a pair of coordinates (p, q)mod 1. Let ω² be the usual area element dp ∧ dq. Consider the family of translations g^t(p, q) = (p + t, q) (Figure 173). The maps g^t preserve the symplectic structure (i.e., area). Can we find a hamiltonian function corresponding to the vector field ( = 1, = 0)? If and , we would have ∂H/∂p = 0 and ∂H/∂q = −1, i.e., H = −q + C. But q is only a local coordinate on T²; there is no map H: T² → ℝ for which ∂H/∂p = 0 and ∂H/∂q = 1. Thus g^t is not a hamiltonian phase flow.

Figure 173
A locally hamiltonial field on the torus

Definition. A locally hamiltonian vector field on a symplectic manifold (M²ⁿ, ω²) is the vector field Iω¹, where ω¹ is a closed 1-form on M²ⁿ.

Locally, a closed 1-form is the differential of a function, ω¹ = dH. However, in attempting to extend the function H to the whole manifold M²n we may obtain a “many-valued hamiltonian function,” since a closed 1-form on a non-simply-connected manifold may not be a differential (for example, the form dq on T²). A phase flow given by a locally hamiltonian vector field is called a locally hamiltonian flow.

Problem. Show that a one-parameter group of diffeomorphisms of a symplectic manifold preserves the symplectic structure if and only if it is a locally hamiltonian phase flow.

Hint. Cf. Section 38A.

Problem. Show that in the symplectic space ℝ²ⁿ, every one-parameter group of canonical diffeomorphisms (preserving dp ⋀ dq) is a hamiltonian flow.

Hint. Every closed 1-form on ℝ²ⁿ is the differential of a function.

Problem. Show that the locally hamiltonian vector fields form a sub-algebra of the Lie algebra of all vector fields. In addition, the Poisson bracket of two locally hamiltonian fields is actually a hamiltonian field, with a hamiltonian function uniquely⁶⁵ determined by the given fields ξ and η by the formula H = ω²(ξ, η). Thus, the hamiltonian fields form an ideal in the Lie algebra of locally hamiltonian fields.

41 Symplectic geometry

A euclidean structure on a vector space is given by a symmetric bilinear form, and a symplectic structure by a skew-symmetric one. The geometry of a symplectic space is different from that of a euclidean space, although there are many similarities.

A Symplectic vector spaces

Let ℝ²ⁿ be an even-dimensional vector space.

Definition. A symplectic linear structure on ℝ²ⁿ is a nondegenerate⁶⁶ bilinear skew-symmetric 2-form given in ℝ²ⁿ. This form is called the skew-scalar product and is denoted by [ξ, η] = −[η, ξ]. The space ℝ²ⁿ, together with the symplectic structure [,], is called a symplectic vector space.

Example. Let (p₁,..., p_n, q₁,..., q_n) be coordinate functions on ℝ²ⁿ, and ω² the form

Since this form is nondegenerate and skew-symmetric, it can be taken for a skew-scalar product: [ξ, η] = ω²(ξ, η). In this way the coordinate space ℝ²ⁿ = {(p, q)} receives a symplectic structure. This structure is called the standard symplectic structure. In the standard symplectic structure the skew-scalar product of two vectors ξ and η is equal to the sum of the oriented areas of the parallelogram (ξ, η) on the n coordinate planes (p_i, q_i).

Two vectors ξ and η in a symplectic space are called skew-orthogonal (ξ ∠ η) if their skew-scalar product is equal to zero.

Problem. Show that ξ ∠ ξ: every vector is skew-orthogonal to itself.

The set of all vectors skew-orthogonal to a given vector η is called the skew-orthogonal complement to η.

Problem. Show that the skew-orthogonal complement to η is a 2n − 1-dimensional hyperplane containing η.

Hint. If all vectors were skew-orthogonal to η, then the form [,] would be degenerate.

B The symplectic basis

A euclidean structure under a suitable choice of basis (it must be orthonormal) is given by a scalar product in a particular standard form. In exactly the same way, a symplectic structure takes the standard form indicated above in a suitable basis.

Problem. Find the skew-scalar product of the basis vectors e_pi and e_qi (i = 1 ..., n) in the example presented above.

Solution. The relations
(1)
follow from the definition of p₁ ∧ q₁ + ... + p_n ∧ q_n.

We now return to the general symplectic space.

Definition. A symplectic basis is a set of 2n vectors, e_pi, e_qi (i = 1,..., n) whose scalar products have the form (1).

In other words, every basis vector is skew-orthogonal to all the basis vectors except one, associated to it; its product with the associated vector is equal to ±1.

Theorem. Every symplectic space has a symplectic basis. Furthermore, we can take any nonzero vector e for the first basis vector.

Proof. This theorem is entirely analogous to the corresponding theorem in euclidean geometry and is proved in almost the same way.

Since the vector e is not zero, there is a vector f not skew-orthogonal to it (the form [,] is nondegenerate). By choosing the length of this vector, we can insure that its skew-scalar product with e is equal to 1. In the case n = 1, the theorem is proved.

If n > 1, consider the skew-orthogonal complement D (Figure 174) to the pair of vectors e, f. D is the intersection of the skew-orthogonal complements to e and f. These two 2n − 1-dimensional spaces do not coincide, since e is not in the skew-orthogonal complement to f. Therefore, their intersection has even dimension 2n − 2.

Figure 174

Skew-orthogonal complement

We will show that D is a symplectic subspace of ℝ²ⁿ, i.e., that the skew-scalar product [,] restricted to D is nondegenerate. If a vector ξ ∈ D were skew-orthogonal to the whole subspace D, then since it would also be skew-orthogonal to e and to f, ξ would be skew-orthogonal to ℝ²ⁿ, which contradicts the nondegeneracy of [,] on ℝ²ⁿ. Thus D^{2n − 2} is symplectic.

Now if we adjoin the vectors e and f to a symplectic basis for D^{2n − 2} we get a sympletic basis for ℝ²ⁿ, and the theorem is proved by induction on n. □

Corollary. All symplectic spaces of the same dimension are isomorphic.

If we take the vectors of a symplectic basis as coordinate unit vectors, we obtain a coordinate system p_i, q_i in which [,] takes the standard form p₁ ∧ q₁ + ⋯ + p_n ∧ q_n. Such a coordinate system is called symplectic.

C The symplectic group

To a euclidean structure we associated the orthogonal group of linear mappings which preserved the euclidean structure. In a symplectic space the symplectic group plays an analogous role.

Definition. A linear transformation S: ℝ²ⁿ → ℝ²ⁿ of the symplectic space ℝ²ⁿ to itself is called symplectic if it preserves the skew-scalar product:

The set of all symplectic transformations of ℝ²ⁿ is called the symplectic group and is denoted by Sp(2n).

It is clear that the composition of two symplectic transformations is symplectic. To justify the term symplectic group, we must only show that a symplectic transformation is nonsingular; it is then clear that the inverse is also symplectic.

Problem. Show that the group Sp(2) is isomorphic to the group of real two-by-two matrices with determinant 1 and is homeomorphic to the interior of a solid three-dimensional torus.

Theorem. A transformation S: ℝ²ⁿ → ℝ²ⁿ of the standard symplectic space (p, q) is symplectic if and only if it is linear and canonical, i.e., preserves the differential 2-form

Proof. Under the natural identification of the tangent space to ℝ²ⁿ with ℝ²ⁿ, the 2-form ω² goes to [,]. □

Corollary. The determinant of any symplectic transformation is equal to 1.

Proof. We already know (Section 38B) that canonical maps preserve the exterior powers of the form ω². But its n-th exterior power is (up to a constant multiple) the volume element on ℝ²ⁿ. This means that symplectic transformations S of the standard ℝ²ⁿ = {(p, q)} preserve the volume element, so det S = 1. But since every symplectic linear structure can be written down in standard form in a symplectic coordinate system, the determinant of a symplectic transformation of any symplectic space is equal to 1. □

Theorem. A linear transformation S: ℝ²ⁿ → ℝ²ⁿ is symplectic if and only if it takes some (and therefore any) symplectic basis into a symplectic basis.

Proof. The skew-scalar product of any two linear combinations of basis vectors can be expressed in terms of skew-scalar products of basis vectors. If the transformation does not change the skew-scalar products of basis vectors, then it does not change the skew-scalar products of any vectors. □

D Planes in symplectic space

In a euclidean space all planes are equivalent: each of them can be carried into any other one by a motion. We will now look at a symplectic vector space from this point of view.

Problem. Show that a nonzero vector in a symplectic space can be carried into any other nonzero vector by a symplectic transformation.

Problem. Show that not every two-dimensional plane of the symplectic space ℝ²ⁿ can be obtained from a given 2-plane by a symplectic transformation.

Hint. Consider the planes (p₁, p₂) and (p₁, q₁).

Definition. A k-dimensional plane (i.e., subspace) of a symplectic space is called null⁶⁷ if it is skew-orthogonal to itself, i.e., if the skew-scalar product of any two vectors of the plane is equal to zero.

Example. The coordinate plane (p₁,..., p_k) in the symplectic coordinate system p, q is null. (Prove it!)

Problem. Show that any non-null two-dimensional plane can be carried into any other non-null two-plane by a symplectic transformation.

For calculations in symplectic geometry it may be useful to impose some euclidean structure on the symplectic space. We fix a symplectic coordinate system p, q and introduce a euclidean structure using the coordinate scalar product

The symplectic basis e_p, e_q is orthonormal in this euclidean structure. The skew-scalar product, like every bilinear form, can be expressed in terms of the scalar product by

(2)

where I: ℝ²ⁿ → ℝ²ⁿ is some operator. It follows from the skew-symmetry of the skew-scalar product that the operator I is skew-symmetric.

Problem. Compute the matrix of the operator I in the symplectic basis e_pi, e_qi.

Answer.
where E is the n × n identity matrix.

Thus, for n = 1 (in the p, q-plane), I is simply rotation by 90°, and in the general case I is rotation by 90° in each of the n planes p_i, q_i.

Problem. Show that the operator I is symplectic and that I² = −E_2n.

Although the euclidean structures and the operator I are not invariantly associated to a symplectic space, they are often convenient.

The following theorem follows directly from (2).

Theorem. A plane π of a symplectic space is null if and only if the plane Iπ is orthogonal to π.

Notice that the dimensions of the planes π and Iπ are the same, since I is nonsingular. Hence

Corollary. The dimension of a null plane in ℝ²ⁿ is less than or equal to n.

This follows since the two k-dimensional planes π and Iπ cannot be orthogonal if k > n.

We consider more carefully the n-dimensional null planes in the symplectic coordinate space ℝ²ⁿ. An example of such a plane is the coordinate p-plane. There are in all

n-dimensional coordinate planes in ℝ²ⁿ = {(p, q)}.

Problem. Show that there are 2ⁿ null planes among the n-dimensional coordinate planes: to each of the 2ⁿ partitions of the set (1,..., n) into two parts (i₁,..., i_k), (j₁,..., j_{n − k}) we associate the null coordinate plane p_i₁,..., p_ik, q_j₁,..., q_{j_{n − k}}.

In order to study the generating functions of canonical transformations we need

Theorem. Every n-dimensional null plane π in the symplectic coordinate space ℝ²ⁿ is transverse⁶⁸ to at least one of the 2ⁿ coordinate null planes.

Proof. Let P be the null plane p₁,..., p_n (Figure 175). Consider the intersection τ = π ∩ P. Suppose that the dimension of τ is equal to k, 0 ≤ k ≤ n. Like every k-dimensional subspace of the n-dimensional space, the plane τ is transverse to at least one (n − k)-dimensional coordinate plane in P, let us say the plane

Figure 175

Construction of a coordinate plane σ transversal to a given plane π.

We now consider the null n-dimensional coordinate plane

and show that our plane π is transverse to σ:

We have

But P is an n-dimensional null plane. Therefore, every vector skew-orthogonal to P belongs to P (cf. the corollary above). Thus (π ∩ σ) ⊂ P. Finally,

as was to be shown. □

Problem. Let π₁ and π₂ be two k-dimensional planes in symplectic ℝ²ⁿ. Is it always possible to carry π₁ to π₂ by a symplectic transformation? How many classes of planes are there which cannot be carried one into another?

Answer. [k/2] + 1, if k ≤ n: [(2n − k)/2] + 1 if k ≥ n.

E Symplectic structure and complex structure

Since I² = −E we can introduce into our space ℝ²ⁿ not only a symplectic structure [,] and euclidean structure (,), but also a complex structure, by defining multiplication by

to be the action of I. The space ℝ²ⁿ is identified in this way with a complex space ℂⁿ (the coordinate space with coordinates z_k = p_k + iq_k). The linear transformations of ℝ²ⁿ which preserve the euclidean structure form the orthogonal group O(2n); those preserving the complex structure form the complex linear group GL(n, ℂ).

Problem. Show that transformations which are both orthogonal and symplectic are complex, that those which are both complex and orthogonal are symplectic, and that those which are both symplectic and complex are orthogonal; thus that the intersection of two of the three groups is equal to the intersection of all three:
This intersection is called the unitary group U(n).

Unitary transformations preserve the hermitian scalar product (ξ, η) + i[ξ, η]; the scalar and skew-scalar products on ℝ²ⁿ are its real and imaginary parts.

42 Parametric resonance in systems with many degrees of freedom

During our investigation of oscillating systems with periodically varying parameters (cf. Section 25), we explained that parametric resonance depends on the behavior of the eigenvalues of a certain linear transformation (“the mapping at a period”). The dependence consists of the fact that an equilibrium position of a system with periodically varying parameters is stable if the eigenvalues of the mapping at a period have modulus less than 1, and unstable if at least one of the eigenvalues has modulus greater than 1.

The mapping at a period obtained from a system of Hamilton’s equations with periodic coefficients is symplectic. The investigation in Section 25 of parametric resonance in a system with one degree of freedom relied on our analysis of the behavior of the eigenvalues of symplectic transformations of the plane. In this paragraph we will analyze, in an analogous way, the behavior of the eigenvalues of symplectic transformations in a phase space of any dimension. The results of this analysis (due to M. G. Krein) can be applied to the study of conditions for the appearance of parametric resonance in mechanical systems with many degrees of freedom.

A Symplectic matrices

Consider a linear transformation of a symplectic space, S: ℝ²ⁿ → ℝ²ⁿ. Let p₁,..., p_n; q₁,..., q_n be a symplectic coordinate system. In this coordinate system, the transformation is given by a matrix S.

Theorem. A transformation is symplectic if and only if its matrix S in the symplectic coordinate system (p, q) satisfies the relation

where

and S′ is the transpose of S.

Proof. The condition for being symplectic ([Sξ, Sη] = [ξ, η] for all ξ and η) can be written in terms of the scalar product by using the operator I, as follows:

as was to be shown. □

B Symmetry of the spectrum of a symplectic transformation

Theorem. The characteristic polynomial of a symplectic transformation

is reflexive,⁶⁹ i.e., p(λ) = λ²ⁿp(1/λ).

Proof. We will use the facts that det S = det I = 1 , I² = −E, and det A′ = det A. By the theorem above, S = −IS′⁻¹I. Therefore,

Corollary. If λ is an eigenvalue of a symplectic transformation, then 1/λ is also an eigenvalue.

On the other hand, the characteristic polynomial is real; therefore, if λ is a complex eigenvalue, then

is an eigenvalue different from λ. It follows that the roots λ of the characteristic polynomial lie symmetrically with respect to the real axis and to the unit circle (Figure 176). They come in 4-tuples,

and pairs lying on the real axis,

or on the unit circle,

Figure 176

Distribution of the eigenvalues of a symplectic transformation

It is not hard to verify that the multiplicities of all four points of a 4-tuple (or both points of a pair) are the same.

C Stability

Definition. A transformation S is called stable if

Problem. Show that if at least one of the eigenvalues of a symplectic transformation S does not lie on the unit circle, then S is unstable.

Hint. In view of the demonstrated symmetry, if one of the eigenvalues does not lie on the unit circle, then there exists an eigenvalue outside the unit circle |λ| > 1; in the corresponding invariant subspace, S is an “expansion with a rotation.”

Problem. Show that if all the eigenvalues of a linear transformation are distinct and lie on the unit circle, then the transformation is stable.

Hint. Change to a basis of eigenvectors.

Definition. A symplectic transformation S is called strongly stable if every symplectic transformation sufficiently close⁷⁰ to S is stable.

In Section 25 we established that S: ℝ² → ℝ² is strongly stable if λ_{1, 2} = e^±iα and λ₁ ≠ λ₂.

Theorem. If all 2n eigenvalues of a symplectic transformation S are distinct and lie on the unit circle, then S is strongly stable.

Proof. We enclose the 2n eigenvalues λ in 2n non-intersecting neighborhoods, symmetric with respect to the unit circle and the real axis (Figure 177). The 2n roots of the characteristic polynomial depend continuously on the elements of the matrix of S. Therefore, if the matrix S₁ is sufficiently close to S, exactly one eigenvalue λ₁ of the matrix of S₁ will lie in each of the 2n neighborhoods of the 2n points of λ. But if one of the points λ₁ did not lie on the unit circle, for example, if it lay outside the unit circle, then by the theorem in subsection B, there would be another point λ₂, |λ₂| < 1 in the same neighborhood, and the total number of roots would be greater than 2n, which is not possible.

Figure 177

Behavior of simple eigenvalues under a small change of the symplectic transformation

Thus all the roots of S₁ lie on the unit circle and are distinct, so S₁ is stable. □

We might say that an eigenvalue λ of a symplectic transformation can leave the unit circle only by colliding with another eigenvalue (Figure 178); at the same time, the complex-conjugate eigenvalues will collide, and from the two pairs of roots on the unit circle we obtain one 4-tuple (or pair of real λ).

Figure 178

Behavior of multiple eigenvalues under a small change of the symplectic transformation

It follows from the results of Section 25 that the condition for parametric resonance to arise in a linear canonical system with a periodically changing hamilton function is precisely that the corresponding symplectic transformation of phase space should cease to be stable. It is clear from the theorem above that this can happen only after a collision of eigenvalues on the unit circle. In fact, as M. G. Krein noticed, not every such collision is dangerous.

It turns out that the eigenvalues λ with |λ| = 1 are divided into two classes: positive and negative. When two roots with the same sign collide, the roots “go through one another,” and cannot leave the unit circle. On the other hand, when two roots with different signs collide, they generally leave the unit circle.

M.G. Krein’s theory goes beyond the limits of this book; we will formulate the basic results here in the form of problems.

Problem. Let λ and be simple (multiplicity 1) eigenvalues of a symplectic transformation S with |λ| = 1. Show that the two-dimensional invariant plane π_λ corresponding to λ, , is non-null.

Hint. Let ξ₁ and ξ₂ be complex eigenvectors of S with eigenvalues λ₁ and λ₂. Then if λ₁ λ₂ ≠ 1, the vectors ξ₁ and ξ₂ are skew-orthogonal: [ξ₁, ξ₂] = 0.

Let ξ be a real vector of the plane π_λ, where Im λ > 0 and |λ| = 1. The eigenvalue λ is called positive if [Sξ, ξ] > 0.

Problem. Show that this definition is correct, i.e., it does not depend on the choice of ξ ≠ 0 in the plane π_λ.

Hint. If the plane π_λ contained two non-collinear skew-orthogonal vectors, it would be null.

In the same way, an eigenvalue λ of multiplicity k with |λ| = 1 is of definite sign if the quadratic form [Sξ, ξ] is (positive or negative) definite on the invariant 2k-dimensional subspace corresponding to λ, .

Problem. Show that S is strongly stable if and only if all the eigenvalues λ lie on the unit circle and are of definite sign.

Hint. The quadratic form [Sξ, ξ] is invariant with respect to S.

43 A symplectic atlas

In this paragraph we prove Darboux’s theorem, according to which every symplectic manifold has local coordinates p, q in which the symplectic structure can be written in the simplest way: ω² = dp ∧ dq.

A Symplectic coordinates

Recall that the definition of manifold includes a compatibility condition for the charts of an atlas. This is a condition on the maps

going from one chart to another. The maps

are maps of a region of coordinate space.

Definition. An atlas of a manifold M²ⁿ is called symplectic if the standard symplectic structure ω² = dp ∧ dq is introduced into the coordinate space ℝ²ⁿ = {(p, q)}, and the transfer from one chart to another is realized by a canonical (i.e., ω²-preserving) transformation⁷¹

Problem. Show that a symplectic atlas defines a symplectic structure on M²ⁿ.

The converse is also true: every symplectic manifold has a symplectic atlas. This follows from the following theorem.

B Darboux’s theorem

Theorem. Let ω² be a closed nondegenerate differential 2-form in a neighborhood of a point x in the space ℝ²ⁿ. Then in some neighborhood of x one can choose a coordinate system (p₁,..., p_n; q₁,..., q_n) such that the form has the standard form:

This theorem allows us to extend to all symplectic manifolds any assertion of a local character which is invariant with respect to canonical transformations and is proven for the standard phase space (ℝ²ⁿ, ω² = dp ⋀ dq).

C Construction of the coordinates p₁ and q₁

For the first coordinate p₁ we take a non-constant linear function (we could have taken any differentiable function whose differential is not zero at the point x). For simplicity we will assume that p₁(x) = 0.

Let P₁ = I dp₁ denote the hamiltonian field corresponding to the function p₁ (Figure 179). Note that P₁(x) ≠ 0; therefore, we can draw a hyperplane N^{2n − 1} through the point x which does not contain the vector P₁(x) (we could have taken any surface transverse to P₁(x) as N^{2n − 1}).

Figure 179

Construction of symplectic coordinates

Consider the hamiltonian flow

with hamiltonian function p₁. We consider the time t necessary to go from N to the point

under the action of

as a function of the point z. By the usual theorems in the theory of ordinary differential equations, this function is defined and differentiable in a neighborhood of the point x ∈ ℝ²ⁿ. Denote it by q₁. Note that q₁ = 0 on N and that the derivative of q₁ in the direction of the field P₁ is equal to 1. Thus the Poisson bracket of the functions q₁ and p₁ we constructed is equal to 1:

D Construction of symplectic coordinates by induction on n

If n = 1, the construction is finished. Let n > 1. We will assume that Darboux’s theorem is already proved for ℝ^{2n − 2}. Consider the set M given by the equations p₁ = q₁ = 0. The differentials dp₁ and dq₁ are linearly independent at x since ω²(I dp₁, I dq₁) = (q₁, p₁) ≡ 1. Thus, by the implicit function theorem, the set M is a manifold of dimension 2n − 2 in a neighborhood of x; we will denote it by M^{2n − 2}.

Lemma. The symplectic structure ω² on ℝ²ⁿ induces a symplectic structure on some neighborhood of the point x on M^{2n − 2}.

Proof. For the proof we need only the nondegeneracy of ω² on TM_x. Consider the symplectic vector space

. The vectors P₁(x) and Q₁(x) of the hamiltonian vector fields with hamiltonian functions p₁ and q₁ belong to

. Let ξ ∈ TM_x. The derivatives of p₁ and q₁ in the direction ξ are equal to zero. This means that dp₁(ξ) = ω²(ξ, P₁) = 0 and dq₁(ξ) = ω² (ξ, Q₁) = 0. Thus TM_x is the skew-orthogonal complement to P₁(x), Q₁(x). By Section 41B, the form ω² on T M_x is nondegenerate. □

By the induction hypothesis there are symplectic coordinates in a neighborhood of the point x on the symplectic manifold (M^{2n − 2}, ω²|_M). Denote them by p_i, q_i (i = 2,..., n). We extend the functions p₂,..., q_n to a neighborhood of x in ℝ²ⁿ in the following way. Every point z in a neighborhood of x in ℝ²ⁿ can be uniquely represented in the form

, where w ∈ M^{2n − 2}, and s and t are small numbers. We set the values of the coordinates p₂,..., q_n at z equal to their values at the point w (Figure 179). The 2n functions p₁,..., p_n, q₁,..., q_n thus constructed form a local coordinate system in a neighborhood of x in ℝ²ⁿ.

E Proof that the coordinates constructed are symplectic

Denote by

and

the hamiltonian flows with hamiltonian functions p_i and q_i, and by P_i and Q_i the corresponding vector fields. We will compute the Poisson brackets of the functions p₁,..., q_n. We already saw in C that (q₁, p₁) ≡ 1. Therefore, the flows

and

commute:

Recalling the definitions of p₂,..., q_n we see that each of these functions is invariant with respect to the flows

and

. Thus the Poisson brackets of p₁ and q₁ with all 2n − 2 functions p_i, q_i (i > 1) are equal to zero.

The map

therefore commutes with all 2n − 2 flows

. Consequently, it leaves each of the 2n − 2 vector fields P_i, Q_i (i > 1) fixed.

preserves the symplectic structure ω² since the flows

and

are hamiltonian; therefore, the values of the form ω² on the vectors of any two of the 2n − 2 fields P_i, Q_i (i > 1) are the same at the points

and w ∈ M^{2n − 2}. But these values are equal to the values of the Poisson brackets of the corresponding hamiltonian functions. Thus, the values of the Poisson bracket of any two of the 2n − 2 coordinates p_i, q_i (i > 1) at the points z and w are the same if

The functions p₁ and q₁ are first integrals of each of the 2n − 2 flows

(i > 1). Therefore, each of the 2n − 2 fields P_i, Q_i is tangent to the level manifold p₁ = q₁ = 0. But this manifold is M^{2n − 2}. Therefore, each of the 2n − 2 fields P_i, Q_i (i > 1) is tangent to M^{2n − 2}. Consequently, these fields are hamiltonian fields on the symplectic manifold (M^{2n − 2}, ω²|_M), and the corresponding hamiltonian functions are p_i|_M, q_i|_M(i > 1). Thus, in the whole space (ℝ²ⁿ, ω²), the Poisson bracket of any two of the 2n − 2 coordinates p_i, q_i (i > 1) considered on M^{2n − 2} is the same as the Poisson bracket of these coordinates in the symplectic space (M^{2n − 2}, ω²|_M).

But, by our induction hypothesis, the coordinates on M^{2n − 2} (p_i|_M, q_i|_M; i > 1) are symplectic. Therefore, in the whole space ℝ²ⁿ, the Poisson brackets of the constructed coordinates have the standard values

The Poisson brackets of the coordinates p, q on ℝ²ⁿ have the same form if ω² = ∑d p_i ∧ d q_i. But a bilinear form ω² is determined by its values on pairs of basis vectors. Therefore, the Poisson brackets of the coordinate functions determine the shape of ω² uniquely. Thus

and Darboux’s theorem is proved. □

⁶⁰

Or Lie bracket [Trans. note].

⁶¹

By theorems of existence, uniqueness, and differentiability in the theory of ordinary differential equations, the group A^t is defined if the manifold M is compact. In the general case the maps A^t are defined only in a neighborhood of x and only for small t; this is enough for the following constructions.

⁶²

In many books the bracket is given the opposite sign. Our sign agrees with the sign of the commutator in the theory of Lie groups (cf. subsection F).

⁶³

In some riemannian metric on M.

⁶⁴

Our choice of sign in the definition of Poisson bracket was determined by this correspondence.

⁶⁵

Not just up to a constant.

⁶⁶

A 2-form [,] on ℝ²ⁿ is nondegenerate if ([ξ, η] = 0, ∀η) ⇒ (ξ = 0).

⁶⁷

Null planes are also called isotropic, and for k = n, lagrangian.

⁶⁸

Two subspaces L₁ and L₂ of a vector space L are transverse if L₁ + L₂ = L. Two n-dimensional planes in ℝ²ⁿ are transverse if and only if they intersect only in 0.

⁶⁹

A reflexive polynomial is a polynomial a₀x^m + a₁x^{m − 1} + ⋯ + a_m which has symmetric coefficients a₀ = a_m, a₁ = a_{m − 1},....

⁷⁰

S₁ is “sufficiently close” to S if the elements of the matrix of S₁ in a fixed basis differ from the elements of the matrix of S in the same basis by less than a sufficiently small number ε.

⁷¹

Complex-analytic manifolds, for example, are defined analogously; there must be a complex-analytic structure on coordinate space, and the transfer from one chart to another must be complex analytic.

9
Canonical formalism

V. I. Arnold¹

(1)

Department of Mathematics Steklov Mathematical Institute, Russian Academy of Sciences, GSP-1, 117966, Moscow, Russia

The coordinate point of view will predominate in this chapter. The technique of generating functions for canonical transformations, developed by Hamilton and Jacobi, is the most powerful method available for integrating the differential equations of dynamics. In addition to this technique, the chapter contains an “odd-dimensional” approach to hamiltonian phase flows.

This chapter is independent of the previous one. It contains new proofs of several of the results in Chapter 8, as well as an explanation of the origin of the theory of symplectic manifolds.

44 The integral invariant of Poincaré-Cartan

In this section we look at the geometry of 1-forms in an odd-dimensional space.

A A hydrodynamical lemma

Let v be a vector field in three-dimensional oriented euclidean space ℝ³, and r = curl v its curl. The integral curves of r are called vortex lines. If γ₁ is any closed curve in ℝ³ (Figure 180), the vortex lines passing through the points of γ₁ form a tube called a vortex tube.

Figure 180

Vortex tube

Let γ₂ be another curve encircling the same vortex tube, so that γ₁ − γ₂ = ∂σ, where σ is a 2-cycle representing a part of the vortex tube. Then:

Stokes’ lemma. The field v has equal circulation along the curves γ₁ and γ₂:

Proof. By Stokes’ formula, ∫_γ₁ v dl − ∫_γ₂ v dl = ∫∫_σ curl v dn = 0, since curl v is tangent to the vortex tube. ☐

B The multi-dimensional Stokes’ lemma

It turns out that Stokes’ lemma generalizes to the case of any odd-dimensional manifold M^{2n + 1} (in place of ℝ³). To formulate this generalization we replace our vector field by a differential form.

The circulation of a vector field v is the integral of the 1-form ω¹

. To the curl of v there corresponds the 2-form ω² = dω¹

. It is clear from these formulas that there is a direction at every point (namely, the direction of r, Figure 181), having the property that the circulation of v along the boundary of every “infinitesimal square” containing r is equal to zero:

In fact, dω¹(r, η) = (r, r, η) = 0.

Figure 181

Axis invariantly connected with a 2-form in an odd-dimensional space

Remark. Passing from the 2-form ω² = dω¹ to the vector field r = curl v is not an invariant operation: it depends on the euclidean structure of ℝ³. Only the direction⁷² of r is invariantly associated with ω² (and, therefore, with the 1-form ω¹). It is easy to verify that, if r ≠ 0, then the direction of r is uniquely determined by the condition that ω²(r, η) = 0 for all η.

The algebraic basis for the multi-dimensional Stokes’ lemma is the existence of an axis for every rotation of an odd-dimensional space.

Lemma. Let ω² be an exterior algebraic 2-form on the odd-dimensional vector space ℝ^{2n + 1}. Then there is a vector ξ ≠ 0 such that

Proof. A skew-symmetric form ω² is given by a skew-symmetric matrix A

of odd order 2n + 1. The determinant of such a matrix is equal to zero, since

Thus the determinant of A is zero. This means A has an eigenvector ξ ≠ 0 with eigenvalue 0, as was to be shown. ☐

A vector ξ for which ω²(ξ, η) = 0, ∀η is called a null vector for the form ω². The null vectors of ω² clearly form a linear subspace. The form ω² is called nonsingular if the dimension of this space is the minimal possible (i.e., 1 for an odd-dimensional space ℝ^{2n + 1} or 0 for an even-dimensional space).

Problem. Consider the 2-form ω² = dp₁ ⋀ dq₁ + ⋯ + dp_n ⋀ dq_n on an even-dimensional space ℝ²ⁿ with coordinates p₁,..., p_n; q₁,..., q_n. Show that ω² is nonsingular.

Problem. On an odd-dimensional space ℝ^{2n + 1} with coordinates p₁,..., p_n; q₁,..., q_n; t, consider the 2-form ω² = ∑dp_i ⋀ dq_i − ω¹ ⋀ dt, where ω¹ is any 1-form onℝ2^{n + 1} Show that ω² is nonsingular.

If ω² is a nonsingular form on an odd-dimensional space ℝ^{2n + 1}, then the null vectors ξ of ω² all lie on a line. This line is invariantly associated to the form ω².

Now let M^{2n + 1} be an odd-dimensional differentiable manifold and ω¹ a 1-form on M. By the lemma above, at every point x ∈ M there is a direction (i.e., a straight line {cξ} in the tangent space TM_x) having the property that the integral of ω¹ along the boundary of an “infinitesimal square containing this direction” is equal to zero:

Suppose further that the 2-form dω¹ is nonsingular. Then the direction ξ is uniquely determined. We call it the “vortex direction” of the form ω¹.

The integral curves of the field of vortex directions are called the vortex lines (or characteristic lines) of the form ω¹.

Let γ₁ be a closed curve on M. The vortex lines going out from points of γ₁ form a “vortex tube.” We have

The multi-dimensional Stokes’ lemma. The integrals of a 1-form ω¹ along any two curves encircling the same vortex tube are the same:

if γ₁ − γ₂ = ∂σ, where σ is a piece of the vortex tube.

Proof. By Stokes’ formula

But the value of dω¹ on any pair of vectors tangent to the vortex tube is equal to zero. (These two vectors lie in a 2-plane containing the vortex direction, and dω¹ vanishes on this plane.) Thus, ∫_σ dω¹ = 0. ☐

C Hamilton’s equations

All the basic propositions of hamiltonian mechanics follow directly from Stokes’ lemma.

For M^{2n + 1} we will take the “extended phase space ℝ^{2n+ 1}” with coordinates p₁,..., p_n; q₁,..., q_n; t. Suppose we are given a function H = H(p, q, t). Then we can construct⁷³ the 1-form

We apply Stokes’ lemma to ω¹ (Figure 182).

Figure 182

Hamiltonian field and vortex lines of the form p dq − H dt.

Theorem. The vortex lines of the form ω¹ = p dq − Hdt on the 2n + 1-dimensional extended phase space p, q, t have a one-to-one projection onto the t axis, i.e., they are given by functions p = p(t), q = q(t). These functions satisfy the system of canonical differential equations with hamiltonian function H:

(1)

In other words, the vortex lines of the form p dq − H dt are the trajectories of the phase flow in the extended phase space, i.e., the integral curves of the canonical equations (1).

Proof. The differential of the form p dq − H dt is equal to

It is clear from this expression that the matrix of the 2-form dω¹ in the coordinates p, q, t has the form

where

(verify this!).

The rank of this matrix is 2n (the upper left 2n-corner is non-degenerate); therefore, dω¹ is nonsingular. It can be verified directly that the vector (−H_q, H_p, 1) is an eigenvector of A with eigenvalue 0 (do it!). This means that it gives the direction of the vortex lines of the form p dq − H dt. But the vector (−H_q, H_p, 1) is also the velocity vector of the phase flow of (1). Thus the integral curves of (1) are the vortex lines of the form p dq − Hdt, as was to be shown. ☐

D A theorem on the integral invariant of Poincaré-Cartan

We now apply Stokes’ lemma. We obtain the fundamental

Theorem. Suppose that the two curves γ₁ and γ₂ encircle the same tube of phase trajectories of (1). Then the integrals of the form p dq − H dt along them are the same:

The form p dq − H dt is called the integral invariant of Poincaré-Cartan.⁷⁴

Proof. The phase trajectories are the vortex lines of the form p dq − H dt, and the integrals along closed curves contained in the same vortex tube are the same by Stokes’ lemma. ☐

We will consider, in particular, curves consisting of simultaneous states, i.e., lying in the planes t = const (Figure 183). Along such curves, dt = 0 and

. From the preceding theorem we obtain the important:

Figure 183

Poincaré’s integral invariant

Corollary 1. The phase flow preserves the integral of the form p dq = p₁ dq₁ + ⋯ + p_n dq_n on closed curves.

Proof. Let

be the transformation of the phase space (p, q) realized by the phase flow from time t₀ to t_l

is the solution to the canonical equations (1) with initial conditions p(t₀) = p₀, q(t₀) = q₀). Let γ be any closed curve in the space ℝ²ⁿ ⊂ ℝ^{2n + 1} (t = t₀). Then

is a closed curve in the space ℝ²ⁿ (t = t₁), contained in the same tube of phase trajectories in ℝ^{2n + 1}. Since dt = 0 on γ and on

we find by the preceding theorem that

, as was to be shown. ☐

The form p dq is called Poincaré’s relative integral invariant. It has a simple geometric meaning. Let σ be a two-dimensional oriented chain and γ = ∂σ. Then, by Stokes’ formula, we find

Thus we have proved the important:

Corollary 2. The phase flow preserves the sum of the oriented areas of the projections of a surface onto the n coordinate planes (p_i, q_i):

In other words, the 2-form ω² = dp ⋀ dq is an absolute integral invariant of the phase flow.

Example. For n = 1, ω² is area, and we obtain Liouville’s theorem: the phase flow preserves area.

E Canonical transformations

Let g be a differentiable mapping of the phase space ℝ²ⁿ = {(p, q)} to ℝ²ⁿ.

Definition. The mapping g is called canonical, or a canonical transformation, if g preserves the 2-form ω² = ∑ dp_i ⋀ dq_i.

It is clear from the argument above that this definition can be written in any of three equivalent forms:

g*ω² = ω² (g preserves the 2-form ∑ dp_i ⋀ dq_j);

∬_σ ω² = ∫∫_gσ ω², ∀σ (g preserves the sum of the areas of the projections of any surface);

∮ _γ p dq = ∮ _gγ p dq (the form p dq is a relative integral invariant of g).

Problem. Show that definitions (1) and (2) are equivalent to (3) if the domain of the map in question is a simply connected region in the phase space ℝ²ⁿ; in the general case 3 ⇒ 2 ⇔ 1.

The corollaries above can now be formulated as:

Theorem. The transformation of phase space induced by the phase flow is canonical.⁷⁵

Let g: ℝ²ⁿ → ℝ²ⁿ be a canonical transformation: g preserves the form ω². Then g also preserves the exterior square of ω²:

The exterior powers of the form ∑dp_i ⋀ dq_i are proportional to the forms

Thus we have proved

Theorem. Canonical transformations preserve the integral invariants ω⁴,..., ω²ⁿ.

Geometrically, the integral of the form ω^2k is the sum of the oriented volumes of the projections onto the coordinate planes (p_i₁,..., p_{i_k}, q_i₁,..., q_{i_k}). In particular, ω²ⁿ is proportional to the volume element, and we obtain:

Corollary. Canonical transformations preserve the volume element in phase space:

the volume of gD is equal to the volume of D, for any region D.

In particular, applying this to the phase flow we obtain

Corollary. The phase flow (1) has as integral invariants the forms ω², ω⁴,..., ω⁴ⁿ.

The last of these invariants is the phase volume, so we have again proved Liouville’s theorem.

45 Applications of the integral invariant of Poincaré-Cartan

In this paragraph we prove that canonical transformations preserve the form of Hamilton’s equations, that a first integral of Hamilton’s equations allows us to reduce immediately the order of the system by two and that motion in a natural lagrangian system proceeds along geodesics of the configuration space provided with a certain riemannian metric.

A Changes of variables in the canonical equations

The invariant nature of the connection between the form p dq − H dt and its curl lines gives rise to a way of writing the equations of motion in any system of 2n + 1 coordinates in extended phase space {(p, q, t)}.

Let (x₁,..., x_{2n + 1}) be coordinate functions in some chart of extended phase space (considered as a manifold M^{2n + 1}, Figure 184). The coordinates (p, q, t) can be considered as giving another chart on M. The form ω¹ = p dq − H dt can be considered as a differential 1-form on M. Invariantly associated (not depending on the chart) to this form is a family of lines on M — the vortex lines. In the chart (p, q, t), these lines are represented as the trajectories of the phase flow

(1)

with hamiltonian function H(p, q, t).

Figure 184

Change of variables in Hamilton’s equations

Suppose that in the coordinates (x₁,..., x_{2n + 1}) the form ω¹ is written as

Theorem. In the chart (x_i), the trajectories of (1) are represented by the vortex lines of the form ∑X_i dx_i.

Proof. The curl lines of the forms ∑X_i dx_j and p dq − H dt are the images in two different charts of the vortex lines of the same form on M. But the integral curves of (1) are the vortex lines of p dq − H dt. Thus, their images in the chart (x_i) are the vortex lines of the form ∑ X_j dx_i. ☐

Corollary. Let (P₁,..., P_n; Q₁,..., Q_n; T) be a coordinate system on the extended phase space (p, q, t) and K(P, Q T) and S(P, Q, T) functions such that

(the left- and right-hand sides are forms on extended phase space).

Then the trajectories of the phase flow (1) are represented in the chart (P, Q, T) by the integral curves of the canonical equations

(2)

Proof. By the theorem above, the trajectories of (1) are represented by the vortex lines of the form P dQ − K dT + dS. But dS has no influence on the vortex lines (since ddS = 0). Therefore, the images of the trajectories of (1) are the vortex lines of the form P dQ − K dT According to Section 44C, the vortex lines of such a form are integral curves of the canonical equations (2). ☐

In particular, let g: ℝ²ⁿ → ℝ²ⁿ be a canonical transformation of phase space taking a point with coordinates (p, q) to a point with coordinates (P, Q). The functions P(p, q) and Q(p, q) can be considered as new coordinates on phase space.

Theorem. In the new coordinates (P, Q) the canonical equations (1) have the canonical form⁷⁶

(3)

with the same hamiltonian function: K(P, Q, t) = H(p, q, t).

Proof. Consider the 1-form p dq − P dQ on ℝ²ⁿ. For any closed curve γ we have (Figure 185)

since g is canonical. Therefore,

does not depend on the path of integration but only on the endpoint (p₁, q₁) (for a fixed initial point (p₀, q₀)). Thus dS = p dq − P dQ. Consequently, in the extended phase space, we have

Figure 185

Closedness of the form p dq − P dQ

Thus, the theorem above is applicable, and (2) is transformed to (3). ☐

Problem. Let g(t): ℝ²ⁿ → ℝ²ⁿ be a canonical transformation of phase space depending on the parameter t, g(t)(p, q) = (P(p, q, t), Q(p, q, t)). Show that in the variables P, Q, t the canonical equations (1) have the canonical form with new hamiltonian function

where

B Reduction of order using the energy integral

Suppose now that the hamiltonian function H(p, q) does not depend on time. Then the canonical equations (1) have a first integral: H(p(t), q(t)) = const. It turns out that by using this integral we can reduce the dimension (2n + 1) of the extended phase space by two, thereby reducing the problem to integration of a system of canonical equations in a (2n − 1)-dimensional space.

We assume that (in some region) the equation h = H(p₁,..., p_n; q₁,..., q_n) can be solved for p₁:

where P = (p₂,..., p_n); Q = (q₂,..., q_n); T = −q₁. Then we find

Now let γ be an integral curve of the canonical equations (1) lying on the 2n-dimensional surface H(p, q) = h in ℝ^{2n + 1}. Then γ is a vortex line of the form p dq − H dt (Figure 186). We project the extended phase space ℝ^{2n + 1} = {(p, q, t)} onto the phase space ℝ²ⁿ = {(p, q)}. The surface H = h is projected onto a (2n − 1)-dimensional manifold M^{2n − 1}: H(p, q) = h in ℝ²ⁿ, and γ is projected to a curve

lying on this submanifold. The variables P, Q, T form local coordinates on M²ⁿ⁻¹.

Figure 186

Lowering the order of a hamiltonian system

Problem. Show that the curve is a vortex line of the form p dq = P dQ − K dT on M²ⁿ⁻¹. Hint. d(Ht) does not affect the vortex lines, and dH is zero on M.

But the vortex lines of P dQ − K dT satisfy Hamilton’s equations (2). Thus we have proved

Theorem. The phase trajectories of the equations (1) on the surface M^{2n − 1}, H = h, satisfy the canonical equations

where the function K(p₂,..., p_n; q₂,..., q_n; T, h) is defined by the equation H(K, p₂,..., p_n; − T, q₂,..., q_n) = h.

C The principle of least action in phase space

In the extended phase space {(p, q, t)}, we consider an integral curve of the canonical equations (1) connecting the points (p₀, q₀, t₀) and (p₁, q₁, t₁).

Theorem. The integral ∫ p dq − H dt has γ as an extremal under variations of γ for which the ends of the curve remain in the n-dimensional subspaces (t = t₀, q = q₀) and (t = t₁, q = q₁).

Proof. The curve γ is a vortex line of the form p dq − H dt (Figure 187). Therefore, the integral of p dq − Hdt over an “infinitely small parallelogram passing through the vortex direction” is equal to zero. In other words, the increment ∫_γ′ − ∫_γ p dq − H dt is small to a higher order in comparison with the difference of the curves γ and γ′, as was to be shown.

Figure 187

Principle of least action in phase space

If this argument does not seem rigorous enough, it can be replaced by the computation

We see that the integral curves of Hamilton’s equations are the only extremals of the integral ∫ p dq − H dt in the class of curves γ whose ends lie in the n-dimensional subspaces (t = t₀, q = q₀) and (t = t₁, q = q₁) of extended phase space. ☐

Remark. The principle of least action in Hamilton’s form is a particular case of the principle considered above. Along extremals, we have
(since the lagrangian L and the hamiltonian H are Legendre transforms of one another). Now let (Figure 188) be the projection of the extremal γ onto the q, t plane. To any nearby curve connecting the same points (t₀, q₀) and (t₁, q,) in the q, t plane we associate a curve γ′ in the phase space (p, q, t) by setting . Then, along γ′, too, . But by the theorem above, δ ∫γ p dq − H dt = 0 for any variation curve γ(with boundary conditions (t = t₀, q = q₀) and (t = t₁, q = q₁). In particular, this is true for variations of the special form taking γ to γ′. Thus γ is an extremal of ∫ L dt, as was to be shown.

Figure 188

Comparison curves for the principles of least action in the configuration and phase spaces

In the theorem above we are allowed to compare γ with a significantly wider class of curves γ′ than in Hamilton’s principle: there are no restrictions placed on the relation of p with

Surprisingly, one can show that the two principles are nevertheless equivalent: an extremal in the narrower class of variations

is an extremal under all variations. The explanation is that, for fixed

the value

is an extremal of

(cf. the definition of the Legendre transform, Section 14).

D The principle of least action in the Maupertuis-Euler-Lagrange-Jacobi form

Suppose now that the hamiltonian function H(p, q) does not depend on time. Then H(p, q) is a first integral of Hamilton’s equations (1). We project the surface H(p, q) = h from the extended phase space {(p, q, t)} to the space {(p, q)}. We obtain a (2n − 1)-dimensional surface H(p, q) = h in ℝ²ⁿ, which we already studied in subsection B and which we denoted by M^{2n − 1}.

The phase trajectories of the canonical equations (1) beginning on the surface M^{2n − 1} lie entirely in M^{2n − 1}. They are the vortex lines of the form p dq = P dQ − K dT (in the notation of B) on M^{2n − 1}. By the theorem in subsection C, the curves (1) on M^{2n − 1} are extremals for the variational principle corresponding to this form. Therefore, we have proved

Theorem. If the hamiltonian function H = H(p, q) does not depend on time, then the phase trajectories of the canonical equations (1) lying on the surface M^{2n − 1}: H(p, q) = h are extremals of the integral ∫ p dq in the class of curves lying on M^{2n − 1} and connecting the subspaces q = q₀ and q = q₁.

We now consider the projection onto the q-space of an extremal lying on the surface M^{2n − 1}: H(p, q) = h. This curve connects the points q₀ and q₁. Let γ be another curve connecting the points q₀ and q₁ (Figure 189). The curve γ is the projection of some curve

on M^{2n − 1}. Specifically, we parametrize γ by τ, a ≤ τ ≤ b, γ(a) = q₀, γ(b) = q₁. Then at every point q of γ there is a velocity vector

, and the corresponding momentum

If the parameter τ is chosen so that H(p, q) = h, then we obtain a curve

on the surface M^{2n − 1}. Applying the theorem above to the curve

on M^{2n − 1}, we obtain

Figure 189

Maupertuis’ principle

Corollary. Among all curves q = γ(τ) connecting the two points q₀ and q₁ on the plane q and parametrized so that the hamiltonian function has a fixed value

, the trajectory of the equations of dynamics (1) is an extremal of the integral of “reduced action”

This is also the principle of least action of Maupertuis (Euler-Lagrange-Jacobi).⁷⁷ It is important to note that the interval a ≤ τ ≤ b parametrizing the curve γ is not fixed and can be different for different curves being compared. On the other hand, the energy (the hamiltonian function) must be the same. We note also that the principle determines the shape of a trajectory but not the time: in order to determine the time we must use the energy constant.

The principle above takes a particularly simple form in the case when the system represents inertial motion on a smooth manifold.

Theorem. A point mass confined to a smooth riemannian manifold moves along geodesic lines (i.e., along extremals of the length ∫ ds).

Proof. In this case,

Therefore, in order to guarantee a fixed value of H = h, the parameter must be chosen proportional to the length

The reduced action integral is then equal to

therefore, extremals are geodesics of our manifold. ☐

In the case when there is a potential energy, the trajectories of the equations of dynamics are also geodesics in a certain riemannian metric.

Let ds² be a riemannian metric on configuration space which gives the kinetic energy (so that

). Let h be a constant.

Theorem. In the region of configuration space where U(q) < h we define a riemannian metric by the formula

Then the trajectories of the system with kinetic energy

, potential energy U(q), and total energy h will be geodesic lines of the metric dρ.

Proof. In this case L = T − U, H = T + U, and

. Therefore, in order to guarantee a fixed value of H = h, the parameter τ must be chosen proportional to length: dτ =

. The reduced action integral will then be equal to

By Maupertuis’ principle, the trajectories are geodesics in the metric dρ, as was to be shown. ☐

Remark 1. The metric dρ is obtained from ds by a “stretching” depending on the point q but not depending on the direction. Therefore, angles in the metric dρ are the same as angles in the metric ds. On the boundary of the region U ≤ h the metric dρ has a singularity: the closer we come to the boundary, the smaller the ρ-length becomes. In particular, the length of any curve lying in the boundary (U = h) is equal to zero.

Remark 2. If the initial and endpoints of a geodesic γ are sufficiently close, then the extremum of length is a minimum. This justifies the name “principle of least action.” In general, an extremum of the action is not necessarily a minimum, as we see by considering geodesics on the unit sphere (Figure 190). Every arc of a great circle is a geodesic, but only those with length less than π are minimal: the arc NS′M is shorter than the great circle arc NSM.

Figure 190

Non-minimal geodesic

Remark 3. If h is larger than the maximum value of U on the configuration space, then the metric dρ has no singularities; therefore, we can apply topological theorems about geodesics on riemannian manifolds to the study of mechanical systems. For example, we consider the torus T² with some riemannian metric. Among all closed curves on T² making m rotations around the parallel and n around the meridian, there exists a curve of shortest length (Figure 191). This curve is a closed geodesic (for a proof see books on the calculus of variations or “Morse theory”). On the other hand, the torus T² is the configuration space of a planar double pendulum. Therefore,

Figure 191

Periodic motion of a double pendulum

Theorem. For any integers m and n there is a periodic motion of the double pendulum under which one segment makes m rotations while the other segment makes n rotations.

Furthermore, such periodic motions exist for any sufficiently large values of the constant h (h must be larger than the potential energy at the highest position).

As a last example we consider a rigid body fastened at a stationary point and located in an arbitrary potential field. The configuration space (SO(3)) is not simply connected: there exist non-contractible curves in it. The above arguments imply

Theorem. In any potential force field, there exists at least one periodic motion of the body. Furthermore, there exist periodic motions for which the total energy h is arbitrarily large.

46 Huygens’ principle

The fundamental notions of hamiltonian mechanics (momenta, the hamiltonian function H, the form p dq − H dt and the Hamilton-Jacobi equations, all of which we will be concerned with below) arose by the transforming of several very simple and natural notions of geometric optics, guided by a particular variational principle—that of Fermat, into general variational principles (and in particular into Hamilton’s principle of stationary action, δ ∫ L dt = 0).

A Wave fronts

We consider briefly⁷⁸ the fundamental notions of geometric optics. According to the extremal principle of Fermat, light travels from a point q₀ to a point q₁ in the shortest possible time. The speed of the light can depend both on the point q (an “inhomogeneous medium”) and on the direction of the ray (in an “anisotropic medium,” such as a crystal). The characteristics of a medium can be described by giving a surface (the “indicatrix”) in the tangent space at each point q. To do this, we take in every direction the velocity vector of the propagation of light at the given point in the given direction (Figure 192).

Figure 192

An anisotropic, inhomogeneous medium

Now let t > 0. We look at the set of all points q to which light from a given point q₀ can travel in time less than or equal to t. The boundary of this set, Φq₀(t), is called the wave front of the point q₀ after time t and consists of points to which light can travel in time t and not faster.

There is a remarkable relation, discovered by Huygens, between the wave fronts corresponding to different values of t. (Figure 193)

Figure 193

Envelope of wave fronts

Huygens’ theorem. Let Φq₀(t) be the wave front of the point _q₀ after time t. For every point q of this front, consider the wave front after time s, Φ_q(s). Then the wave front of the point q₀ after time s + t, Φ_q₀(s + t), will be the envelope of the fronts Φ_q(s), q ∈ Φq₀(t).

Proof. Let q_{t + s} ∈ Φq₀(t + s). Then there exists a path from q₀ to q_{t + s} along which the time of travel of light equals t + s, and there is none shorter. We look at the point q_t on this path, to which light travels in time t. No shorter path from q₀ to q_t can exist; otherwise, the path q₀q_{t + s} would not be the shortest. Therefore, the point q_t lies on the front Φ_q0(t). In exactly the same way light travels the path q_tq_{t +}_s in time s, and there is no shorter path from q_t to q_{t + s}. Therefore, the point q_{t + s} lies on the front of the point q_t at time s, Φ_qt(s). We will show that the fronts Φ_qt(s) and Φq₀(t + s) are tangent. In fact, if they crossed each other (Figure 194), then it would be possible to reach some points of Φ_q₀(t + s) from q_t in time less than s, and therefore from q₀ in time less than s + t. This contradicts the definition of Φq₀(t + s); and so the fronts Φ_qt (s) and Φq₀(t + s) are tangent at the point q_{t + s}, as was to be proved. ☐

Figure 194

Proof of Huygens’ theorem

The theorem which has been proved is called Huygens’ principle. It is clear that the point q₀ could be replaced by a curve, surface, or, in general, by a closed set, the three-dimensional space {q} by any smooth manifold, and propagation of light by the propagation of any disturbance transmitting itself “locally.”

Huygens’ principle reduces to two descriptions of the process of propagation. First, we can trace the rays, i.e., the shortest paths of the propagation of light. In this case the local character of the propagation is given by a velocity vector

. If the direction of the ray is known, then the magnitude of the velocity vector is given by the characteristics of the medium (the indicatrix).

On the other hand, we can trace the wave fronts. Assuming that we are given a riemannian metric on the space {q}, we can talk about the velocity of motion of the wave front. We look, for example, at the propagation of light in a medium filling ordinary euclidean space. Then one can characterize the motion of the wave front by a vector p perpendicular to the front, which will be constructed in the following manner.

For every point q₀ we define the function S_q₀(q) as the optical length of the path from q₀ to q, i.e., the least time of the propagation of light from q₀ to q. The level set {q: S_q₀(q) = t} is nothing other than the wave front Φ_q₀(t) (Figure 195). The gradient of the function S (in the sense of the metric mentioned above) is perpendicular to the wave front and characterizes the motion of the wave front. In this connection, the bigger the gradient, the slower the front moves. Therefore, Hamilton called the vector

the vector of normal slowness of the front.

Figure 195

Direction of a ray and direction of motion of the wave front

The direction of the ray

and the direction of motion of the front p do not coincide in an anisotropic medium. However, they are related to one another by a simple relationship, easily derived from Huygens’ principle. Recall that the characteristics of the medium are at every point described by a surface of velocity vectors of light—the indicatrix.

Definition. The direction of the hyperplane tangent to the indicatrix at the point v is called conjugate to the direction v (Figure 196).

Figure 196

Conjugate hyperplane

Theorem. The direction of the wave front Φ_q₀(t) at the point q_t is conjugate to the direction of the ray

Proof. We look (Figure 197) at points q_τ of the ray q₀q_t, 0 ≤ τ ≤ t. Take ε very small. Then the front Φ_qt−ε(ε) differs by quantities of order O(ε²) from the indicatrix at the point q_t, contracted by ε. By Huygens’ principle, this front Φ_{q_{t − ε}}(ε) is tangent to the front Φq₀ (t) at the point q_t. Passing to the limit as ε → 0, we obtain the theorem. ☐

Figure 197

Conjugacy of the direction of a wave and of the front

If the auxiliary metric used to define the vector p is changed, the natural velocity of the motion of the front, i.e. both the magnitude and direction of the vector p, will be changed. However, the differential form p dq = dS on the space {q} = ℝ³ is defined in a way which is independent of the auxiliary metric; its value depends only on the chosen fronts (or rays). On the hyperplane conjugate to the velocity vector of a ray, this form is equal to zero, and its value on the velocity vector is equal to 1.⁷⁹

B The optical-mechanical analogy

We return now to mechanics. Here the trajectories of motion are also extremals of a variational principle, and one can construct mechanics as the geometric optics of a many-dimensional space, as Hamilton did; we will not develop this construction in full detail, but will only enumerate those optical concepts which led Hamilton to basic mechanical concepts.

Optics	Mechanics
Optical medium	Extended configuration space {(q, t)}
Fermat’s principle	Hamilton’s principle δ ∫ L dt = 0
Rays	Trajectories q(t)
Indicatrices	Lagrangian L
Normal slowness vector p of the front	Momentum p
Expression of p in terms of the velocity of the ray,	Legendre transformation
1-form p dq	1-form p dq − H dt

The optical length of the path S_q0(q) and Huygens’ principle have not yet been used. Their mechanical analogues are the action function and the Hamilton-Jacobi equation, to which we now turn.

C Action as a function of coordinates and time

Definition. The action function S(q, t) is the integral

along the extremal γ connecting the points (q₀, t₀) and (q, t).

In order for this definition to be correct, we must take several precautions: we must require that the extremals going from the point (q₀, t₀) do not intersect elsewhere, but instead form a so-called “central field of extremals” (Figure 198). More precisely, we associate to every pair

a point (q, t) which is the end of the extremal with initial condition

. We say that an extremal γ is contained in a central field if the mapping

is nondegenerate (at the point corresponding to the extremal γ under consideration, and therefore in some neighborhood of it).

Figure 198

A central field of extremals

Figure 199

Extremal with a focal point which is not contained in any central field

It can be shown that for |t − t₀| small enough the extremal γ is contained in a central field.⁸⁰

We now look at a sufficiently small neighborhood of the endpoint (q, t) of our extremal. Every point of this neighborhood is connected to (q₀, t₀) by a unique extremal of the central field under consideration. This extremal depends differentiably on the endpoint (q, t). Therefore, in the indicated neighborhood the action function is correctly defined

In geometric optics we were looking at the differential of the optical length of a path. It is natural here to look at the differential of the action function.

Theorem. The differential of the action function (for a fixed initial point) is equal to

where

and

are defined with the help of the terminal velocity

of the trajectory γ.

Proof. We lift every extremal from (q, t)-space to the extended phase space {(p, q, t)}, setting

, i.e., replacing the extremal by a phase trajectory. We then get an n + 1-dimensional manifold in the extended phase space consisting of phase trajectories, i.e., characteristic curves of the form p dq − H dt. We now give the endpoint (q, t) an increment (Δq, Δt), and consider the set of extremals connecting (q₀, t₀) with points of the segment q + θΔq, t + θΔt, 0 ≤ θ ≤ 1 (Figure 200). In phase space we get a quadrangle σ composed of characteristic curves of the form p dq − H dt, the boundary of which consists of two phase trajectories γ₁ and γ₂, a segment of a curve α lying in the space (q = q₀, t = t₀), and a segment of a curve β projecting to the segment (Δq, Δt). Since σ consists of characteristic curves of the form p dq − H dt, we have

But, on the segment α, we have dq = 0, dt = 0. On the phase trajectories γ₁ and γ₂, p dq − H dt = L dt (Section 45C). So, the difference ∫_γ₂ ∫_γ₁ p dq − H dt is equal to the increase of the action function, and we find

If now Δq → 0, Δt → 0,then

which proves the theorem. ☐

Figure 200

Calculation of the differential of the action function

The form p dq − H dt was formerly introduced to us artificially. We see now, by carrying out the optical-mechanical analogue, that it arises from examining the action function corresponding to the optical length of a path.

D The Hamilton-Jacobi equation

Recall that the “vector of normal slowness p” cannot be altogether arbitrary: it is subject to one condition,

, following from Huygens’ principle. An analogous condition restricts the gradient of the action function S.

Theorem. The action function satisfies the equation

(1)

This nonlinear first-order partial differential equation is called the Hamilton-Jacobi equation.

Proof. It is sufficient to notice that, by the previous theorem,

□

The relation just established between trajectories of mechanical systems (“rays”) and partial differential equations (“wave fronts”) can be used in two directions.

First, solutions of Equation (1) can be used for integrating the ordinary differential equations of dynamics. Jacobi’s method of integrating Hamilton’s canonical equations, presented in the next section, consists of just this.

Second, the relation of the ray and wave points of view allows one to reduce integration of the partial differential equations (1) to integration of a hamiltonian system of ordinary differential equations.

Let us go into this in a little more detail. For the Hamilton-Jacobi equation (1), the Cauchy problem is

(2)

In order to construct a solution to this problem, we look at the hamiltonian system

We consider the initial conditions (Figure 201):

The solution corresponding to these equations is represented in (q, t)-space by the curve q = q(t), which is the extremal of the principle δ ∫ L dt = 0 (where the lagrangian

is the Legendre transformation with respect to p of the hamiltonian function H(p, q, t)). This extremal is called the characteristic of problem (2), emanating from the point q₀.

Figure 201

Characteristics for a solution of Cauchy’s problem for the Hamilton-Jacobi equation

If the value t₁ is sufficiently close to t₀, then the characteristics emanating from points close to q₀ do not intersect for t₀ ≤ t ≤ t₁, |q − q₀| < R. Furthermore, the values of q₀ and t can be taken as coordinates for points in the region |q − q_0*| < R, t₀ ≤ t ≤ t₁ (Figure 201).

We now construct the “action function with initial condition S₀”:

(3)

(integrating along the characteristic leading to A).

Theorem. The function (3) is a solution of problem (2).

Proof. The initial condition is clearly fulfilled. The fact that the Hamilton-Jacobi equation is satisfied is verified just as in the theorem on differentials of action functions (Figure 202).

Figure 202

The action function as a solution of the Hamilton-Jacobi equation

By Stokes’ lemma, ∫_γ1 − ∫_γ2 + ∫_β − ∫_α p dq − H dt = 0. But on α, H dt = 0 and p = ∂S₀/dq, so
Further, γ₁ and γ₂ are phase trajectories, so
So
For Δt, Δq → 0, we get ∂S/∂t = −H, ∂S/∂q = p, which proves the theorem. ☐

Problem. Show the uniqueness of the solution to problem (2).

Hint. Differentiate S along the characteristics.

Problem. Solve the Cauchy problem (2) for

Problem. Draw a graph of the multiple-valued “functions” S(q) and p(q) for t = t₃ (Figure 201).

Answer. Cf. Figure 203.

Figure 203
A typical singularity of a solution of the Hamilton-Jacobi equation

The point of self-intersection of the graph of S corresponds on the graph of p to the Maxwell line: the shaded areas are equal. The graph of S(q, t) has a singularity called a swallowtail at the point (0, t₂).

47 The Hamilton-Jacobi method for integrating Hamilton’s canonical equations

In this paragraph we define the generating function of a free canonical transformation.

The idea of the Hamilton-Jacobi method consists of the following. Under canonical changes of coordinates, the canonical form of the equations of motion is preserved, as is the hamiltonian function (Section 45A). Therefore, if we succeed in finding a canonical transformation which reduces the hamiltonian function to a form such that the canonical equations can be integrated, then we can also integrate the original canonical equations. It turns out that the problem of constructing such a canonical transformation reduces to the determination of a sufficiently large number of solutions to the Hamilton-Jacobi partial differential equation. The generating function of the desired canonical transformation must satisfy this equation.

Before turning to the apparatus of generating functions, we remark that it is unfortunately noninvariant and it uses, in an essential way, the coordinate structure in phase space {(p, q)}. It is necessary to use the apparatus of partial derivatives, in which even the notation is ambiguous.⁸¹

A Generating functions

Suppose that the 2n functions P(p, q) and Q(p, q) of the 2n variables p and q give a canonical transformation g: ℝ²ⁿ → ℝ²ⁿ. Then the 1-form p dq − P dQ is an exact differential (Section 45A):

(1)

Problem. Show the converse: if this form is an exact differential, then the transformation is canonical.

We now assume that, in a neighborhood of some point (p₀, q₀), we can take (Q, q) as independent coordinates. In other words, we assume that the following jacobian is not zero at (p₀, q₀):

Such canonical transformations will be called free. In this case, the function S can be expressed locally in these coordinates:

Definition. The function S₁(Q, q) is called a generating function of our canonical transformation g.

We emphasize that S₁ is not a function on the phase space ℝ²ⁿ: it is a function on a region in the direct product

of two n-dimensional coordinate spaces, whose points are denoted by q and Q. It follows from (1) that the “partial derivatives” of S₁ are

(2)

Conversely, every function S₁ gives a canonical transformation g by formulas (2).

Theorem. Let S₁(Q, q) be a function given on a neighborhood of some point (Q₀, q₀) of the direct product of two n-dimensional euclidean spaces. If

then S₁ is a generating function of some free canonical transformation.

Proof. Consider the equation for the Q coordinates:

By the implicit function theorem this equation can be solved to determine a function Q(p, q) in a neighborhood of the point

(with Q(p₀, q₀) = Q₀). In fact, the determinant we need here is

and this is different from zero by hypothesis.

We now consider the function

and set

Then the local map g: ℝ²ⁿ → ℝ²ⁿ sending the point (p, q) to the point (P(p, q), Q(p, q)) will be canonical with generating function S₁, since by construction

It is free, since det(∂Q/∂p) = det(∂²S₁(Q, q)/∂Q ∂q)^{− 1} ≠ 0. ☐

The transformation g: ℝ²ⁿ → ℝ²ⁿ is given in general by 2n functions of 2n variables. We see that a canonical transformation is given entirely by one function of 2n variables—its generating function. It is easy to see how useful generating functions are in all calculations related to canonical transformations. This becomes even more so as the number of variables, 2n, becomes large.

B The Hamilton-Jacobi equation for generating functions

We notice that canonical equations in which the hamiltonian function depends only on the variable Q are easy to integrate. If H = K(Q, t), then the canonical equations have the form

(3)

from which we have immediately

We will now look for a canonical transformation reducing the hamiltonian H(p, q) to the form K(Q). To this end we will look for a generating function of such a transformation, S(Q, q). From (2) we obtain the condition

(4)

where after differentiation we must substitute q(P, Q) for q. We notice that for fixed Q, Equation (4) has the form of the Hamilton-Jacobi equation.

Jacobi’s theorem. If a solution S(Q, q) is found to the Hamilton-Jacobi equation (4), depending on n parameters⁸² Q_i and such that det(∂²S/∂Q∂q) ≠ 0, then the canonical equations

(5)

can be solved explicitly by quadratures. The functions Q(p, q) determined by the equations ∂S(Q, q)/∂q = p are first integrals of the equation (5).

Proof. Consider the canonical transformation with generating function S(Q, q). By (2) we have p = (∂S/∂q)(Q, q), from which we can determine Q(p, q). We calculate the function H(p, q) in the new coordinates P, Q. We have H(p, q) = H((∂S/∂q)(Q, q), q). In order to find the hamiltonian function in the new coordinates we must substitute into this expression (after differentiation) for q its expression in terms of P and Q. However, by (4), this expression does not depend on P at all, so we have simply

Thus, in the new variables, Equation (5) has the form (3), from which Jacobi’s theorem follows directly. ☐

Jacobi’s theorem reduces solving the system of ordinary differential equations (5) to finding a complete integral of the partial differential equation (4). It may appear surprising that this “reduction” from the simple to the complicated provides an effective method for solving concrete problems. Nevertheless, it turns out that this is the most powerful method known for exact integration, and many problems which were solved by Jacobi cannot be solved by other methods.

C Examples

We consider the problem of attraction by two fixed centers. Interest in this problem has grown recently in connection with the study of the motion of artificial earth satellites. It is fairly clear that two close centers of attraction on the z-axis approximate attraction by an ellipsoid slightly extended along the z-axis. Unfortunately, the earth is not prolate, but oblate. To overcome this difficulty, one must place the centers at imaginary points at distances ± iε from the origin along the z-axis. Analytic formulas for the solution are true, of course, in the complex region. In this way we obtain an approximation to the earth’s field of gravity, in which the equations of motion can be exactly integrated and which is closer to reality than the keplerian approximation in which the earth is a point.

For simplicity we will consider only the planar problem of attraction by two fixed points with equal masses. The success of Jacobi’s method is based on the adoption of a suitable coordinate system, called elliptic coordinates. Suppose that the distance between the fixed points O₁ and O₂ is 2c (Figure 204), and that the distances of a moving mass from them are r₁ and r₂, respectively. The elliptic coordinates ξ, η are defined as the sum and difference of the distances to the points O₁ and O₂: ξ = r₁ + r₂, η = r₁ − r₂.

Figure 204

Elliptic coordinates

Problem. Express the hamiltonian function in elliptic coordinates.

Solution. The lines ξ = const are ellipses with foci at O₁ and O₂; the lines η = const are hyperbolas with the same foci (Figure 205). They are mutually orthogonal; therefore,
We will find the coefficients a and b. For motion along an ellipse we have dr₁ = ds cos α and dr₂ = −ds cos α, so dη = 2 cos α ds. For motion along a hyperbola we have dr₁ = ds sin α and dr₂ = ds sin α, so dξ = 2 sin α ds. Thus a = (2 sin α) ^{− 1} and b = (2 cos α)^{− 1}. Furthermore, from the triangle O₁ MO₂ we find , which implies
But if , then
Thus,
But r₁ + r₂ = ξ, r₁ − r₂ = η, 4r₁r₂ = ξ² − η². Therefore, finally,
We will now solve the Hamilton-Jacobi equation.

Figure 205
Confocal ellipses and hyperbolas

Definition. If, in the equation

the variable q₁ and derivative ∂S/∂q₁ appear only in the form of a combination φ(∂S/∂q₁, q₁), then we say that the variable q₁ is separable.

In this case it is useful to look for a solution of the equation of the form

By setting φ(∂S₁/∂q₁, q₁) = c₁ in this equation, we obtain an equation for S′ with a smaller number of variables

Let S′ = S′(q₂,..., q_n; c₁, c) be a family of solutions to this equation depending on the parameters c_i. The functions S₁(q₁, c₁) + S′ will satisfy the desired equation if S₁ satisfies the ordinary differential equation φ(∂S₁/∂q₁, q₁) = c₁. This equation is easy to solve; we express ∂S₁/∂q₁ in terms of q₁ and c₁ to obtain ∂S₁/∂q₁ = ψ(q₁, c₁), from which S₁ = ∫^q1 ψ(q₁, c₁)dq₁.

If one of the variables, say q₂, is separable in the new equation (with Φ₂) we can repeat this procedure and (in the most favorable case) we can find a solution of the original equation depending on n constants

In this case we say that the variables are completely separable.

If the variables are completely separable, then a solution depending on n parameters of the Hamilton-Jacobi equation, Φ₁(∂S/∂q, q) = 0, is found by quadratures. But then the corresponding system of canonical equations can also be integrated by quadratures (Jacobi’s theorem).

We apply the above to the problem of two fixed centers. The Hamilton-Jacobi equation (4) has the form

We can separate variables by, for instance, setting

and

Then we find the complete integral of Equation (4) in the form

Jacobi’s theorem now gives an explicit expression, in terms of elliptic integrals, for motion in the problem of two fixed centers. A more detailed investigation of this motion can be found in Charlier’s book “Die Mechanik des Himmels,” Berlin, Leipzig, W. de Gruyter & Co., 1927.

Another application of the problem of the attraction of two fixed centers is the study of motion with fixed pull in a field with one attracting center.

This is a question of the motion of a point mass under the action of a newtonian attraction of a fixed center and one more force (“pull”) of constant magnitude and direction. This problem can be looked at as the limiting case of the problem of attraction by two fixed centers. In the passage to the limit, one center goes off to infinity in the direction of the thrust force (during which its mass must grow proportionally to the square of the distance moved in order to guarantee constant pull).

This limiting case of the problem of the attraction of two fixed centers can be integrated explicitly (in elliptic functions). We can convince ourselves of this by passing to a limit or by directly separating variables in the problem of motion with constant pull in a field with one center. The coordinates in which the variables are separated in this problem are obtained as the limit of elliptic coordinates as one of the centers approaches infinity. They are called parabolic coordinates and are given by the formulas

(the pull is directed along the x-axis).

A description of the trajectories of a motion with constant pull (many of which are very intricate) can be found in V. V. Beletskii’s book “Sketches on the motion of celestial bodies,” Nauka, 1972.

As one more example we consider the problem of geodesics on a triaxial ellipsoid.⁸³ Here Jacobi’s elliptic coordinates λ₁, λ₂, and λ₃ are helpful, where the λ_i are the roots of the equation

x₁, x₂, and x₃ are cartesian coordinates. We will not carry out the computations showing that the variables are separable (they can be found, for example, in Jacobi’s “Lectures on dynamics”), but will mention only the result: we will describe the behavior of the geodesics.

The surfaces λ₁ = const, λ₂ = const, and λ₃ = const are surfaces of second degree, called confocal quadrics. The first of these is an ellipsoid, the second a hyperboloid of one sheet, and the third a hyperboloid of two sheets. The ellipsoid can degenerate into the interior of an ellipse, the one-sheeted hyperboloid either into the exterior of an ellipse or into the part of a plane between the branches of a hyperbola, and the two-sheeted hyperboloid either into the part of a plane outside the branches of a hyperbola or into a plane.

Suppose that the ellipsoid under consideration is one of the ellipsoids in the family with semi-axes a > b > c. Each of the three ellipses x₁ = 0, x₂ = 0, and x₃ = 0 is a closed geodesic. A geodesic starting from a point of the largest ellipse (with semiaxes a and b) in a direction close to the direction of the ellipse (Figure 206), is alternately tangent to the two closed lines of intersection of the ellipsoid with the one-sheeted hyperboloid of our family λ = const.⁸⁴ This geodesic is either closed or is dense in the area between the two lines of intersection. As the slope of the geodesic increases, the hyperboloids collapse down to the region “inside” the hyperbola which intersects our ellipsoid in its four “umbilical points.” In the limiting case we obtain geodesics passing through the umbilical points (Figure 207).

Figure 206

Geodesic on a triaxial ellipsoid

Figure 207

Geodesics emanating from an umbilical point

It is interesting to note that all the geodesics starting at an umbilical point again converge at the opposite umbilical point, and all have the same length between the two umbilical points. Only one of these geodesics is closed, namely, the middle ellipse with semi-axes a and c. If we travel along any other geodesic passing through an umbilical point in any direction, we will approach this ellipse asymptotically.

Finally, geodesics which intersect the largest ellipse even more “steeply” (Figure 208) are alternately tangent to the two lines of intersection of our ellipsoid with a two-sheeted hyperboloid.⁸⁵ In general, they are dense in the region between these lines. The small ellipse with semi-axes b and c is among these geodesics.

Figure 208

Geodesics of an ellipsoid which are tangent to a two-sheeted hyperboloid

“The main difficulty in integrating a given differential equation lies in introducing convenient variables, which there is no rule for finding. Therefore, we must travel the reverse path and after finding some notable substitution, look for problems to which it can be successfully applied.” (Jacobi, “Lectures on dynamics”).

A list of problems admitting separation of variables in spherical, elliptic, and parabolic coordinates is given in Section 48 of Landau and Lifshitz’s “Mechanics” (Oxford, Pergamon, 1960).

48 Generating functions

In this paragraph we construct the apparatus of generating functions for non-free canonical transformations.

A The generating function S₂ (P, q)

Let f: ℝ²ⁿ → ℝ²ⁿ be a canonical transformation with g(p, q) = (P, Q). By the definition of canonical transformation the differential form on ℝ²ⁿ

is the total differential of some function S(p, q). A canonical transformation is free if we can take q, Q as 2n independent coordinates. In this case the function S expressed in the coordinates q and Q is called a generating function S₁(q, Q). Knowing this function alone, we can find all 2n functions giving the transformation from the relations

(1)

It is far from the case that all canonical transformations are free. For example, in the case of the identity transformation q and Q = q are dependent. Therefore, the identity transformation cannot be given by a generating function S₁(q, Q). We can, however, obtain generating functions of another form by means of the Legendre transformation. Suppose, for instance, that we can take P, q as independent local coordinates on ℝ²ⁿ (i.e., the determinant det(∂(P, q)/∂(p, q)) = det(∂P/∂p) is not zero). Then we have

The quantity PQ + S, expressed in terms of (P, q), is also called a generating function

For this function, we find

(2)

Conversely, if S₂(P, q) is any function for which the determinant

is not zero, then in a neighborhood of the point

we can solve the first group of equations (2) for P and obtain a function P(p, q) (where P(p₀, q₀) = P₀). After this, the second group of equations (2) determine Q(p, q), and the map (p, q) → (P, Q) is canonical (prove this!).

Problem. Find a generating function S₂ for the identity map P = p, Q = q.

Answer. Pq.

Remark. The generating function S₂(P, q) is convenient also because there are no minus signs in the formulas (2), and they are easy to remember if we remember that the generating function of the identity transformation is Pq.

B 2ⁿ generating functions

Unfortunately, the variables P, q cannot always be chosen for local coordinates either; however, we can always choose some set of n new coordinates

so that together with the old q we obtain 2n independent coordinates.

Here (i₁,..., i_k)(j₁,..., j_{n − k}) is any partition of the set (1,..., n) into two non-intersecting parts; so there are in all 2ⁿ cases.

Theorem. Let g: ℝ²ⁿ → ℝ²ⁿ be a canonical transformation given by the functions P(p, q) and Q(p, q). In a neighborhood of every point (p₀, q₀) at least one of the 2ⁿ sets of functions (P_i, Q_j, q) can be taken as independent coordinates on ℝ²ⁿ:

In a neighborhood of such a point, the canonical transformation g can be reconstructed from the function

by the relations

(3)

Conversely, if S₃(P_i, Q_j, q) is any function for which the determinant det(∂²S₃/∂P ∂q)|_{P₀, q₀} (P = P_i, Q_j) is not zero, then the relations (3) give a canonical transformation in a neighborhood of the point p₀, q₀.

Proof. The proof of this theorem is almost the same as the one carried out above in the particular case k = n. We need only verify that the determinant det[(∂(P_i, Q_j)/∂(p_i, p_j))] is not zero for one of the 2ⁿ.sets (P_i, Q_j, q).

We consider the differential of our transformation g at the point (p₀, q₀). By identifying the tangent space to ℝ²ⁿ with ℝ²ⁿ, we can consider dg as a symplectic transformation S : ℝ²ⁿ → ℝ²ⁿ.

Consider the coordinate p-plane P in ℝ²ⁿ (Figure 209). This is a null n-plane, and its image SP is also a null plane. We project the plane SP onto the coordinate plane σ = {(p_i, q_j)} parallel to the remaining coordinate axes, i.e., in the direction of the n-dimensional null coordinate plane . We denote the projection operator by T S : P → σ.

Figure 209
Checking non-degeneracy

The condition det(∂(P_i, Q_j)/∂(p_i, p_j)) ≠ 0 means that T : SP → σ is nonsingular. The operator S is nonsingular. Therefore, TS is nonsingular if and only if T : SP → σ is nonsingular. In other words, the null plane SP must be transverse to the null coordinate plane . But we showed in Section 41 that at least one of the 2ⁿ null coordinate planes is transverse to SP. This means that one of our 2ⁿ determinants is nonzero, as was to be shown. ☐

Problem. Show that this system of 2ⁿ types of generating functions is minimal: given any one of the 2ⁿ determinants, there exists a canonical transformation for which only this determinant is nonzero.⁸⁶

C Infinitesimal canonical transformations

We now consider a canonical transformation which is close to the identity. Its generating function can be taken close to the generating function Pq of the identity. We look at a family of canonical transformations g_ε depending differentiably on the parameter ε, such that the generating functions have the form

(4)

An infinitesimal canonical transformation is an equivalence class of families g_ε, two families g_ε and h_ε being equivalent if their difference is small of higher than first order, |g_ε − h_ε| = O(ε²), ε → 0.

Theorem. An infinitesimal canonical transformation satisfies Hamilton’s differential equations

with hamiltonian function H(p, q) = S(p, q, 0).

Proof. The result follows from formula (4): P → p as ε → 0. ☐

Corollary. A one-parameter group of transformations of phase space ℝ²ⁿ satisfies Hamilton’s canonical equations if and only if the transformations are canonical.

The hamiltonian function H is called the “generating function of the infinitesimal canonical transformation.” We notice that unlike the generating function S, the function H is a function of points of phase space, invariantly associated to the transformation.

The function H has a simple geometric meaning. Let x and y be two points in ℝ²ⁿ (Figure 210), γ a curve connecting them, and ∂γ = y − x. Consider the images of the curve γ under the transformations g_τ, 0 ≤ τ ≤ ε; they form a band σ(ε). Now consider the integral of the form ω² = ∑ dp_i ∧ dq_i over the 2-chain σ, using the fact that ∂σ = g_εγ − γ + g_τx − g_τy.

Figure 210

Geometric meaning of Hamilton’s function

Problem. Show that
exists and does not depend on the representative of the class g_ε.

From this result we once more obtain the well-known

Corollary. Under canonical transformations the canonical equations retain their form, with the same hamiltonian function.

Proof. We computed the variation of the hamiltonian function using only an infinitesimal canonical transformation and the symplectic structure of ℝ²ⁿ—the form ω². ☐

⁷²

I.e., the unoriented line in T ℝ³ with direction vector r.

⁷³

The form ω¹ seems here to appear out of thin air. In the following paragraph we will see how the idea of using this form arose from optics.

⁷⁴

In the calculus of variations ∫ p dq − H dt is called Hilbert’s invariant integral.

⁷⁵

The proof of this theorem which is presented in the excellent book by Landau and Lifshitz (Mechanics, Pergamon, Oxford, 1960) is incorrect.

⁷⁶

In some textbooks the property of preserving the canonical form of Hamilton’s equations is taken as the definition of a canonical transformation. This definition is not equivalent to the generally accepted one mentioned above. For example, the transformation P = 2p, Q = q, which is not canonical by our definition, preserves the hamiltonian form of the equations of motion. This confusion appears even in the excellent textbook by Landau and Lifshitz (Mechanics, Oxford, Pergamon, 1960); in Section 45 of this book they show that every transformation which preserves the canonical equations is canonical in our sense.

⁷⁷

“In almost all textbooks, even the best, this principle is presented so that it is impossible to understand.” (C. Jacobi, Lectures on Dynamics, 1842–1843). I do not choose to break with tradition. A very interesting “proof” of Maupertuis’ principle is in Section 44 of the mechanics textbook of Landau and Lifshitz (Mechanics, Oxford, Pergamon, 1960).

⁷⁸

We will not pursue rigor here, and will assume that all determinants are different from zero, etc. The proofs of the subsequent theorems do not depend on the semi-heuristic arguments of this paragraph. It should be noted that the appropriate lagrangian for geometric optics is homogeneous of order 1 in the velocities. To apply the Legendre transform, and to make the analogy with mechanics in the following section, we should square this lagrangian, which does not affect the indicatrix surface where the value is 1. In fact, the real meaning of Huygens’ principle is best expressed in contact geometry (see Appendix 4 or the author’s Singularities of Caustics and Wave Fronts, Kluwer 1990).

⁷⁹

In this way, the vectors p corresponding to various fronts passing through a given point are not arbitrary, but are subject to one condition: the permissible values of p fill a hypersurface in {p}-space which is dual to the indicatrix of velocities.

⁸⁰

Problem. Show that this is not true for large t − t₀. Hint.

(Figure 199).

⁸¹

It is important to note that the quantity ∂u/∂x on the x, y-plane depends not only on the function which is taken for x, but also on the choice of the function y: in new variables (x, z) the value of ∂u/∂x will be different. One should write

⁸²

An n-parameter family of solutions of (4) is called a complete integral of the equation.

⁸³

The problem of geodesics on an ellipsoid and the closely related problem of ellipsoidal billiards have found application in a series of recent results in physics connected with laser devices.

⁸⁴

These lines of intersection of the confocal surfaces are also lines of curvature of the ellipsoid.

⁸⁵

These are also lines of curvature.

⁸⁶

The number of kinds of generating functions in different textbooks ranges from 4 to 4ⁿ.

10
Introduction to perturbation theory

V. I. Arnold¹

(1)

Department of Mathematics Steklov Mathematical Institute, Russian Academy of Sciences, GSP-1, 117966, Moscow, Russia

Perturbation theory consists of a very useful collection of methods for finding approximate solutions of “perturbed” problems which are close to completely solvable “unperturbed” problems. These methods can be easily justified if we are investigating motion over a small interval of time. Relatively little is known about how far we can trust the conclusions of perturbation theory in investigating motion over large or infinite intervals of time.

We will see that the motion in many “unperturbed” integrable problems. turns out to be conditionally periodic. In the study of unperturbed problems, and even more so in the study of the perturbed problems, special symplectic coordinates, called “action-angle” variables, are useful. In conclusion, we will prove a theorem justifying perturbation theory for single-frequency systems and will prove the adiabatic invariance of action variables in such systems.

49 Integrable systems

In order to integrate a system of 2n ordinary differential equations, we must know 2n first integrals. It turns out that if we are given a canonical system of differential equations, it is often sufficient to know only n first integrals—each of them allows us to reduce the order of the system not just by one, but by two.

A Liouville’s theorem on integrable systems

Recall that a function F is a first integral of a system with hamiltonian function H if and only if the Poisson bracket

is identically equal to zero.

Definition. Two functions F₁ and F₂ on a symplectic manifold are in involution if their Poisson bracket is equal to zero.

Liouville proved that if, in a system with n degrees of freedom (i.e., with a 2n-dimensional phase space), n independent first integrals in involution are known, then the system is integrable by quadratures.

Here is the exact formulation of this theorem: Suppose that we are given n functions in involution on a symplectic 2n-dimensional manifold

Consider a level set of the functions F_i

Assume that the n functions F_i are independent on M_f(i.e., the n 1-forms dF_i are linearly independent at each point of M_f). Then

M_f is a smooth manifold, invariant under the phase flow with hamiltonian function H = F₁.

If the manifold M_f is compact and connected, then it is diffeomorphic to the n-dimensional torus

The phase flow with hamiltonian function H determines a conditionally periodic motion on M_f, i.e., in angular coordinates φ = (φ₁,..., φ_n) we have

The canonical equations with hamiltonian function H can be integrated by quadratures.

Before proving this theorem, we note a few of its corollaries.

Corollary 1. If, in a canonical system with two degrees of freedom, a first integral F is known which does not depend on the hamiltonian H, then the system is integrable by quadratures; a compact connected two-dimensional submanifold of the phase space H = h, F = f is an invariant torus, and motion on it is conditionally periodic.

Proof. F and H are in involution since F is a first integral of a system with hamiltonian function H. □

As an example with three degrees of freedom, we consider a heavy symmetric Lagrange top fixed at a point on its axis. Three first integrals are immediately obvious: H, M_z, and M₃. It is easy to verify that the integrals M_z and M₃ are in involution. Furthermore, the manifold H = h in the phase space is compact. Therefore, we can immediately say, without any calculations, that for the majority of initial conditions⁸⁷ the motion of the top is conditionally periodic: the phase trajectories fill up the three-dimensional torus H = c₁, M_z = c₂, M₃ = c₃. The corresponding three frequencies are called frequencies of fundamental rotation, precession, and nutation.

Other examples arise from the following observation: if a canonical system can be integrated by the method of Hamilton-Jacobi, then it has n first integrals in involution. The method consists of a canonical transformation (p, q) → (P, Q) such that the Q_i are first integrals. But the functions Q_i and Q_j are clearly in involution.

In particular, the observation above applies to the problem of attraction by two fixed centers. Other examples are easily found. In fact, the theorem of Liouville formulated above covers all the problems of dynamics which have been integrated to the present day.

B Beginning of the proof of Liouville’s theorem

We turn now to the proof of the theorem. Consider the level set of the integrals:

By hypothesis, the n 1-forms dF_i are linearly independent at each point of M_f; therefore, by the implicit function theorem, M_f is an n-dimensional submanifold of the 2n-dimensional phase space.

Lemma 1. On the n-dimensional manifold M_f there exist n tangent vector fields which commute with one another and which are linearly independent at every point.

Proof. The symplectic structure of phase space defines an operator I taking 1-forms to vector fields. This operator I carries the 1-form dF_i to the field I dF_i of phase velocities of the system with hamiltonian function F_i. We will show that the n fields I dF_i are tangent to M_f, commute, and are independent.

The independence of the I dF_i at every point of M_f follows from the independence of the dF_i and the nonsingularity of the isomorphism I. The fields I dF_i commute with one another, since the Poisson brackets of their hamiltonian functions (F_i, F_j) are identically 0. For the same reason, the derivative of the function F_i in the direction of the field I dF_j is equal to zero for any i, j = 1,..., n. Thus the fields I dF_i are tangent to M_f, and Lemma 1 is proved. □

We notice that we have proved even more than Lemma 1:

1′.

The manifold M_f is invariant with respect to each of the n commuting phase flows

with hamiltonian functions

1″.

The manifold M_f is null (i.e.,the 2-form ω² is zero on TM_f|_x).

This is true since the n vectors I dF_i|_x are skew-orthogonal to one another ((F_i, F_j) ≡ 0) and form a basis of the tangent plane to the manifold M_f at the point x.

C Manifolds on which the action of the group ℝⁿ is transitive

We will now use the following topological proposition (the proof is completed in Section D).

Lemma 2. Let Mⁿ be a compact connected differentiable n-dimensional manifold, on which we are given n pairwise commutative and linearly independent at each point vector fields. Then Mⁿ is diffeomorphic to an n-dimensional torus.

Proof. We denote by

, i = 1,..., n, the one-parameter groups of diffeomorphisms of M corresponding to the n given vector fields. Since the fields commute, the groups

and

commute. Therefore, we can define an action g of the commutative group ℝⁿ = {t} on the manifold M by setting

Clearly, g^t+s = g^tg^s, t, s ∈ ℝⁿ. Now fix a point x₀ ∈ M. Then we have a map

(The point x₀ moves along the trajectory of the first flow for time t₁, along the second flow for time t₂, etc.)

Problem 1. Show that the map g (Figure 211) of a sufficiently small neighborhood V of the point 0 ∈ ℝⁿ gives a chart in a neighborhood of x₀: every point x₀ ∈ M has a neighborhood U (x₀ ∈ U ⊂ M) such that g maps V diffeomorphically onto U.

Figure 211
Problem 1

Hint. Apply the implicit function theorem and use the linear independence of the fields at x₀.

Problem 2. Show that g: ℝⁿ → M is onto.

Hint. Connect a point x ∈ M with x₀ by a curve (Figure 212), cover the curve by a finite number of the neighborhoods U of the preceding problem and define t as the sum of shifts t_i corresponding to pieces of the curve.

Figure 212
Problem 2

We note that the map g: ℝⁿ → Mⁿ cannot be one-to-one since Mⁿ is compact and ℝⁿ is not. We will examine the set of pre-images of x₀ ∈ Mⁿ.

Definition. The stationary group of the point x₀ is the set Γ of points t ∈ ℝⁿ for which g^tx₀ = x₀.

Problem 3. Show that Γ is a subgroup of the group ℝⁿ, independent of the point x₀.

Solution. If g^sx₀ = x₀ and g^tx₀ = x₀, then g^s+tx₀ = g^sg^tx₀ = g^sx₀ = x₀ and g^−tx₀ = g^−tg^tx₀ = x₀. Therefore, Γ is a subgroup of ℝⁿ. If x = g^rx₀ and t ∈ Γ, then g^tx = g^t+rx₀ = g^rg^tx₀ = g^rx₀ = x.

In this way the stationary group Γ is a well-defined subgroup of ℝⁿ independent of the point x₀. In particular, the point t = 0 clearly belongs to Γ.

Problem 4. Show that, in a sufficiently small neighborhood V of the point 0 ∈ ℝⁿ, there is no point of the stationary group other than t = 0.

Hint. The map g: V → U is a diffeomorphism.

Problem 5. Show that, in the neighborhood t + V of any point t ∈ Γ ⊂ ℝⁿ, there is no point of the stationary group Γ other than t. (Figure 213)

Figure 213
Problem 5

Thus the points of the stationary group Γ lie in ℝⁿ discretely. Such subgroups are called discrete subgroups.

Example. Let e₁,..., e_k be k linearly independent vectors in ℝⁿ, 0 ≤ k ≤ n. The set of all their integral linear combinations (Figure 214)

forms a discrete subgroup of ℝⁿ. For example, the set of all integral points in the plane is a discrete subgroup of the plane.

Figure 214

A discrete subgroup of the plane

D Discrete subgroups in ℝⁿ

We will now use the algebraic fact that the example above includes all discrete subgroups of ℝⁿ. More precisely, we will prove

Lemma 3. Let Γ be a discrete subgroup of ℝⁿ. Then there exist k (0 ≤ k ≤ n) linearly independent vectors e₁,..., e_k ∈ Γ such that Γ is exactly the set of all their integral linear combinations.

Proof. We will consider ℝⁿ with some euclidean structure. We always have 0 ∈ Γ. If Γ = {0} the lemma is proved. If not, there is a point e₀ ∈ Γ, e₀ ≠ 0 (Figure 215). Consider the line ℝe₀. We will show that among the elements of Γ on this line, there is a point e₁ which is closest to 0. In fact, in the disk of radius |e₀| with center 0, there are only a finite number of points of Γ (as we saw above, every point x of Γ has a neighborhood V of standard size which does not contain any other point of Γ). Among the finite number of points of Γ inside this disc and lying on the line ℝe₀, the point closest to 0 will be the closest point to 0 on the whole line. The integral multiples of this point e₁ (me₁, m ∈ ℤ) constitute the intersection of the line ℝe₀ with Γ.

Figure 215

Proof of the lemma on discrete subgroups

In fact, the points me₁ divide the line into pieces of length |e₁|. If there were a point e ∈ Γ inside one of these pieces (me₁, (m + 1)e₁), then the point e − me₁ ∈ Γ would be closer to 0 than e₁.

If there are no points of Γ off the line ℝe₁, the lemma is proved. Suppose there is a point e ∈ Γ, e ∉ ℝe₁. We will show that there is a point e₂ ∈ Γ closest to the line ℝe₁ (but not lying on the line). We project e orthogonally onto ℝe₁. The projection lies in exactly one interval Δ = {λe₁}, m ≤ λ < m + 1. Consider the right circular cylinder C with axis Δ and radius equal to the distance from Δ to e. In this cylinder lie a finite (nonempty) number of points of the group Γ. Let e₂ be the closest one to the axis ℝe₁ not lying on the axis.

Problem 6. Show that the distance from this axis to any point e of Γ not lying on ℝe₁ is greater than or equal to the distance of e₂ from ℝe₁.

Hint. By a shift of me₁ we can move the projection of e onto the axis interval Δ.

The integral linear combinations of e₁ and e₂ form a lattice in the plane ℝe₁ + ℝe₂.

Problem 7. Show that there are no points of Γ on the plane ℝe₁ + ℝe₂ other than integral linear combinations of e₁ and e₂.

Hint. Partition the plane into parallelograms (Figure 216) Δ = {λ₁e₁ + λ₂e₂ }, m_i ≤ λ_i < m_i + 1. If there were an e ∈ Δ with e ≠ m₁e₁ + m₂e₂, then the point e − m₁e₁ − m₂e₂ would be closer to ℝe₁ than e₂.

Figure 216
Problem 7

If there are no points of Γ outside the plane ℝe₁ + ℝe₂, the lemma is proved. Suppose that there is a point e ∈ Γ outside this plane. Then there exists a point e₃ ∈ Γ closest to ℝe₁ + ℝe₂; the points m₁e₁ + m₂e₂ + m₃e₃ exhaust Γ in the three-dimensional space ℝe₁ + ℝe₂ + ℝe₃. If Γ is not exhausted by these, we take the closest point to this three-dimensional space, etc.

Problem 8. Show that this closest point always exists.

Hint. Take the closest of the finite number of points in a “cylinder” C.

Note that the vectors e₁, e₂, e₃,... are linearly independent. Since they all lie in ℝⁿ, there are k ≤ n of them.

Problem 9. Show that Γ is exhausted by the integral linear combinations of e₁,..., e_k.

Hint. Partition the plane ℝe₁ + ⋯ + ℝe_k into parallelepipeds Δ and show that there cannot be a point of Γ in any Δ. If there is an e ∈ Γ outside the plane ℝe₁ + ⋯ + ⋯ + ℝe_k, the construction is not finished.

Thus Lemma 3 is proved. □

It is now easy to prove Lemma 2: M_f is diffeomorphic to a torus Tⁿ.

Consider the direct product of k circles and n − k straight lines:

together with the natural map p: ℝ²ⁿ → T^k × ℝ^{n − k},

The points f₁,..., f_k ∈ ℝⁿ (f_ihas coordinates φ_i = 2π, φ_j = 0, y = 0) are mapped to 0 under this map.

Let e₁,..., e_k ∈ Γ ⊂ ℝⁿ be the generators of the group Γ (cf. Lemma 3). We map the vector space ℝⁿ = {(φ, y)} onto the space ℝⁿ = {t} so that the vectors f_i go to e_i. Let A: ℝⁿ → ℝⁿ be such an isomorphism.

We now note that ℝⁿ = {(φ, y)} gives charts for T^k × ℝ^{n − k}, and ℝⁿ = {t} gives charts for our manifold M_f.

Problem 10. Show that the map of charts A: ℝⁿ → ℝⁿ gives a diffeomorphism Ã : T^k × ℝⁿ⁻^k → M_f,

But, since the manifold M_f is compact by hypothesis, k = n and M_f is an n-dimensional torus. Lemma 2 is proved. □

In view of Lemma 1, the first two statements of the theorem are proved. At the same time, we have constructed angular coordinates φ₁,..., φ_nmod 2π on M_f.

Problem 11. Show that under the action of the phase flow with hamiltonian H the angular coordinates φ vary uniformly with time

In other words, motion on the invariant torus M_f is conditionally periodic.

Hint. φ = A⁻¹t.

Of all the assertions of the theorem, only the last remains to be proved: that the system can be integrated by quadratures.

50 Action-angle variables

We show here that, under the hypotheses of Liouville’s theorem, we can find symplectic coordinates (I, φ) such that the first integrals F depend only on I, and φ are angular coordinates on the torus M_f.

A Description of action-angle variables

In Section 49 we studied one particular compact connected level manifold of the integrals: M_f ={x: F(x) = f}; it turned out that M_f was an n-dimensional torus, invariant with respect to the phase flow. We chose angular coordinates φ_i on M so that the phase flow with hamiltonian function H = F₁ takes an especially simple form:

We will now look at a neighborhood of the n-dimensional manifold M_f in 2n-dimensional phase space.

Problem. Show that the manifold M_f has a neighborhood diffeomorphic to the direct product of the n-dimensional torus Tⁿ and the disc Dⁿ in n-dimensional euclidean space.

Hint. Take the functions F_i and the angles φ_i constructed above as coordinates. In view of the linear independence of the dF_i, the functions F_i and φ_i (i = 1,..., n) give a diffeomorphism of a neighborhood of M_f onto the direct product Tⁿ × Dⁿ.

In the coordinates (F, φ) the phase flow with hamiltonian function H = F₁ can be written in the form of the simple system of 2n ordinary differential equations

(1)

which is easily integrated: F(t) = F(0), φ(t) = φ(0) + ω(F(0))t.

Thus, in order to integrate explicitly the original canonical system of differential equations, it is sufficient to find the variables φ in explicit form. It turns out that this can be done using only quadratures. A construction of the variables φ is given below.

We note that the variables (F, φ) are not, in general, symplectic coordinates. It turns out that there are functions of F, which we will denote by I = I(F), I = (I₁,..., I_n), such that the variables (I, φ) are symplectic coordinates: the original symplectic structure ω² is expressed in them by the usual formula

The variables I are called action variables;⁸⁸ together with the angle variables φ they form the action-angle system of canonical coordinates in a neighborhood of M_f.

The quantities I_i are first integrals of the system with hamiltonian function H = F₁, since they are functions of the first integrals F_j. In turn, the variables F_i can be expressed in terms of I and, in particular, H = F₁= H(I). In action-angle variables the differential equations of our flow (1) have the form

(2)

Problem. Can the functions ω(I) in (2) be arbitrary?

Solution. In the variables (I, φ), the equations of the flow (2) have the canonical form with hamiltonian function H(I). Therefore, ω(I) = ∂H/∂I; thus if the number of degrees of freedom is n ≥ 2, the functions ω(I) are not arbitrary, but satisfy the symmetry condition ∂ω_i/∂I_j = ∂ω_j/∂I_i.

Action-angle variables are especially important for perturbation theory; in Section 52 we will demonstrate their application to the theory of adiabatic invariants.

B Construction of action-angle variables in the case of one degree of freedom

A system with one degree of freedom in the phase plane (p, q) is given by the hamiltonian function H(p, q).

Example 1. The harmonic oscillator H = ½p² + ½q²; or, more generally, H = ½a²p² + ½b²q².

Example 2. The mathematical pendulum H = ½p² − cos q. In both cases we have a compact closed curve M_h(H = h), and the conditions of the theorem of Section 49 for n = 1 are satisfied.

In order to construct the action-angle variables, we will look for a canonical transformation (p, q) → (I, φ) satisfying the two conditions:

(3)

Problem. Find the action-angle variables in the case of the simple harmonic oscillator H = ½p² + ½q².

Solution. If r, φ are polar coordinates, then dp ⋀ dq = r dr ⋀ dφ = d(r²/2) ⋀ dφ. Therefore, I = H = (p² + q²)/2.

In order to construct the canonical transformation p, q → I, φ in the general case, we will look for its generating function S(I, q):

(4)

We first assume that the function h(I) is known and invertible, so that every curve M_h is determined by the value of I (M_h = M_h(I)). Then for a fixed value of I we have from (4)

This relation determines a well-defined differential 1-form dS on the curve M_h(I).

Integrating this 1-form on the curve M_h(I) we obtain (in a neighborhood of a point q₀) a function

This function will be the generating function of the transformation (4) in a neighborhood of the point (I, q₀). The first of the conditions (3) is satisfied automatically: I = I(h). To verify the second condition, we consider the behavior of S(I, q) “in the large.” After a circuit of the closed curve M_h(I) the integral of p dq increases by

equal to the area Π enclosed by the curve M_h_(I). Therefore, the function S is a “multiple-valued function” on M_h_(I): it is determined up to addition of integral multiples of Π. This term has no effect on the derivative ∂S(I, q)/∂q; but it leads to the multi-valuedness of φ = ∂S/∂I. This derivative turns out to be defined only up to multiples of d ΔS(I)/dI. More precisely, the formulas (4) define a 1-form dφ on the curve M_h(I), and the integral of this form on M_h(I) is equal to d ΔS(I)/dI.

In order to fulfill the second condition,

, we need that

where

is the area bounded by the phase curve H = h.

Definition. The action variable in the one-dimensional problem with hamiltonian function H(p, q) is the quantity I(h) = (1/2π)Π(h).

Finally, we arrive at the following conclusion. Let dΠ/dh ≠ 0. Then the inverse I(h) of the function h(I) is defined.

Theorem. Set

. Then formulas (4) give a canonical transformation p, q → I, φ satisfying conditions (3).

Thus, the action-angle variables in the one-dimensional case are constructed.

Problem. Find S and I for a harmonic oscillator.

Answer. If H = ½a²p² + ½b²q² (Figure 217), then M_h is the ellipse bounding the area . Thus for a harmonic oscillator the action variable is the ratio of energy to frequency. The angle variable φ is, of course, the phase of oscillation.

Figure 217
Action variable for a hamonic oscillator

Problem. Show that the period T of motion along the closed curve H = h on the phase plane p, q is equal to the derivative with respect to h of the area bounded by this curve:

Solution. In action-angle variables the equations of motion (2) give

C Construction of action-angle variables in ℝ²ⁿ

We turn now to systems with n degrees of freedom given in ℝ²ⁿ = {(p, q)} by a hamiltonian function H(p, q) and having n first integrals in involution F₁ = H, F₂, ..., F_n. We will not repeat the reasoning which brought us to the choice of

in the one-dimensional case, but will immediately define n action variables I.

Let γ₁, ..., γ_n be a basis for the one-dimensional cycles on the torus M_f (the increase of the coordinate φ_i on the cycle γ_j is equal to 2π if i = j and 0 if i ≠ j). We set

(5)

Problem. Show that this integral does not depend on the choice of the curve γ_i representing the cycle (Figure 218).

Figure 218
Independence of the curve of integration for the action variable

Hint. In Section 49 we showed that the 2-form ω² = ∑dp_i ⋀ dq_i on the manifold M_f is equal to zero. By Stokes’ formula,
where ∂σ = γ − γ′.

Definition. The n quantities I_i(f) given by formula (5) are called the action variables.

We assume now that, for the given values f_i of the n integrals F_i, the n quantities I_i are independent: det(∂I/∂f)|_f ≠ 0. Then in a neighborhood of the torus M_f we can take the variables I, φ as coordinates.

Theorem. The transformation p, q → I, φ is canonical, i.e.,

We outline the proof of this theorem. Consider the differential 1-form p dq on M_f. Since the manifold M_f is null (Section 49) this 1-form on M_f is closed: its exterior derivative ω² = dp ⋀ dq is identically equal to zero on M_f. Therefore (Figure 219),

does not change under deformations of the path of integration (Stokes’ formula). Thus S(x) is a “multiple-valued function” on M_f, with periods equal to

Figure 219

Independence of the path for the integral of p dq on M_f

Now let x₀ be a point on M_f, in a neighborhood of which the n variables q are coordinates on M_f, such that the submanifold M_f ⊂ ℝ²ⁿ is given by n equations of the form p = p(I, q), q(x₀) = q₀ . In a simply connected neighborhood of the point q₀ a single-valued function is defined,

and we can use it as the generating function of a canonical transformation p, q → I, φ:

It is not difficult to verify that these formulas actually give a canonical transformation, not only in a neighborhood of the point under consideration, but also “in the large” in a neighborhood of M_f. The coordinates φ will be multiple-valued with periods

as was to be shown. □

We now note that all our constructions involve only “algebraic” operations (inverting functions) and “quadrature”—calculation of the integrals of known functions. In this way the problem of integrating a canonical system with 2n equations, of which n first integrals in involution are known, is solved by quadratures, which proves the last assertion of Liouville’s theorem (Section 49). □

Remark 1. Even in the one-dimensional case the action-angle variables are not uniquely defined by the conditions (3). We could have taken I′ = I + const for the action variable and φ′ = φ + c(I) for the angle variable.

Remark 2. We constructed action-angle variables for systems with phase space ℝ²ⁿ. We could also have introduced action-angle variables for a system on an arbitrary symplectic manifold. We restrict outselves here to one simple example (Figure 220).

Figure 220

Action-angle variables on a symplectic manifold

We could have taken the phase space of a pendulum (H = ½p² − cos q) to be, instead of the plane {(p, q)}, the surface of the cylinder ℝ¹ × S¹ obtained by identifying angles q differing by an integral multiple of 2π.

The critical level lines H = ±1 divide the cylinder into three parts, A, B, and C, each of which is diffeomorphic to the direct product ℝ¹ × S¹. We can introduce action-angle variables into each part. In the bounded part (B) the closed trajectories represent the oscillation of the pendulum; in the unbounded parts they represent rotation.

Remark 3. In the general case, as in the example analyzed above, the equations F_i = f_i cease to be independent for some values of f_i, and M_f ceases to be a manifold. Such critical values of f correspond to separatrices dividing the phase space of the integrable problem into parts corresponding to the parts A, B, and C above. In some of these parts the manifolds M_f can be unbounded (parts A and C in the plane {(p, q)}); others are stratified into n-dimensional invariant tori M_f; in a neighborhood of such a torus we can introduce action-angle variables.

51 Averaging

In this paragraph we show that time averages and space averages are equal for systems undergoing conditionally periodic motion.

A Conditionally periodic motion

In the earlier sections of this book, we have frequently encountered conditionally periodic motion: Lissajous figures, precession, nutation, rotation of a top, etc.

Definition. Let Tⁿ be the n-dimensional torus and φ = (φ₁, ..., φ_n) mod 2π angular coordinates. Then by a conditionally periodic motion we mean a one-parameter group of diffeomorphisms Tⁿ → Tⁿ given by the differential equations (Figure 221):

Figure 221

Conditionally periodic motion

These differential equations are easily integrated:

Thus the trajectories in the chart {φ} are straight lines. A trajectory on the torus is called a winding of the torus.

Example. Let n = 2. If ω₁/ω₂ = k₁/k₂, the trajectories are closed: if (ω₁/ω₂ is irrational, then trajectories on the torus are dense (cf. Section 16).

The quantities ω₁,..., ω_n are called the frequencies of the conditionally periodic motion. The frequencies are called independent if they are linearly independent over the field of rational numbers: if k ∈ Zⁿ ⁸⁹ and (k,ω) = 0, then k = 0.

B Space average and time average

Let f(φ) be an integrable function on the torus Tⁿ.

Definition. The space average of a function f on the torus Tⁿ is the number

Consider the value of the function f(φ) on the trajectory φ(t) = φ₀ + ωt. This is a function of time, f(φ₀ + ωt). We consider its average.

Definition. The time average of the function f on the torus Tⁿ is the function

(defined where the limit exists).

Theorem on the averages. The time average exists everywhere, and coincides with the space average if f is continuous (or merely Riemann integrable) and the frequencies ω_i are independent.

Problem. Show that if the frequencies are dependent, then the time average can differ from the space average.

Corollary 1. If the frequencies are independent, then every trajectory {φ(t)} is dense on the torus Tⁿ.

Proof. Assume the contrary. Then in some neighborhood D of some point of the torus, there is no point of the trajectory φ(t). It is easy to construct a continuous function f equal to zero outside D and with space average equal to 1. The time average f*(φ₀) on the trajectory φ(t) is equal to 0 ≠ 1. This contradicts the assertion of the theorem. □

Corollary 2. If the frequencies are independent, then every trajectory is uniformly distributed on the torus Tⁿ.

This means that the time the trajectory spends in a neighborhood D is proportional to the measure of D.

More precisely, let D be a (Jordan) measurable region of Tⁿ. We denote by τ_D(T) the amount of time that the interval 0 ≤ t ≤ T of the trajectory φ(t) is inside of D. Then

Proof. We apply the theorem to the characteristic function f of the set D (f is Riemann integrable since D is Jordan measurable). Then

, and

mes D, and the corollary follows immediately from the theorem. □

Corollary. In the sequence
of first digits of the numbers 2ⁿ, the number 7 appears (log 8 − log 7)/(log 9 − log 8) times as often as 8.

The theorem on averages may be found implicitly in the work of Laplace, Lagrange, and Gauss on celestial mechanics; it is one of the first “ergodic theorems.” A rigorous proof was given only in 1909 by P. Bohl, W. Sierpinski, and H. Weyl in connection with a problem of Lagrange on the mean motion of the earth’s perihelion. Below we reproduce H. Weyl’s proof.

C Proof of the theorem on averages

Lemma 1. The theorem is true for exponentials f = e^{i(k, φ)}, k ∈ ℤⁿ.

Proof. If k = 0, then

and the theorem is obvious. If k ≠ 0, then

On the other hand,

Therefore, the time average is

□

Lemma 2. The theorem is ιrue for trigonometric polynomials

Proof. Both the time and space averages depend linearly on f, and therefore agree by Lemma 1. □

Lemma 3. Let f be a real continuous (or at least Riemann integrable) function. Then, for any ε > 0, there exist two trigonometric polynomials P₁ and P₂ such that P₁ < f < P₂ and (1/(2πⁿ) ∫_Tⁿ(P₂ − P₁)dφ < ε.

Proof. Suppose first that f is continuous. By the Weierstrass theorem, we can approximate f by a trigonometric polynomial P with |f − P| < ½ε. The polynomials P₁ = P − ½ε and P₂ = P + ½ε are the ones we are looking for.

If f is not continuous but Riemann integrable, then there are two continuous functions f₁ and f₂ such that f₁ < f < f₂ and

(Figure 222 corresponds to the characteristic function of an interval). By approximating f₁ and f₂ by polynomials P₁ < f₁ < f₂ < P₂,

we obtain what we need. Lemma 3 is proved. □

Figure 222

Approximation of the function f by trigonometric polynomials P₁ and P₂

It is now easy to finish the proof of the theorem. Let ε > 0. Then, by Lemma 3, there are trigonometric polynomials P₁ < f < P₂ with (2π) ⁻ⁿ ∫ (P₂ − P₁)dφ < ε.

For any T, we then have

By Lemma 2, for T > T₀(ε),

Furthermore,

. Therefore,

and

; therefore, for T > T₀(ε),

as was to be proved. □

Problem. A two-dimensional oscillator with kinetic energy and potential energy U = ½x² + y² performs an oscillation with amplitudes a_x = 1 and a_y = 1. Find the time average of the kinetic energy.

Problem.⁹⁰ Let ω_k be independent, a_k > 0. Calculate

Answer. (ω₁α₁ + ω₂α₂ + ω₃α₃)/π, where α₁, α₂, and α₃ are the angles of the triangle with sides a_k (Figure 223).

Figure 223
Problem on mean motion of perihelia

D Degeneracies

So far we have considered the case when the frequencies ω are independent. An integral vector k ∈ ℤⁿ is called a relation among the frequencies if (k, ω) = 0.

Problem. Show that the set of all relations between a given set of frequencies ω is a subgroup Γ of the lattice ℤⁿ.

We saw in Section 49 that such a subgroup consists entirely of linear combinations of r independent vectors k_i, 1 ≤ r ≤ n. We say that there are r (independent) relations among the frequencies.⁹¹

Problem. Show that the closure of a trajectory {φ(t) = φ₀ + ωt} (on Tⁿ) is a torus of dimension n − r if there are r independent relations among the frequencies ω in this case the motion on T^{n − r} is conditionally periodic with n − r independent frequencies.

We turn now to the integrable hamiltonian system given in action-angle variables I, φ by the equations

Every n-dimensional torus I = const in the 2n-dimensional phase space is invariant, and motion on it is conditionally periodic.

Definition. A system is called nondegenerate if the determinant

is not zero.

Problem. Show that, if a system is nondegenerate, then in any neighborhood of any point there is a conditionally periodic motion with n frequencies, and also with any smaller number of frequencies.

Hint. We can take the frequencies ω themselves instead of the variables I as local coordinates. In the space of collections of frequencies, the set of points ω with any number of relations r(0 ≤ r < n) is dense.

Corollary. If a system is nondegenerate, then the invariant tori I = const are uniquely defined, independent of the choice of action-angle coordinates I, φ, the construction of which always involves some arbitrariness.⁹²

Proof. The tori I = const can be defined as the closures of the phase trajectories corresponding to the independent ω. □

We note incidentally that, for the majority of values I, the frequencies ω will be independent.

Problem. Show that the set of I for which the frequencies ω(I) in a nondegenerate system are dependent has Lebesgue measure equal to zero.

Hint. Show first that

On the other hand, in degenerate systems we can construct systems of action-angle variables such that the tori I = const will be different in different systems. This is the case because the closures of trajectories in a degenerate system are tori of dimension k < n, and they can be contained in different ways in n-dimensional tori.

Example 1. The planar harmonic oscillator ẍ = −x; n = 2, k = 1. Separation of variables in cartesian and polar coordinates leads to different action-angle variables and different tori.

Example 2. Keplerian planar motion (U = − 1/r), n = 2, k = 1. Here, too, separation of variables in polar and in elliptic coordinates leads to different I.

52 Averaging of perturbations

Here we show the adiabatic invariance of the action variable in a system with one degree of freedom.

A Systems close to integrable ones

We have considered a great many integrable systems (one-dimensional problems, the two-body problem, small oscillations, the Euler and Lagrange cases of the motion of a rigid body with a fixed point, etc.). We studied the characteristics of phase trajectories in these systems: they turned out to be “windings of tori,” densely filling up the invariant tori in phase space; every trajectory is uniformly distributed on this torus.

One should not conclude from this that integrability is the typical situation. Actually, the properties of trajectories in many-dimensional systems can be highly diverse and not at all similar to the properties of conditionally periodic motions. In particular, the closure of a trajectory of a system with n degrees of freedom can fill up complicated sets of dimension greater than n in 2n-dimensional phase space; a trajectory could even be dense and uniformly distributed on a whole (2n − 1)-dimensional manifold given by the equation H = h.⁹³ One may call such systems “nonintegrable” since they do not admit single-valued first integrals independent of H. The study of such systems is still far from complete; it constitutes a problem in “ergodic theory.”

One approach to nonintegrable systems is to study systems which are close to integrable ones. For example, the problem of the motion of planets around the sun is close to the integrable problem of the motion of noninteracting points around a stationary center; other examples are the problem of the motion of a slightly asymmetric heavy top and the problem of nonlinear oscillations close to an equilibrium position (the nearby integrable problem is linear). The following method is especially fruitful in the investigation of these and similar problems.

B The averaging principle

Let I, φ be action-angle variables in an integrable (“unperturbed”) system with hamiltonian function H₀(I):

As the nearby “perturbed” system we take the system

(1)

where ε ≪ 1.

We will ignore for a while that the system is hamiltonian and consider an arbitrary system of differential equations in the form (1) given on the direct product T^k × G of the k-dimensional torus T^k = {φ =(φ₁,...,φ_k)mod 2π} and a region G in l-dimensional space G ⊂ ℝ¹ = {I = (I₁, ..., I₁)}. For ε = 0 the motion in (1) is conditionally periodic with at most k frequencies and with k-dimensional invariant tori.

The averaging principle for system (1) consists of its replacement by another system, called the averaged system:

(2)

in the l-dimensional region G ⊂ ℝ^l = {J = (J_i, ..., J_i)}.

We claim that system (2) is a “good approximation” to system (1).

We note that this principle is neither a theorem, an axiom, nor a definition, but rather a physical proposition, i.e., a vaguely formulated and, strictly speaking, untrue assertion. Such assertions are often fruitful sources of mathematical theorems.

This averaging principle may be found explicitly in the work of Gauss (in studying the perturbations of planets on one another, Gauss proposed to distribute the mass of each planet around its orbit proportionally to time and to replace the attraction of each planet by the attraction of the ring so obtained). Nevertheless, a satisfactory description of the connection between the solutions of systems (1) and (2) in the general case has not yet been found.

In replacing system (1) by system (2) we discard the term

on the right-hand side. This term has order ε as does the remaining term

. In order to understand the different roles of the terms

and

in g, we consider the simplest example.

Problem. Consider the case k = l = 1,

Show that for 0 < t < 1/ε,

Solution

where

is a periodic, and therefore bounded, function.

Thus the variation in I with time consists of two parts: an oscillation of order ε depending on

and a systematic “evolution” with velocity

(Figure 224).

Figure 224

Evolution and oscillation

The averaging principle is based on the assertion that in the general case the motion of system (1) can be divided into the “evolution” (2) and small oscillations. In its general form, this assertion is invalid and the principle itself is untrue. Nevertheless, we will apply the principle to the hamiltonian system (1):

For the right-hand side of the averaged system (2) we then obtain

In other words, there is no evolution in a nondegenerate hamiltonian system.

One variant of this entirely nonrigorous deduction leads to the so-called Laplace theorem: The semi-major axes of the keplerian ellipses of the planets have no secular perturbations.

The discussion above suffices to convince us of the importance of the averaging principle; we now formulate a theorem justifying this principle in one very particular case—that of single-frequency oscillations (k = 1). This theorem shows that the averaging principle correctly describes evolution over a large interval of time (0 < t < 1/ε).

C Averaging in a single-frequency system

Consider the system of l + 1 differential equations

(1)

where f(I, φ + 2π) ≡f(I, φ) and g(I, φ + 2π) ≡ g(I, φ), together with the “averaged” system of l equations

(2)

We denote by I(t), φ(t) the solution of system (1) with initial conditions I(0), φ(0), and by J(t) the solution of system (2) with the same initial conditions J(0) = I(0) (Figure 225).

Figure 225

Theorem on averaging

Theorem. Suppose that:

the functions ω, f, and g are defined for I in a bounded region G, and in this region they are bounded, together with their derivatives up to second order:

in the region G, we have

for 0 ≤ t ≤ 1/ε, a neighborhood of radius d of the point J(t) belongs to G:

Then for sufficiently small ε (0 < ε < ε₀)

where the constant c₉ > 0 depends on c₁, c, and d, but not on ε.

Some applications of this theorem will be given below (“adiabatic invariants”). We remark that the basic idea of the proof of this theorem (a change of variables diminishing the perturbation) is more important than the theorem itself; this is one of the basic ideas in the theory of ordinary differential equations; it is encountered in elementary courses as the “method of variation of constants.”

D Proof of the theorem on averaging

In place of the variables I we will introduce new variables P

(3)

where the function k, 2π-periodic in φ, will be chosen so that the vector P will satisfy a simpler differential equation.

By (1) and (3), the rate of change of P(t) is

(4)

We assume that the substitution (3) can be inverted, so that

(5)

(where the functions h are 2π-periodic in φ).

Then (4) and (5) imply that P(t) satisfies the system of equations

(6)

where the “remainder term” R is small of second order with respect to ε:

(7)

if only

(8)

We will now try to choose the change of variables (3) so that the term involving ε in (6) becomes zero. For k we get the equation

In general, such an equation is not solvable in the class of functions k periodic in φ. In fact, the average value (with respect to φ) of the left-hand side is always equal to 0, and the average value of the right-hand side can be different from 0. Therefore, we cannot choose k in such a way as to kill the entire term involving ε in (6). However, we can kill the entire “periodic” part of g,

by setting

(9)

So we define the function k by formula (9). Then, by hypotheses 1. and 2. of the theorem, the function k satisfies the estimate ∥k∥_c² < c₃, where c₃(c₁, c) > 0. In order to establish the inequality (8), we must estimate h. For this we must first show that the substitution (3) is invertible.

Fix a positive number α.

Lemma. If ε is sufficiently small, then the restriction of the mapping (3)⁹⁴

to the region G − α (consisting of points whose α-neighborhood is contained in G) is a diffeomorphism. The inverse diffeomorphism (5) in the region G − 2α satisfies the estimate ∥h∥_c² < c₄ with some constant c₄(α, c₃) > 0.

Proof. The necessary estimate follows directly from the implicit function theorem. The only difficulty is in verifying that the map I → I + εk is one-to-one in the region G − α. We note that the function k satisfies a Lipschitz condition (with some constant L(α, c₃)) in G − α. Consider two points I₁, I₂ in G − α. For sufficiently small ε (namely, for Lε < 1) the distance between εk(I₁) and εk(I₂) will be smaller than |I₁ − I₂|. Therefore, I₁ + εk(I₁) ≠ I₂ + εk(I₂). Thus the map (3) is one-to-one on G − α, and the lemma is proved. □

It follows from the lemma that for ε small enough all the estimates (8) are satisfied. Thus the estimate (7) is also true.

We now compare the system of differential equations for J

(2)

and for P; the latter, in view of (9), takes the form

(6′)

Since the difference between the right sides is of order ≲ ε² (cf. (7)), for time t ≲ 1/ε the difference |P − J| between the solutions is of order ε (Figure 226). On the other hand, |I − P| = ε|k| ≲ ε. Thus, for t ≲ 1/ε, the difference |I − J| is of order ≲ ε, as was to be proved. □

Figure 226

Proof of the theorem on averaging

To find an accurate estimate, we introduce the quantity
(10)
Then (6′) and (9) imply
where |R′| < c₂ε² + c₅ε|z| if the segment (P, J) lies in G − α. Under this assumption we find
(11)

Lemma. If |ż| ≤ a|z| + b and |z(0)| < d for a, b, d, t > 0, then |z(t)| ≤ (d + bt)e^al.

Proof. |z(t)| is no greater than the solution y(t) of the equation . Solving this equation, we find . □

Now from (11) and the assumption that the segment (P, J) lies in G − α (Figure 226), we have
From this it follows that, for 0 ≤ t ≤ 1/cε,
We see that, if α = d/3 and ε is small enough, the entire segment (P(t), J(t)) (t ≤ 1/ε) lies inside G − α and, therefore,
On the other hand, |P(t) − I(t)| < |εk| < c₃ ε. Thus, for all t with 0 ≤ t ≤ 1/ε,
and the theorem is proved. □

E Adiabatic invariants

Consider a hamiltonian system with one degree of freedom, with hamiltonian function H(p, q; λ) depending on a parameter λ. As an example, we can take a pendulum:

as the parameter λ we can take the length l or the acceleration of gravity g. Suppose that the parameter changes slowly with time. It turns out that in the limit as the rate of change of the parameter approaches 0, there is a remarkable asymptotic phenomenon: two quantities, generally independent, become functions of one another.

Assume, for example, that the length of the pendulum changes slowly (in comparison with its characteristic oscillations). Then the amplitude of its oscillation becomes a function of the length of the pendulum. If we very slowly increase by a factor of two the length of the pendulum and then very slowly decrease it to the original value, then at the end of this process the amplitude of the oscillation will be the same as it was at the start.

Furthermore, it turns out that the ratio of the energy H of the pendulum to the frequency ω changes very little under a slow change of the parameter, although the energy and frequency themselves may change a lot. Quantities such as this ratio, which change little under slow changes of parameter, are called by physicists adiabatic invariants.

It is easy to see that the adiabatic invariance of the ratio of the energy of a pendulum to its frequency is an assertion of a physical character, i.e., it is untrue without further assumptions. In fact, if we vary the length of a pendulum arbitrarily slowly, but chose the phase of oscillation under which the length increases and decreases, we can set the pendulum swinging (parametric resonance). In view of this, physicists have suggested formulating the definition of adiabatic invariance as follows: the person changing the parameters of the system must not see what state the system is in (Figure 227). Giving this definition a rigorous mathematical meaning is a very delicate and as yet unsolved problem. Fortunately, we can get along with a surrogate. The assumption of ignorance of the internal state of the system on the part of the person controlling the parameter may be replaced by the requirement that the change of parameter must be smooth, i.e., twice continuously differentiable.

Figure 227

Adiabatic change in the length of a pendulum

More precisely, let H(p, q; λ) be a fixed, twice continuously differentiable function of λ. Set λ = εt and consider the resulting system with slowly varying parameter λ = εt:

(*)

Definition. The quantity I(p, q; λ) is an adiabatic invariant of the system (*) if for every κ > 0 there is an ε₀ > 0 such that if 0 < ε < ε₀ and 0 < t < 1/ε, then

Clearly, every first integral is also an adiabatic invariant. It turns out that every one-dimensional system (*) has an adiabatic invariant. Namely, the adiabatic invariant is the action variable in the corresponding problem with constant coefficients.

Assume that the phase trajectories of the system with hamiltonian H(p, q; λ) are closed. We define a function I(p, q; λ) in the following way. For fixed λ there is a phase portrait corresponding to the hamiltonian function H(p, q; λ) (Figure 228). Consider the closed phase trajectory passing through a point (p, q). It bounds some region in the phase plane. We denote the area of this region by 2πI(p, q; λ). I = const on every phase trajectory (for given λ). Clearly, I is nothing but the action variable (cf. Section 50).

Figure 228

Adiabatic invariant of a one-dimensional system

Theorem. If the frequency ω(I, λ) of the system (*) is nowhere zero, then I(p, q; λ) is an adiabatic invariant.

F Proof of the adiabatic invariance of action

For fixed λ we can introduce action-angle variables I, φ into the system (*) by a canonical transformation depending on

We denote by S(I, q; λ) the (multiple-valued) generating function of this transformation:

Now let λ = εt. Since the change from variables p, q to variables I, φ is now performed by a time dependent canonical transformation, the equations of motion in the new variables I, φ have the hamiltonian form, but with hamiltonian function (cf. Section 45A)

Problem. Show that ∂S(I, q; λ)/∂λ is a single-valued function on the phase plane.

Hint. S is determined up to the addition of multiples of 2πI.

In this way we obtain the equations of motion in the form

Since ω ≠ 0, the averaging theorem (Section 52C) is applicable. The averaged system has the form

But g = (∂/∂φ) (∂S/∂λ), and ∂S/∂λ is a single-valued function on the circle I = const. Therefore,

, and in the averaged system J does not change at all: J(t) = J(0).

By the averaging theorem, |I(t) − I(0)| < cε for all t with 0 ≤ t ≤ 1/ε, as was to be proved. □

Example. For a harmonic oscillator (cf. Figure 217),

i.e., the ratio of energy to frequency is an adiabatic invariant.

Problem. The length of a pendulum is slowly doubled (l = l₀(1 + εt), 0 ≤ t ≤ 1/ε). How does the amplitude q_max of the oscillations vary?

Solution.

therefore,

As a second example, consider the motion of a perfectly elastic rigid ball of mass 1 between perfectly elastic walls whose separation l slowly varies (Figure 229). We may consider that a point is moving in an “infinitely deep rectangular potential well,” and that the phase trajectories are rectangles of area 2vl, where v is the velocity of the ball. In this case the product vl of the velocity of the ball and the distance between the walls turns out to be an adiabatic invariant.⁹⁵ Thus if we make the walls twice as close together, the velocity of the ball doubles, and if we separate the walls, the velocity decreases.

Figure 229

Adiabatic invariant of an absolutely elastic ball between slowly changing walls

⁸⁷

The singular level sets, where the integrals are not functionally independent, constitute the exception.

⁸⁸

It is not hard to see that I has the dimensions of action.

⁸⁹

k = (k₁,..., k_n) with integral k_i.

⁹⁰

Lagrange showed that the investigation of the average motion of the perihelion of a planet reduces to a similar problem. The solution of this problem can be found in the work of H. Weyl. The eccentricity of the earth’s orbit varies as the modulus of an analogous sum. Ice ages appear to be related to these changes in eccentricity.

⁹¹

Show that the number r does not depend on the choice of independent vectors k_i.

⁹²

For example, we can always write the substitution I′ = I, φ′ = φ + S_I(I), or I₁, I₂; φ₁, φ₂ → I₁ + I₂, I₂; φ₁, φ₂ − φ₁.

⁹³

For example, inertial motion on a manifold of negative curvature has this property.

⁹⁴

For any fixed value of the parameter φ.

⁹⁵

This does not formally follow from the theorem, since the theorem concerns smooth systems without shocks. The proof of the adiabatic invariance of vl in this system is an instructive elementary problem.

Appendix 1: Riemannian curvature

From a sheet of paper, one can form a cone or a cylinder, but it is impossible to obtain a piece of a sphere without folding, stretching, or cutting. The reason lies in the difference between the “intrinsic geometries” of these surfaces: no part of the sphere can be isometrically mapped onto the plane.

The invariant which distinguishes riemannian metrics is called riemannian curvature. The riemannian curvature of a plane is zero, and the curvature of a sphere of radius R is equal to R⁻². If one riemannian manifold can be isometrically mapped to another, then the riemannian curvature at corresponding points is the same. For example, since a cone or cylinder is locally isometric to the plane, the riemannian curvature of the cone or cylinder at any point is equal to zero. Therefore, no region of a cone or cylinder can be mapped isometrically to a sphere.

The riemannian curvature of a manifold has a very important influence on the behavior of geodesics on it, i.e., on motion in the corresponding dynamical system. If the riemannian curvature of a manifold is positive (as on a sphere or ellipsoid), then nearby geodesics oscillate about one another in most cases, and if the curvature is negative (as on the surface of a hyperboloid of one sheet), geodesics rapidly diverge from one another.

In this appendix we define riemannian curvature and briefly discuss the properties of geodesics on manifolds of negative curvature. A further treatment of riemannian curvature can be found in the book, “Morse Theory” by John Milnor, Princeton University Press, 1963, and a treatment of geodesics on manifolds of negative curvature in D. V. Anosov’s book, “Geodesic flows on closed riemannian manifolds with negative curvature,” Proceedings of the Steklov Institute of Mathematics, No. 90 (1967), Am. Math. Soc., 1969.

A Parallel translation on surfaces

The definition of riemannian curvature is based on the construction of parallel translation of vectors along curves on a riemannian manifold.

We begin with the case when the given riemannian manifold is two-dimensional, i.e., a surface, and the given curve is a geodesic on this surface. [See do Carmo, Manfredo Perdigao, “Differential Geometry of Curves and Surfaces,” Prentice-Hall, 1976. (Translator’s note)]

Parallel translation of a vector tangent to the surface along a geodesic on this surface is defined as follows: the point of origin of the vector moves along the geodesic, and the vector itself moves continuously so that its angle with the geodesic and its length remain constant. By translating to the endpoint of the geodesic all vectors tangent to the surface at the initial point, we obtain a map from the tangent plane at the initial point to the tangent plane at the endpoint. This map is linear and isometric.

We now define parallel translation of a vector on a surface along a broken line consisting of several geodesic arcs (Figure 230). In order to translate a vector along a broken line, we translate it from the first vertex to the second along the first geodesic arc, then translate this vector along the second arc to the next vertex, etc.

Figure 230

Parallel translation along a broken geodesic

Problem. Given a vector tangent to the sphere at one vertex of a spherical triangle with three right angles, translate this vector around the triangle and back to the same vertex.

Answer. As a result of this translation the tangent plane to the sphere at the initial vertex will be turned by a right angle.

Finally, parallel translation of a vector along any smooth curve on a surface is defined by a limiting procedure, in which the curve is approximated by broken lines consisting of geodesic arcs.

Problem. Translate a vector directed towards the North Pole and located at Leningrad (latitude λ = 60°) around the 60th parallel and back to Leningrad, moving to the east.

Answer. The vector turns through the angle 2π (1 − sin λ), i.e., approximately 50° to the west. Thus the size of the angle of rotation is proportional to the area bounded by our parallel, and the direction of rotation coincides with the direction the origin of the vector is going around the North Pole.

Hint. It is sufficient to translate the vector along the same circle on the cone formed by the tangent lines to the meridian, going through all the points of the parallel (Figure 231). This cone then can be unrolled onto the plane, after which parallel translation on its surface becomes ordinary parallel translation on the plane.

Figure 231
Parallel translation on the sphere

Example. We consider the upper half-plane y > 0 of the plane of complex numbers z = x + iy with the metric

It is easy to compute that the geodesics of this two-dimensional riemannian manifold are circles and straight lines perpendicular to the x-axis. Linear fractional transformations with real coefficients
are isometric transformations of our manifold, which is called the Lobachevsky plane.

Problem. Translate a vector directed along the imaginary axis at the point z = i to the point z = t + i along the horizontal line (dy = 0) (Figure 232).

Figure 232
Parallel translation on the Lobachevsky plane

Answer. Under translation by t the vector turns t radians in the direction from the y-axis towards the x-axis.

B The curvature form

We will now define the riemannian curvature at each point of a two-dimensional riemannian manifold (i.e., a surface). For this purpose, we choose an orientation of our surface in a neighborhood of the point under consideration and consider parallel translation of vectors along the boundary of a small region D on our surface. It is easy to calculate that the result of such a translation is rotation by a small angle. We denote this angle by φ(D) (the sign of the angle is fixed by the choice of orientation of the surface).

If we divide the region D into two parts D₁ and D₂, the result of parallel translation along the boundary of D can be obtained by first going around one part, and then the other. Thus,

i.e., the angle φ is an additive function of regions. When we change the direction of travel along the boundary, the angle φ changes sign. It is natural therefore to represent φ(D) as the integral over D of a suitable 2-form. Such a 2-form in fact exists; it is called the curvature form, and we denote it by Ω. Thus we define the curvature form Ω by the relation

(1)

The value of Ω on a pair of tangent vectors ξ, η in TM_x can be defined in the following way. We identify a neighborhood of the point 0 in the tangent space to M at x with a neighborhood of the point x on M (using, for example, some local coordinate system). We can then construct on M the parallelogram Π_ε spanned by the vectors εξ, εη, at least for sufficiently small ε.

Now the value of the curvature form on our vectors is defined by the formula

(2)

In other words, the value of the curvature form on a pair of tangent vectors is equal to the angle of rotation under translation along the infinitely small parallelogram determined by these vectors.

Problem. Find the curvature forms on the plane, on a sphere of radius R, and on the Lobachevsky plane.

Answer. Ω = 0, Ω = R⁻² dS, Ω = −dS, where the 2-form dS is the area element on our oriented surface.

Problem. Show that the function defined by formula (2) is really a differential 2-form, independent of the arbitrary choice involved in the construction, and that the rotation of a vector under translation along the boundary of a finite oriented region D is expressed, in terms of this form, by formula (1).

Problem. Show that the integral of the curvature form over any convex surface in three-dimensional euclidean space is equal to 4π.

C The riemannian curvature of a surface

We note that every differential 2-form on a two-dimensional oriented riemannian manifold M can be written in the form ρdS, where dS is the oriented area element and ρ is a scalar function uniquely determined by the choice of metric and orientation.

In particular, the curvature form can be written in the form

where K : M → ℝ is a smooth function on M and dS is the area element.

The value of the function K at a point x is called the riemannian curvature of the surface at x.

Problem. Calculate the riemannian curvature of the euclidean space, the sphere of radius R, and the Lobachevsky plane.

Answer. K = 0, K = R⁻², K = − 1.

Problem. Show that the riemannian curvature does not depend on the orientation of the manifold, but only on its metric.

Hint. The 2-forms Ω and dS both change sign under a change of orientation.

Problem. Show that, for surfaces in ordinary three-dimensional euclidean space, the riemannian curvature at every point is equal to the product of the inverses of the principal radii of curvature (with minus sign if the centers of curvature lie on opposite sides of the surface).

We note that the sign of a manifold’s curvature at a point does not depend on the orientation of the manifold; this sign may be defined without using the orientation at all.

Namely, on manifolds of positive curvature, a vector parallel translated around the boundary of a small region turns around its origin in the same direction as the point on the boundary goes around the region; on manifolds of negative curvature the direction of rotation is opposite.

We note further that the value of the curvature at a point is determined by the metric in a neighborhood of this point, and therefore is preserved under bending: the curvature is the same at corresponding points of isometric surfaces. Hence, riemannian curvature is also called intrinsic curvature.

The formulas for computing curvature in terms of components of the metric in some coordinate system involve the second derivatives of the metric and are rather complicated: cf. the problems in Section G below.

D Higher-dimensional parallel translation

The construction of parallel translation on riemannian manifolds of dimension greater than two is somewhat more complicated than the two-dimensional construction presented above. The reason is that in these dimensions the direction of the vector being translated is no longer determined by the condition that the angle with a geodesic be invariant. In fact, the vector could rotate around the direction of the geodesic while preserving its angle with the geodesic.

The refinement which we must introduce into the construction of parallel translation along a geodesic is the choice of a two-dimensional plane passing through the tangent to the geodesic, which must contain the translated vector. This choice is made in the following (unfortunately complicated) way.

At the initial point of a geodesic the needed plane is the plane spanned by the vector to be translated and the direction vector of the geodesic. We look at all geodesics proceeding from the initial point, in directions lying in this plane. The set of all such geodesics (close to the initial point) forms a smooth surface which contains the geodesic along which we intend to translate the vector (Figure 233).

Figure 233

Parallel translation in space

Consider a new point on the geodesic at a small distance Δ from the initial point. The tangent plane at the new point to the surface described above contains the direction of the geodesic at this new point. We take this new point as the initial point and use its tangent plane to construct a new surface (formed by the bundle of geodesics emanating from the new point). This surface contains the original geodesic. We move along the original geodesic again by Δ and repeat the construction from the beginning.

After a finite number of steps we can reach any point of the original geodesic. As a result of our work we have, at every point of the geodesic, a tangent plane containing the direction of the geodesic. This plane depends on the length Δ of the steps in our construction. As Δ → 0 the family of tangent planes obtained converges (as can be calculated) to a definite limit. As a result we have a field of two-dimensional tangent planes along our geodesic containing the direction of the geodesic and determined in an intrinsic manner by the metric on the manifold.

Now parallel translation of our vector along a geodesic is defined as in the two-dimensional case: under translation the vector must remain in the planes described above; its length and its angle with the direction of the geodesic must be preserved. Parallel translation along any curve is defined using approximations by geodesic polygons, as in the two-dimensional case.

Problem. Show that parallel translation of vectors from one point of a riemannian manifold to another along a fixed path is a linear isometric operator from the tangent space at the first point to the tangent space at the second point.

Problem. Parallel translate any vector along the line
in a Lobachevsky space with metric
Answer. Vectors in the directions of the x₁ and y axes are rotated by angle τ in the plane spanned by them (rotation is in the direction from the y-axis towards the x₁-axis); vectors in the x₂-direction are carried parallel to themselves in the sense of the euclidean metric.

E The curvature tensor

We now consider, as in the two-dimensional case, parallel translation along small closed paths beginning and ending at a point of a riemannian manifold. Parallel translation along such a path returns vectors to the original tangent space. The map of the tangent space to itself thus obtained is a small rotation (an orthogonal transformation close to the identity).

In the two-dimensional case we characterized this rotation by one number—the angle of rotation φ. In higher dimensions a skew-symmetric operator plays the role of φ. Namely, any orthogonal operator A which is close to the identity can be written in a natural way in the form
where Φ is a small skew-symmetric operator.

Problem. Compute Φ if A is a rotation of the plane through a small angle φ.

Answer.

Unlike in the two-dimensional case, the function Φ is not generally additive (since the orthogonal group of n-space for n > 2 is not commutative). Nevertheless, we can construct a curvature form using Φ, describing the “infinitely small rotation caused by parallel translation around an infinitely small parallelogram” in the same way as in the two-dimensional case, i.e., using formula (2).

Thus, let ξ and η in TM_x be vectors tangent to the riemannian manifold M at the point x. Construct a small curvilinear parallelogram Π_ε on M (the sides of the parallelogram Π_ε are obtained from the vectors εξ and εη by a coordinate identification of a neighborhood of zero in TM_x with a neighborhood of x in M). We will look at parallel translation along the sides of the parallelogram Π_ε (we begin the circuit at ξ).

The result of translation will be an orthogonal transformation of TM_x, close to the identity. It differs from the identity transformation by a quantity of order ε² and has the form

where Ω is a skew-symmetric operator depending on ξ and η. Therefore, we can define a function Ω of pairs of vectors ξ, η in the tangent space at x with values in the space of skew-symmetric operators on TM_x by the formula

Problem. Show that the function Ω is a differential 2-form (with values in the skew-symmetric operators on TM_x) and does not depend on the choice of coordinates we used to identify TM_x and M.

The form Ω is called the curvature tensor of the riemannian manifold. We could say that the curvature tensor describes the infinitesimal rotation in the tangent space obtained by parallel translation around an infinitely small parallelogram.

F Curvature in a two-dimensional direction

Consider a two-dimensional subspace L in the tangent space to a riemannian manifold at some point. We take geodesics emanating from this point in all the directions in L. These geodesics form a smooth surface close to our point. The surface constructed lies in the riemannian manifold and has an induced riemannian metric.

By the curvature of a riemannian manifold M in the direction of a 2-plane L in the tangent space to M at a point x, we mean the riemannian curvature at x of the surface described above.

Problem. Find the curvatures of a three-dimensional sphere of radius R and of Lobachevsky space in all possible two-dimensional directions.

Answer. R⁻², −1.

In general, the curvatures of a riemannian manifold in different two-dimensional directions are different. Their dependence on the direction is described by formula (3) below.

Theorem. The curvature of a riemannian manifold in the two-dimensional direction determined by a pair of orthogonal vectors ξ, η of length 1 can be expressed in terms of the curvature tensor Ω by the formula
(3)
where the brackets denote the scalar product giving the riemannian metric.

The proof is obtained by comparing the definitions of the curvature tensor and of curvature in a two-dimensional direction. We will not go into it in a rigorous way. It is possible to take formula (3) for the definition of the curvature K.

G Covariant differentiation

Connected with parallel translation along curves in a riemannian manifold is a particular differential calculus—so-called covariant differentiation, or the riemannian connection. We define this differentiation in the following way.

Let ξ be a vector tangent to a riemannian manifold M at a point x, and v a vector field given on M in a neighborhood of x. The covariant derivative of the field v in the direction ξ is defined by using any curve passing through x with velocity ξ. After moving along this curve for a small interval of time t, we find ourselves at a new point x(t). We take the vector field v at this point x(t) and parallel translate it backwards along the curve to the original point x. We obtain a vector depending on t in the tangent space to M at x. For t = 0 this vector is v(x), and for other t it changes according to the non-parallelness of the vector field v along our curve in the direction ξ.

Consider the derivative of the resulting vector with respect to t, evaluated at t = 0. This derivative is a vector in the tangent space TM_x. It is called the covariant derivative of the field v along ξ and is denoted by ∇_ξv. It is easy to verify that the vector ∇_ξv does not depend on the choice of curve specified in the definition, but only on ξ and v.

Problem 1. Prove the following properties of covariant differentiation:

1.

∇_ξv is a bilinear function of ξ and v.

2.

∇_ξ fv = (L_ξ f)v + f(x)∇_ξ v, where f is a smooth function and L_ξ f is the derivative of f in the direction of the vector ξ inTM_x.

3.

L_ξ〈v, w〉 = 〈∇_ξv, w(x)〉 + 〈v(x), ∇_ξw〉.

4.

∇_v(x) w − ∇_w(x) v = [w, v](x) (where

Problem 2. Show that the curvature tensor can be expressed in terms of covariant differentiation in the following way:
where ξ, η, ζ are any vector fields whose values at the point under consideration are ξ₀, η₀, and ζ₀.

Problem 3. Show that the curvature tensor satisfies the following identities:

Problem 4. Suppose that the riemannian metric is given in local coordinates x₁,..., x_n by the symmetric matrix g_ij:

Denote by e₁,..., e_n the coordinate vector fields (so that differentiation in the direction e_i is ∂_i = ∂/∂x_i). Then covariant derivatives can be calculated using the formulas in Problem 1 and the following formulas:
where (g^lk) is the inverse matrix to (g_kl).

By using the expression for the curvature tensor in terms of the connection in Problem 2, we also obtain an explicit formula for the curvature. The numbers R_jjkl = 〈Ω(e_j, e_j)e_k, e_l〉 are called the components of the curvature tensor.

H The Jacobi equation

The riemannian curvature of a manifold is closely connected with the behavior of its geodesics. In particular, let us consider a geodesic passing through some point in some direction, and alter slightly the initial conditions, i.e., the initial point and initial direction. The new initial conditions determine a new geodesic. At first this geodesic differs very little from the original geodesic. To investigate the divergence it is useful to linearize the differential equation of geodesics close to the original geodesic. The second-order linear differential equation thus obtained (“the variational equation” for the equation of geodesics) is called the Jacobi equation; it is convenient to write it in terms of covariant derivatives and curvature tensors.

We denote by x(t) a point moving along a geodesic in the manifold M with velocity (of constant magnitude) v(t) ∈ TM_x(t). If the initial condition depends smoothly on a parameter α, then the geodesic also depends smoothly on the parameter. Consider the motion corresponding to a value of α. We denote the position of a point at time t on the corresponding geodesic by x(t, α) ∈ M. We will assume that the initial geodesic corresponds to the zero value of the parameter, so that x(t, 0) = x(t).

The vector field of geodesic variation is the derivative of the function x(t, α) with respect to α, evaluated at α = 0; the value of this field at the point x(t) is equal to

To write the variational equation, we define the covariant derivative with respect to t of a vector field ζ(t) given on the geodesic x(t). To define this, we take the vector ζ(t + h), parallel translate it from the point x(t + h) to x(t) along the geodesic, differentiate the vector obtained in the tangent space TM_x(t) with respect to h and evaluate at h = 0. The result is a vector in TM_x(t), which is called the covariant derivative of the field ζ(t) with respect to t, and denoted by Dζ/Dt.

Theorem The vector field of geodesic variation satisfies the second-order linear differential equation

(4)

where Ω is the curvature tensor, and v = v(t) is the velocity vector of motion along the original geodesic.

Conversely, every solution of the differential equation (4) is a field of variation of the original geodesic.

Equation (4) is called the Jacobi equation.

Problem. Prove the theorem above.

Problem. Let M be a surface, y(t) the magnitude of the component of the vector ξ(t) in the direction normal to a given geodesic, and let the length of the vector v(t) be equal to 1. Show that y satisfies the differential equation
(5)
where K = K(t) is the riemannian curvature at the point x(t)

Problem. Using Equation (5), compare the behavior of geodesics close to a given one on the sphere (K = +R⁻²) and on the Lobachevsky plane (K = −1).

I Investigation of the Jacobi equation

In investigating the variational equations, it is useful to disregard the trivial variations, i.e., changes of the time origin and of the magnitude of the initial velocity of motion. To this end we decompose the variation vector ξ into components parallel and perpendicular to the velocity vector v. Then (since Ω(v, v) = 0 and since the operator Ω(v, ξ) is skew-symmetric) for the normal component we again get the Jacobi equation, and for the parallel component we get the equation

We now note that the Jacobi equation for the normal component can be written in the form of “Newton’s equation”

where the quadratic form U of the vector ξ is expressed in terms of the curvature tensor and is proportional to the curvature K in the direction of the (ξ, v) plane:

Thus the behavior of the normal component of the variation vector of a geodesic with velocity 1 can be described by the equation of a (non-autonomous) linear oscillator whose potential energy is equal to the product of the curvature in the direction of the plane of velocity vectors and variations with the square of the length of the normal component of the variation.

In particular we consider the case when the curvature is negative in all two-dimensional directions containing the velocity vector of the geodesic (Figure 234). Then the divergence of nearby geodesics from the given one in the normal direction can be described by the equation of an oscillator with negative definite (and time-dependent) potential energy. Therefore, the normal component of divergence for nearby geodesics behaves like the divergence of a ball, located near the top of a hill, from the top. The equilibrium position of the ball at the top is unstable. This means that geodesics near the given geodesic will diverge exponentially from it.

Figure 234

Nearby geodesics on manifolds of positive and negative curvature

If the potential energy of the newtonian equation we obtained did not depend on time, our conclusion would be rigorous. Let us assume further that the curvature in the different directions containing v is in the interval
Then solutions to the Jacobi equation for normal divergence will be linear combinations of exponential curves with exponent ±λ_i, where the positive numbers λ_i are between a and b. Therefore, every solution to the Jacobi equation grows at least as fast as e^b|t| as either t → + ∞ or t → − ∞; most solutions grow even faster, with rate e^a|t|.

The instability of an equilibrium position under negative definite potential energy is intuitively obvious also in the non-autonomous case. It can be proven by comparison with a corresponding autonomous system. As a result of such a comparison we may convince ourselves that under motion along a geodesic, all solutions of the Jacobi equation for normal divergence on a manifold of negative curvature grow at least as fast as an exponential function of the distance traveled, whose exponent is equal to the square root of the absolute value of the curvature in the two-dimensional direction for which this absolute value is minimal. In fact, most solutions grow even faster, but we cannot now assert that the exponent of growth for most solutions is determined by the direction in which the absolute value of the negative curvature is largest.

In summary, we can say that the behavior of geodesics on a manifold of negative curvature is characterized by exponential instability. For numerical estimates of this instability, it is useful to define the characteristic path length s as the average path length on which small errors in the initial conditions are increased e times.

More precisely, the characteristic path length s can be defined as the inverse of the exponent λ which characterizes the growth of the solution to the Jacobi equation for normal divergence from the geodesic proceeding with velocity 1:

In general, the exponent λ and the path s depend on the initial geodesic.

If the curvature of our manifold in all two-dimensional directions is bounded away from zero by the number −b², then the characteristic path length is less than or equal to b⁻¹. Thus as the curvature of a manifold gets more negative, the characteristic path length s, on which the instability of geodesics is reduced to e-fold growth of error, gets smaller. In view of the exponential character of the growth of error, the course of a geodesic on a manifold of negative curvature is practically impossible to predict.

Assume, for example, that the curvature is negative and bounded away from zero by −4m⁻². The characteristic path length is less than or equal to half a meter, i.e., on a geodesic arc five meters long the error grows by approximately e¹⁰ ∼ 10⁴. Therefore, an error of a tenth of a millimeter in the initial conditions shows up in the form of a one-meter difference at the end of the geodesic.

J Geodesic flows on compact manifolds of negative curvature

Let M be a compact riemannian manifold whose curvature at every point in every two-dimensional direction is negative. (Such manifolds exist.) Consider the inertial motion of a point of mass 1 on M, without any external forces. The lagrangian function of this system is equal to the kinetic energy, which is equal to the total energy and is a first integral of the equations of motion.

If M has dimension n, then each energy level manifold has dimension 2n − 1. This manifold is a submanifold of the tangent bundle of M. For example, we can fix the value of the energy at

(which corresponds to initial velocity 1). Then the velocity vector of the point has length constantly equal to 1, and our level manifold turns out to be the fiber bundle

consisting of the unit spheres in the tangent spaces to M at every point.

Thus, a point of the manifold T₁M is represented as a vector of length 1 at a point of M. By the Maupertuis-Jacobi principle, we can describe the motion of a point mass with fixed initial conditions in the following way: the point moves with velocity 1 along the geodesic determined by the indicated vector.

By the law of conservation of energy the manifold T₁M is an invariant manifold in the phase space of our system. Therefore, our phase flow determines a one-parameter group of diffeomorphisms on the (2n − 1)-dimensional manifold T₁M. This group is called the geodesic flow on M. The geodesic flow can be described as follows: the transformation at time t carries the unit vector ξ ∈ T₁M located at the point x, to the unit velocity vector of the geodesic coming from x in the direction ξ, located at the point at distance t from x. We note that there is a naturally defined volume element on T₁M and that the geodesic flow preserves it (Liouville’s theorem).

Up to now we have not used the negative curvature of the manifold M. But if we investigate the trajectories of the geodesic flow, it turns out that the negative curvature of M has a strong impact on the behavior of these trajectories (this is related to the exponential instability of geodesics on M).

Here are some properties of geodesic flows on manifolds of negative curvature (for further details, see the book of D. V. Anosov cited earlier).

Almost all phase trajectories are dense in the energy level manifold (the exceptional non-dense trajectories form a set of measure zero).

Uniform distribution: the amount of time which almost every trajectory spends in any region of the phase space T₁M is proportional to the volume of the region.

The phase flow g^t has the mixing property: if A and B are two regions, then

(where mes denotes the volume, normalized by the condition that the whole space have measure 1).

From these properties of trajectories in phase space follow analogous statements about geodesics on the manifold itself. Physicists call these properties “stochastic”: asymptotically for large t the trajectories behave as if the point were random. For example, the mixing property means that the probability of turning up in B at a time t long after exiting from A is proportional to the volume of B.

Thus, the exponential instability of geodesics on manifolds of negative curvature leads to the stochasticity of the corresponding geodesic flow.

K Other applications of exponential instability

The exponential instability property of geodesics on manifolds of negative curvature has been studied by many authors, beginning with Hadamard (and, in the case of constant curvature, also by Lobachevsky), but especially by E. Hopf. An unexpected discovery of the 1960s in this area was the surprising stability of exponentially unstable systems with respect to perturbations of the systems themselves.

Consider, for example, the vector field giving the geodesic flow on a compact surface of negative curvature. As we showed above, the phase curves of this flow are arranged in a complicated way: almost every one of them is dense in the three-dimensional energy level manifold. The flow has infinitely many closed trajectories, and the set of points on closed trajectories is also dense in the three-dimensional energy level manifold.

We now consider a nearby vector field. It turns out that, in spite of the complexity of the picture of phase curves, the entire picture with dense phase curves and infinitely many closed trajectories hardly changes at all if we pass to the nearby field. In fact, there is a homeomorphism close to the identity transformation which takes the phase curves of the unperturbed flow to the phase curves of the perturbed flow.

Thus our complicated phase flow has the same property of “structural stability” as a limit cycle, or a stable focus in the plane. We note that neither a center in the plane nor a winding of the torus has this property of structural stability: the topological type of the phase portrait in these cases changes for arbitrarily small changes in the vector field.

The existence of structurally stable systems with complicated motions, each of which is in itself exponentially unstable, is one of the basic discoveries of recent years in the theory of ordinary differential equations (the conjecture that geodesic flows on manifolds of negative curvature are structurally stable was made by S. Smale in 1961, and the proof was given by D. V. Anosov and published in 1967; the basic results on stochasticity of these flows were obtained by Ya. G. Sinai and D. V. Anosov, also in the 1960s).

Before these works most mathematicians believed that in systems of differential equations in “general form” only the simplest stable limiting behaviors were possible: equilibrium positions and cycles. If a system was more complicated (for example, if it was conservative), then it was assumed that after a small change in its equations (for example, after imposing small non-conservative perturbations) complicated motions are “dispersed” into simple ones. We now know that this is not so, and that in the function space of vector fields there are whole regions consisting of fields with more complicated behavior of phase curves.

The conclusions which follow from this are relevant to a wide range of phenomena, in which “stochastic” behavior of deterministic objects is observed.

Namely, suppose that in the phase space of some (non-conservative) system there is an attracting invariant manifold (or set) in which the phase curves have the property of exponential instability. We now know that systems with such a property are not exceptional: under small changes of the system this property must persist. What is seen by an experimenter observing motions of such a system?

The approach of phase curves to an attracting set will be interpreted as the establishment of some sort of limiting conditions. The further motion of a phase point near the attracting set will involve chaotic, unpredictable changes of “phase” of the limiting behavior, perceptible as” stochasticity” or “turbulence.”

Unfortunately, no convincing analysis from this point of view has yet been developed for physical examples of a turbulent character. A primary example is the hydrodynamic instability of a viscous fluid, described by the so-called Navier-Stokes equations. The phase space of this problem is infinite-dimensional (it is the space of vector fields with divergence 0 in the domain of fluid flow), but the infinite-dimensionality of the problem is apparently not a serious obstacle, since the viscosity extinguishes the high harmonics (small vortices) faster and faster as the harmonics are higher and higher. As a result, the phase curves from the infinite-dimensional space seem to approach some finite-dimensional manifold (or set), to which the limit regime also belongs.

For large viscosity, we have a stable attracting equilibrium position in the phase space (“stable stationary flow”). As the viscosity decreases it loses stability; for example, a stable limit cycle can appear in phase space (“periodic flow”) or a stable equilibrium position of a new type (“secondary stationary flow”).⁹⁶ As the viscosity decreases further, more and more harmonics come into play, and the limit regime can become ever higher in dimension.

For small viscosity, the approach to a limit regime with exponentially unstable trajectories seems very likely. Unfortunately, the corresponding calculations have not yet been carried out due to the limited capacity of existing computers. However, the following general conclusion can be drawn without any calculations: turbulent phenomena may appear even if solutions exist and are unique; exponential instability, which is encountered even in deterministic systems with a finite number of degrees of freedom, is sufficient.

As one more example of an application of exponential instability we mention the proof announced by Ya. G. Sinai of the “ergodic hypothesis” of Boltzmann for systems of rigid balls. The hypothesis is that the phase flow corresponding to the motion of identical absolutely elastic balls in a box with elastic walls is ergodic on connected energy level sets. (Ergodicity means that almost every phase curve spends an amount of time in every measurable piece of the level set proportional to the measure of that piece.)

Boltzmann’s hypothesis allows us to replace time averages by space averages, and was for a long time considered to be necessary to justify statistical mechanics. In reality, Boltzmann’s hypothesis (in which it is a question of a limit as time approaches infinity) is not necessary for passing to the statistical limit (the number of pieces approaches infinity). However, Boltzmann’s hypothesis inspired the entire analysis of the stochastic properties of dynamical systems (so-called ergodic theory), and its proof serves as a measure of the maturity of this theory.

The exponential instability of trajectories in Boltzmann’s problem arises as a result of collisions of the balls with one another, and can be explained in the following way. For simplicity, we will consider a system of only two particles in the plane, and will represent a square box with reflection off the walls by the planar torus {(x, y)mod 1}. Then we can consider one of the particles as stationary (using the conservation of momentum); the other particle can be considered as a point.

In this way we arrive at the model problem of motion of a point on a toral billiard table with a circular wall in the middle from which the point is reflected according to the law “the angle of incidence is equal to the angle of reflection” (Figure 235).

Figure 235

Torus-shaped billiard table with scattering by a circular wall

To investigate this system we look at an analogous billiard table bounded on the outside by a planar convex curve (e.g., the motion of a point inside an ellipse). Motion on such a billiard table can be considered as the limiting case of the geodesic flow on the surface of an ellipsoid. Passage to the limit consists of decreasing the smallest axis of the ellipsoid to zero. As a result, geodesics on the ellipsoid become billiard trajectories on the ellipse. We discover from this that the ellipse can reasonably be thought of as two-sided and that, under every reflection, the geodesic goes from one side of the ellipse to the other.

We now return to our toral billiard table. Motion on it can be looked at as the limiting case of the geodesic flow on a smooth surface. This surface is obtained from looking at the torus with a hole as a two-sided surface, giving it some thickness and slightly smoothing the sharp edge. As a result we have a surface with the topology of a pretzel (a sphere with two handles).

After blowing up the ellipse into the ellipsoid we obtain a surface of positive curvature; after blowing up the torus with a hole we get a surface of negative curvature (in both cases the curvature is concentrated close to the edge, but the blowing up can be done so that the sign of the curvature does not change). Thus motion in our toral billiard table can be looked at as the limiting case of motion along geodesics on a surface of negative curvature.

Now, to prove Boltzmann’s conjecture (in the simple case under consideration) it is sufficient to verify that the analysis of stochastic properties of geodesic flows on surfaces of negative curvature holds in the indicated limiting case.

A more detailed presentation of the proof turns out to be very complicated; it has been published only for the case of systems of two particles (Ya. G. Sinai, Dynamical systems with elastic reflections, Russian Mathematical Surveys, 25, no. 2 (1970), 137–189).

A more detailed account of loss of stability is given in “Lectures on bifurcations and versal families,” Russian Math. Surveys 27, no. 5 (1972), 55–123.

Appendix 2: Geodesics of left-invariant metrics on Lie groups and the hydrodynamics of ideal fluids

Eulerian motion of a rigid body can be described as motion along geodesics in the group of rotations of three-dimensional euclidean space provided with a left-invariant riemannian metric. A significant part of Euler’s theory depends only upon this invariance, and therefore can be extended to other groups.

Among the examples involving such a generalized Euler theory are motion of a rigid body in a high-dimensional space and, especially interesting, the hydrodynamics of an ideal (incompressible and inviscid) fluid. In the latter case, the relevant group is the group of volume-preserving diffeomorphisms of the domain of fluid flow. In this example, the principle of least action implies that the motion of the fluid is described by the geodesics in the metric given by the kinetic energy. (If we wish, we can take this principle to be the mathematical definition of an ideal fluid.) It is easy to verify that this metric is (right) invariant.

Of course, extending results obtained for finite-dimensional Lie groups to the infinite-dimensional case should be done with care. For example, in three-dimensional hydrodynamics an existence and uniqueness theorem for solutions of the equations of motion has not yet been proved. Nevertheless, it is interesting to see what conclusions can be drawn by formally carrying over properties of geodesics on finite-dimensional Lie groups to the infinite-dimensional case. These conclusions take the character of a priori statements (identities, inequalities, etc.) which should be satisfied by all reasonable solutions. In some cases, the formal conclusions can then be rigorously justified directly, without infinite-dimensional analysis.

For example, the Euler equations of motion for a rigid body have as their analogue in hydrodynamics the Euler equations of motion of an ideal fluid. Euler’s theorem on the stability of rotations around the large and small axes of the inertia ellipsoid corresponds in hydrodynamics to a slight generalization of Rayleigh’s theorem on the stability of flows without inflection points of the velocity profile.

It is also easy to extract from Euler’s formulas an explicit expression for the riemannian curvature of a group with a one-sided invariant metric. Applying this to hydrodynamics we find the curvature of the group of diffeomorphisms preserving the volume element. It is interesting to note that in sufficiently nice two-dimensional directions, the curvature turns out to be finite and, in many cases, negative. Negative curvature implies exponential instability of geodesics (cf. Appendix 1). In the case under consideration, the geodesics are motions of an ideal fluid; therefore the calculation of the curvature of the group of diffeomorphisms gives us some information on the instability of ideal fluid flow. In fact, the curvature determines the characteristic path length on which differences between initial conditions grow by e. Negative curvature leads to practical indeterminacy of the flow: on a path only a few times longer than the characteristic path length, a deviation in initial conditions grows 100 times larger.

In this appendix, we will briefly set out the results of calculations related to geodesics on groups with one-sided (right- or left-) invariant metrics. Proofs and further details can be found in the following places:

V. Arnold, Sur la géométrie différentielle des groupes de Lie de dimension infinie et ses applications à l’hydrodynamique des fluides parfaits. Annales de l’Institut Fourier, XVI, no. 1 (1966), 319–361.

V. I. Arnold, An a priori estimate in the theory of hydrodynamic stability, Izv. Vyssh. Uchebn. Zaved. Matematika 1966, no. 5 (54), 3–5. (Russian)

V. I. Arnold, The Hamiltonian nature of the Euler equations in the dynamics of a rigid body and of an ideal fluid, Uspekhi Matematicheskikh Nauk, 24 (1969), no. 3 (147) 225–226. (Russian)

L. A. Dikii, A remark on Hamiltonian systems connected with the rotation group, Functional Analysis and Its Applications, 6:4 (1972) 326–327.

D. G. Ebin, J. Marsden, Groups of diffeomorphisms and the motion of an incompressible fluid, Annals of Math. 92, no. 1 (1970), 102–163.

O. A. Ladyzhenskaya, On the local solvability of non-stationary problems for incompressible ideal and viscous fluids and vanishing viscosity, Boundary problems in mathematical physics, v. 5 (Zapiski nauchnikh seminarov LOMI, v. 21), “Nauka,” 1971, 65–78. (Russian)

A. S. Mishchenko, Integrals of geodesic flows on Lie groups, Functional Analysis and Its Applications, 4, no. 3 (1970), 232–235.

A. M. Obukhov, On integral invariants in systems of hydrodynamic type, Doklady Acad. Nauk. 184, no. 2 (1969). (Russian)

L. D. Faddeev, Towards a stability theory of stationary planar-parallel flows of an ideal fluid, Boundary problems in mathematical physics, v. 5 (Zapiski nauchnikh seminarov LOMI, v. 21), “Nauka,” 1971, 164–172. (Russian)

A Notation: The adjoint and co-adjoint representations

Let G be a real Lie group and

its Lie algebra, i.e., the tangent space to the group at the identity provided with the commutator bracket operation [ , ].

A Lie group acts on itself by left and right translation: every element g of the group G defines diffeomorphisms of the group onto itself:

The induced maps of the tangent spaces will be denoted by

for every h in G.

The diffeomorphism R_g⁻¹, L_g is an inner automorphism of the group. It leaves the group identity element fixed. Its derivative at the identity is a linear map from the algebra (i.e., the tangent space to the group at the identity) to itself. This map is denoted by

and is called the adjoint representation of the group. It is easy to verify that Ad_g is an algebra homomorphism, i.e., that

It is also clear that Ad_gh = Ad_gAd_h.

We can consider Ad as a map of the group into the space of linear operators on the algebra:

The map Ad is differentiable. Its derivative at the identity of the group is a linear map from the algebra

to the space of linear operations on

. This map is denoted by ad, and its image on an element ξ in the algebra by ad_ξ. Thus ad_ξ is an endomorphism of the algebra space, and we have

where e^tξ is the one-parameter group with tangent vector ξ. From the formula written above it is easy to deduce an expression for ad in terms of the algebra alone:

We now consider the dual vector space

to the Lie algebra

. This is the space of real linear functionals on the Lie algebra. In other words,

is the cotangent space to the group at the identity,

. The value of an element ξ of the cotangent space to the group at some point g on an element η of the tangent space at the same point will be denoted by round brackets:

Left and right translation induce operators on the cotangent space dual to L_g* and R_g*. We denote them by

for every h in G. These operators are defined by the identities

The transpose operators

, where g runs through the Lie group G, form a representation of this group, i.e., they satisfy the relations

This representation is called the co-adjoint representation of the group and plays an important role in all questions related to (left) invariant metrics on the group.

Consider the derivative of the operator

with respect to g at the identity. This derivative is a linear map from the algebra to the space of linear operators on the dual space to the algebra. This linear map is denoted by ad*, and its image on an element ξ in the algebra is denoted by

. Thus ad* is a linear operator on the dual space to the algebra,

It is easy to see that

is the adjoint of ad_ξ:

It is sometimes convenient to denote the action of ad* by braces:

Thus braces mean the bilinear function from

, related to commutation in the algebra by the identity

We consider now the orbits of the co-adjoint representation of the group in the dual space of the algebra. At each point of an orbit we have a natural symplectic structure (called the Kirillov form since A. A. Kirillov first used it to investigate representations of nilpotent Lie groups). Thus, the orbits of the co-adjoint representation are always even-dimensional. We also note that we obtain a series of examples of symplectic manifolds by looking at different Lie groups and all possible orbits.

The symplectic structure on the orbits of the co-adjoint representation is defined by the following construction. Let x be a point in the dual space to the algebra and ξ a vector tangent at this point to its orbit. Since

is a vector space, we can consider the vector ξ, which really belongs to the tangent space to

at x, as lying in

The vector ξ can be represented (in many ways) as the velocity vector of the motion of the point x under the co-adjoint action of the one-parameter group e^at with velocity vector

. In other words, every vector tangent to the orbit of x in the co-adjoint representation of the group can be expressed in terms of a suitable vector a in the algebra by the formula

Now we are ready to define the value of the symplectic 2-form Ω on a pair of vectors ξ₁, ξ₂ tangent to the orbit of x. Namely, we express ξ₁ and ξ₂ in terms of algebra elements a₁ and a₂ by the formula above, and then obtain the scalar

It is easy to verify that (1) the bilinear form Ω is well defined, i.e., its value does not depend on the choice of a_i; (2) Ω is skew-symmetric and therefore gives a differential 2-form Ω on the orbit; and (3) Ω is nondegenerate and closed (the proofs can be found, for instance, in Appendix 5). Thus the form Ω is a symplectic structure on an orbit of the co-adjoint representation.

B Left-invariant metrics

A riemannian metric on a Lie group G is called left-invariant if it is preserved by all left translations L_g, i.e., if the derivative of left translation carries every vector to a vector of the same length.

It is sufficient to give a left-invariant metric at one point of the group, for instance the identity; then the metric can be carried to the remaining points by left translations. Thus there are as many left-invariant riemannian metrics on a group as there are euclidean structures on the algebra.

A euclidean structure on the algebra is defined by a symmetric positive-definite operator from the algebra to its dual space. Thus, let

be a symmetric positive linear operator:

(It is not very important that A be positive, but in mechanical applications the quadratic form (Aξ, ξ) is positive-definite.)

We define a symmetric operator A_g : TG_g → T*G_g by left translation:

We thus obtain the following commutative diagram of linear operators:

We will denote by angled brackets the scalar product determined by the operator A_g:

This scalar product gives a riemannian metric on the group G, invariant under left translations. The scalar product in the algebra will be denoted simply by 〈 , 〉. We define an operation

by the identity

Clearly, this operation B is bilinear, and for fixed first argument is skew-symmetric in the second:

C Example

Let G = SO(3) be the group of rotations of three-dimensional euclidean space, i.e. the configuration space of a rigid body fixed at a point. A motion of the body is then described by a curve g = g(t) in the group. The Lie algebra of G is the three-dimensional space of angular velocities of all possible rotations. The commutator in this algebra is the usual vector product.

A rotation velocity ġ of the body is a tangent vector to the group at the point g. To get the angular velocity, we must carry this vector to the tangent space of the group at the identity, i.e. to the algebra. But this can be done in two ways: by left and right translation. As a result, we obtain two different vectors in the algebra:

These two vectors are none other than the “angular velocity in the body” and the “angular velocity in space.”

An element g of the group G corresponds to a position of the body obtained by the motion g from some initial state (corresponding to the identity element of the group and chosen abritrarily). Let ω be an element of the algebra.

Let e^ωt be a one-parameter group of rotations with angular velocity ω; ω is the tangent vector to this one-parameter group at the identity. Now we look at the displacement
obtained from the displacement g by a rotation with angular velocity ω after a small time τ. If the vector ġ coincides with the vector
then ω is called the angular velocity relative to space and is denoted by ω_s. Thus ω_s is obtained from ġ by right translation. In an analogous way we can show that the angular velocity in the body is the left translate of the vector ġ in the algebra.

The dual space

to the algebra in our example is the space of angular momenta.

The kinetic energy of a body is determined by the vector of angular velocity in the body and does not depend on the position of the body in space. Therefore, kinetic energy gives a left-invariant riemannian metric on the group. The symmetric positive-definite operator A_g : TG_g → T*G_g given by this metric is called the moment of inertia operator (or tensor). It is related to the kinetic energy by the formula

, where

is the value of A_g for g = e. The image of the vector ġ under the action of the moment of inertia operator A_g is called the angular momentum and is denoted by M = A_gġ. The vector M lies in the cotangent space to the group at the point g, and it can be carried to the cotangent space to the group at the identity by both left and right translations. We obtain two vectors

and

These vectors in the dual space to the algebra are none other than the angular momentum relative to the body (M_c) and the angular momentum relative to space (M_s). This follows easily from the expression for kinetic energy in terms of momentum and angular velocity:

By the principle of least action, the motion of a rigid body under inertia (with no external forces) is a geodesic in the group of rotations with the left-invariant metric described above.

We will now look at a geodesic of an arbitrary left-invariant riemannian metric on an arbitrary Lie group as a motion of a “generalized rigid body” with configuration space G. Such a “rigid body with group G” is determined by its kinetic energy, i.e., a positive-definite quadratic form on the Lie algebra. More precisely, we will consider geodesics of a left-invariant metric on a group G given by a quadratic form 〈ω, ω〉 on the algebra as motions of a rigid body with group G and kinetic energy 〈ω, ω〉/2.

To every motion t → g(t) of our generalized rigid body we can associate four curves:

called motions of the vectors of angular velocity and momentum in the body and in space. The differential equations which these curves satisfy were found by Euler for an ordinary rigid body. However, they are true in the most general case of an arbitrary group G, and we will call them the Euler equations for a generalized rigid body.

Remark. In the ordinary theory of a rigid body six different three-dimensional spaces ℝ³, ℝ³*,

, TG_g, and T*G_g are identified. The fact that the dimensions of the space ℝ³ in which the body moves and of the Lie algebra

of its group of motions are the same is an accident related to the dimension 3; in the n-dimensional case,

has dimension n(n − 1)/2.

The identification of the Lie algebra

with its dual space

has a more profound basis. The fact is that on the group of rotations there exists (and is unique up to multiplication) a two-sided invariant riemannian metric. This metric gives once and for all a preferred isomorphism of the vector spaces

and

(and also of TG_g and T*G_g). It allows us therefore to consider the vectors of angular velocity and momentum as lying in the same euclidean space. With this identification, the operation { , } is simply the commutator of the algebra, taken with a minus sign.

A two-sided invariant metric exists on any compact Lie group. Therefore, to study motions of rigid bodies with compact groups we may identify the spaces of angular velocities and momenta. However, we cannot make this identification for applications to non-compact (or infinite-dimensional) groups of diffeomorphisms.

D Euler’s equation

The results of Euler (obtained by him in the particular case G = SO(3)) can be formulated as the following theorems on the motion of the vectors of angular velocity and momentum of a generalized rigid body with group G.

Theorem 1. The vector of angular momentum relative to space is preserved under motion:

Theorem 2. The vector of angular momentum relative to the body satisfies Euler’s equation

These theorems are proved for a generalized rigid body in the same way as for an ordinary rigid body.

Remark 1. The vector of angular velocity in the body, ω_c, can be expressed linearly in terms of the vector of angular momentum in the body, M_c, by using the inverse of the inertia operator: ω_c = A⁻¹M_c. Therefore, Euler’s equation can be considered as an equation for the vector of angular momentum in the body alone; its right-hand side is quadratic in M_c.

We can also express this result in the following way. Consider the phase flow of our rigid body. (Its phase space T*G has dimension twice the dimension n of the group G or the space of angular momenta

.) Then this phase flow in a 2n-dimensional manifold factors over the flow given by Euler’s equation in the n-dimensional vector space

A factorization of a phase flow g^t on a manifold X over a phase flow f^t on a manifold Y is a smooth mapping π of X onto Y under which motions g^t are mapped to motions f^t, so that the following diagram commutes (i.e., πg^t = f^tπ):
In our case, X = T*G is the phase space of the body, is the space of angular momenta. The projection is defined by left translation is the phase flow of the body under consideration on the 2n-dimensional space T*G, and f^t is the phase flow of the Euler equation in the n-dimensional space of angular momenta .

In other words, a motion of the vector of angular momentum relative to the body depends only on the initial position of the vector of angular momentum relative to the body and does not depend on the position of the body in the space.

Remark 2. The law of conservation of the vector of angular momentum relative to space can be expressed by saying that every component of this vector in some coordinate system on the space

is conserved. We thus obtain a set of first integrals of the equations of motion of the rigid body. In particular, to every element of the Lie algebra

there corresponds a linear function on the space

and, therefore, a first integral. The Poisson brackets of first integrals given by functions on

are themselves functions on

, as can be seen easily. We thus obtain an (infinite-dimensional) extension of the Lie algebra

, consisting of all functions on

itself is included in this extension as the Lie algebra of linear functions on

. Of course, of all these first integrals of the phase flow in a 2n-dimensional space only n are functionally independent. As the n independent integrals we can take, for example, n linear functions on

which form a basis in

Because of possible infinite-dimensional applications, we would like to avoid coordinates and formulate statements about first integrals intrinsically. This can be done by reformulating Theorem 1 in the following way.

Theorem 3. The orbits of the co-adjoint representation of a group in the dual space to the algebra are invariant manifolds for the flow in this space given by Euler’s equation.

Proof. M_c(t) is obtained from M_s(t) by the action of the co-adjoint representation, and M_s(t) remains fixed. ☐

Example. In the case of an ordinary rigid body, the orbits of the co-adjoint representation of the group in the space of momenta are the spheres .

. In this case Theorem 3 is reduced to the law of conservation of the length of the angular momentum. It consists of the fact that, if the initial point M_c lies on some orbit (i.e., in the given case on the sphere M² = const), then all the points of its trajectory under the action of Euler’s equation lie on the same orbit.

We now return to the general case of an arbitrary group G and recall that each orbit of the co-adjoint representation has a symplectic structure (cf. subsection A). Furthermore, the kinetic energy of the body can be expressed in terms of the angular momentum relative to the body. As a result we obtain a quadratic form on the space of angular momenta

Let us fix some one orbit V of the co-adjoint representation. We consider the kinetic energy as a function on this orbit:

Theorem 4. On every orbit V of the co-adjoint representation, Euler’s equation is hamiltonian with hamiltonian function H.

Proof. Every vector ξ tangent to V at a point M has the form ξ = {f, M}. where . In particular, the vector field on the right side of Euler’s equation can be written in the form X = {dT, M} (here the differential of the function T at a point M of the vector space is considered as a vector of the dual space to , i.e., as an element of the Lie algebra ). It follows from the definitions of the symplectic structure Ω and the operation { , } (cf. subsection A) that for every vector ξ tangent to V at M,

Euler’s equation can be carried over from the dual space of the algebra to the algebra itself by inversion of the moment of inertia operator. As a result we obtain the following formulation of Euler’s equation in terms of the operation B (section B).

Theorem 5. The motion of the vector of angular velocity in the body is determined by the initial position of this vector and does not depend on the initial position of the body. The vector of angular velocity in the body satisfies an equation with quadratic right-hand side:

We will call this equation Euler’s equation for angular velocity. We notice that, under the action of the operator

, the orbits of the co-adjoint representation are carried to invariant manifolds of Euler’s equation for angular velocity; these manifolds have symplectic structure, etc. However, unlike orbits in

, these invariant manifolds are not determined by the Lie group G itself, but depend also on the choice of rigid body (i.e., moment of inertia operator).

From the law of conservation of energy we have

Theorem 6. Euler’s equations (for momentum and angular velocity) have a quadratic first integral, whose value is equal to the kinetic energy

E Stationary rotations and their stability

A stationary rotation of a rigid body is a rotation for which the angular velocity in the body is constant (and thus also the angular velocity in space; it is easy to see that one implies the other). We know from the theory of an ordinary rigid body in ℝ³ that stationary rotations are rotations around the major axes of the moment of inertia ellipsoid. Below, we formulate a generalization of this theorem to the case of a rigid body with any Lie group. We note that stationary rotations are geodesics of left-invariant metrics which are one-parameter subgroups. We note also that the directions of the major axes of the inertia ellipsoid can be determined by looking at the stationary points of the kinetic energy on the sphere of vectors of momentum of fixed length.

Theorem 7. The angular momentum (respectively, angular velocity) of a stationary rotation with respect to the body is a critical point of the energy on the orbit of the co-adjoint representation (respectively on the image of the orbit under the action of the operator A⁻¹). Conversely, every critical point of the energy on an orbit determines a stationary rotation.

The proof is a straightforward computation or application of Theorem 4.

We note that the partition of the space of momenta into orbits of the co-adjoint representation cannot be so easily constructed in the case of an arbitrary group as it was in the simple case of an ordinary rigid body; in that case it was the partition of three-dimensional space into spheres with center 0 and the point 0 itself. In the general case, the orbits can have different dimensions, and the partition into orbits at some points may not be a fibering; such a singularity already appeared in the three-dimensional case at the point 0.

We call a point M of the space of angular momenta a regular point if the partition of a neighborhood of M into orbits is diffeomorphic to a partition of euclidean space into parallel planes (in particular, all orbits near the point M have the same dimension). For example, for the group of rotations of three-dimensional space all points of the space of angular momenta are regular except the origin.

Theorem 8. Suppose that a regular point M of the space of angular momenta is a critical point of the energy on an orbit of the co-adjoint representation, and that the second differential of the energy d²H at this point is a (positive- or negative-)definite form. Then M is a(Liapunov) stable equilibrium position of Euler’s equations.

Proof. It follows from the regularity of the orbits near this point that on every neighboring orbit there exists near M a point which is a conditional maximum or minimum of energy. ☐

Theorem 9. The second differential of the kinetic energy, restricted to the image of an orbit of the co-adjoint representation in the algebra, is given at a critical point

by the formula

where ξ is a tangent vector to this image, expressed in terms of f by the formula

F Riemannian curvature of a group with left-invariant metric

Let G be a Lie group provided with the left-invariant metric given by a scalar product 〈 , 〉 in the algebra. We note that the riemannian curvature of the group G at any point is determined by the curvature at the identity (since left translation maps the group to itself isometrically). Therefore, it is sufficient to calculate the curvature for two-dimensional planes lying in the Lie algebra.

Theorem 10. The curvature of a group in the direction determined by an orthonormal pair of vectors ξ, η in the algebra is given by the formula

where 2δ = B(ξ, η) + B(η, ξ), 2β = B(ξ, η) − B(η, ξ), 2α = [ξ, η], 2B_ξ = B(ξ, ξ), 2B_η = B(η, η), and where B is the operation defined in section B.

The proof is a tedious but straightforward calculation. It is based on the easily verified formula for covariant derivative

where ξ and η on the left are left-invariant vector fields and on the right are their values at the identity.

Remark 1. In the case of a two-sided invariant metric, the formula for curvature has the particularly simple form

Remark 2. The formula for the curvature of a group with a right-invariant riemannian metric coincides with the formula for the left-invariant case. In fact, a right-invariant metric on a group is a left-invariant metric on the group with the reverse multiplication law (g₁ * g₂ = g₂g₁). Passage to the reverse group changes the signs of both the commutator and the operation B in the algebra. But, in every term of the formula for curvature, there is a product of two operations changing the sign. Therefore, the formula for curvature is the same in the right-invariant case.

In Euler’s equation the right-hand side changes sign under passage to the right-invariant case.

G Application to groups of diffeomorphisms

Let D be a bounded region in a riemannian manifold. Consider the group of diffeomorphisms of D which preserve the volume element. We will denote this group by SDiff D.

The Lie algebra corresponding to the group SDiff D consists of all vector fields with divergence 0 on D, tangent to the boundary (if it is not empty). We define the scalar product of two elements of this Lie algebra (i.e., two vector fields) as

where (·) is the scalar product giving the riemannian metric on D, and dx is the riemannian volume element.

We now consider the flow of a uniform ideal (incompressible, non-viscous) fluid on the region D. Such a flow is described by a curve t → g_t in the group SDiff D. Namely, the diffeomorphism g_t is the map which carries every particle of the fluid from the place it was at time 0 to the place it is at time t. It turns out that the kinetic energy of the moving fluid is a right-invariant riemannian metric on the group of diffeomorphisms SDiff D.

Indeed, suppose that after time t the flow of the fluid gives a diffeomorphism g_t, and that the velocity at this moment of time is given by the vector field v. Then the diffeomorphism realized by the flow after time t + τ (where τ is small) will be e^vτg_t up to a quantity small in comparison with τ (here e^vτ is the one-parameter group with vector v, i.e., the phase flow of the differential equation given by the field v). Therefore, the field of velocities v is obtained from the vector ġ tangent to the group at the point g by right translation. This also implies the right-invariance of the kinetic energy, which is by definition equal to
(we assume the density of the fluid to be 1).

The principle of least action (which in mathematical terms is the definition of an ideal fluid) asserts that flows of an ideal fluid are geodesics in the right-invariant metric just described on the group of diffeomorphisms.

Strictly speaking, an infinite-dimensional group of diffeomorphisms is not a manifold. Therefore the exact formulation of the definition above requires additional work: we must choose suitable functional spaces, prove a theorem on existence and uniqueness of solutions, etc. Up to now this has been done only in the case when the dimension of the region of the flow D is equal to 2. However, we will proceed as if these difficulties connected with infinite dimensions did not exist. Thus the following arguments are heuristic in character. It turns out that many of the results can be proved rigorously, independently of the theory of infinite-dimensional manifolds.

We will now indicate the form that the general formulas introduced above take in the case G = SDiff D, where D is a connected region with finite volume in a three-dimensional riemannian manifold. To do this we must first describe explicitly the bilinear operation

defined in section B by the formula

It is easy to verify that in the three-dimensional case the vector field B(c, a) can be expressed in terms of the vector fields a and c of our Lie algebra by the formula

where ⋀ denotes the vector product, and α the single-valued function on D which is uniquely (up to a constant summand) determined by the condition

(i.e., the conditions div B = 0 and B is tangent to the boundary of D).

We note that the operation B does not depend on the choice of orientation, since the vector product and curl both change sign with a change of orientation.

Stationary flows. Euler’s equation for “angular velocity” in the case G = SDiff D has the form

, since the metric is right-invariant. Therefore, in the case of the group of diffeomorphisms of three-dimensional space, it takes the form of “the equations of motion in Bernoulli’s form”

Euler’s equation for momentum is written in the form of the “vorticity equation”

In particular, the vorticity of a stationary flow commutes with the field of velocities.

This remark leads quickly to a topological classification of stationary flows of an ideal fluid in three-dimensional space.

Theorem 11. Assume that the region D is bounded by a compact analytic surface, and that the field of velocities is analytic and not everywhere collinear with its curl. Then the region of the flow can be partitioned by an analytic submanifold into a finite number of cells, in each of which the flow is constructed in a standard way. Namely, the cells are of two types: those fibered into tori invariant under the flow and those fibered into surfaces invariant under the flow, diffeomorphic to the annulus ℝ × S¹. On each of these tori the flow lines are either all closed or all dense, and on each annulus all the flow lines are closed.

To prove this theorem we look at the “Bernoulli surfaces,” i.e., the level surfaces of the function α. It follows from the condition for a flow to be stationary (v ⋀ curl v = −grad α) that both the flow lines and the vortex lines lie on the Bernoulli surface. Since the fields of velocity and vorticity commute, the group ℝ² acts on the closed Bernoulli surface, and it must be a torus (cf. the proof of Liouville’s theorem in Section 49). An analogous calculation for the boundary conditions on the boundary of D shows that the non-closed Bernoulli surfaces consist of annuli with closed flow lines.

Remark. The analyticity of the field of velocities is not very essential, but it is important that the fields of velocity and vorticity not be collinear. Computer experiments conducted by M. Hénon show more complicated behavior than described in the theorem for the flow lines of a stationary flow on the three-dimensional torus; this field is given by the formulas

The formulas are selected so that the vectors v and curl v are collinear. The results of Hénon’s calculations suggest that some flow lines densely fill up a three-dimensional region.

I Isovorticial fields

Two-dimensional hydrodynamics differs sharply from three-dimensional hydrodynamics. The essence of this difference is contained in the difference in the geometries of the orbits of the co-adjoint representation in the two- and three-dimensional cases. In the two-dimensional case the orbits are in some sense closed and behave, for example, like a family of level sets of a function (more precisely of several functions: actually even an infinite number of functions). In the three-dimensional case the orbits are more complicated; in particular, they are unbounded (and perhaps dense). The orbits of the co-adjoint representation of the group of diffeomorphisms of a three-dimensional riemannian manifold can be described in the following way. Let v₁ and v₂ be two vector fields of velocities of an incompressible fluid in the region D. We say that the fields v₁ and v₂ are isovorticial if there is volume-preserving diffeomorphism g : D→ D which carries every closed contour γ in D to a new contour such that the circulation of the first field along the original contour is equal to the circulation of the second field along the new contour:

It is easy to verify that the image of an orbit of the co-adjoint representation in the algebra (under the action of the inverse of the inertia operator, A⁻¹) is none other than the set of fields isovorticial to the given field.

In particular, Theorem 3 now takes the form of the following law of conservation of circulation:

Theorem 12. The circulation of a field of velocities of an ideal fluid over a closed fluid contour does not change when the contour is carried by the flow to a new position.

We note that if two fields of velocities of a three-dimensional ideal fluid on D are isovorticial, then the corresponding diffeomorphism carries the curl of the first field into the curl of the second:

Furthermore, the isovorticity of two fields can be defined as the equivalence of the fields of vorticity, if the region of the flow is simply connected. Therefore, the problem of the oribits of the co-adjoint representation in the three-dimensional case includes the problem of classifying vector fields with divergence zero up to volume-preserving diffeomorphisms. This last problem in three dimensions is hopelessly difficult.

We now consider the two-dimensional case. First, we translate the basic formulas into notation convenient for considering the two-dimensional case. We assume that the region D of the flow is two-dimensional and oriented. The metric and orientation give a symplectic structure on D; the vector field of velocities has divergence zero and is therefore hamiltonian. Therefore, this field is given by a hamiltonian function (many-valued, in general, if the region D is not simply connected). The hamiltonian function of a field of velocities is called the stream function in hydrodynamics, and is denoted by ψ. Thus

where I is the operator of clockwise rotation by 90°.

The stream function of the commutator of two fields turns out to be the jacobian (or the Poisson bracket of hamiltonian formalism) of the stream functions of the original fields

The vector field B(c, a) is given, in the two-dimensional case, by the formula

where ψ_a and ψ_c are the stream functions of the fields a and c, and Δ = div grad is the laplacian.

In the particular case of the euclidean plane with cartesian coordinates x and y, the formulas for stream function, commutator and laplacian take the particularly simple form

The vorticity (or curl) of a two-dimensional field of velocities is the scalar function r such that the integral around any oriented region σ in D of the product of r with the oriented area element is equal to the circulation of the field of velocities around the boundary of σ:

It is easy to compute an expression for the vorticity in terms of the stream function:

In the two-dimensional simply connected case, isovorticity of fields v₁ and v₂ means simply that the functions r₁ and r₂ (the vorticities of these fields) are carried to one another under a suitable volume-preserving diffeomorphism.

Under such conditions the two functions r₁ and r₂ have the same distribution function, i.e.,

for any number c. Therefore, if two fields are in the image of the same orbit of the co-adjoint representation, then a whole series of functionals are equal; for example, the integrals of all powers of the vorticity

In particular, Euler’s equations of motion of a two-dimensional ideal fluid

have an infinite collection of first integrals. For example, the integral of any power of the vorticity of the field of velocities

is such a first integral.

The existence of these first integrals (i.e., the relatively simple structure of orbits of the co-adjoint representation) allows us to prove theorems on existence and uniqueness, etc. in the two-dimensional hydrodynamics of an ideal (and also of a viscous) fluid; the complicated geometry of orbits of the co-adjoint representation in the three-dimensional case (or, perhaps, insufficient information about these orbits) makes the foundations of three-dimensional hydrodynamics a very hard problem.

J Stability of planar stationary flows

Here we formulate general theorems about stationary rotations (Theorems 7, 8, and 9 above) for the case of a group of diffeomorphisms. We obtain in this way the following assertions:

A stationary flow of an ideal fluid is distinguished from all flows isovorticial to it by the fact that it is a conditional extremum (or critical point) of the kinetic energy.

If (i) the indicated critical point is actually an extremum, i.e., a local conditional maximum or minimum, (ii) it satisfies certain (generally satisfied) regularity conditions, and (iii) the extremum is non-degenerate (the second differential is positive- or negative-definite), then the stationary flow is stable (i.e., is a Liapunov stable equilibrium position of Euler’s equation).

The formula for the second differential of the kinetic energy, on the tangent space to the manifold of fields which are isovorticial to a given one, has the following form in the two-dimensional case. Let D be a region in the euclidean plane with cartesian coordinates x and y. Consider a stationary flow with stream function ψ = ψ(x, y). Then 2 d²H =∬_D (δv)² + (Δψ/∇Δψ)(δr)² dx dy, where δv is the variation of the field of velocities (i.e., a vector of the tangent space indicated above), and δr = curl δv.

We note that for a stationary flow, the gradient vectors of the stream function and its laplacian are collinear. Therefore the ratio ∇ψ/∇Δψ makes sense. Furthermore, in a neighborhood of every point where the gradient of the vorticity is not zero, the stream function is a function of the vorticity function.

The assertions introduced above lead to the conclusion that the positive- or negative-definiteness of the quadratic form d²H is a sufficient condition for stability of the stationary flow under consideration. This conclusion does not formally follow from Theorems 7, 8, and 9 since the application of any of our formulas in the infinite-dimensional case requires justification. Fortunately, we can justify the final conclusion about stability without justifying the intermediate constructions. Thus we can rigorously prove the following a priori bounds (expressing the stability of a stationary flow in terms of small perturbations of the initial velocity field).

Theorem 13. Suppose that the stream function of a stationary flow, ψ = ψ(x, y), in a region D is a function of the vorticity function (i.e., of the function Δψ) not only locally, but globally. Suppose that the derivative of the stream function with respect to the vorticity satisfies the inequality

Let ψ + φ(x, y, t) be the stream function of another flow, not necessarily stationary. Assume that, at the initial moment, the circulation of the velocity field of the perturbed flow (with flow function ψ + φ) around every boundary component of the region D is equal to the circulation of the original flow (with stream function ψ). Then the perturbation φ = φ(x, y, t) at every moment of time is bounded in terms of the initial perturbation φ₀ = φ(x, y, 0) by the formula

If the stationary flow satisfies the inequality

then the perturbation φ is bounded in terms of φ₀ by the formula

This theorem implies the stability of a stationary flow in the case of a positive-definite quadratic form

with respect to ∇φ (where φ is a constant function on every component of the boundary of D whose gradient flow is zero over every boundary component), and also in the case of a negative definite form

Example 1. Consider a planar parallel flow in the strip Y₁ ≤ y ≤ Y₂ in the (x, y)-plane with velocity profile v(y) (i.e., with velocity field (v(y), 0)). Such a flow is stationary for any velocity profile. To make the region of the flow compact, we impose the condition that the velocity fields of all flows under consideration be periodic with period X in the x-coordinate.

The conditions of Theorem 13 are fulfilled if the velocity profile has no points of inflection (i.e., if d²v/dy² ≠ 0). We come to the conclusion that planar parallel flows of an ideal fluid with no inflection points in the velocity profile are stable.

The analogous proposition in the linearized problem is called Rayleigh’s theorem.

We emphasize that in Theorem 13 it is not a question of stability “in a linear approximation,” but of actual strict Liapunov stability (i.e., with respect to finite perturbations in the nonlinear problem). The difference between these two forms of stability is substantial in this case, since our problem has a hamiltonian character (cf. Theorem 4); for hamiltonian systems asymptotic stability is impossible, so stability in a linear approximation is always neutral and insufficient for a conclusion about the stability of an equilibrium position of the nonlinear problem.

Example 2. Consider the planar-parallel flow on the torus

with velocity field v = (sin y, 0), parallel to the x-axis. This field is determined by the stream function ψ = −cos y and has vorticity r = −cos y. The velocity profile has two inflection points, but the stream function can be expressed as a function of the vorticity. The ratio ∇ψ/∇Δψ is equal to minus one. By applying Theorem 13 we can convince ourselves of the stability of our stationary flow in the case when

for all functions φ of period X in x and 2π in y. It is easy to calculate that the last inequality is satisfied for X ≤ 2π and violated for X > 2π.

Thus Theorem 13 implies the stability of a sinusoidal stationary flow on a short torus, when the period in the direction of the basic flow (X) is less than the width of the flow (2π). On the other hand, we can directly verify that on a long torus (for X > 2π) our sinusoidal flow is unstable.⁹⁷ Thus, in this example, the sufficient condition for stability from Theorem 13 turns out to be necessary.

We should note that in general an indefinite quadratic form d²H does not imply instability of the corresponding flow. In general, an equilibrium position of a hamiltonian system can be stable even though the hamiltonian function at this position is neither a maximum nor a minimum. The quadratic hamiltonian is the simplest example of this kind.

K Riemannian curvature of a group of diffeomorphisms

The expression for the curvature of a Lie group provided with a one-sided-invariant metric, introduced in subsection E, makes sense also for the group SDiff D of diffeomorphisms of a riemannian domain D. This group is the configuration space for an ideal fluid filling the domain D. The kinetic energy defines a right-invariant metric on SDiff D. The number which we obtain by formally applying the formula for the curvature of a Lie group to this infinite-dimensional group is naturally called the curvature of the group SDiff D.

Calculation of the curvature of a group of diffeomorphisms has been carried out completely only in the case of a flow on the two-dimensional torus with euclidean metric. Such a torus is obtained from the euclidean plane ℝ² by identifying points whose difference lies in some lattice (a discrete subgroup of the plane). An example of such a lattice is the set of points with integral coordinates. In general, to obtain an arbitrary lattice Γ we may replace the square lying at the basis of this special lattice by any parallelogram.

Now consider the Lie algebra of vector fields with divergence zero on the torus with a single-valued stream function. The corresponding group S₀ Diff T² consists of volume-preserving diffeomorphisms which leave the center of mass of the torus fixed. It is embedded in the group SDiff T² of all volume-preserving diffeomorphisms as a totally geodesic submanifold (i.e., a submanifold such that each of its geodesics is a geodesic in the ambient manifold).

The proof consists of the fact that if, at the initial moment, a velocity field of an ideal fluid has a single-valued stream function, then at all other moments of time the stream function will also be single-valued; this follows from the law of conservation of momentum.

We will now investigate the curvature of the group S₀ Diff T² in all possible two-dimensional directions passing through the identity of the group (the curvature of the group SDiff T² in every such direction is the same, since the submanifold S₀ Diff T² is totally geodesic).

Choose an orientation on ℝ². Then elements of the Lie algebra of the group S₀ Diff T² can be thought of as real functions on the torus having average value zero (a field with divergence zero is obtained from such a function by considering it to be a stream function). Therefore, a two-dimensional direction in the tangent space to the group S₀ Diff T² is determined by a pair of functions on the torus with average value zero.

We will give such a function by the set of its Fourier coefficients. It is convenient to carry out all calculations with Fourier series in the complex domain. We let e_k (where k, called a wave vector, is a point of the euclidean plane) denote the function whose value at a point x of our plane is equal to e^{i(k, x)}. Such a function determines a function on the torus if it is Γ-periodic, i.e., if adding a vector from the lattice Γ to x does not change the value of the function.

In other words, the scalar product (k, x) must be a multiple of 2π for all x ∈ Γ. All such vectors k belong to a lattice Γ* on ℝ². The functions e_k, where k ∈ Γ*, form a complete system in the space of complex functions on the torus.

We now complexify our Lie algebra, scalar product 〈 , 〉, commutator [,] and operation B in the algebra, as well as the riemannian connection and curvature tensor Ω, so that all these functions become (multi-) linear in the complex vector space of the complexified Lie algebra. The functions e_k (where k ∈ Γ*, k ≠ 0) form a basis of this vector space.

Theorem 14. The explicit formulas for the scalar product, commutator, operation B, connection, and curvature of a right-invariant metric on the group S₀ Diff T² have the following form:

In these formulas, S is the area of the torus, and u ⋀ v the area of the parallelogram spanned by u and v (with respect to the chosen orientation of ℝ²). The parentheses denote the euclidean scalar product in the plane, and angled brackets denote the scalar product in the Lie algebra.

The proof of this theorem is in the first article listed in the introduction to this appendix.

The formulas above allow us to calculate the curvature in any two-dimensional direction. These calculations show that in most directions the curvature is negative, but in a few it is positive. Consider, for instance, some fluid flow, i.e. a geodesic of our group. By Jacobi’s equations, the stability of this geodesic is determined by the curvatures in the directions of all possible two-dimensional planes passing through the velocity vector of the geodesic at each of its points.

Assume now that the flow under consideration is stationary. Then the geodesic is a one-parameter subgroup of our group. From this it follows that the curvatures in the directions of all planes passing through velocity vectors of the geodesic at all of its points are equal to the curvatures in the corresponding planes going through the velocity vector of this geodesic at the initial moment of time (Proof: right translate to the identity element of the group). Thus the stability of a stationary flow depends only on the curvatures in the directions of those two-dimensional planes in the Lie algebra which contain the vector of the Lie algebra which is the velocity field of the stationary flow.

Consider, for example, the simplest parallel sinusoidal stationary flow. Such a flow is given by the stream function

Consider any other real vector of the algebra,

. We deduce easily from Theorem 14 that

Theorem 15. The curvature of the group S₀ Diff T² in any two-dimensional plane containing the direction ξ is non-positive. Namely,

From this formula it follows, in particular, that

The curvature is equal to zero only for those two-dimensional planes which consist of parallel flows in the same direction as ξ, so that [ξ, η] = 0;

The curvature in the plane defined by the flow functions ξ = cos kx, η = cos lx is

where S is the area of the torus, α is the angle between k and l, and β is the angle between k + l and k − l;

In particular, the curvature of the group of diffeomorphisms of the torus {(x, y) mod 2π} in directions determined by the velocity fields (sin y, 0) (0, sin x) is equal to

L Discussion

It is natural to expect that the curvature of a group of diffeomorphisms is related to the stability of geodesics in this group (i.e. to the stability of flows of an ideal fluid) in the same way as the curvature of a finite-dimensional Lie group is related to the stability of geodesics on it. Namely, negative curvature causes exponential instability of geodesics. The characteristic path length (the average path length in which errors in the initial conditions grow e times) has order of magnitude

. Thus, knowing the curvatures of a group of diffeomorphisms allows us to estimate the time for which we can predict the development of the flow of an ideal fluid by means of an approximate initial velocity field before the error grows to a large order.

It should be emphasized that instability of a flow of an ideal fluid is here understood differently than in section K; it is a question of exponential instability of the motion of the fluid, not of its velocity field. It is possible for a stationary flow to be a Liapunov stable solution of Euler’s equation while the corresponding motion of the fluid is exponentially unstable. The reason is that a small change in the velocity field of a fluid can induce an exponentially growing change in the motion of the fluid. In such a case (stability of the solution of Euler’s equation and negative curvature of the group) we can predict the velocity field, but we cannot predict the motion of the fluid mass without a great loss of accuracy.

The formulas mentioned above for curvature can be used even for rough estimates of the time over which a long-term dynamical prediction of the weather is impossible, if we agree to a few simplifying assumptions. These simplifying assumptions consist of the following:

The earth has the shape of a torus obtained by factoring the plane by a square lattice.

The atmosphere is a two-dimensional homogeneous incompressible inviscid fluid.

The motion of the atmosphere is approximately a “tradewind current,” parallel to the equator of the torus and having sinusoidal velocity profile.

To calculate the characteristic path length we must then estimate the curvature of the group S₀ Diff T² in directions containing the “tradewind current” ξ from Theorem 15. To do this we will look at T² as {(x, y) mod 2π}, k = (0, 1). In other words, we look at 2π-periodic flows on the (x, y)-plane close to a stationary flow, parallel to the x-axis and with sinusoidal velocity profile

It is easy to see from the formula in Theorem 15 that the curvature of the group S₀ Diff T² in the planes containing our tradewind current v varies within the limits

Here the lower limit is obtained by a rather crude estimate. However, a direction with curvature K = −1/2S certainty exists, and there are many other directions with curvature of approximately the same size. In order to make a rough estimate of the characteristic path length, we make the rough guess K₀ = −1/2S as value of the “mean curvature.”

If we agree to start from this value K₀ of the curvature, we obtain the characteristic path length

The velocity of motion with respect to the group which corresponds to our tradewind current is equal to

(since the average square value of the sine is ½. Therefore, the time it takes for our flow to travel the characteristic path length is equal to 2. The fastest particles of the fluid go a distance of 2 after this time, i.e., 1/π of the entire orbit around the torus.

Thus, if we take our value of the mean curvature, then the error grows by e^π ≈ 20 after the time of one orbit of the fastest particle. Taking the value 100 km/hr as the maximal velocity of the tradewind current, we get 400 hours for the time of orbit, i.e., less than three weeks.

Thus, if at the initial moment the state of the weather was known with small error ε, then the order of magnitude of the error of prediction after n months would be

For example, to predict the weather two months in advance we must have initial data with five more digits of accuracy than the prediction accuracy. Practically, this means that calculating the weather for such a period is impossible.

It is clear that the estimates mentioned here are not very sharp, and the model we took is very simplified. The choice of the value of “mean curvature” also requires justification.

Cf., for example, the article of L. D. Meshalkin and Ya. G. Sinai, “Investigation of the stability of a stationary solution of a system of equations for the plane movement of an incompressible viscous liquid.” J. Applied Math. Mech. 25 (1962), 1700–1705.

Appendix 3: Symplectic structures on algebraic manifolds

The symplectic manifolds of classical mechanics are most often phase spaces of lagrangian mechanical systems, i.e., cotangent bundles of configuration spaces.

An entirely different series of symplectic manifolds arises in algebraic geometry.

For example, any smooth complex algebraic manifold (given by a system of polynomial equations in complex projective space) has a natural symplectic structure.

The construction of a symplectic structure on an algebraic manifold is based on the fact that complex projective space itself has a particular symplectic structure, namely the imaginary part of its hermitian structure.

A The hermitian structure of complex projective space

Recall that n-dimensional complex projective space ℂPⁿ is the manifold of all complex lines passing through the point 0 in an (n + 1)-dimensional complex vector space ℂ^{n + 1}. To construct a symplectic structure on ℂPⁿ we use the hermitian structure in the corresponding vector space ℂ^{n + 1}

Recall that a hermitian scalar product (or hermitian structure) on a complex vector space is a complex linear function on pairs of vectors, which (1) is linear in the first and anti-linear in the second variable, (2) changes its value to the complex conjugate when the arguments are interchanged, and (3) becomes a positive-definite real quadratic form if we take the arguments equal:
for ξ ≠ 0.

An example of a hermitian scalar product is
(1)
where ξ_k and η_k are the coordinates of the vectors ξ and η in some basis.

A basis for which a hermitian scalar product has the form (1) always exists, and is called a hermitian-orthonormal basis.

The real and imaginary parts of a hermitian scalar product are real bilinear forms. The first is symmetric, and the second skew-symmetric, and both are nondegenerate:

The quadratic form (ξ, ξ) is positive-definite.

Thus a hermitian structure 〈 , 〉 on a complex vector space gives it a euclidean structure ( , ) and a symplectic structure [ , ]. These two structures are related to the complex structure by the relation

We will now define a riemannian metric on complex projective space. To do this, consider the unit sphere

in the corresponding vector space ℂ^{n + 1}. This sphere inherits the riemannian metric from ℂ^{n + 1} Every complex line intersects our sphere in a great circle.

Definition. The distance between two points of complex projective space is the distance between the two corresponding circles on the unit sphere.

We note that these two circles are parallel in the sense that the distance from any point of one of the circles to the other is the same (Proof: multiplication of z by e^iφ preserves the metric on the sphere). This circumstance allows us at once to write down an explicit formula (2) for the riemannian metric on the complex projective space given by the construction defined above.

In fact, let p denote the mapping

taking a point z ≠ 0 of the vector space ℂ^{n + 1} to the complex line passing through 0 and z.

Every vector ξ tangent to ℂPⁿ at the point pz can be represented (in many ways) as the image of a vector at the point z; under this map

Theorem. The square of the length of a vector ζ in the riemannian metric defined above is given by the formula

(2)

Proof. Assume first that the point z lies on the unit sphere S^{2n + 1}.

Decompose the vector ξ into two components: one in the complex line determined by the vector z and the other in the hermitian-orthogonal direction. Note that hermitian-orthogonal to the vector z means euclidean-orthogonal to the vectors z and iz. The vector z is a euclidean normal vector to the sphere S^{2n + 1} at z. The vector iz is a vector tangent to the circle in which the sphere intersects the complex line passing through z. Thus the component η of the vector ξ which is hermitian-orthogonal to z is tangent to the sphere S^{2n + 1} and euclidean-orthogonal to the circle in which the sphere intersects the line pz.

By the definition of the metric on ℂPⁿ, the riemannian square of the length of the vector is equal to the euclidean square length of the component η of ξ which is hermitian-orthogonal to z.

We calculate the component η of ξ, hermitian-orthogonal to z. We write our decomposition as

By hermitian multiplication with z, we find

so

Calculating the hermitian square of the vector η, we find 〈η, η〉 = 〈η, ξ〉 and
Thus, formula (2) is proved for points z of the unit sphere. The general case follows from looking at the homothetic transformation z → z/|z|. □

Note that our construction allows us to define not only a euclidean structure (2), but also a hermitian structure on the tangent space to ℂPⁿ. Consider the hermitian-orthogonal complement H to the direction of the vector z in the space

, where z ∈ S^{2n + 1}. The map p*: H → T(ℂPⁿ)_Pz maps H isomorphically (as we showed above) onto the tangent space to ℂPⁿ and carries over the hermitian structure from H.

It is clear that the scalar square defined by this hermitian structure is given by formula (2). Therefore, the formula for the hermitian scalar product in the tangent space to ℂPⁿ can be written down without further calculations:

(3)

for any vectors ξ₁, ξ₂ in

satisfying the relation p*ξ_k = ξ_k ∈ T(ℂPⁿ)_pz. We note that in formula (3) the point z does not necessarily lie on the unit sphere.

The euclidean and hermitian structures (2) and (3) constructed on the tangent spaces to ℂPⁿ are not invariant under all projective transformations of the manifold ℂPⁿ, but are invariant under those which are given by unitary (preserving the hermitian structure) linear transformations of the vector space ℂ^{n + 1}.

B The symplectic structure of complex projective space

We consider the imaginary part of the hermitian form (3), taken with coefficient −1/π (the reason for taking this coefficient is explained in Problem 1, Section C):

(4)

Like the imaginary part of any hermitian form, the real bilinear form Ω on the tangent space to complex projective space is skew-symmetric and nondegenerate.

Theorem. The differential 2-form Ω gives a symplectic structure on complex projective space.

Proof. We need only verify that the form Ω is closed.

Consider the exterior derivative dΩ of the form Ω. This differential 3-form on ℂPⁿ is invariant with respect to mappings induced by unitary transformations of the space ℂ^{n + 1}. It follows from this that it is equal to zero.

To see this, we look at a hermitian-orthonormal basis e₁, ..., e_n of the tangent space to ℂPⁿ at some point z. Then the vectors e₁, ..., e_n, ie₁, ..., ie_n form a euclidean-orthonormal ℝ-basis. We will show that the value of the form dΩ on any triple of these ℝ-basis vectors is equal to zero. (We assume that n > 1; for n = 1 there is nothing to prove.)

Note that in any triple of ℝ-basis vectors at least one is hermitian-orthogonal to the two others. Denote this vector by e. It is easy to construct a unitary transformation of the space ℂ^{n + 1} inducing a motion on ℂPⁿ which fixes the point z and the hermitian-orthogonal complement to e, and changes the direction of e.

The value of the form dΩ on our three vectors, e, f, and g is equal to its value on the triple −e, f, and g by the invariance of the form Ω, and is hence equal to zero. □

Remark. Another method of constructing the same symplectic structure on complex projective space consists of the following. Consider small oscillations of a mathematical pendulum with an (n + 1)-dimensional configuration space. We make use of the integral of energy to decrease by 1 the degree of freedom of the system. The phase space obtained after this operation is ℂPⁿ, and the symplectic structure on it agrees with the form Ω described above up to a factor.

One other method of constructing a symplectic structure on ℂPⁿ uses the fact that this space may be represented as one of the orbits of the co-adjoint representation of a Lie group, and on every such orbit there is always a standard symplectic structure (cf. Appendix 2, Section A). For the Lie group we can take the group of unitary (preserving the hermitian metric) operators in an (n + 1)-dimensional complex space. The orbits of the co-adjoint representation in this case are the same as of the adjoint representation. In the adjoint representation the operator of reflection through a hyperplane (which changes the sign of the first coordinate and leaves the others fixed) has ℂPⁿ as its orbit, since the reflection operator is uniquely determined by the complex line orthogonal to the hyperplane.

C Symplectic structure on algebraic manifolds

We will now obtain a symplectic structure on any complex submanifold M of complex projective space. Let j: M → ℂPⁿ be an embedding of the complex manifold M into complex projective space. The riemannian, hermitian, and symplectic structures on projective space induce corresponding structures on M. For example, the symplectic structure on M is given by the formula

Theorem. The differential form Ω_M gives a symplectic structure on the manifold M.

Proof. The nondegeneracy of the 2-form Ω_M follows from the fact that M is a complex submanifold. In fact, the quadratic form

is positive-definite (it is induced by the riemannian metric on ℂPⁿ). Therefore, the bilinear form (ξ, η) = Ω_M(ξ, iη) is nondegenerate. This means that the form Ω_M is also nondegenerate. The form Ω_M is closed since the form Ω is closed. □

Remark. In the same way as for complex projective space, we define a hermitian structure on the tangent spaces of its complex submanifolds; the symplectic structure is the imaginary part.

A complex manifold with a hermitian metric whose imaginary part is a closed form (i.e. a symplectic structure) is called a Kähler manifold and its hermitian metric a Kähler metric. Many important results have been obtained in the geometry of Kähler manifolds; in particular, they have remarkable topological properties (cf., for example, A. Weil, “Variétés Kählériennes,” Hermann, 1958).

Not all symplectic manifolds admit a Kähler structure.

Problem 1. Calculate the symplectic structure Ω in the affine chart w = z₁ : z₀ of the projective line ℂP¹.

Answer. Ω = (1/π)(dx ⋀ dy)/(1 + x² + y²)², where w = x + iy. The coefficient in the definition of the form Ω is chosen to obtain the usual orientation of the complex line (dx ⋀ dy) and so that the integral of the form Ω along the whole projective line is equal to 1.

Problem 2. Show that the symplectic structure Ω in the affine chart of the projective space ℂPⁿ = {(z₀ : z₁ :...: z_n)} is given by the formula

By convention, w₀ = 1.

Remark. Differential forms on a complex space with complex values (such as dw_k and ) are defined as complex linear functions of tangent vectors; if w_k = x_k + iy_k, then

The space of such forms in ℂⁿ has complex dimension 2n; the 2n forms for example, form a ℂ-basis, or the 2n forms dx_k, dy_k.

Exterior multiplication is defined in the usual way and obeys the usual rules. For example,

Let f be a real-smooth function on ℂⁿ (with complex values, in general). An example of such a function is . The differential of the function f is a complex 1-form. Therefore, it can be decomposed in the basis . The coefficients of this decomposition are called the partial derivatives “with respect to w_k” and “with respect to ”:

In calculating exterior derivatives it is also convenient to separate into differentiation d′ with respect to the variable w and d″ with respect to the variable , so that d = d′ + d″.

For example, for a function f

For the differential 1-form

the operators d′ and d″ are defined analogously:

Problem 3. Show that the symplectic structure Ω on the affine chart of the projective space ℂPⁿ is given by the formula

Appendix 4: Contact structures

An odd-dimensional manifold cannot admit a symplectic structure. The analogue of a symplectic structure for odd-dimensional manifolds is a little less symmetric, but also a very interesting structure—the contact structure.

The source of symplectic structures in mechanics are phase spaces (i.e., cotangent bundles to configuration manifolds), on which there is always a canonical symplectic structure. The source of contact structures are manifolds of contact elements of configuration spaces.

A contact element to an n-dimensional smooth manifold at some point is an (n − 1)-dimensional plane tangent to the manifold at that point (i.e., an (n − 1)-dimensional subspace of the n-dimensional tangent space at that point).

The set of all contact elements of an n-dimensional manifold has a natural smooth manifold structure of dimension 2n − 1. It turns out that there is an interesting additional “contact structure” on this odd-dimensional manifold (we describe this below).

The manifold of contact elements of a riemannian n-dimensional manifold is closely related to the (2n − 1)-dimensional manifold of unit tangent vectors of this riemannian n-dimensional manifold, or to the (2n − 1)-dimensional energy level manifold of a point mass moving on the riemannian manifold under inertia. The contact structures on these (2n − 1)-dimensional manifolds are closely related to the symplectic structure on the 2n-dimensional phase space of the point (i.e., the cotangent bundle of the original n-dimensional riemannian manifold).

A Definition of contact structure

Definition. A contact structure on a manifold is a smooth field of tangent hyperplanes⁹⁸ satisfying a nondegeneracy condition which will be formulated later.

To formulate this condition we examine what a field of hyperplanes looks like in general in a neighborhood of a point in an N-dimensional manifold.

Example. Let N = 2. Then the manifold is a surface and a field of hyperplanes is a field of straight lines. Such a field in a neighborhood of a point is always constructed very simply, namely, as a field of tangents to a family of parallel lines in a plane. More precisely, one of the basic results of the local theory of ordinary differential equations is that it is possible to change any smooth field of tangent lines on a manifold into a field of tangents to a family of straight lines in euclidean space by using a diffeomorphism in a sufficiently small neighborhood of any point of the manifold.

If N > 2, then a hyperplane is not a line, and the question becomes significantly more complicated. For example, most fields of two-dimensional tangent planes in ordinary three-dimensional space cannot be diffeomorphically mapped onto a field of parallel planes. The reason is that there exist fields of tangent planes for which it is impossible to find “integral surfaces,” i.e., surfaces which have the prescribed tangent plane at each point.

The nondegeneracy condition for a field of hyperplanes which enters into the definition of contact structure consists of the stipulation that the field of hyperplanes must be maximally far from a field of tangents to a family of hyperplanes. In order to measure this distance, as well as to convince ourselves of the existence of fields without integral hypersurfaces, we must make a few constructions and calculations.⁹⁹

B Frobenius’ integrability condition

We will consider some point on an N-dimensional manifold and try to construct a surface passing through this point and tangent to a given field of (N − 1)-dimensional planes at each point (an integral surface).

To this end we introduce a coordinate system onto a neighborhood of this point so that at the point itself one coordinate surface is tangent to a plane of the field. We will call this plane the horizontal plane, and will call the coordinate axis not lying in it the vertical axis.

Construction of an integral surface. An integral surface, if one exists, is the graph of a function of N − 1 variables near the origin. To construct it, we can take some smooth path on the horizontal plane. Then the vertical lines over this path form a two-dimensional surface (cylinder); our field of planes intersects its tangent planes in a field of tangent lines. The integral surface we are looking for, if it exists, intersects this cylinder in an integral curve of the field of lines, starting at the origin. Such an integral curve always exists independent of whether an integral surface exists. Thus we can construct an integral surface over the horizontal plane by moving along smooth curves in the latter.

In order to obtain a smooth integral surface from all the integral curves we need the result of our construction to be independent of the path, determined only by its endpoint. In particular, for a circuit of a closed path in a neighborhood of the origin in the horizontal plane, the integral curve on the cylinder must close up.

It is easy to construct examples of fields of planes for which such closure does not take place and, therefore, for which an integral surface does not exist. Such fields of planes are called nonintegrable.

Example of a nonintegrable field of planes. In order to give a field of planes and measure numerically the deviation from closure, we introduce the following notation. We note first of all that a field of hyperplanes can be given locally by a differential 1-form; a plane in the tangent space gives a 1-form up to multiplication by a nonzero constant. We will choose this constant so that the value of the form on the vertical basic vector is equal to 1.

This condition can be satisfied in some neighborhood of the origin since the plane of the field at zero does not contain the vertical direction. This condition determines the form uniquely (given the field of planes).

A field of planes in ordinary three-space which does not have an integral surface can be given, for example, by the 1-form

where x and y are the horizontal coordinates and z is the vertical. The proof of the fact that this field of planes is nonintegrable will be given below.

Construction of a 2-form measuring nonintegrability. With the help of the form giving the field, we can measure the degree of nonintegrability. This is done using the following construction (Figure 236).

Figure 236

Integral curves constructed for a non-integrable field of planes

Consider a pair of vectors emanating from the origin and lying in the horizontal plane of our coordinate system. Construct a parallelogram on them. We obtain two paths from the origin to the opposite vertex. Over each of these two paths we can construct an integral curve (with two sections) as described above. As a result, in general, there arise two different points over the vertex of the parallelogram opposite to the origin. The difference in the heights of these points is a function of our pair of vectors. This function is skew-symmetric and equal to zero if one of the vectors is equal to zero. Thus the linear part of the Taylor series of this function is zero at zero, and the quadratic part of its Taylor series is a bilinear skew-symmetric form on the horizontal plane.

If the field is integrable, then this 2-form is equal to zero. Therefore, this 2-form can be considered as a measure of the nonintegrability of the field.

The 2-form is well defined. We constructed the 2-form above with the help of coordinates. However, the value of our 2-form on a pair of tangent vectors does not depend on the coordinate system, but only on the 1-form used to give the field.

To convince ourselves of this, it is enough to prove the following.

Theorem. The 2-form defined above agrees with the exterior derivative of the 1-form ω, dω|_{ω = 0}, on the null space of ω.

Proof. We will show that the difference in the heights of the two points obtained as a result of our two motions along the sides of the parallelogram is the same as the integral of the 1-form ω over the four sides of the parallelogram, up to a quantity small of third order with respect to the sides of the parallelogram.

To this end we note that the height of the rise of an integral curve along any path of length ε emanating from the origin has order ε², since at the origin the plane of the field is horizontal. Therefore, the integrals of the 2-form dω over all four vertical areas over the sides of the parallelogram bounded by the integral curves and the horizontal plane, have order ε³ if the sides are of order ε.

The integrals of the form ω along integral curves are exactly equal to zero. Therefore, by Stokes’ formula, the increase in height along the integral curve lying over any of the sides of the parallelogram is equal to the integral of the 1-form ω along this side up to a quantity of third-order smallness.

Now the theorem follows directly from the definition of exterior differentiation. □

Some arbitrariness remains in the choice of the 1-form ω which we used to construct our 2-form. Namely, the form ω is defined by the field of planes only up to multiplication by a function f which is never zero. In other words, we could have started with the form fω. Then we would have obtained the 2-form

which, on our plane, differs from the 2-form dω by multiplication by the nonzero number f (0).

Thus the 2-form constructed on the plane of the field is defined invariantly up to multiplication by a nonzero constant.

Condition for integrability of a field of planes

Theorem. If a field of hyperplanes is integrable, then the 2-form constructed above on a plane of the field is equal to zero. Conversely, if the 2-form constructed on every plane of the field is equal to zero, then the field is integrable.

Proof. The first assertion of the theorem is clear by the construction of the 2-form. The proof of the second assertion can be carried out by exactly the same reasoning we used to prove the commutativity of phase flows for which the Poisson bracket of the velocity fields was equal to zero. We can simply refer to this commutativity, applying it to the integral curves arising over the lines of the coordinate directions in the horizontal plane. □

Theorem. The integrability condition for a field of planes,

is equivalent to the following condition of Frobenius:

Proof. We consider the value of the 3-form above on any three distinct coordinate vectors. Only one of these vectors can be the vertical. Therefore, of all the terms entering into the definition of the value of the exterior product of the three vectors, only one is nonzero: the product of the value of the form ω on the vertical vector with the value of the form dω on the pair of horizontal vectors. If the field given by the form is integrable, then the second factor is zero, so our 3-form is zero on arbitrary triples of vectors.

Conversely, if the 3-form is equal to zero for any vectors, then it is equal to zero for any triple of coordinate vectors, of which one is vertical and the other two horizontal. The value of the 3-form on such a triple is equal to the product of the value of ω on the vertical vector with the value of dω on the pair of horizontal vectors. The first factor is not zero, so the second must be zero, and thus the form dω is zero on a plane of the field. □

C Nondegenerate fields of hyperplanes

Definition. A field of hyperplanes is said to be nondegenerate at a point if the rank of the 2-form dω|_{ω = 0} in the plane of the field passing through this point is equal to the dimension of the plane.

This means that for any nonzero vector in our plane, we can find another vector in the plane such that the value of the 2-form on this pair of vectors is not zero.

Definition. A field of planes is called nondegenerate on a manifold if it is nondegenerate at every point of the manifold.

Note that on an even-dimensional manifold there cannot be a nondegenerate field of hyperplanes; on such a manifold a hyperplane is odd-dimensional, and the rank of every skew-symmetric bilinear form on an odd-dimensional space is less than the dimension of the space (cf. Section 44).

Nondegenerate fields of hyperplanes do exist on odd-dimensional manifolds.

Example. Consider a euclidean space of dimension 2m + 1 with coordinates x, y, and z (where x and y are vectors in an m-dimensional space and z is a number). The 1-form

defines a field of hyperplanes. The plane of the field passing through the origin has equation dz = 0. We take x and y as coordinates in this hyperplane. Therefore, in this plane of the field our 2-form can be written in the form

The rank of this form is 2m, so our field is nondegenerate at the origin, and thus also in a neighborhood of the origin (in fact, this field of planes is nondegenerate at all points of the space).

Now, finally, we can give the definition of a contact structure on a manifold: a contact structure on a manifold is a nondegenerate field of tangent hyperplanes.

D The manifold of contact elements

The term “contact structure” stems from the fact that there is always such a structure on a manifold of contact elements of a smooth n-manifold.

Definition. A hyperplane (dimension n − 1) tangent to a manifold at some point is called a contact element, and this point the point of contact.

The set of all contact elements of an n-dimensional manifold has the structure of a smooth manifold of dimension 2n − 1.

In fact, the set of contact elements with a fixed point of contact is the set of all (n − 1)-dimensional subspaces of an n-dimensional vector space, i.e., a projective space of dimension n − 1. To give a contact element we must therefore give the n coordinates of the point of contact together with the n − 1 coordinates defining a point of an (n − 1)-dimensional projective space—2n − 1 coordinates in all.

The manifold of all contact elements of an n-dimensional manifold is a fiber bundle whose base is our manifold and whose fiber is (n − 1)-dimensional projective space.

Theorem. The bundle of contact elements is the projectivization of the cotangent bundle: it can be obtained from the cotangent bundle by changing every cotangent n-dimensional vector space into an (n − 1)-dimensional projective space (a point of which is a line passing through the origin in the cotangent space).

Proof. A contact element is given by a 1-form on the tangent space, for which this element is a zero level set. This form is not zero, and it is determined up to multiplication by a nonzero number. But a form on the tangent space is a vector of the cotangent space. Therefore, a nonzero form on the tangent space, determined up to a multiplication by a nonzero number, is a nonzero vector of the cotangent space, determined up to a multiplication by a nonzero number, i.e., a point of the projectivized cotangent space. □

The contact structure on the manifold of contact elements. In the tangent space to the manifold of contact elements there is a distinguished hyperplane. It is called the contact hyperplane and is defined in the following way.

We fix a point of the (2n − 1)-dimensional manifold of contact elements on an n-dimensional manifold. We can think of this point as an (n − 1)-dimensional plane tangent to the original n-dimensional manifold.

Definition. A tangent vector to the manifold of contact elements at a fixed point belongs to the contact hyperplane if its projection onto the n-dimensional manifold lies in the (n − 1)-dimensional plane which is the given point of the manifold of contact elements.

In other words, a displacement of a contact element is tangent to the contact hyperplane if the velocity of the point of contact belongs to this contact element, no matter how the element turns.

Example. We take some submanifold of our n-dimensional manifold and consider all (n − 1)-dimensional planes tangent to it (i.e., contact elements). The set of all such contact elements forms a smooth submanifold of the (2n − 1)-dimensional manifold of all contact elements. The dimension of this submanifold is equal to n − 1, no matter what the dimension of the original submanifold (which could be (n − 1)-dimensional, or have smaller dimension, down to a curve or even a point).

This (n − 1)-dimensional submanifold of the (2n − 1)-dimensional manifold of all contact elements is tangent at each of its points to the field of contact hyperplanes (by the definition of contact hyperplane). Thus the field of (2n − 2)-dimensional contact hyperplanes has an (n − 1)-dimensional integral manifold.

Problem. Does this field of planes have integral manifolds of higher dimensions?

Answer. No.

Problem. Is it possible to give the field of contact hyperplanes by a differential 1-form on the manifold of all contact elements?

Answer. No, even if the underlying n-dimensional manifold is a euclidean space (for example, the ordinary two-plane).

We will show below that the field of contact hyperplanes on the (2n − 1)-dimensional manifold of all contact elements of an n-dimensional manifold is nondegenerate. The proof uses the symplectic structure of the cotangent bundle. The manifold of contact elements is related by a simple construction to the space of the cotangent bundle (the projectivization of which is the manifold of contact elements). Moreover, the nondegeneracy of the field of contact planes of the projectivized bundle is closely related to the nondegeneracy of the 2-form giving the symplectic structure of the cotangent bundle.

The construction we are concerned with will be carried out below in a somewhat more general situation. Namely, for any odd-dimensional manifold with a contact structure we can construct its “symplectification”—a symplectic manifold whose dimension is one larger. The inter-relation between these two manifolds—the odd-dimensional contact manifold and the even-dimensional symplectic manifold—is the same as between the manifold of contact elements with its contact structure and the cotangent bundle with its symplectic structure.

E Symplectification of a contact manifold

Consider an arbitrary contact manifold, i.e., a manifold of odd dimension N with a nondegenerate field of tangent hyperplanes (of even dimension N − 1). We will call these planes contact planes. Every contact plane is tangent to the contact manifold at one point. We will call this point the point of contact.

Definition. A contact form is a linear form on the tangent space at the point of contact of the manifold such that its zero set is the contact plane.

It should be emphasized that the contact form is not a differential form but an algebraic linear form on one tangent space.

Definition. The symplectification of a contact manifold is the set of all contact forms on the contact manifold, provided with the structure of a symplectic manifold as defined below.

We note first of all that the set of all contact forms on a contact manifold has a natural structure of a smooth manifold of even dimension N + 1. Namely, we can consider the set of all contact forms as the space of a bundle over the original contact manifold. Projection onto the base is the mapping associating the contact form to the point of contact.

The fiber of this bundle is the set of contact forms with a common point of contact. All such forms are obtained from one another by multiplication by a nonzero number (so that they determine the same contact plane). Thus the fiber of our bundle is one-dimensional: it is the line minus a point.

We also note that the group of nonzero real numbers acts on the manifold of all contact forms by the operation of multiplication, i.e., the product of a contact form and a nonzero number is again a contact form. In this way the group acts on our bundle, leaving every fiber fixed (upon multiplication of a form by a number the point of contact is not changed).

Remark. So far we have not used the nondegeneracy of the field of planes. Nondegeneracy is needed only to insure that the manifold obtained by symplectification is symplectic.

Example. Consider the manifold (of dimension 2n − 1) of all contact elements of an n-dimensional smooth manifold. On the manifold of elements there is a field of hyperplanes (which we defined above and called the contact hyperplanes). Therefore, we can symplectify the manifold of contact elements.

As a result of symplectification we obtain a 2n-dimensional manifold. This manifold is the space of the cotangent bundle of the original n-dimensional manifold without zero vectors. The action by the multiplicative group of real numbers on the fiber reduces to multiplication of vectors of the cotangent space by a number.

On the cotangent bundle there is a distinguished 1-form “p dq.” There is an analogous 1-form on any manifold obtained by symplectification from a contact manifold.

The canonical 1-form on the symplectified space

Definition. The canonical 1-form in the symplectified space of a contact manifold is the differential 1-form α whose value on any vector ξ tangent to the symplectified space at some point p (Figure 237) is equal to the value on the projection of the vector ξ onto the tangent plane to the contact manifold of the 1-form on this tangent plane which is the point p:

where π is the projection of the symplectified space onto the contact manifold.

Figure 237

Symplectification of a contact manifold

Theorem. The exterior derivative of the canonical 1-form on the symplectified space of a contact manifold is a nondegenerate 2-form.

Corollary. The symplectified space of a contact manifold has a symplectic structure which is canonically (i.e., uniquely, without arbitrariness) determined by the contact structure of the underlying odd-dimensional manifold.

Proof of Theorem. Since the assertions of the theorem are local, it is sufficient to prove it in a small neighborhood of a point of the manifold. In a small neighborhood of a point on a contact manifold, a field of contact planes can be given by a differential form ω on the contact manifold. We fix such a 1-form ω.

By the same token we can represent the symplectified space of the contact manifold over our neighborhood as the direct product of the neighborhood and the line minus a point. Namely, we associate to the pair (x, λ)—where x is a point of the contact manifold and λ is a nonzero number—the contact form given by the differential 1-form λω on the tangent space at the point x. Thus in the part of the symplectified space we are considering, we have defined a function λ whose values are nonzero numbers. It should be emphasized that λ is only a local coordinate on the symplectified manifold and that this coordinate is not defined canonically; it depends on the choice of differential 1-form ω. The canonical 1-form α can be written in our notation as
and does not depend on the choice of ω. The exterior derivative of the 1-form α thus has the form

We will show that the 2-form dα is nondegenerate, i.e., that for any vector ξ tangent to the symplectification, we can find a vector η such that dx (ξ, η) ≠ 0. We select from vectors tangent to the symplectification, those of the following type. We call a vector ξ vertical if it is tangent to the fiber, i.e., if π_{_{_✱}}ξ = 0. We call the vector ξ horizontal if it is tangent to a level surface of the function λ, i.e., if dλ(ξ) = 0. We call the vector ξ a contact vector if its projection onto the contact manifold lies in the contact plane, i.e., if ω(π_{_{_✱}}ξ) = 0 (in other words, if α(ξ) = 0).

We calculate the value of the form dα on a pair of vectors (ξ, η):
Assume that ξ is not a contact vector. For η, take a nonzero vertical vector, so that π_{_{_✱}}η = 0. Then the second term is equal to zero, and the first term is equal to
which is not zero since η is a nonzero vertical vector and ξ is not a contact vector. Thus if ξ is not a contact vector, we have found an η for which dα(ξ, η) ≠ 0.

Now assume that ξ is a contact vector and not vertical. Then for η we take any contact vector. Now the first term is entirely zero, and the second (and therefore the sum) is reduced to λ dω(π_{_{_✱}}ξ, π_{_{_✱}}η). Since ξ is not vertical, the vector π_{_{_✱}}ξ lying in the contact plane is not zero. But the 2-form dω is nondegenerate on the contact plane (by the definition of contact structure). Thus there is a contact vector η such that dω(π_{_{_✱}}ξ, π_{_{_✱}}η) ≠ 0. Since λ ≠ 0, we have found a vector η for which dα(ξ, η) ≠ 0.

Finally, if the vector ξ is nonzero and vertical, then for η we can take any vector which is not a contact vector. □

Remark. The constructions of the 1-form α and the 2-form dα are valid for an arbitrary manifold with a field of hyperplanes, and do not depend on the condition of nondegeneracy. However, the 2-form dα will define a symplectic structure only in the case when the field of planes is nondegenerate.

Proof. Assume that the field is degenerate, i.e., that there exists a nonzero vector ξ′ in a plane of the field such that dω(ξ′, η′) = 0 for all vectors η′ in this plane. For such a ξ′, the quantity dω(ξ′, η′) as a function of η′ is a linear form, identically equal to zero on the plane of the field. Therefore there is a number μ not dependent on η′ such that
for all vectors η′ of the tangent space.

We now take for ξ a tangent vector to the symplectified manifold for which π_{_{_✱}}ξ = ξ′. Such a vector ξ is determined up to addition of a vertical summand, and we will show that for a suitable choice of this summand we will have
The first term of the formula for dα is equal to dλ(ξ)ω(π_{_{_✱}}η) (since ω(π_{_{_✱}}ξ) = 0). The second term is equal to λ dω(π_{_{_✱}}ξ, π_{_{_✱}}η) = λμω(π_{_{_✱}}η). We choose the vertical component of the vector so that dλ(ξ) = −λµ. Then ξ will be skew-orthogonal to all vectors η.

Thus if dα is a symplectic structure, then the underlying field of hyperplanes is a contact structure. □

Corollary. The field of contact hyperplanes defines a contact structure on the manifold of all contact elements of any smooth manifold.

Proof. The symplectification of the (2n − 1)-dimensional manifold of all contact elements on an n-dimensional smooth manifold, constructed with help of the field of (2n − 2)-dimensional contact planes, is by construction the space of the cotangent bundle of the underlying n-dimensional manifold without the zero cotangent vectors. The canonical 1-form α on the symplectification is, by its definition, the same 1-form on the cotangent bundle that we called “p dq” and which is fundamental in hamilton mechanics (cf. Section 37). Its derivative dα is therefore the form “dp ∧ dq” defining the usual symplectic structure of a phase space. Therefore the form dα is nondegenerate, and, by the preceding remark, the field of contact hyperplanes is nondegenerate. □

F Contact diffeomorphisms and vector fields

Definition. A diffeomorphism of a contact manifold to itself is called a contact diffeomorphism if it preserves the contact structure, i.e., carries every plane of a given structure of a field of hyperplanes to a plane of the same field.

Example. Consider the (2n − 1)-dimensional manifold of contact elements of an n-dimensional smooth manifold with its usual contact structure. To each contact element we can ascribe a “positive side” by choosing one of the halves into which this element divides the tangent space to the n-dimensional manifold.

We will call a contact element with a chosen side a (transversally) oriented contact element.

The oriented contact elements on our n-dimensional manifold form a (2n − 1)- dimensional smooth manifold with a natural contact structure (it is a double covering of the manifold of ordinary nonoriented contact elements).

Now assume that we are given a riemannian metric on the underlying n-dimensional manifold. Then there is a “geodesic flow”¹⁰⁰ on the manifold of oriented contact elements. The transformation after time t by this flow is defined as follows. We go out from the point of contact of a contact element along the geodesic orthogonal to it and directed to the side orienting the element. In the course of time t we will move the point of contact along the geodesic, keeping the element orthogonal to the geodesic. After time t we obtain a new oriented element. We have defined the geodesic flow of oriented contact elements.

Theorem. The geodesic flow of oriented contact elements consists of contact diffeomorphisms.

The proof of this theorem will not be presented since it is just a reformulation in new terms of Huygens’ principle (cf. Section 46).

Definition. A vector field on a contact manifold is called a contact vector field if it is the velocity field of a one-parameter (local) group of contact diffeomorphisms.

Theorem. The Poisson bracket of contact vector fields is a contact vector field. The contact vector fields form a subalgebra in the Lie algebra of all smooth vector fields on a contact manifold.

The proof follows directly from the definitions.

G Symplectification of contact diffeomorphisms and fields

For every contact diffeomorphism of a contact manifold there is a canonically constructed symplectic diffeomorphism of its symplectification. This symplectic diffeomorphism commutes with the action of the multiplicative group of real numbers on the symplectified manifold and is defined by the following construction.

Recall that a point of the symplectified manifold is a contact form on the underlying contact manifold.

Definition. The image of a contact form p with point of contact x under the action of a contact diffeomorphism f of the contact manifold to itself is the form

In simple terms, we carry the form p from the tangent space at the point x to the tangent space at f(x) using the diffeomorphism f (whose derivative at x determines an isomorphism between these two tangent spaces). The form f_!p is a contact form since the diffeomorphism f is a contact diffeomorphism.

Theorem. The mapping f_! defined above of the symplectification of a contact manifold to itself is a symplectic diffeomorphism which commutes with the action of the multiplicative group of real numbers and preserves the canonical 1-form on the symplectification.

Proof. The assertion of the theorem follows from the fact that the canonical 1-form, the symplectic 2-form, and the action of the group of real numbers are all determined by the contact structure itself (for their construction we did not use coordinates or any other noninvariant tools), and the diffeomorphism f preserves the contact structure. It follows from this that f_! preserves all that which was invariantly constructed using the contact structure, in particular the 1-form α, its derivative dα, and the action of the group.

Theorem. Every symplectic diffeomorphism of the symplectification of a contact manifold which commutes with the action of the multiplicative group (1) projects onto the underlying contact manifold as a contact diffeomorphism and (2) preserves the canonical 1-form α.

Proof. Every diffeomorphism which commutes with the action of the multiplicative group projects onto some diffeomorphism of the contact manifold. To show that this is a contact diffeomorphism it is sufficient to prove the second assertion of the theorem (since only those vectors for which α(ξ) = 0 project onto the contact plane).

To prove the second assertion we express the integral of the form along any path γ in terms of the symplectic structure dα:
where the 2-chain σ(ε) is obtained from γ by multiplication by all numbers in the interval [ε, 1]. The boundary of σ contains, besides γ, two vertical intervals and the path εγ. The integrals of α over the vertical intervals are equal to zero, and the integral over εγ approaches 0 as ε does.

Now from the invariance of the 2-form dα and the commutativity of our diffeomorphism F with multiplication by numbers it follows that for any path γ
and thus the diffeomorphism F preserves the 1-form α. ☐

Definition. The symplectification of a contact vector field is defined by the following construction. Consider the field as a velocity field of a one-parameter group of contact diffeomorphisms. Symplectify the diffeomorphisms. Consider the velocity field of this group. It is called the symplectification of the original field.

Theorem. The symplectification of a contact vector field is a hamiltonian vector field. The hamiltonian can be chosen to be homogeneous of first order with respect to the action of multiplication by the group of real numbers:

Conversely, every hamiltonian field on a symplectified contact manifold, having a hamiltonian which is homogeneous of degree 1, projects onto the underlying contact manifold as a contact vector field.

Proof. The fact that symplectifications of contact diffeomorphisms are symplectic implies that the symplectification of a contact field is hamiltonian. The homogeneity of the hamiltonian follows from the homogeneity of symplectic diffeomorphisms (from commutativity with multiplication by λ). Thus the first assertion of the theorem follows from the theorem on symplectifications of contact diffeomorphisms. The second part follows in the same way from the theorem on homogeneous symplectic diffeomorphisms. ☐

Corollary. Symplectification of vector fields is an isomorphic map of the Lie algebra of contact vector fields onto the Lie algebra of all locally hamiltonian vector fields with hamiltonians which are homogeneous of degree 1.

The proof is clear.

H Darboux’s theorem for contact structures

Darboux’s theorem is a theorem on the local uniqueness of a contact structure. It can be formulated in any of the following three ways.

Theorem. All contact manifolds of the same dimension are locally contact diffeomorphic (i.e., there is a diffeomorphism of a sufficiently small neighborhood of any point of one contact manifold onto a neighborhood of any point of the other which carries the noted point of the first neighborhood to the noted point of the second and the field of planes in the first neighborhood to the field of planes in the second).

Theorem. Every contact manifold of dimension 2m − 1 is locally contact diffeomorphic to the manifold of contact elements of m-dimensional space.

Theorem. Every differential 1-form defining a nondegenerate field of hyperplanes on a manifold of dimension 2n + 1, can be written in some local coordinate system in the “normal form”

where x = (x₁,..., x_n), y = (y₁,..., y_n) and z are the local coordinates.

It is clear that the first two theorems follow from the third. We will deduce the third one from an analogous theorem of Darboux on the normal form of the 2-form giving a symplectic structure (cf. Section 43).

Proof of Darboux’s theorem. We symplectify our manifold. On this new (2n + 2)-dimensional symplectic manifold there are a canonical 1-form α, a nondegenerate 2-form dα, a projection π onto the underlying contact manifold and a vertical direction at every point.

The given differential 1-form ω on the contact manifold defines a contact form at every point. These contact forms form a (2n + 1)-dimensional submanifold of the symplectic manifold. The projection π maps this submanifold diffeomorphically onto the underlying contact manifold, and the verticals intersect this submanifold at a nonzero angle.

Consider a point in the surface just constructed (in the symplectic manifold) lying over the point of the contact manifold we are interested in. In the symplectic manifold we can choose a local system of coordinates near this point such that
and such that the coordinate surface p₀ = 0 coincides with our (2n + 1)-dimensional manifold (cf. Section 43, where in the proof of the symplectic Darboux’s theorem the first coordinate may be chosen arbitrarily).

We note now that the 1-form p₀ dq₀ + ⋯ p_n dq_n has derivative dα. Thus, locally,
where w is a function which can be taken to be zero at the origin. In particular, on the surface p₀ = 0 the form α takes the form

The projection π allows us to carry the coordinates p₁,..., p_n; q₀; q₁,..., q_n and the function w onto the contact manifold. More precisely, we define functions x, y, and z by the formulas
where A is a point on the surface p₀ = 0.

Then we obtain
and it remains only to verify that the functions (x₁,..., x_n; y₁,..., y_n; z) form a coordinate system. For this it is sufficient to verify that the partial derivative of w with respect to q₀ is not zero, or in other words that the 1-form α is not zero on a vector of the coordinate direction q₀. The latter is equivalent to the 2-form dα being nonzero on the pair of vectors: the basic vector in the direction of q₀ and the vertical vector.

But a vector in the coordinate direction q₀ is skew-orthogonal to all vectors of the coordinate plane p₀ = 0. If it was also skew-orthogonal to the vertical vector, then it would be skew-orthogonal to all vectors, which contradicts the nondegeneracy of dα. Thus ∂w/∂q₀ ≠ 0, and the theorem is proved. ☐

I Contact hamiltonians

Suppose that the contact structure of a contact manifold is given by a differential 1-form ω, and that this form is fixed.

Definition. The ω-embedding of the contact manifold into its symplectification is the map associating to a point of the contact manifold the restriction of the form ω on the tangent plane at this point.

Definition. The contact hamiltonian function of a contact vector field on a contact manifold with fixed 1-form ω is the function K on the contact manifold whose value at each point is the value of the homogeneous hamiltonian H of the symplectification of the field on the image of the given point under the ω-embedding:

Theorem. The contact hamiltonian function K of a contact vector field X on a contact manifold with a given 1-form ω is equal to the value of the form ω on this contact field:

Proof. We use the expression for the increment of the ordinary hamiltonian function over a path in terms of the vector field and the symplectic structure (Section 48C). For this we draw a vertical interval {λB}, 0 < λ ≤ 1, through the point B of the symplectification at which we want to calculate the hamiltonian function. The translations of this interval over small time τ under the action of the symplectified flow defined by our field X, fill out a two-dimensional region σ(τ). The value of the hamiltonian at the point B is equal to the limit
since H(λB) → 0 as λ → 0. But the integral of the form dα over the region is the integral of the 1-form α along the edge formed by the trajectory of the point B (the other parts of the boundary give zero integrals). Therefore, the double integral is simply the integral of the 1-form α along the interval of trajectories, and the limit is the value of α on the velocity vector Y of the symplectified field. Thus K(πB) = H(B) = α(Y) = ω(X), as was to be shown. ☐

J Computational formulas

Suppose now that we make use of the coordinates in Darboux’s theorem in which the form ω has the normal form

Problem. Find the components of the contact field with a given contact hamiltonian function K = K(x, y, z).

Answer. The equations of the contact flow have the form

Solution. A point of the symplectification can be given by the 2n + 2 numbers x_i, y_i, z, and λ, where (x, y, z) are the coordinates of a point of the contact manifold and λ is the number by which we must multiply ω to obtain the given point of the symplectified space.

In these coordinates α = λx dy + λ dz. Therefore, in the coordinate system p, q, where
the form α takes the standard form:
The action T_μ of the multiplicative group is now reduced to multiplication of p by a number:
The contact hamiltonian K can be expressed in terms of the ordinary hamiltonian H = H(p, q, p₀, q₀) by the formula
The function H is homogeneous of degree 1 in p. Therefore, the partial derivatives of K at the point (x, y, z) are related to the derivatives of H at the point (p = x, p₀ = 1, q = y, q₀ = z) by the relations
Hamilton’s equations with hamiltonian function H therefore have the following form at the point under consideration:
from which we obtain the answer above.

Problem. Find the contact hamiltonian of the Poisson bracket of two contact fields with contact hamiltonians K and K′.

Answer.

, where the brackets denote Poisson bracket in the variables x and y and E is the Euler operator EF = F − xF_x.

Solution. In the notation of the solution of the preceding problem we must express the ordinary Poisson bracket of the homogeneous hamiltonians H and H′ at the point (p = x, p₀ = 1, q = y, z₀ = z) in terms of the contact hamiltonians K and K′. We have
Substituting the values of the derivatives from the preceding problem, we find at the point under consideration

K Legendre manifolds

The lagrangian submanifolds of a symplectic phase space correspond in the contact case to an interesting class of manifolds which may be called Legendre manifolds since they are closely related to Legendre transformations.

Definition. A Legendre submanifold of a (2n + 1)-dimensional contact manifold is an n-dimensional integral manifold of the field of contact planes.

In other words, it is an integral manifold of the highest possible dimension for a nondegenerate field of planes.

Example 1. The set of all contact elements tangent to a submanifold of any dimension in an m-dimensional manifold is an (m − 1)-dimensional Legendre submanifold of the (2m − 1)-dimensional contact manifold of all contact elements.

Example 2. The set of all planes tangent to the graph of a function f = φ(x) in an (n + 1)-dimensional euclidean space with coordinates (x₁,..., x_n; f) is a Legendre submanifold of the (2n + 1)-dimensional space of all non-vertical hyperplane elements in the space of the graph (the contact structure is given by the 1-form

the element with coordinates (p, x, f) passes through the point with coordinates (x, f) parallel to the plane f = p₁x₁ + ⋯ + p_nx_n).

The Legendre transformation can be described in these terms in the following way.

Consider a second (2n + 1)-dimensional contact space with coordinates (P, X, F) and contact structure given by the form

The Legendre involution is the map taking a point of the first space with coordinates (p, x, f) to the point of the second space with coordinates

The Legendre involution, as can be easily calculated, carries the first contact structure to the second. Clearly, we have

Theorem. A diffeomorphism of one contact manifold onto another which carries contact planes to contact planes, carries every Legendre manifold to a Legendre manifold.

In particular, under the action of the Legendre involution the Legendre manifold of plane elements tangent to the graph of a function is carried into a new Legendre manifold. This new manifold is called the Legendre transform of the original manifold.

The projection of the new manifold onto the space with coordinates (X, F) (parallel to the P-direction) is in general not a smooth manifold, but has singularities. This projection is called the Legendre transform of the graph of the function φ.

If the function φ is convex, then the projection is itself the graph of a function F = Φ(X). In this case Φ is called the Legendre transform of the function φ.

As another example we consider the motion of oriented contact elements under the action of the geodesic flow on a riemannian manifold. As the “initial wave front” we take some smooth submanifold of our riemannian manifold (the dimension of the submanifold is arbitrary). The oriented contact elements tangent to this submanifold form a Legendre manifold in the space of all contact elements. From the preceding theorem we obtain

Corollary. The family of all elements tangent to a wave front is transformed under the action of the geodesic flow after time t to a Legendre manifold of the space of all contact elements.

It should be noted that this new Legendre manifold may not be the family of all elements tangent to some smooth manifold, since a wave front may develop singularities.

The Legendre singularities which arise in this way can be described in a manner similar to lagrangian singularities (cf. Appendix 12). A Legendre fibration of a (2n + 1)-dimensional contact manifold is a fibration all of whose fibers are n-dimensional Legendre manifolds. A Legendre singularity is a singularity of the projection of an n-dimensional Legendre submanifold of a (2n + 1)-dimensional contact manifold onto the (n + 1)-dimensional base of the Legendre fibration.

Consider the space ℝ^{2n + 1} with contact structure given by the form α = x dy + dz, where x = (x₁,..., x_n) and y = (y₁,..., y_n). The projection (x, y, z) → (y, z) gives a Legendre fibration.

An equivalence of Legendre fibrations is a diffeomorphism of the total spaces of the fibrations carrying the contact structure and fibers of the first bundle to the contact structure and fibers of the second bundle. It can be shown that every Legendre bundle is equivalent to the special bundle just described in a neighborhood of every point of the space of the bundle.

The contact structure of the total space of fibration gives the fibers a local structure of a projective space. Legendre equivalence preserves this structure, i.e., defines locally projective fiber transformations.

The following theorem allows us to locally describe Legendre submanifolds and maps by using generating functions.

Theorem. For any partition I + J of the set of indices (1,..., n) into two disjoint subsets and for any function S(x_I, y_J) of n variables x_i, i ∈ I, j ∈ J, the formulas

define a Legendre submanifold of ℝ^{2n + 1}. Conversely, every Legendre submanifold of ℝ^{2n + 1} is defined in a neighborhood of every point by these formulas for at least one of the 2ⁿ possible choices of the subset I.

The proof is based on the fact that, on a Legendre manifold, dz + x dy = 0, so d(z + x_I y_I) = y_I dx_I − x_J dy_J. ☐

In the formulas of the preceding theorem, we replace S by a function from the list of the simple lagrangian singularities given in Appendix 12. We obtain Legendre singularities which are preserved under small deformations of the Legendre mapping (x, y, z) → (y, z) (i.e., are carried to equivalent singularities for small deformations of the function S). Every Legendre mapping for n < 6 can be approximated by a map, all of whose singularities are locally equivalent to singularities from the list A_k (1 ≤ k ≤ 6), D_k (4 ≤ k ≤ 6), E₆.

In particular, we obtain a list of the singularities of a wave front in general position in spaces of dimension less than 7.

In ordinary three-space this list is as follows:

where I = {1}, J = {2}, and n = 2.

The projections of the Legendre manifolds indicated here onto the base of the Legendre bundle (i.e., onto the space with coordinates y₁, y₂, and z) are: a simple point in the case of A₁, a cuspidal edge in the case of A₂, and a swallowtail (cf. Figure 246) in the case of A₃.

Thus a wave front in general position in three-space has only cusps and “swallowtail” points as singularities. At isolated moments of time during the motion of the front we can observe transitions of the three types

and

(cf. Appendix 12, where the corresponding caustics filled out by the singularities of the front during its motion are drawn).

Problem 1. Lay out an interval of length t on every interior normal to an ellipse in the plane. Draw the curve obtained and investigate its singularities and its transitions as t changes.

Problem 2. Do the same thing for a triaxial ellipsoid in three-dimensional space.

L Contactification

Along with symplectification of contact manifolds, there is a contactification of symplectic manifolds with symplectic structure cohomologous to zero.

The contactification E^{2n + 1} of the symplectic manifold (M²ⁿ, ω²) is constructed as the space of a bundle with fiber ℝ over M²ⁿ. Let U be a sufficiently small neighborhood of a point x in M, so that there is a canonical coordinate system p, q on U with ω = dp ∧ dq. Consider the direct product U × ℝ with coordinates p, q, z. Let V × ℝ be the same kind of product constructed on another (or the same) neighborhood V, with coordinates P, Q, Z; dP ∧ dQ = ω. If the neighborhoods U and V on M intersect, then we identify the fibers above the points of intersection in both representations so that the form dz + p dq = dZ + P dQ = α is defined on the whole (this is possible since P dQ − p dq is a total differential on U ∩ V).

It is easy to verify that after this pasting together we have a bundle E^{2n + 1} on M²ⁿ and that the form α defines a contact structure on E. The manifold E is called the contactification of the symplectic manifold M. If the cohomology class of the form ω² is integral, then we can define a contactification with fiber S¹.

M Integration of first-order partial differential equations

Let M^{2n + 1} be a contact manifold, and E²ⁿ a hypersurface in M^{2n + 1}. The contact structure on M defines some geometric structure on E—in particular, the field of so-called characteristic directions. An analysis of this geometric structure can reduce the integration of general first-order nonlinear partial differential equations to the integration of a system of ordinary differential equations.

We assume that the manifold E²ⁿ is transverse to the contact planes at all its points. In this case, the intersection of the tangent plane to E²ⁿ at each of its points with the contact plane has dimension 2n − 1, so that we have a field of hyperplanes on E²ⁿ. Furthermore, the contact structure on M^{2n + 1} defines on E²ⁿ a field of lines lying in these (2n − 1)-dimensional planes.

In fact, let α be a 1-form on M^{2n + 1} locally giving the contact structure; let ω = dα and let ℝ²ⁿ be a contact plane at the point x in E²ⁿ. Let Φ = 0 be the local equation of E²ⁿ (so dΦ is not zero at x). The restriction of dΦ to ℝ²ⁿ defines a nonzero linear form on ℝ²ⁿ. The 2-form ω gives ℝ²ⁿ the structure of a symplectic vector space and thus an isomorphism of this space with its dual. The nonzero 1-form dΦ|_ℝ²ⁿ corresponds to a nonzero vector ξ of ℝ²ⁿ, so that dΦ(·) = ω(ξ, ·). The vector ξ is called the characteristic vector of the manifold E²ⁿ at the point x. The characteristic vector ξ lies in the intersection of ℝ²ⁿ with the tangent plane to E²ⁿ, so that dΦ(ξ) = 0.

The vector ξ is not uniquely defined by the manifold E²ⁿ and the contact structure on M, but only up to multiplication by a nonzero number. In fact, like the 2-form ω on ℝ²ⁿ, the 1-form dΦ on ℝ²ⁿ is defined only up to multiplication by a nonzero number.

The direction of the characteristic vector (i.e., the line containing it) is determined uniquely by the contact structure at every point of the manifold E. Thus we have a field of characteristic directions on the hypersurface E of the contact manifold M. The integral curves of this field of directions are called the characteristics.

Now suppose we are given an (n − 1)-dimensional submanifold I of our hypersurface E²ⁿ, which is integral for the contact field (so that the tangent plane to I at each point is contained in the contact plane).

Theorem. If at a point x of I the characteristic on E²ⁿ is not tangent to I, then in a neighborhood of the point x the characteristics on E²ⁿ passing through points of I form a Legendre submanifold Lⁿ in M^{2n + 1}.

Proof. Let ξ be a vector field on E²ⁿ made up of characteristic vectors. By the homotopy formula (cf. Section 36G) we have on E²ⁿ

But i_ξα = 0 since the characteristic vector belongs to the contact plane. Therefore, on E²ⁿ we have L_ξα = i_ξω. But the 1-form i_ξω is zero on the intersection of the tangent plane to E²ⁿ with the contact plane (since on the contact plane i_ξω = dΦ, and on the tangent plane dΦ = 0). Therefore, on the tangent plane to E²ⁿ we have i_ξω = cα. Thus on the hypersurface E,

(where c is a function smooth in a neighborhood of x).

Now let {g^t} be the (local) phase flow of the field ξ and η a vector tangent to E²ⁿ. Set

and y(t) = α(η(t)). Then the function y satisfies the linear differential equation

If η(0) is tangent to I, then y(0) = α(η(0)) = 0. This means y(t) = α(η(t)) = 0, i.e., for all t, η(t) lies in the contact plane. Therefore, g^tI is an integral manifold of the contact field. Therefore the manifold formed by all {g^tI} for small t is a Legendre manifold. ☐

Example. Consider ℝ^{2n + 1} with coordinates x₁,..., x_n; p₁,..., p_n; u with contact structure defined by the 1-form α = du − p dx. A function Φ(x, p, u) defines a differential equation Φ(x, ∂u/∂x, u) = 0 and a submanifold E = Φ⁻¹(0) in the space ℝ^{2n + 1} (called the space of 1-jets of functions on ℝⁿ).

An initial condition for the equation Φ = 0 is an assignment of a value f to the function u on an (n − 1)-dimensional hypersurface Γ in the n-dimensional space with coordinates x₁,..., x_n.

An initial condition determines the derivatives of u in the n − 1 independent directions at each point of Γ. The derivative in a direction transverse to Γ can generally be found from the equation; if the conditions of the implicit function theorem are fulfilled, then the initial condition is called noncharacteristic.

A noncharacteristic initial condition defines an (n − 1)-dimensional integral submanifold I of the form α (the graph of the mapping u = f(x), p = p(x), x ∈ Γ). The characteristics on E intersecting I form a Legendre submanifold of ℝ^{2n + 1}, the graph of the mapping u = u(x), p = ∂u/∂x. The function u(x) is a solution of the equation Φ(x, ∂u/∂x, u) = 0 with initial condition u|_Γ = f.

Note that to find the function u we need only solve the system of 2n first-order ordinary differential equations for the characteristics on E, and perform a series of “algebraic” operations.

⁹⁸

A hyperplane in a vector space is a subspace of dimension 1 less than the dimension of the space (i.e., the zero level set of a linear function which is not identically zero). A tangent hyperplane is a hyperplane in a tangent space.

⁹⁹

From now on, we will omit the prefix “hyper-”. If we wish, we may assume that we are in three-dimensional space and a hypersurface is an ordinary surface. The higher-dimensional case is analogous to the three-dimensional case.

¹⁰⁰

Strictly speaking, we need to require that the riemannian manifold be complete, i.e., geodesics can be continued without limit.

Appendix 5: Dynamical systems with symmetries

By the theorem of E. Noether, one-parameter groups of symmetries of a dynamical system determine first integrals. If a system admits a larger group of symmetries, then there are several integrals. Simultaneous level manifolds of these first integrals in the phase space are invariant manifolds of the phase flow. The subgroup of the group of symmetries mapping such an invariant manifold into itself acts on the manifold. In many cases, we can look at the quotient manifold of an invariant manifold by this subgroup. This quotient manifold, called the reduced phase space, has a natural symplectic structure. The original hamiltonian dynamical system induces a hamiltonian system on the reduced phase space.

The partition of the phase space into simultaneous level manifolds generally has singularities. An example is the partition of a phase plane into energy level curves.

In this appendix we will briefly discuss dynamical systems in reduced phase space and their relationship with invariant manifolds in the original space. All these questions were investigated by Jacobi and Poincaré (“elimination of the nodes” in the many-body problem, “reduction of order” in systems with symmetries, “stationary rotations” of rigid bodies, etc.). A detailed presentation in current terminology can be found in the following articles: S. Smale, “Topology and mechanics,” Inventiones Mathematicae 10:4 (1970) 305–331, 11:1 (1970), 45–64; and J. Marsden and A. Weinstein, “Reduction of symplectic manifolds with symmetries,” Reports on Mathematical Physics 5 (1974) 121–130.

A Poisson action of Lie groups

Consider a symplectic manifold (M²ⁿ, ω²) and suppose a Lie group G acts on it as a group of symplectic diffeomorphisms. Every one-parameter subgroup of G then acts as a locally hamiltonian phase flow on M. In many important cases, these flows have single-valued hamiltonian functions.

Example. Let V be a smooth manifold and G some Lie group of diffeomorphisms of V. Since every diffeomorphism takes 1-forms on V to 1-forms, the group G acts on the cotangent bundle M = T*V.

Recall that on the cotangent bundle there is always a canonical 1-form α (“pdq”) and a natural symplectic structure ω = dα. The action of the group G on M is symplectic since it preserves the 1-form α and hence also the 2-form dα.

A one-parameter subgroup {g^t} of G defines a phase flow on M. It is easy to verify that this phase flow has a single-valued hamiltonian function. In fact, the hamiltonian function is given by the formula from Noether’s theorem:

We now assume that we are given a symplectic action of a Lie group G on a connected symplectic manifold M such that, to every element a of the Lie algebra of G, there corresponds a one-parameter group of symplectic diffeomorphisms with a single-valued hamiltonian H_a. These hamiltonians are determined up to the addition of constants which can be chosen so that the dependence of H_a upon a is linear. To do this, it is sufficient to choose arbitrarily the constants in the hamiltonians for a set of basis vectors of the Lie algebra of G, and to then define the hamiltonian function for each element of the algebra as a linear combination of the basis functions.

Thus, given a symplectic action of a Lie group G and a single-valued hamiltonian on M, we can construct a linear mapping of the Lie algebra of G into the Lie algebra of hamiltonian functions on M. The function H_{[a, b]} associated to the commutator of two elements of the Lie algebra is equal to the Poisson bracket (H_a, H_b), or else it differs from this Poisson bracket by a constant:

Remark. The appearance of the constant C in this formula is a consequence of an interesting phenomenon: the existence of a two-dimensional cohomology class of the Lie algebra of (globally) hamiltonian fields.

The quantity C(a, b) is a bilinear skew-symmetric function on the Lie algebra. The Jacobi identity gives us
A bilinear skew-symmetric function on a Lie algebra with this property is called a two-dimensional cocycle of the Lie algebra.

If we choose the constants in the hamiltonian functions differently, then the cocycle C is replaced by C′, where
where ρ is a linear function on the Lie algebra. Such a cocycle C′ is said to be cohomologous to the cocycle C. A class of cocycles which are cohomologous to one another is called a cohomology class of the Lie algebra.

Thus, a symplectic action of a group G for which single-valued hamiltonians exist defines a two-dimensional cohomology class of the Lie algebra of G. This cohomology class measures the deviation of the action from one in which the hamiltonian function of a commutator can be chosen equal to the Poisson bracket of the hamiltonian functions.

Definition. An action of a connected Lie group on a symplectic manifold is called a Poisson action if the hamiltonian functions for one-parameter groups are single-valued, and chosen so that the hamiltonian function depends linearly on elements of the Lie algebra and so that the hamiltonian function of a commutator is equal to the Poisson bracket of the hamiltonian functions:

In other words, a Poisson action of a group defines a homomorphism from the Lie algebra of this group to the Lie algebra of hamiltonian functions.

Example. Let V be a smooth manifold and G a Lie group acting on V as a group of diffeomorphisms. Let M = T*V be the cotangent bundle of the manifold V with the usual symplectic structure ω = dα. The hamiltonian functions of one-parameter groups are defined as above:
(1)

Theorem. This action is Poisson.

Proof. By definition of the 1-form α, the hamiltonian functions H_a are linear “in p” (i.e., on every cotangent space). Therefore, their Poisson brackets are also linear. Thus the function H_{[a, b]} − (H_a, H_b) is linear in p. Since it is constant, it is equal to zero. ☐

In the same way, we can show that the symplectification of any contact action is a Poisson action.

Example. Let V be three-dimensional euclidean space and G the six-dimensional group of its motions. The following six one-parameter groups form a basis of the Lie algebra: the translations with velocity 1 along the coordinate axes q₁, q₂, and q₃ and the rotations with angular velocity 1 around these axes. By formula (1), the corresponding hamiltonian functions are (in the usual notation) p₁, p₂, p₃; M₁, M₂, M₃, where M₁ = q₂ p₃ − q₃ p₂, etc. The theorem implies that the pairwise Poisson brackets of these six functions are equal to the hamiltonian functions of the commutators of the corresponding one-parameter groups.

A Poisson action of a group G on a symplectic manifold M defines a mapping of M into the dual space of the Lie algebra of the group

That is, we fix a point x in M and consider the function on the Lie algebra which associates to an element a of the Lie algebra the value of the Hamiltonian H_a at the fixed point x:

This p_x is a linear function on the Lie algebra and is the element of the dual space to the algebra associated to x:

Following Souriau (Structure des systèmes dynamiques, Dunod, 1970), we will call the mapping P the momentum. Note that the value of the momentum is always a vector in the space

Example. Let V be a smooth manifold, G a Lie group acting on V as a group of diffeomorphisms, M = T*V the cotangent bundle and H_a the hamiltonian functions constructed above of the action of G on M (cf. (1)).

Then the “momentum” mapping can be described in the following way. Consider the map Φ: G → M given by the action of all the elements of G on a fixed point x in M (so Φ(g) = gx). The canonical 1-form α on M induces a 1-form Φ*α on G. Its restriction to the tangent space at the identity of G is a linear form on the Lie algebra.

Thus to every point x in M we have associated a linear form on the Lie algebra. It is easy to verify that this mapping is the momentum of our Poisson action.

In particular, if V is euclidean three-space and G is the group of rotations around the point 0, then the values of the momentum are the usual vectors of angular momentum; if G is the group of rotations around an axis, then the values of the momentum are the angular momenta relative to this axis; if G is the group of parallel translations, then the values of the momentum are the vectors of linear momentum.

Theorem. Under the momentum mapping P, a Poisson action of a connected Lie group G is taken to the co-adjoint action of G on the dual space

of its Lie algebra (cf. Appendix 2), i.e., the following diagram commutes:

Corollary. Suppose that a hamiltonian function H : M → ℝ is invariant under the Poisson action of a group G on M. Then the momentum is a first integral of the system with hamiltonian function H.

Proof of the theorem The theorem asserts that the hamiltonian function H_a of the one-parameter group h^t is carried over by the diffeomorphism g to the hamiltonian function H_{Ad_ga} of the one-parameter group gh^tg^{− 1}.

Let g^s be a one-parameter group with hamiltonian function H_b. It is sufficient to show that the derivatives with respect to s (for s = 0) of the functions H_a(g^sx) and H_{Ad_gs_a}(x) are the same. The first of these derivatives is the value at x of the Poisson bracket (H_a, H_b). The second is H_{[a, b]}(x). Since the action is Poisson, the theorem is proved. ☐

Proof of the corollary. The derivative, in the direction of the phase flow with hamiltonian function H, of each component of the momentum is zero, since it is equal to the derivative of function H in the direction of the phase flow corresponding to a one-parameter subgroup of G. ☐

B The reduced phase space

Suppose that we are given a Poisson action of a group G on a symplectic manifold M. Consider a level set of the momentum, i.e., the inverse image of some point

under the map P. We denote this set by M_p, so that (Figure 238)

In many important cases the set M_p is a manifold. For example, this will be so if p is a regular value of the momentum, i.e., if the differential of the map P at each point of the set M_p maps the tangent space to M onto the whole tangent space to

In general, a Lie group G acting on M takes the sets M_p into one another. However, the stationary subgroup of a point p in the co-adjoint representation (i.e., the subgroup consisting of those elements g of the group G for which

) leaves M_p fixed. We denote this stationary subgroup by G_p. The group G_p is a Lie group, and it acts on the level set M_p of the momentum.

Figure 238

Reduced phase space

The reduced phase space is obtained from M_p by factoring by the action of the group G_p. In order for such a factorization to make sense, it is necessary to make several assumptions. For example, it is sufficient to assume that

p is a regular value, so that M_p is a manifold,

The stationary subgroup G_p is compact, and

The elements of the group G_p act on M_p without fixed points.

Remark. These conditions can be weakened. For example, instead of compactness of the group G_p we can require that the action be proper (i.e., that the inverse images of compact sets under the mapping (g, x) → (g(x), x)are compact). For example, the actions of a group on itself by left and right translation are always proper.

If conditions (1), (2), and (3) are satisfied, then it is easy to give the set of orbits of the action of G_p on M_p the structure of a smooth manifold. Namely, a chart on a neighborhood of a point x ∈ M_p is furnished by any local transversal to the orbit G_px, whose dimension is equal to the codimension of the orbit.

The resulting manifold of orbits is called the reduced phase space of a system with symmetry.

We will denote the reduced phase space corresponding to a value of the momentum by F_p. The manifold F_p is the base space of the bundle π: M_p → F_p with fiber diffeomorphic to the group G_p.

There is a natural symplectic structure on the reduced phase space F_p. Namely, consider any two vectors ξ and η tangent to F_p at the point f. The point f is one of the orbits of the group G_p on the manifold M_p. Let x be one of the points of this orbit. The vectors ξ and η tangent to F_p are obtained from some vectors ξ′ and η′ tangent to M_p at some point x by the projection π: M_p → F_p.

Definition. The skew-scalar product of two vectors ξ and η which are tangent to a reduced phase space at the same point, is the skew-scalar product of the corresponding vectors ξ′ and η′, tangent to the original symplectic manifold M:

Theorem.¹⁰¹ The skew-scalar product of the vectors ξ and η does not depend on the choices of the point x and representatives ξ′ and η′, and gives a symplectic structure on the reduced phase space.

Corollary. The reduced phase space is even-dimensional.

Proof of the theorem. We look at the following two spaces in the tangent space to M at x:

T(M_p), the tangent space to the level manifold M_p, and
T(G_x), the tangent space to the orbit of the group G.

Lemma. These two spaces are skew-orthogonal complements to one another in TM.

Proof. A vector ζ lies in the skew-orthogonal complement to the tangent plane of an orbit of the group G if and only if the skew-scalar product of the vector ζ with velocity vectors of the hamiltonian flow of the group G is equal to zero (by definition). But these skew-scalar products are equal to the derivatives of the corresponding hamiltonian functions in the direction ζ. Therefore, the vector ζ lies in the skew-orthogonal complement to the orbit of G if and only if the derivative of the momentum in the direction ζ is equal to zero, i.e., if ζ lies in T(M_p). ☐

The representatives ξ′ and η′ are defined up to addition of a vector from the tangent plane to the orbit of the group G_p. But this tangent plane is the intersection of the tangent planes to the orbit Gx and to the manifold M_p (by the last theorem of part A). Consequently, the addition to ξ′ of a vector from T(G_px) does not change the skew-scalar product with any vector η′ from T(M_p) (since by the lemma T(G_px) is skew-orthogonal to T(M_p)). Thus, we have shown the independence from the representatives ξ′ and η′.

The independence of the quantity [ξ, η]_p from the choice of the point x of the orbit f follows from the symplectic nature of the action of the group G on M and the invariance of M_p. Thus we have defined a differential 2-form on F_p:

It is nondegenerate, since if [ξ, η]_p = 0 for every η, then the corresponding representative ξ′ is skew-orthogonal to all vectors in T(M_p). Therefore, ξ′ must be the skew-orthogonal complement to T(M_p) in T M. Then by the lemma ξ′ ∈ T(Gx), i.e., ξ = 0.

The form Ω_p is closed. In order to verify this we consider a chart, i.e., a piece of submanifold in M_p, transversally intersecting the orbit of the group G_p in one point.

The form Ω_f is represented in this chart by a 2-form induced from the 2-form ω which defines the symplectic structure in the whole space M, by means of the embedding of the submanifold piece. Since the form ω is closed, the induced form is also closed. The theorem is proved. ☐

Example 1. Let M = ℝ²ⁿ be euclidean space of dimension 2n with coordinates p_k, q_k and 2-form ∑ dp_k ⋀ dq_k. Let G = S¹ be the circle, and let the action of G on M be given by the hamiltonian of a harmonic oscillator

Then the momentum mapping is simply H: ℝ²ⁿ → ℝ, a nonzero momentum level manifold is a sphere S^{2n − 1}, and the quotient space is the complex projective space ℂP^{n − 1}.

The preceding theorem defines a symplectic structure on this complex projective space. It is easy to verify that this structure coincides (up to a multiple) with the one we constructed in Appendix 3.

Example 2. Let V be the cotangent bundle of a Lie group, G the same group and the action defined by left translation. Then M_p is a submanifold of the cotangent bundle of G, formed by those vectors which, after right translation to the identity of the group, define the same element in the dual space to the Lie algebra.

The manifolds M_p are diffeomorphic to the group itself and are right-invariant cross-sections of the cotangent bundle. All the values p are regular.

The stationary subgroup G_p of the point p consists of those elements of the group for which left and right translation of p give the same result. The actions of elements different from the identity of G_p on M_p have no fixed points (since there are none by right translation of the group onto itself).

The group G_p acts properly (cf. remark above). Consequently, the space of orbits of the group G_p on M_p is a symplectic manifold.

But this space of orbits is easily identified with the orbit of the point p in the co-adjoint representation. Actually, we map the right-invariant section M_p of the cotangent bundle into the cotangent space to the group at the identity with left translations. We get a mapping

The image of this mapping is the orbit of the point p in the co-adjoint representation, and the fibers are the orbits of the action of the group G_p. The symplectic structure of the reduced phase space thus defines a symplectic structure in the orbits of the co-adjoint representation.

It is not hard to verify by direct calculation that this is the same structure which we discussed in Appendix 2.

Example 3. Let the group G = S¹, the circle, and let it act without fixed points on a manifold V. Then there is an action of the circle on the cotangent bundle M = T*V. We can define momentum level manifolds M_p (of codimension 1 in M) and quotient manifolds F_p (the dimension of which is 2 less than the dimension of M).

In addition, we can construct a quotient manifold of the configuration space V by identifying the points of each orbit of the group on V. We denote this quotient manifold by W.

Theorem. The reduced phase space F_p is symplectic and diffeomorphic to the cotangent bundle of the quotient configuration manifold W.

Proof. Let π: V → W be the factorization map, and ω ∈ T*W a 1-form on W at the point w = πv. The form π*ω on V at the point v belongs to M₀ and projects to a point in the quotient F₀. Conversely, the elements of F₀ are the invariant 1-forms on V which are equal to zero on the orbits; they define 1-forms in W. We have constructed a mapping T*W → F₀; it is easy to see that this is a symplectic diffeomorphism.

The case p ≠ 0 is reduced to the case p = 0 as follows. Consider a riemannian metric on V, invariant with respect to G. The intersection of M_p with the cotangent plane to V at the point v is a hyperplane. The quadratic form defined by the metric has a unique minimum point S(v) in this hyperplane. Subtraction of the vector S(v) carries the hyperplane M_p ∩ T*V_v into M₀ ∩ T*V_v, and we obtain a possibly nonsymplectic diffeomorphism F_p → F₀.

The difference between the symplectic structures on T*W induced by that of F_p and F₀ is a 2-form, induced by a 2-form on W. ☐

C Applications to the study of stationary rotations and bifurcations of invariant manifolds

Suppose that we are given a Poisson action of a group G on a symplectic manifold M; let H be a function on M invariant under G. Let F_p be a reduced phase space (we assume that the conditions under which this can be defined are satisfied).

The hamiltonian field with hamiltonian function H is tangent to every momentum level manifold M_p (since momentum is a first integral). The induced field on M_p is invariant with respect to G_p and defines a field on the reduced phase space F_p. This vector field on F_p will be called the reduced field.

Theorem. The reduced field on the reduced phase space is hamiltonian. The value of the hamiltonian function of the reduced field at any point of the reduced phase space is equal to the value of the original hamiltonian function at the corresponding point of the original phase space.

Proof. The relation defining a hamiltonian field X_H with hamiltonian H on a manifold M with form ω
implies an analogous relation for the reduced field in view of the definition of the symplectic structure on F_p. ☐

Example. Consider an asymmetric rigid body, fixed at a stationary point, under the action of the force of gravity (or any potential force symmetric with respect to the vertical axis).

The group S¹ of rotations with respect to a vertical line acts on the configuration space SO(3). The hamiltonian function is invariant under rotations, and therefore we obtain a reduced system on the reduced phase space.

The reduced phase space is, in this case, the cotangent bundle of the quotient configuration space (cf. Example 3 above). Factorization of the configuration space by the action of rotations around the vertical axis was done by Poisson in the following way.

We will specify the position of the body by giving the position of an orthonormal frame (e₁, e₂, e₃). The three vertical components of the basic vectors give a vector in three-dimensional euclidean space. The length of this vector is 1 (why?). This Poisson vector¹⁰² γ determines the original frame up to rotations around a vertical line (why?).

Thus the quotient configuration space is represented by a two-dimensional sphere S², and the reduced phase space is the cotangent bundle T*S² with a nonstandard symplectic structure. The reduced hamiltonian function on the cotangent bundle is represented as the sum of the “kinetic energy of the reduced motion,” which is quadratic in the cotangent vectors, and the “effective potential” (the sum of the potential energy and the kinetic energy of rotation around a vertical line).

The transition to the reduced phase space in this case is almost by “elimination of the cyclic coordinate φ.” The difference is that the usual procedure of elimination requires that the configuration or phase space be a direct product by the circle, whereas in our case we have only a bundle. This bundle can be made a direct product by decreasing the size of the configuration space (i.e., by introducing coordinates with singularities at the poles); the advantage of the approach above is that it makes it clear that there are no real singularities (except singularities of the coordinate system) near the poles.

Definition. The phase curves in M which project to equilibrium positions in the reduced system on the reduced phase space F_p are called the relative equilibria of the original system.

Example. Stationary rotations of a rigid body which is fixed at its center of mass are relative equilibria. In the same way, rotations of a heavy rigid body with constant speed around the vertical axis are relative equilibria.

Theorem. A phase curve of a system with a G-invariant hamiltonian function is a relative equilibrium if and only if it is the orbit of a one-parameter subgroup of G in the original phase space.

Proof. It is clear that a phase curve which is an orbit projects to a point. If a phase curve x(t) projects to a point, then it can be expressed uniquely in the form x(t) = g(t)x(0), and it is then easy to see that {g(t)} is a subgroup. ☐

Corollary 1. An asymmetrical rigid body in an axially symmetric potential field, fixed at a point on the axis of the field, has at least two stationary rotations (for every value of the angular momentum with respect to the axis of symmetry).

Corollary 2. An axially symmetric rigid body fixed at a point on the axis of symmetry, has at least two stationary rotations (for every value of the angular momentum with respect to the axis of symmetry).

Both corollaries follow from the fact that a function on the sphere has at least two critical points.

Another application of relative equilibria is that they can be used to investigate modifications of the topology of invariant manifolds under changes of the energy and momentum values.

Theorem. The critical points of the momentum and energy mapping

on a regular momentum level set are exactly the relative equilibria.

Proof. The critical points of the mapping P × H are the conditional extrema of H on the momentum level manifold M_p (since this level manifold is regular, i.e., for every x in M_p, we have ).

After factorization by G_p, the conditional extrema of H on M_p define the critical points of the reduced hamiltonian function (since H is invariant under G_p). ☐

The detailed study of relative equilibria and singularities of the energy-momentum mapping is not simple and has not been completely carried out, even in the classical problem of the motions of an asymmetrical rigid body in a gravitational field. The case when the center of gravity lies on one of the principal axes of inertia is treated in the supplement written by S. B. Katok to the Russian translation¹⁰³ of the article by S. Smale cited in the beginning of this appendix. In this problem the dimension of the phase space is six, and the group is the circle; the reduced phase space T*S² is four-dimensional.

The nonsingular energy level manifolds in the reduced phase space are (depending on the values of momentum and energy) of the following four forms: S³, S² × S¹, ℝP³, and a “pretzel” obtained from the three-sphere S³ by attaching two “handles” of the form

¹⁰¹

The theorem was first formulated in this form by Marsden and Weinstein. Many special cases have been considered since the time of Jacobi and used by Poincaré and his successors in mechanics, by Kirillov and Kostant in group theory, and by Faddeev in the general theory of relativity.

¹⁰²

Poisson showed that the equations of motion of a heavy rigid body can be written in terms of γ in a remarkably simple form, the “Euler-Poisson equations”:

¹⁰³

Uspekhi Matematicheskikh Nauk 27, no. 2 (1972) 78–133.

Appendix 6: Normal forms of quadratic hamiltonians

In this appendix we give a list of normal forms to which we can reduce a quadratic hamiltonian function by means of a real symplectic transformation. This list was composed by D. M. Galin based on the work of J. Williamson in “On an algebraic problem concerning the normal forms of linear dynamical systems,” Amer. J. of Math. 58, (1936), 141–163. Williamson’s paper gives the normal forms to which a quadratic form in a symplectic space over any field can be reduced.

A Notation

We will write the hamiltonian as

where x = (p₁,..., p_n; q₁,..., q_n) is a vector written in a symplectic basis and A is a symmetric linear operator. The canonical equations then have the form

By the eigenvalues of the hamiltonian we will mean the eigenvalues of the linear infinitesimally-symplectic operator IA. In the same way, by a Jordan block we will mean a Jordan block of the operator IA.

The eigenvalues of the hamiltonian are of four types: real pairs (a, −a), purely imaginary pairs (ib, − ib), quadruples (±a ±ib), and zero eigenvalues.

The Jordan blocks corresponding to the two members of a pair or four members of a quadruple always have the same structure.

In the case when the real part of an eigenvalue is zero, we have to distinguish the Jordan blocks of even and odd order. There are an even number of blocks of odd order with zero eigenvalue and they can be naturally divided into pairs.

A complete list of normal forms follows.

B Hamiltonians

For a pair of Jordan blocks of order k with eigenvalues ±a, the hamiltonian is

For a quadruple of Jordan blocks of order k with eigenvalues ±a ±bi the hamiltonian is

For a pair of Jordan blocks of order k with eigenvalue zero the hamiltonian is

For a Jordan block of order 2k with eigenvalue zero, the hamiltonian is of one of the following two inequivalent types:

(for

For a pair of Jordan blocks of odd order 2k + 1 with purely imaginary eigenvalues ±bi, the hamiltonian is of one of the following two inequivalent types:

For

For a pair of Jordan blocks of even-order 2k with purely imaginary eigenvalues +bi, the hamiltonian is of one of the following two inequivalent types:

Williamson’s theorem. A real symplectic vector space with a given quadratic form H can be decomposed into a direct sum of pairwise skew orthogonal real symplectic subspaces so that the form H is represented as a sum of forms of the types indicated above on these subspaces.

C Nonremovable Jordan blocks

An individual hamiltonian in “general position” does not have multiple eigenvalues and reduces to a simple form (all the Jordan blocks are of first order). However, if we consider not an individual hamiltonian but a whole family of systems depending on parameters, then for some exceptional values of the parameters more complicated Jordan structures can arise. We can get rid of some of these by a small change of the family; others are nonremovable and only slightly deformed after a small change of the family. If the number l of parameters of the family is finite, then the number of nonremovable types in l-parameter families is finite. The theorem of Galin formulated below allows us to count all these types for any fixed l.

We denote by n₁(z) ≥ n₂(z) ≥ ⋯ ≥ n_s(z) the dimensions of the Jordan blocks with eigenvalues z ≠ 0, and by m₁ ≥ m₂ ≥ ⋯ ≥ m_u and

the dimensions of the Jordan blocks with eigenvalues zero, where the m_i are even and the

are odd (of every pair of blocks of odd dimension, only one is considered).

Theorem. In the space of all hamiltonians, the manifold of hamiltonians with Jordan blocks of the indicated dimensions has codimension

(Note that, if zero is not an eigenvalue, then only the first term in the sum is not zero.)

Corollary. In l-parameter families in general position of linear hamiltonian systems, the only systems which occur are those with Jordan blocks such that the number c calculated by the formula above is not greater than l: all cases with larger c can be eliminated by a small change of the family.

Corollary. In one- and two-parameter families, nonremovable Jordan blocks of only the following 12 types occur:

(here the Jordan blocks are denoted by their determinants; for example, (±a)² denotes a pair of Jordan blocks of order 2 with eigenvalues a and −a, respectively;

(the remaining eigenvalues are simple).

Galin has also computed the normal forms to which one can reduce any family of linear hamiltonian systems which depend smoothly on parameters, by using a symplectic linear change of coordinates which depends smoothly on the parameters. For example, for the simplest Jordan square (±a)², the normal form of the hamiltonian will be

(λ₁ and λ₂ are the parameters).

Appendix 7: Normal forms of hamiltonian systems near stationary points and closed trajectories

In studying the behavior of solutions to Hamilton’s equations near an equilibrium position, it is often insufficient to look only at the linearized equation. In fact, by Liouville’s theorem on the conservation of volume, it is impossible to have asymptotically stable equilibrium positions for hamiltonian systems. Therefore, the stability of the linearized system is always neutral: the eigenvalues of the linear part of a hamiltonian vector field at a stable equilibrium position all lie on the imaginary axis.

For systems of differential equations in general form, such neutral stability can be destroyed by the addition of arbitrarily small nonlinear terms. For hamiltonian systems the situation is more complicated. Suppose, for example, that the quadratic part of the hamiltonian function at an equilibrium position (which determines the linear part of the vector field) is (positive or negative) definite. Then the hamiltonian function has a maximum or minimum at the equilibrium position. Therefore, this equilibrium position is stable (in the sense of Liapunov, but not asymptotically), not only for the linearized system but also for the entire nonlinear system.

On the other hand, the quadratic part of the hamiltonian function at a stable equilibrium position may not be definite. A simple example is supplied by the function

. To investigate the stability of systems with this kind of quadratic part, we must take into account terms of degree ≥ 3 in the Taylor series of the hamiltonian function (i.e., the terms of degree ≥ 2 for the phase velocity vector field). It is useful to carry out this investigation by reducing the hamiltonian function (and, therefore, the hamiltonian vector field) to the simplest possible form by a suitable canonical change of variables. In other words, it is useful to choose a canonical coordinate system, near the equilibrium position, in which the hamiltonian function and equations of motion are as simple as possible.

The analogous question for general (non-hamiltonian) vector fields can be solved easily: there the general case is that a vector field in a neighborhood of an equilibrium position is linear in a suitable coordinate system (the relevant theorems of Poincaré and Siegel can be found, for instance, in the book, Lectures on Celestial Mechanics, by C. L. Siegel and J. Moser, Springer-Verlag, 1971.)

In the hamiltonian case the picture is more complicated. The first difficulty is that reduction of the hamiltonian field to a linear normal form by a canonical change of variables is generally not possible. We can usually kill the cubic part of the hamiltonian function, but we cannot kill all the terms of degree four (this is related to the fact that, in a linear system, the frequency of oscillation does not depend on the amplitude, while in a nonlinear system it generally does). This difficulty can be surmounted by the choice of a nonlinear normal form which takes the frequency variations into account. As a result, we can (in the “non-resonance” case) introduce action-angle variables near an equilibrium position so that the system becomes integrable up to terms of arbitrary high degree in the Taylor series.

This method allows us to study the behavior of systems over the course of large intervals of time for initial conditions close to equilibrium. However, it is not sufficient to determine whether an equilibrium position will be Liapunov stable (since on an infinite time interval the influence of the discarded remainder term of the Taylor series can destroy the stability). Such stability would follow from an exact reduction to an analogous normal form which did not disregard remainder terms. However, we can show that this exact reduction is generally not possible, and formal series for canonical transformations reducing a system to normal form generally diverge.

The divergence of these series is connected with the fact that reduction to normal form would imply simpler behavior of the phase curves (they would have to be conditionally periodic windings of tori) than that which in fact occurs. The behavior of phase curves near an equilibrium position is discussed in Appendix 8. In this appendix we give the formal results on normalization up to terms of high degree.

The idea of reducing hamiltonian systems to normal forms goes back to Lindstedt and Poincaré;¹⁰⁴ normal forms in a neighborhood of an equilibrium position were extensively studied by G. D. Birkhoff (G. D. Birkhoff, Dynamical Systems, American Math. Society, 1927).

Normal forms for degenerate cases can be found in the work of A. D. Bruno, “Analytic forms of differential equations,” (Trudy Moskovskogo matematicheskogo obshchestva, v. 25 and v. 26).

A Normal form of a conservative system near an equilibrium position

Suppose that in the linear approximation an equilibrium position of a hamiltonian system with n degrees of freedom is stable, and that all n characteristic frequencies ω₁,..., ω_n are different. Then the quadratic part of the hamiltonian can be reduced by a canonical linear transformation to the form

(Some of the numbers ω_k may be negative).

Definition. The characteristic frequencies ω₁,..., ω_k satisfy a resonance relation of order K if there exist integers k_l not all equal to zero such that

Definition. A Birkhoff normal form of degree s for a hamiltonian is a polynomial of degree s in the canonical coordinates (P_l,Q_l) which is actually a polynomial (of degree [s/2]) in the variables

For example, for a system with one degree of freedom the normal form of degree 2m (or 2m + 1) looks like
and for a system with two degrees of freedom the Birkhoff normal form of degree 4 will be
The coefficients a₁ and a₂ are characteristic frequencies, and the coefficients a_ij describe the dependence of the frequencies on the amplitude.

Theorem. Assume that the characteristic frequencies ω_l do not satisfy any resonance relation of order s or smaller. Then there is a canonical coordinate system in a neighborhood of the equilibrium position such that the hamiltonian is reduced to a Birkhoff normal form of degree s up to terms of order s + 1:

Proof. The proof of this theorem is easy to carry out in a complex coordinate system

(upon passing to this coordinate system we must multiply the hamiltonian by −2i). If the terms of degree less than N entering into the normal form are not already killed, then the transformation with generating function Pq + S_N(P, q) (where S_N is a homogeneous polynomial of degree N) changes only terms of degree N and higher in the Taylor expansion of the hamiltonian function.

Under this transformation the coefficient for a monomial of degree N in the hamiltonian function having the form

is changed into the quantity

where λ_l = iω_l and where s_αβ is the coefficient for z^αw^β in the expansion of the function S_N(P, q) in the variables z and w.

Under the assumptions about the absence of resonance, the coefficient of s_αβ in the square brackets is not zero, except in the case when our monomial can be expressed in terms of the product z_lw_l = 2τ_l (i.e., when all the α_l are equal to the β_l). Thus we can kill all terms of degree N except those expressed in terms of the variables τ_l. Setting N = 3, 4,..., s, we obtain the theorem. □

To use Birkhoff’s theorem, it is helpful to note that a hamiltonian in normal form is integrable. Consider the “canonical polar coordinates” τ_l , φ_l, in which P_l and Q_l can be expressed by the formulas

Since the hamiltonian is expressed in terms of only the action variables τ_l, the system is integrable and describes conditionally periodic notions on the tori τ = const with frequencies ω = ∂H/∂τ. In particular, the equilibrium position P = Q = 0 is stable for the normal form.

B Normal form of a canonical transformation near a stationary point

Consider a canonical (i.e. area-preserving) mapping of the two-dimensional plane to itself. Assume that this transformation leaves the origin fixed, and that its linear part has eigenvalue λ = e^{± iα} (i.e., is a rotation by angle α in a suitable symplectic basis with coordinates p, q). We will call such a transformation elliptic.

Definition. A Birkhoff normal form of degree s for a transformation is a canonical transformation of the plane to itself which is a rotation by a variable angle which is a polynomial of degree not more than m = [s/2] − 1 in the action variable τ of the canonical polar coordinate system:

where

Theorem 2. If the eigenvalue λ of an elliptic canonical transformation is not a root of unity of degree s or less, then this transformation can be reduced by a canonical change of variables to a Birkhoff normal form of degree s with error terms of degree s + 1 and higher.

The multi-dimensional generalization of an elliptic transformation is the direct product of n elliptic rotations of the planes (p_l, q_l) with eigenvalues λ_l = e^±iα_l. A Birkhoff normal form of degree s is given by the formula

where S is a polynomial of degree not more than [s/2] in the action variables τ₁,..., τ_n.

Theorem 3. If the eigenvalues λ_l of a multi-dimensional elliptic canonical transformation do not admit resonances

then this transformation can be reduced to a Birkhoff normal form of degree s (with error in terms of degree s in the expansion of the mapping in a Taylor series at the point p = q = 0).

C Normal form of an equation with periodic coefficients near an equilibrium position

Let p = q = 0 be an equilibrium position of a system whose hamiltonian function depends 2π-periodically on time. Assume that the linearized equation can be reduced by a linear symplectic time-periodic transformation to an autonomous normal form with characteristic frequencies ω₁,..., ω_n.

We say that a system is resonant of order K > 0 if there is a relation

with integers k₀, k₁,..., k_n for which |k₁| + ⋯ + |k_n| = K.

Theorem. If a system is not resonant of order s or less, then there is a 2π-periodic time-dependent canonical transformation reducing the system in a neighborhood of an equilibrium position to the same Birkhoff normal form of degree s as if the system were autonomous, with only the difference that the remainder terms R of degree s + 1 and higher will depend periodically on time.

Finally, suppose that we are given a closed trajectory of an autonomous hamiltonian system. Then, in a neighborhood of this trajectory, we can reduce the system to normal form by using either of the following two methods:

Isoenergetic reduction: Fix an energy constant and consider a neighborhood of the closed trajectory on the (2n − 1)-dimensional energy level manifold as the extended phase space of a system with n − 1 degrees of freedom, periodically depending on time.

Surface of section: Fix an energy constant and value of one of the coordinates (so that the closed trajectory intersects the resulting (2n − 2)-dimensional manifold transversally). Then phase curves near the given one define a mapping of this (2n − 2)-dimensional manifold to itself, with a fixed point on the closed trajectory. This mapping preserves the natural structure on our (2n − 2)-dimensional manifold, and we can study it by using the normal form in Section B.

In investigating closed trajectories of autonomous hamiltonian systems, a phenomenon arises which contrasts with the general theory of equilibrium positions of systems with periodic coefficients. The fact is that the closed trajectories of an autonomous system are not isolated, but form (as a rule) one-parameter families. The parameter of the family is the value of the energy constant. In fact, assume that for some choice of the energy constant the closed trajectory intersects transversally the (2n − 2)-dimensional manifold described above in the (2n − 1)-dimensional energy level manifold. Then for nearby values of the energy, there will exist a similar closed trajectory. By the implicit function theorem we can even say that this closed trajectory depends smoothly on the energy constant.

If we now wish to use the Birkhoff normal form to investigate a one-parameter family of closed trajectories, we encounter the following difficulty. As the parameter describing the family varies, the eigenvalues of the linearized problem will generally change. Therefore, for some values of the parameter we will inevitably encounter resonances, obstructing reduction to the normal form.

Especially dangerous are resonances of low order, since they influence the first few terms of the Taylor series. If we are interested in a closed trajectory for which the eigenvalues nearly satisfy a resonance relation of low order, then the Birkhoff form must be somewhat modified. Namely, for resonance of order N some of the expressions

by which we must divide to kill the terms of order N in the hamiltonian function, may become zero. For non-resonant values of the parameter which are close to resonance, this combination of characteristic frequencies is generally not zero, but very small (this combination is therefore called a “small denominator”).

Division by a small denominator leads to the following difficulties:

The transformation which reduces to normal form depends discontinuously on the parameter (it has poles for resonant values of the parameter);

The region in which the Birkhoff normal form accurately describes the system contracts to zero at resonance.

In order to get rid of these deficiencies, we must give up trying to annihilate some of the terms of the hamiltonian (namely, those which become resonant for resonance values of the parameter). Moreover, these terms must be preserved not only for resonance, but also for nearby values of the parameter.¹⁰⁵ The normal form thus obtained is somewhat more complicated than the usual normal form, but in many cases it gives us useful information on the behavior of solutions near resonance.

D Example: Resonance of order 3

As a simple example, we will study what happens to a closed trajectory of an autonomous hamiltonian system with two degrees of freedom, for which the period of oscillation (about the closed trajectory) of neighboring trajectories is three times the period of the closed trajectory itself. By what we said above, this problem may be reduced to an investigation of a one-parameter system of non-autonomous hamiltonian systems with one degree of freedom, 2π-periodically depending on time, in a neighborhood of an equilibrium position. This equilibrium position can be taken as the origin for all values of the parameter (to achieve this we must make a change of variables depending on the parameter).

Furthermore, the linearized system at the equilibrium position can be converted into a linear system with constant coefficients by a 2π-periodically time-dependent linear canonical change of variables. In the new coordinates the phase flow of the linearized system is represented as a uniform rotation around the equilibrium position. The angular velocity ω of this rotation depends on the parameter.

At the resonance value of the parameter, ω = ⅓ (i.e., after time 2π, we have gone one-third of the way around the origin). The derivative of the angular velocity ω with respect to the parameter is generally not zero. Therefore, we can take as a parameter this angular velocity or, even better, its difference from ⅓. We will denote this difference by ε. The quantity ε is called the frequency deviation or detuning. The resonance value of the parameter is ε = 0, and we are interested in the behavior of the system for small ε.

If we disregard the nonlinear terms in Hamilton’s equations and disregard the frequency deviation ε, then all trajectories of our system become closed after making three revolutions (i.e., they have period 6π). We now want to study the influence of the nonlinear terms and frequency deviation on the behavior of the trajectories. It is clear that in the general case not all the trajectories will be closed. To study their behavior, it is useful to look at the normal form.

In the chosen coordinate system,

, the hamiltonian function has the form ω = ⅓, the resonant terms are those for which

where the dots indicate terms of order higher than three, and where ω = (⅓) + ε.

In the reduction to normal form we can kill all terms of degree three except those for which the small denominator

becomes zero at resonance. These terms can be described also as those which are constant along trajectories of the periodic motion obtained by disregarding the frequency deviation and nonlinearity. They are called the resonant terms. Thus, for resonance

Of the terms of third order, only z³e^−it and

turn out to be resonant. Thus we can reduce the hamiltonian function to the form

(the conjugacy of h and

corresponds to the fact that H is real).

Note that, in order to reduce the hamiltonian function to this normal form, we made a 2π-periodic time-dependent smooth canonical transformation which depends smoothly on the parameter, even in the case of resonance. This transformation differs from the identity only by terms that are small of second order relative to the deviation from the closed trajectory (and its generating function differs from the generating function of the identity only by cubic terms).

Further investigation of the behavior of solutions of Hamilton’s equations proceeds in the following way. First, we throw out of the hamiltonian function all terms of order higher than three and study the solutions of the resulting truncated system. Then we must see how the discarded terms can affect the behavior of the trajectories.

The study of the truncated system can be simplified by introducing a coordinate system in the complex z-plane which rotates uniformly with angular velocity ⅓, i.e., by the substitution z = ζe^it/3. Then for the variable ζ we obtain an autonomous hamiltonian system with hamiltonian function

The fact that, in a rotating coordinate system, the truncated system is autonomous is very good luck. The total system of Hamilton’s equations (including terms of degree higher than three in the hamiltonian) is not only not autonomous in a rotating coordinate system, but is not even 2π-periodic (but only 6π-periodic) in time. The autonomous system with hamiltonian H₀ is essentially the result of averaging the original system over closed trajectories of the linear system with ε = 0 (where we disregard terms of degree higher than three).

The coefficient h can be made real (by a rotation of the coordinate system). Thus the hamiltonian function in the real coordinates (x, y) is reduced to the form

The coefficient a depends on the frequency deviation ε as on a parameter. For ε = 0 this coefficient is generally not zero. Therefore, we can make this coefficient equal to 1 by a smooth change of coordinates depending on a parameter. Thus we must investigate the dependence on the small parameter ε of the phase portrait of the system with hamilton function

in the (x, y)-plane.

It is easy to see that this dependence consists of the following (Fig. 239).

Figure 239

Passage through resonance 3:1

For ε = 0 the zero level set of the function H₀ consists of three straight lines through 0, intersecting at angles of 60°. Under a change of ε the level line always consists of three straight lines, where these three lines are moved forward as ε changes, always forming an equilateral triangle with center at the origin. The vertices of this triangle are saddle points of the hamiltonian function. As ε passes through zero (i.e., upon passage through resonance), the critical point at the origin changes from a minimum to a maximum.

Thus, for a system with hamiltonian function H₀, the origin is a stable equilibrium position for all values of the parameter except at resonance, and at resonance the origin is unstable. For values of the parameter close to resonance, the triangle close to the origin filled by closed phase curves is small (of order ε), so the “radius of stability” of the origin approaches zero as ε → 0: a small (of order ε) perturbation of the initial condition is sufficient to make a phase point move outside the triangle and begin to go away from the equilibrium position.

Returning to the original problem of the periodic trajectory, we come to the following conclusions (which, of course, are not proven, since we threw out terms of degree higher than three, but which can be justified):

At the moment of passage through the resonance 3:1 a periodic trajectory generally loses its stability.

For values of the parameter close to resonance there is an unstable periodic trajectory near the periodic trajectory under consideration on the same energy level manifold. It is closed after making three circulations along the original trajectory and one revolution around it. For the resonance value of the parameter, this unstable trajectory merges with the original one.

The distance of this unstable periodic trajectory from the original decreases, as we approach resonance, to first order in the frequency deviation (i.e., as the first order of the difference of the parameter from the resonance value).

Through this unstable trajectory on the same three-dimensional energy level manifold there pass two two-dimensional invariant surfaces, filled with trajectories approximating this unstable periodic trajectory as t → ∞ on one surface and as t → − ∞ on the other.

The location of the separatrices is such that, by intersecting with a manifold transversal to the original trajectory, we obtain a figure close to the three sides of an equilateral triangle and their continuations. The vertices of the triangle are the points of intersection of the unstable periodic trajectory with the transversal manifold.

For initial conditions inside the triangle formed by the separatrices, a phase point stays near the original periodic trajectory (at a distance of order ε) for a long time (of order not less than 1/ε), and for initial conditions outside the triangle it goes off quite rapidly to a distance which is large in comparison with ε.

E Splitting of separatrices

In reality, the separatrices we talked about in statements 4, 5, and 6 above have a very complicated structure (because of the influence of the terms of order higher than three which we disregarded in our approximation). In order to understand the situation, it is convenient to look at a two-dimensional surface transversally intersecting the original closed trajectory at some point on it (and lying entirely in one energy level manifold).¹⁰⁶ Trajectories beginning on this surface intersect it again after a time close to the time of circulation around the original closed trajectory. Thus we have a mapping of a neighborhood of the point of intersection of the closed trajectory with the surface onto a part of the surface. This mapping has a fixed point (at the point where the closed trajectory intersects the surface) and is approximately a rotation by 120° around this point, which we take for the origin in our surface.

We now consider the third power of the mapping indicated above. This is again a mapping of some neighborhood of the origin to a part of the surface, leaving the origin fixed. But now this mapping is approximately rotation by 360°, i.e., the identity: it is realized by the trajectories of our system after approximately three periods of our closed trajectory.

The calculations above give nontrivial information about the structure of this “mapping after three periods.” In fact, by throwing out the terms of degree four and higher in the hamiltonian function, we change the terms of degree three and higher of the mapping. Therefore, the mapping after three periods which corresponds to the truncated hamiltonian function approximates (with cubic error) the actual mapping after three periods.

But we know the properties of the mapping after three periods corresponding to the truncated hamiltonian function, since it is the mapping of the phase flow of the system with hamiltonian function H₀(x, y) after time 6π (the proof is based on the fact that after time 6π our rotating coordinate system returns to the original position). We now look at which of these properties are preserved for perturbations of third-order smallness relative to the distance from the fixed point, and which are not.

We let A₀ denote the mapping after three periods for the truncated system, and A the actual mapping after three periods.

The mapping A₀ is included in a flow: it is the transformation after time 6π in the phase flow with hamiltonian H₀.

There is no reason to think that the mapping A is included in a flow.

The mapping A₀ is symmetric under a rotation by 120°: there is a nontrivial diffeomorphism g for which g³ = E and which commutes with A₀.

There is no reason to think that the mapping A commutes with any nontrivial diffeomorphism g satisfying g³ = E.

The mapping A₀ has three unstable fixed points at a distance ε from the origin, approximately the vertices of an equilateral triangle. For sufficiently small deviations from resonance (i.e., for sufficiently small ε) the mapping A also has three unstable fixed points near the vertices of an equilateral triangle. This follows from the implicit function theorem.

The separatrices of fixed points of the mapping A₀ form, for values of the parameter close to (but not at) resonance, a figure approximating the sides and extended sides of an equilateral triangle. If we begin with a point on one of the sides of the triangle, then after repeated applications of A₀ we obtain a sequence of points on the same side of the triangle approaching one of the vertices bounding the side, say M₀ . Applying

, we obtain a sequence approaching the other vertex, which we will denote by N₀.

Each of the three unstable fixed points of the mapping A also has separatrices approximating the sides of a triangle (Figure 240). Namely, those points of the plane which approach the fixed point M after applying the mappings Aⁿ, n → + ∞, form a smooth curve Γ⁺ invariant under A, passing through M and, near M, close to the side M₀N₀ of the separatrices of A₀ . The points which approach N after applications of Aⁿ, where n → − ∞, form another smooth invariant curve Γ⁻, passing through N and also near M₀N₀ near N₀.

Figure 240

Splitting of separatrices

However the two curves Γ⁺ and Γ⁻, both near the line M₀N₀, are not at all obliged to coincide. This is the phenomenon of splitting of separatrices, which accounts for the differing behavior of the trajectories of the truncated and total systems.

The magnitude of the splitting of separatrices is exponentially small for small ε; therefore it is easy to overlook the phenomenon of splitting in calculations in one or another scheme of “perturbation theory.” However, this phenomenon is very important in fundamental questions. For example, its existence immediately implies the divergence of the series in numerous versions of perturbation theory (since if the series converged, there would be no splitting).

In general, the divergence of series in perturbation theory (while a good approximation is given by a few initial terms) is usually related to the fact that we are looking for an object which does not exist. If we try to fit a phenomenon to a scheme which actually contradicts the essential features of the phenomenon, then it is not surprising that our series diverge.

The Birkhoff series (which are obtained if one continues infinitely the normalizations of the initial terms of the Taylor series of the hamiltonian function) are one example of a formally convergent, but actually divergent, scheme of perturbation theory. If these series converged, then a general oscillating system with one degree of freedom with periodic coefficients would be reduced near an equilibrium position to an autonomous normal form and there would be no splitting of separatrices in it (whereas in fact there is).

Returning to the original closed trajectory, we see that the three unstable fixed points of the mapping A correspond to an unstable closed trajectory near the original triple. There is a family of trajectories approaching this unstable trajectory as t → + ∞, and another family of trajectories approaching the unstable one as t → − ∞. The points of the trajectories of each of these families form a smooth surface containing our unstable trajectory.

These two surfaces are also the separatrices we talked about in statements 4, 5, and 6 of Section D. By intersecting them with our transversal surface we obtain the invariant curves Γ⁺ and Γ⁻ of the mapping A. The intersections of these two curves form a complicated network about which H. Poincaré, who first discovered the phenomenon of splitting of separatrices, wrote, The intersections form a type of lattice, tissue, or grid with infinitely fine mesh. Neither of the two curves must ever cut across itself again, but it must bend back upon itself in a very complex manner in order to cut across all of the squares in the grid an infinite number of times.

“One will be struck by the complexity of this figure, which I shall not even attempt to draw. Nothing is more suitable for providing us with an idea of the complex nature of the three-body problem, and of all the problems of dynamics in general, where there is no uniform integral and where the Bohlin series are divergent.” (H. Poincaré, “Les Méthodes Nouvelles de la Mécanique Céleste,” Vol. III, Dover, 1957, 389.)

We should note that much is still unclear about the picture of intersecting separatrices.

F Resonances of higher order

Resonances of higher order can also be studied using a normal form. In this connection, we note that resonances of order higher than 4 do not usually induce instability, since in the normal form terms of degree 4 appear, guaranteeing a minimum or maximum of the function H₀ even at resonance.

In the case of resonance of order n > 4, the typical development of the phase portrait of the system with hamiltonian function H₀ is given by the formula

and consists of the following (Figure 241).

Figure 241

Averaged hamiltonian of phase oscillations near resonance 5:1

For small (of order ε) deviations of the frequency from resonance, and at a small (of order

) distance from the equilibrium position at the origin, the function H₀ has 2n critical points near the vertices of a regular n-gon with center at the origin. Half of these critical points are saddle points, and the other half are maxima if the origin is a minimum or minima if the origin is a maximum. The saddle points and stable points alternate. All n saddle points lie on one level of the function H₀; their separatrices, connecting successive saddle points, form n “islands,” each of which is filled with closed phase curves encircling a stable point. The width of the islands is of order ε^{(n/4)−(1/2)}. The closed phase curves inside each island are called “phase oscillations” (since what varies essentially is the phase of the oscillations around the origin). The period of the phase oscillations grows with decreasing frequency deviation ε like ε^−n/4.

Inside the narrow ring formed by the islands, closer to the origin, there are closed phase curves encircling the origin; outside the ring the phase curves are closed, but motion along them proceeds in the direction opposite to that inside the ring. We note that the radius of the ring has order

independently of the order of resonance, if this order is greater than 4. Also, the ring of islands exists for only one of the two signs of ε.

If we pass from the truncated system with hamiltonian H₀ to the total system, the separatrices split in a way similar to that described above for resonance of order 3. The size of the splitting of the separatrices is exponentially small (or order e^{−1/ε^n/⁴}), but the splitting is of fundamental importance for investigating stability, especially in the multi-dimensional case.

Returning to our original closed trajectory, we have the following picture. As we approach resonance along the ε axis from one side,¹⁰⁷ two periodic trajectories split off from our periodic trajectory: a stable one and an unstable one. These new trajectories close up after n circulations along the original trajectory and lie at a distance of order

from the original trajectory. Near the stable trajectory there is a zone of slow phase oscillations with period of order ε^−n/4 and amplitude of order π/n in the azimuthal direction and of order ε^{(n/4)−(1/2)} in the radial direction. Loss of stability of the original periodic trajectory at the moment of passage through resonance does not occur, at least in the approximation which we have considered.

The case of resonance of fourth order is somewhat exceptional. In this case, in the normal form there are both resonant and non-resonant terms of order 4. The shape of the phase curves of the truncated system depends on which of these terms of the normal form dominates, a resonant one or a non-resonant one. In the first case the development is the same as for third-order resonance, except that in place of a triangle there is a square. In the second case the development is the same as for n > 4.

In conclusion, we remark that the given normal form becomes a better approximation as we get closer to resonance

and as the deviation of the initial point from the periodic trajectory gets smaller. That is, as the period of the closed trajectory and the period of oscillation of neighboring trajectories near it become more exactly commensurable, and as the initial condition approaches the closed trajectory, the interval of time grows on which our approximation accurately describes the behavior of the phase curves.

No conclusion about the behavior of non-closed phase curves on infinite intervals of time (for example, about the Liapunov stability of the original periodic trajectory) follows from our arguments, since the terms of higher order which were thrown out in reducing to normal form can, over an infinite period of time, completely change the character of the motion. Actually, under the conditions considered, the original periodic trajectory is Liapunov stable, but the proof requires substantially new techniques beyond the Birkhoff normal form (cf. Appendix 8).

¹⁰⁴

Cf. H. Poincaré, Les Méthodes Nouvelles de la Mécanique Céleste, Vol. 1, Dover, 1957.

¹⁰⁵

The method indicated here is useful not only in investigating hamiltonian systems, but also in the general theory of differential equations. Cf., for example, V. I. Arnold, “Lectures on bifurcations and versal families,” Russian Math. Surveys 27, No. 5, 1972, 54–123.

¹⁰⁶

Here we have the following general phenomenon: it is easier to think about mappings after a period, and easier to calculate with flows.

¹⁰⁷

Unlike resonance of order 3, for which there is an unstable periodic trajectory branching off from both sides of the resonance.

Appendix 8: Theory of perturbations of conditionally periodic motion, and Kolmogorov’s theorem

The collection of solvable “integrable” problems which we have at our disposal is not large (one-dimensional problems, motion of a point in a central field, eulerian and lagrangian motions of a rigid body, the problem of two fixed centers, and motion along geodesics on the ellipsoid). However, with the help of these “ integrable cases,” we can obtain meaningful information about motions of many important systems by considering an integrable problem as a first approximation.

An example of such a situation is the problem of motion of the planets around the sun under the law of universal gravitation. The mass of the planets is approximately 0.001 of the mass of the sun, so in a first approximation we can disregard the interaction of the planets on one another and consider only the attraction by the sun. As a result, we obtain the exactly integrable problem of the motion of non-interacting planets around the sun; each planet will describe its keplerian ellipse independently of the others, and the motion of the system as a whole will be conditionally periodic. If we now consider the interactions of the planets on one another, the keplerian motion of each planet will be slightly changed.

We call upon the theory of perturbations from celestial mechanics to study this interaction. It is clear that calculations for time of the order of 1,000 years do not present any fundamental difficulties. However, if we want to study longer intervals of time, and especially if we are interested in qualitative questions about the behavior of exact solutions of the equations of motion on an infinite time interval, then such difficulties arise. The accumulation of perturbations after an interval of time which is large in comparison to 1,000 years could cause a complete change in the character of the motion: for example, the planets could fall into the sun, escape from it, or collide with one another.

Note that the question of the behavior of solutions of the equations of motion on an infinite time interval has only an indirect relation to the problem of the motion of real planets. The reason is that, after intervals of billions of years, small non-conservative effects not considered in Newton’s equations become important. Thus, the effects of the gravitational interaction of the planets are of real importance only when they seriously change the picture of motion within a finite time which is small in comparison with the time of development of non-conservative effects.

In calculating motion over such finite times, computers prove to be very useful, quickly determining the motion of the planets for many thousands of years in the future or past. However, we should note that even the application of modern calculating methods may be insufficient to predict the influence of perturbations if a phase point falls in the zone of exponential instability.

Asymptotic and qualitative methods have even greater value for the study of charged particles in magnetic fields, since in this situation a particle outstrips the computer and makes so many orbits that mechanical calculation of its trajectory is impossible even in the absence of exponential instability.

A whole series of methods has been devised for calculating perturbations in celestial mechanics. (A detailed analysis of them can be found in the book, “Les Méthodes Nouvelles de la Mécanique Céleste,” by H. Poincaré, Dover, 1957.)

A difficulty with all of these methods is that they lead to divergent series and therefore give no information about the behavior of motion as a whole over infinite intervals of time. The reason for the divergence of series in the theory of perturbations is “small denominators”: integral linear combinations of frequencies of unperturbed motions by which it is necessary to divide in calculating the influence of perturbations. For exact resonance (i.e., for commensurable frequencies) these denominators vanish, and the corresponding term of the series in the theory of perturbations becomes infinitely large. Close to resonance, this term of the series is very large.

Thus, for example, in their motion around the sun, Jupiter and Saturn, in one day, go through approximately 299 and 120.5 seconds of arc respectively. Therefore, the denominator 2ω_J – 5ω_s is very small in comparison with each of their frequencies. This amounts to a large long-period perturbation of the planets on one another (its period is about 800 years); the study by Laplace of this effect was one of the first successes of the theory of perturbations.

We note that the difficulty caused by small denominators is essential. The rational numbers form a dense set; thus in the phase space of an unperturbed problem, initial conditions for which we have resonance and the small denominators vanish form a dense set. Hence, the functions given by the series of perturbation theory have a dense set of singular points.

The difficulty mentioned here is characteristic not only for problems of celestial mechanics, but for all problems which are close to integrable (for instance, for the problem of an asymmetrical rigid top under very fast rotation). Poincaré himself called the problem of studying perturbations of conditionally-periodic motions in a system given by the hamiltonian

in action-angle variables I and φ, the fundamental problem of dynamics. Here H₀ is the hamiltonian of the unperturbed problem, and εH₁ a perturbation which is a 2π-periodic function of the angle variables φ₁,..., φ_n. In the unperturbed problem (ε = 0) the angles φ change uniformly with constant frequencies

and all the action variables are first integrals.

We must investigate the phase curves of Hamilton’s equations

in a phase space which is a direct product of a region in n-dimensional space with coordinates I and the n-dimensional torus with angular coordinates φ.

A substantial advance in the study of phase curves of this perturbed problem was begun in 1954 with the work of A. N. Kolmogorov in “On conservation of conditionally-periodic motions for a small change in Hamilton’s function,” Dokl. Akad. Nauk SSSR 98:4 (1954) 525–530 (Russian). In this appendix we present the basic results obtained since then in this area. The proofs can be found in the following works:

V. I. Arnold, “Small denominators I, Mapping the circle onto itself,” Izv. Akad. Nauk SSSR Ser. Mat. 25 (1961), 21–86.
V. I. Arnold, “Small denominators II, Proof of a theorem of A. N. Kolmogorov on the preservation of conditionally periodic motions under a small perturbation of the Hamiltonian,” Russian Math. Surveys 18:5 (1963).
V. I. Arnold, “Small denominators III. Small denominators and problems of stability of motion in classical and celestial mechanics.” Russian Math. Surveys 18:6 (1963).
V. I. Arnold, A. Avez, Ergodic problems of classical mechanics, New York, Benjamin, 1968.
J. Moser, On invariant curves of area-preserving mappings of an annulus (Nachr. Akad. Wiss. Göttingen, Math. Phys. K1 IIa, (1962) 1–20).
J. Moser, A rapidly converging iteration method and nonlinear differential equations, (Annali della Scuola Norm. Sup. di Pisa, (3), 20 (1966), 265–315: (1966), 499–535.
J. Moser, Convergent series expansions for quasi-periodic motions, Math. Ann. 169 (1967), 136–176.
C. L. Siegel, J. K. Moser, Lectures on Celestial Mechanics, Springer-Verlag, 1971.
S. Sternberg, Celestial Mechanics, I, II, New York, Benjamin, 1969.

Before formulating our results, we will briefly discuss the behavior of phase curves in the unperturbed problem already studied in Chapter 10.

A Unperturbed motion

The system with hamiltonian H₀(I) has n first integrals in involution (the n action variables). Every level set of all these integrals is an n-dimensional torus in 2n-dimensional phase space. This torus is invariant with respect to the phase flow of the unperturbed system: every phase curve starting at a point of our torus remains on it.

The motion of a phase point on the invariant torus I = const is conditionally-periodic. The frequencies of this motion are the derivatives of the unperturbed hamiltonian with respect to the action variables:

Therefore, the phase curve densely fills a torus whose dimension is equal to the number of frequencies ω_k which are arithmetically independent.

We note that the frequencies depend on which torus we are looking at; i.e., which values of the first integrals we have fixed. A system of n functions ω of n variables I is generally functionally independent; in such a case we can simply number the tori by their frequencies, choosing the variables ω for coordinates in a neighborhood of the point under consideration in the space of action variables I.

The case when the frequencies are functionally independent will be called the nondegenerate case. The conditions for nondegeneracy have the form

Thus, in the nondegenerate case, the unperturbed problem determines on the different invariant tori in phase space conditionally-periodic motions with different frequencies. In particular, the invariant tori on which the number of frequencies is maximal (i.e., n) form a dense set in phase space; such tori are called non-resonant tori.

It can be shown that the non-resonant tori form a set of full measure, i.e., the Lebesgue measure of the union of all invariant resonant tori of the unperturbed non-degenerate system is equal to zero. Nevertheless, invariant resonant tori exist and are mixed in with the non-resonant tori in such a way that they too form a dense set. Furthermore, the set of resonant tori with any number of independent frequencies from 1 to n − 1 is dense. In particular, the invariant tori on which all phase curves are closed (the number of independent frequencies is 1) form a dense set. Nevertheless, we note that the probability of landing on a resonant torus by a random choice of initial point in the phase space of the unperturbed system, is equal to zero (since the probability of landing on a rational number by a random choice of a real number is zero). Thus, by disregarding sets of measure zero, we can say that almost all invariant tori in a nondegenerate unperturbed system are non-resonant and have a total set of n arithmetically independent frequencies.

On a non-resonant torus, the trajectory of a conditionally-periodic motion is dense. Thus, for almost all initial conditions, a phase curve of a non-degenerate unperturbed system densely fills an invariant torus whose dimension is equal to the number of degrees of freedom (i.e., half the dimension of the phase space).

To better understand the whole picture, we consider the case of two degrees of freedom (n = 2). In this case, the phase space is four-dimensional so each energy level set is three-dimensional. We fix one such level set. This three-dimensional manifold, fibered by two-dimensional tori, can be represented in ordinary three-dimensional space as a family of concentric tori lying inside one another (Figure 242).

Figure 242

Invariant tori in a three-dimensional energy level manifold

The phase curves are windings of these tori; both frequencies of circulation change from torus to torus. In general, not only both frequencies but also their ratio will change from torus to torus. If the derivative of the ratio of frequencies with respect to the action variable numbering the tori on the given level set of the function H₀ is not zero, then we say that our system is isoenergetically nondegenerate. The condition for isoenergetic nondegeneracy has (as is easy to calculate) the form

The conditions for nondegeneracy and isoenergetic nondegeneracy are independent from one another; i.e., a nondegenerate system could be isoenergetically degenerate, and an isoenergetically nondegenerate system could be degenerate. In the many-dimensional case (n > 2) isoenergetic nondegeneracy means nondegeneracy of the following mapping of the (n − 1)-dimensional level manifold of the function H₀ of n action variables to the projective space of dimension n − 1:

Now consider an isoenergetically nondegenerate system with two degrees of freedom. It is easy to construct a two-dimensional plane in the three-dimentional energy level set transversally intersecting the two-dimensional tori of our family (in a family of concentric circles in the model in three-dimensional euclidean space).

A phase curve beginning in such a plane returns to it after making a circuit around the torus. As a result we obtain a new point on the same circle in which the torus intersects the plane. In this way there arises a mapping of the plane to itself.

This mapping of the plane to itself fixes the concentric meridian circles in which the plane intersects the invariant tori. Every circle is rotated through some angle, namely through that fraction of an entire revolution that the frequency along the meridian constitutes of the frequency along the equator.

If the system is isoenergetically nondegenerate, the angle of revolution of invariant circles in the plane of intersection changes from one circle to another. Therefore, on some circles this angle will be commensurable with a whole revolution, and on others it will be incommensurable. Each of these classes of circles will form a dense set, but on almost all circles (in the sense of Lebesgue measure) the angle of rotation will be incommensurable with a whole revolution.

The commensurability or incommensurability is manifested in the following way on the behavior of points of a circle under the mapping of the region to itself. If the angle of rotation is commensurable with a whole rotation, then after several iterations of the mapping the point will return to its initial position (the number of iterations will be larger as the denominator of the fraction expressing the angle of rotation is larger). If the angle of rotation is incommensurable with a whole rotation, the successive images of the point under repetitions of the mapping will densely fill up the meridian circle.

We note further that commensurability corresponds to resonant tori and incommensurability to non-resonant tori. Also, the existence of resonant tori implies the following property. Consider some power of the mapping of our region to itself induced by the phase curves. Let the exponent be the denominator of the fraction expressing the ratio of the frequencies on one of the resonant tori. Then the mapping raised to the indicated power has a whole circle consisting entirely of fixed points (namely, the meridian of the resonant torus under consideration).

Such behavior of fixed points is unnatural for mappings in any sort of general form, even canonical mappings (fixed points are usually isolated). In the given case, a whole circle of fixed points arises because we have considered an unperturbed integrable system. For arbitrarily small perturbations of general form, this property of the mapping (having a whole circle of fixed points) must fail. The circle of fixed points must be dispersed so that only a finite number remain.

In other words, under small perturbations of our integrable system, we expect a change in the qualitative picture of the phase curves, if only in the respect that entire invariant tori filled out by closed phase curves will disintegrate so that there remain only a finite number of closed curves, near those for the unperturbed system, and the remaining phase curves will be more complicated. We have already encountered such a case in Appendix 7 in investigating phase oscillations near resonance.

We now consider what happens to non-resonant invariant tori under a small perturbation of a hamiltonian function. Formal application of the principle of averaging (i.e., the first approximation of the classical theory of perturbations, cf. Section 52) leads us to the conclusion that a non-resonant torus does not undergo any evolution.

We note that the fact that the perturbations are hamiltonian is essential, since for non-conservative perturbations it is clear that the action variables may evolve. In celestial mechanics, their evolution means a secular change in the major semi-axes of the keplerian ellipses, i.e., the planets falling into the sun, colliding, or escaping to a large distance in a time which is inversely proportional to the size of the perturbation. If conservative perturbations led to evolutions in a first approximation, this would manifest itself in the fate of the planets after a time on the order of 1,000 years. Fortunately, the order of magnitude of the non-conservative perturbations is much less.

The theorem of Kolmogorov, formulated below, furnishes one justification for the conclusion, drawn from the non-rigorous theory of perturbations, about the absence of evolution of action variables.

B Invariant tori in a perturbed system

Theorem. If an unperturbed system is nondegenerate, then for sufficiently small conservative hamiltonian perturbations, most non-resonant invariant tori do not vanish, but are only slightly deformed, so that in the phase space of the perturbed system, too, there are invariant tori densely filled with phase curves winding around them conditionally-periodically, with a number of independent frequencies equal to the number of degrees of freedom.

These invariant tori form a majority in the sense that the measure of the complement of their union is small when the perturbation is small.

A. N. Kolmogorov’s proof of this theorem is based on the following two observations.

We fix a non-resonance set of frequencies of the unperturbed system so that the frequencies are not only independent, but do not even approximately satisfy any resonance conditions of low order. More precisely, we fix a set of frequencies ω for which there exist C and v such that |(ω, k)| > C|k|^−v for all integral vectors k ≠ 0.

It can be shown that, if v is sufficiently large (say v = n + 1), then the measure of the set of such vectors ω (lying in a fixed bounded region) for which the indicated condition of non-resonance is violated, is small when C is small.

Next, near a non-resonant torus of the unperturbed system corresponding to a fixed value of the frequencies, we will look for an invariant torus of the perturbed system on which there is conditionally-periodic motion with exactly the same frequencies as the ones we fixed, and which necessarily satisfy the condition of being non-resonant described above.

In this way, instead of the variations of frequency customary in perturbation schemes (consisting of the introduction of frequencies depending on the perturbation), we must hold constant the non-resonant frequencies, while selecting initial conditions depending on the perturbation in order to guarantee motion with the given frequencies. This can be done by a small (when the perturbation is small) change of initial conditions, because the frequencies change with the action variables according to the non-degeneracy condition.

The second observation is that, to find an invariant torus, instead of using the usual series expansion in powers of the perturbation parameter, we can use a rapidly convergent method similar to Newton’s method of tangents.

Newton’s method of tangents for finding roots of algebraic equations with initial error ε gives, after n approximations, an error of order ε²ⁿ. Such super-convergence allows us to paralyze the influence of the small denominators appearing in every approximation, and in the end succeeds not only in carrying out an infinite number of approximations, but also in showing the convergence of the entire procedure.

The assumption under which all this can be done is that the unperturbed hamiltonian function H₀(I) is analytic and nondegenerate, and the perturbing hamiltonian function εH₁(I, φ) is analytic and 2π-periodic in the angle variables φ. The presence of the small parameter ε is immaterial: it is important only that the perturbation be sufficiently small in some complex neighborhood of radius ρ of the real plane of the variables φ (less than some positive function M(ρ, H₀)).

As J. Moser showed, the requirement of analyticity can be changed to differentiability of sufficiently high order if we combine Newton’s method with an idea of J. Nash, the application of a smoothing operator at each approximation.

The resulting conditionally-periodic motions of the perturbed system with fixed frequencies ω turn out to be smooth functions of the parameter ε of perturbation. Therefore, they could have been sought, without Newton’s method, in the form of a series in powers of ε. The coefficients of this series, called the Lindstedt series, can actually be found; however, we can prove its convergence only indirectly, with the help of newtonian approximations.

C Zones of instability

The presence of invariant tori in the phase space of the perturbed problem means that, for most initial conditions in a system which is nearly integrable, motion remains conditionally periodic with a maximal set of frequencies.

The question naturally arises of what happens to the remaining phase curves, with initial conditions falling into the gaps between the invariant tori which replace the resonant invariant tori of the non-perturbed problem.

The disintegration of a resonant torus on which the number of frequencies is one less than the maximum is easy to investigate in a first-order perturbation theory. To do this, we must average the perturbation over the (n − 1)-dimensional invariant tori into which the resonant invariant torus is decomposed and which are densely filled out by phase curves of the unperturbed system. After averaging, we obtain a conservative system with one degree of freedom (cf. the investigation of phase oscillations near resonance in Appendix 7), which is easy to study.

In the approximation under consideration we have, near the n-dimensional reducible torus, stable and unstable (n − 1)-dimensional tori, with phase oscillations around the stable ones. The corresponding conditionally-periodic motions have a full set of n frequencies, of which n − 1 are the fast frequencies of the original oscillations and one is the slow (of order

) frequency of the phase oscillations.

However, one must not conclude that the only difference between motions in the unperturbed and perturbed systems is the appearance of “islands” of phase oscillations. In fact, the actual phenomena are much more complicated than the first approximation described above. One manifestation of this complicated behavior of the phase curves of the perturbed problem is the splitting of separatrices discussed in Appendix 7.

To study motions of a perturbed system outside of the invariant tori we must distinguish the cases of two and higher degrees of freedom. For two degrees of freedom, the dimension of the phase space is four, and an energy level manifold is three-dimensional. Therefore, the invariant two-dimensional tori divide each energy level set. Thus, a phase curve beginning in the gap between two invariant tori of the perturbed system remains forever confined between those tori. No matter how complicated this curve appears, it does not leave its gap, and the corresponding action variables remain forever near their initial conditions

If the number n of degrees of freedom is greater than two, the n-dimensional invariant tori do not divide the (2n − 1 )-dimensional energy level manifold but are arranged in it like points on a plane or lines in space. In this case the “gaps” corresponding to different resonances are connected to one another, so the invariant tori do not prevent phase curves starting near resonance from going far away. Hence, there is no reason to expect that the action variables along such a phase curve will remain close to their initial values for all time.

In other words, under sufficiently small perturbations of systems with two degrees of freedom (satisfying the generally fulfilled condition of isoenergetic nondegeneracy), not only do the action variables along a phase trajectory have no secular perturbations in any approximation of perturbation theory (i.e., they change little in a time interval on the order of (1/ε)^N for any N, where ε is the magnitude of the perturbation), but these variables remain forever near their initial values. This is true, both for non-resonant phase curves conditionally-periodically filling out two-dimensional tori (and comprising most of the phase space), and for the remaining initial conditions.

At the same time, there exist systems with more than two degrees of freedom satisfying all the nondegeneracy conditions, in which, although for most initial conditions motion is conditionally periodic, for some initial conditions a slow drift of the action variables away from their initial values occurs. The average velocity of this drift in known examples¹⁰⁸ is on the order of

, i.e., this velocity decreases faster than any power of the perturbation parameter. Thus it is not surprising that this drifting away does not appear in any approximation of perturbation theory. (By average velocity, we mean the ratio of the increase of action variables to time, so that we are actually dealing with an increase of order 1 after a time of order

An upper bound on the average velocity of the drift of the action variables in general nearly integrable systems of hamiltonian equations with n degrees of freedom is included in the recent work of N. N. Nekhoroshev.¹⁰⁹

This bound, like the lower bound mentioned above, has the form e^{−1/ε^d}; thus the increase of the action variables is small while the time is small in comparison with e^{1/ε^d}, if ε < ε₀. Here ε is the magnitude of the perturbation, and d is a number between 0 and 1 defined, like ε₀, by the properties of the unperturbed hamiltonian H₀. In addition, a nondegeneracy condition is imposed on the unperturbed hamiltonian (this condition has a long formulation, but is generally satisfied; in particular, strong convexity of the unperturbed hamiltonian is sufficient, i.e., positive or negative definiteness of the second differential of H₀).

From this upper bound it is clear that secular changes of the action variables are not detected by any approximation of perturbation theory, since the average velocity of these changes is exponentially small. We note also that secular changes of the action variables obviously have no directional character, but are represented by more or less random wandering in the resonant regions between the invariant tori. A more detailed discussion of the questions arising here can be found in the article, “ Stochastic instability of nonlinear oscillations,” by G. M. Zaslavski and B. V. Chirikov, Soviet Physics Uspekhi, v. 105, no. 1 (1971), 3–39.

D Variants of the theorem on invariant tori

Statements analogous to the theorem on conservation of invariant tori in an autonomous system have been proved for non-autonomous equations with periodic coefficients and for symplectic mappings. Analogous statements are valid in the theory of small oscillations in a neighborhood of an equilibrium position of an autonomous system or a system with periodic coefficients, as well as in a neighborhood of a closed phase curve of a phase flow or in a neighborhood of a fixed point of a symplectic mapping.

The nondegeneracy conditions necessary in the various cases are different. For reference, we will now give these nondegeneracy conditions. We will limit ourselves to the simplest requirements of nondegeneracy, which are all fulfilled by systems in “general position.” In many cases, the requirements of nondegeneracy can be weakened, but the advantage gained by this is offset by the complication of the formulas.

Autonomous systems. The hamiltonian function is

The nondegeneracy condition

guarantees preservation¹¹⁰ of most invariant tori under small perturbations (ε ≪ 1).

The condition for isoenergetic nondegeneracy

guarantees the existence on every energy level manifold of a set of invariant tori whose complement has small measure. The frequencies on these tori generally depend on the size of the perturbation, but the ratios of frequencies are preserved under changes in ε.

If n = 2, then the condition for isoenergetic nondegeneracy also guarantees stability of the action variables, in the sense that they remain forever close to their initial values for sufficiently small perturbations.

Periodic systems. The hamiltonian function is

the perturbation is 2π-periodic not only in φ, but also in t. It is natural to look at the unperturbed system in the (2n + 1)-dimensional space {(I, φ, t)} = ℝⁿ × T^{n+ 1}. The invariant tori have dimension n + 1. The nondegeneracy condition

guarantees the preservation of most (n + 1)-dimensional invariant tori under a small perturbation (ε ≪ 1).

If n = 1, this nondegeneracy condition also guarantees stability of the action variable, in the sense that it remains forever near its initial value for sufficiently small perturbations.

Mappings (I, φ) → (I′, φ′) of the “2n-dimensional annulus.” The generating function is

The nondegeneracy condition

guarantees the preservation of most invariant tori of the unperturbed mapping (I, φ) → (I, φ + (∂S₀/∂I) under small perturbations (ε ≪ 1).

If n = 1, we obtain an area-preserving mapping of the ordinary annulus to itself. The unperturbed mapping is represented on each circle I = const as a rotation. In this case the nondegeneracy condition means that the angle of rotation changes from one circle to another.

The invariant tori in the case n = 1 are ordinary circles. In this case, the theorem guarantees that under iterations of the mapping all the images of a point will remain near the circle on which the original point lay, if the perturbation is sufficiently small.

Neighborhoods of equilibrium positions (autonomous case). An equilibrium position is assumed to be stable in a linear approximation so that n characteristic frequencies ω₁,..., ω_n are defined. We assume that there are no resonance relations among the characteristic frequencies, i.e., no relations

Then the hamiltonian function can be reduced to the Birkhoff normal form (cf. Appendix 7)

where

and the dots denote terms of degree higher than four with respect to the distance from the equilibrium position.

The nondegeneracy condition

guarantees the existence of a set of invariant tori of almost full measure in a sufficiently small neighborhood of the equilibrium position.

The condition for isoenergetic nondegeneracy,

guarantees the existence of such a set of invariant tori on every energy level set (sufficiently close to the critical point).

In the case n = 2, the condition for isoenergetic nondegeneracy is satisfied if the quadratic part of the function H₀ is not divisible by the linear part. In this case, isoenergetic nondegeneracy guarantees Liapunov stability of the equilibrium position.

Neighborhoods of equilibrium positions (periodic case). Here again we assume stability in a linear approximation, so that n characteristic frequencies ω₁,..., ω_n are defined. We assume that there are no resonance relations

among the characteristic frequencies and the frequency of the time-dependence of the coefficients (which we will assume equal to 1).

Then the hamiltonian function can be reduced to a Birkhoff normal form in the same way as in the autonomous case, but with 2π-periodicity with respect to time in the remainder term.

The nondegeneracy condition

guarantees the existence of (n + 1)-dimensional invariant tori in the (2n + 1)-dimensional extended phase space, near the circle τ = 0 representing the equilibrium position.

In the case n = 1 the nondegeneracy condition reduces to the non-vanishing of the derivative of the period of small oscillations with respect to the square of the amplitude of small oscillations. In this case, nondegeneracy guarantees that the equilibrium position is Liapunov stable.

Fixed points of mappings. Here we assume that all 2n eigenvalues of the linearization of a canonical mapping at a fixed point have modulus 1 and do not satisfy any low-order resonance relations of the form:

(where the 2n eigenvalues are

Then if we disregard terms of higher than third order in the Taylor series at the fixed point, the mapping can be written in Birkhoff normal form

(the usual coordinates in a neighborhood of the equilibrium position are

The nondegeneracy condition

guarantees the existence of n-dimensional invariant tori (close to the tori τ = const), forming a set of almost full measure in a sufficiently small neighborhood of the equilibrium position.

If n = 1, we have a mapping of the ordinary plane to itself, and the invariant tori become circles. The nondegeneracy condition means that, for the normal form, the derivative of the angle of rotation of a circle with respect to the area bounded by the circle is not zero (at the fixed point and, therefore, in some neighborhood of it).

In the case n = 1 the nondegeneracy condition guarantees Liapunov stability of the fixed point of the mapping. We note that in this case the condition of absence of lower resonance has the form

Thus a fixed point of an area-preserving mapping of the plane to itself is Liapunov stable if the linear part of the mapping is rotation through an angle which is not a multiple of 90° or 120° and if the coefficient ω₁₁ in the normal Birkhoff form is not zero (guaranteeing nontrivial dependence of the angle of rotation on the radius).

We have not gone into the smoothness conditions assumed in these theorems. The minimal smoothness needed is not known in even one case. For example, we point out that the last assertion about stability of fixed points of a mapping of the plane to itself was first proved by J. Moser under the assumption of 333-times differentiability, and only later (by Moser and Rüssman) was the number of derivatives reduced to 6.

E Applications of the theorem on invariant tori and its generalizations

There are many mechanical problems to which we can apply the theorem formulated above. One of the simplest of these problems is the motion of a pendulum under the action of a periodically changing exterior field or under the action of vertical oscillations of the point of suspension.

It is well known that, in the absence of parametric resonance, the lower equilibrium position of a pendulum is stable in the linear approximation. The stability of this position with regard to nonlinear effects (under the further assumption of the absence of resonances of order 3 and 4) can be proved only with the help of the theorem on invariant tori.

In an analogous way we can use the theorem on invariant tori to investigate conditionally-periodic motions of a system of interacting nonlinear oscillators.

Another example is the geodesic flow on a convex surface close to an ellipsoid. There are two degrees of freedom in this system, and we can show that most geodesics on a three-dimensional near-ellipsoidal surface oscillate between two “caustics” close to the lines of curvature of the surface, densely filling out the ring between them. At the same time, we can arrive at theorems on the stability of the two closed geodesics obtained, after deforming the surface, from the two ellipses containing the middle axis of the ellipsoid (in the absence of resonances of orders 3 and 4).

As one more example, we can look at closed trajectories on a billiard table of any convex shape. Among the closed billiard trajectories are those which are stable in the linear approximation, and we can conclude that in the general case they are actually stable. An example of such a stable billiard trajectory is the minor axis of an ellipse; therefore, a closed billiard trajectory, close to the minor axis of an ellipse on a billiard table which is almost the ellipse, is stable.

Application of the theorem on invariant tori to the problem of rotations of an asymmetric heavy rigid body allows us to consider the nonintegrable case of a rapidly rotating body. The problem of rapid rotation is mathematically equivalent to the problem of motion with moderate velocity in a weak gravitational field: the essential parameter is the ratio of potential to kinetic energy. If this parameter is small, then we can use eulerian motion of a rigid body as a first approximation.

By applying the theorem on invariant tori to the problem with two degrees of freedom obtained after eliminating cyclic coordinates (rotations around the vertical) we come to the following conclusion about the motion of a rapidly rotating body: if the kinetic energy of rotation of a body is sufficiently large in comparison with the potential energy, then the length of the vector of angular momentum and its angle with the horizontal remain forever close to their initial values.

It follows from this that the motion of the body will forever be close to a combination of Euler-Poinsot motion and azimuthal procession, except in the case when the initial values of kinetic energy and total momentum are close to those for which the body can rotate around the middle principal axis. In this last case, realized only for special initial conditions, the splitting of separatrices near the middle axis implies a more complicated undulation about the middle axis than in Euler-Poinsot motion.

One generalization of the theorem on invariant tori leads to the theorem on the adiabatic invariance for all time of the action variable in a one-dimensional oscillating system with periodically changing parameters. Here we must assume that the rule for changing parameters is given by a fixed smooth periodic function of “slow time,” and the small parameter of the problem is the ratio of the period of characteristic oscillations and the period of change of parameters. Then, if the period of change of parameters is sufficiently large, the change in the adiabatic invariant of a phase point remains small in the course of an infinite interval of time.

In an analogous way we can prove the adiabatic invariance for all time of the action variable in the problem of a charged particle in an axially-symmetric magnetic field. Violation of axial symmetry in this problem increases the number of degrees of freedom from two to three, so that the invariant tori cease to divide the energy level manifolds, and the phase curve wanders about the resonance zones.

Finally, applying the theory to the three- (or many-) body problem, we succeed in finding conditionally periodic motions of “planetary type.” To describe these motions, we must say a few words about the next approximation after the keplerian one in the problem of the motion of the planets. For simplicity we will limit ourselves to the planar problem.

For each keplerian ellipse, consider the vector connecting the focus of the ellipse (i.e., the sun) to the center of the ellipse. This vector, called the Laplace vector, characterizes both the magnitude of the eccentricity of the orbit and the direction to the perihelion.

The interaction of the planets on one another causes the keplerian ellipse (and therefore the Laplace vector) to change slowly. In addition, there is an important difference between changes in the major semi-axis and changes in the Laplace vector. Namely, the major semi-axis has no secular perturbations, i.e., in the first approximation it merely oscillates slightly around its average value (“Laplace’s theorem”). The Laplace vector, on the other hand, performs both periodic oscillations and secular motion. The secular motion may be obtained if we spread each planet over its orbit proportionally to the time spent in travelling each piece of the orbit, and replace the attraction of the planets by the attraction of the rings obtained, that is, if we average the perturbation over the rapid motions. The true motion of the Laplace vector is obtained from the secular one by the addition of small oscillations; these oscillations are essential if we are interested in small intervals of time (years), but their effect remains small in comparison to the effect of the secular motion if we consider a large interval of time (thousands of years).

Calculations (carried out by Lagrange) show that the secular motion of the Laplace vector of each of n planets moving in one plane consists of the following (if we ignore the squares of the eccentricities of the orbits which are small in comparison with the eccentricities themselves). In the orbital plane of a planet we must arrange n vectors of fixed lengths, each rotating uniformly with its angular velocity. The Laplace vector is their sum.

This description of the motion of the Laplace vector is obtained because the hamiltonian system averaged with respect to rapid motions, which describes the secular motion of the Laplace vector, has an equilibrium position corresponding to zero eccentricities. The described motion of the Laplace vector is the decomposition of small oscillations near this equilibrium position into characteristic oscillations. The angular velocities of the uniformly rotating components of the Laplace vector are the characteristic frequencies, and the lengths of these components determine the amplitudes of the characteristic oscillations.

We note that the motion of the Laplace vector of the earth is, apparently, one of the factors involved in the occurrence of ice ages. The reason is that, when the eccentricity of the earth’s orbit increases, the time it spends near the sun decreases, while the time it spends far from the sun increases (by the law of areas); thus the climate becomes more severe as the eccentricity increases. The magnitude of this effect is such that, for example, the amount of solar energy received in a year at the latitude of Leningrad (60°N) may attain the value which now corresponds to the latitudes of Kiev (50°N) (for decreased eccentricity) and Taimir (80°N) (for increased eccentricity). The characteristic time of variation of the eccentricity (tens of thousands of years) agrees well with the interval between ice ages.

The theorems on invariant tori lead to the conclusion that for planets of sufficiently small mass, there is, in the phase space of the problem, a set of positive measure filled with conditionally periodic phase curves such that the corresponding motion of the planets is nearly motion over slowly changing ellipses of small eccentricities, and the motion of the Laplace vectors is almost that given by the approximation described above. Furthermore, if the masses of the planets are sufficiently small, then motions of this type fill up most of the region of phase space corresponding in the keplerian approximation to motions of the planets in the same direction over non-intersecting ellipses of small eccentricities.

The number of degrees of freedom in the planar problem with n planets is equal to 2n if we take the sun to be fixed. The integral of angular momentum allows us to eliminate one cyclic coordinate; however, there are still too many variables for the invariant tori to divide an energy level manifold (even if there are only two planets this manifold is five-dimensional, and the tori are three-dimensional). Therefore, in this problem we cannot draw any conclusions about the preservation of the large semi-axes over an infinite interval of time for all initial conditions, but only for most initial conditions.

A problem with two degrees of freedom is obtained by further idealization. We replace one of the two planets by an “asteroid” which moves in the field of the second planet (“Jupiter”), not perturbing its motion.

The problem of the motion of such an asteroid is called the restricted three-body problem. The planar restricted three-body problem reduces to a system with two degrees of freedom, periodically depending on time, for the motion of the asteroid. If, in addition, the orbit of Jupiter is circular, then in a coordinate system rotating together with it we obtain, for the motion of the asteroid, an autonomous hamiltonian system with two degrees of freedom—called the planar restricted circular three-body problem.

In this problem, there is a small parameter—the ratio of the masses of Jupiter and the sun. The zero value of the parameter corresponds to unperturbed keplerian motion of the asteroid, represented in our four-dimensional phase space as a conditionally-periodic motion on a two-dimensional torus (since the coordinate system is rotating). One of the frequencies of this conditionally-periodic motion is equal to 1 for all initial conditions; this is the angular velocity of the rotating coordinate system, i.e., the frequency of the revolution of Jupiter around the sun. The second frequency depends on the initial conditions (this is the frequency of the revolution of the asteroid around the sun) and is fixed on any fixed three-dimensional level manifold of the hamiltonian function.

Therefore, the nondegeneracy condition is not fulfilled in our problem, but the condition for isoenergetic nondegeneracy is fulfilled. Kolmogorov’s theorem applies, and we conclude that most invariant tori with irrational ratios of frequencies are preserved in the case when the mass of the perturbing planet (Jupiter) is not zero, but sufficiently small.

Furthermore, the two-dimensional invariant tori divide the three-dimensional level manifolds of the hamiltonian function. Therefore, the magnitude of the major semi-axis and the eccentricity of the keplerian ellipse of the asteroid will remain forever near their initial values if, at the initial moment, the keplerian ellipse does not intersect the orbit of the perturbing planet, and if the mass of this planet is sufficiently small.

In addition, in a stationary coordinate system, the keplerian ellipse of the asteroid could slowly rotate, since our system is only isoenergetically non-degenerate. Therefore under perturbations of an invariant torus frequencies are not preserved, but only their ratios. As a result of a perturbation, the frequency of azimuthal motion of the perihelion of the asteroid in a stationary coordinate system could be slightly different from Jupiter’s frequency, and then in the stationary system the perihelion would slowly rotate.

¹⁰⁸

Cf. V. I. Arnold, Instability of dynamical systems with many degrees of freedom. Soviet Mathematics 5:3 (1964) 581–585.

¹⁰⁹

N. N. Nekhoroshev, The behavior of hamiltonian systems that are close to integrable ones, Functional Analysis and Its Applications, 5:4 (1971); Uspekhi Mat. Nauk 32:6 (1977).

¹¹⁰

It is understood that the tori are slightly deformed under perturbations.

Appendix 9: Poincaré’s geometric theorem, its generalizations and applications

In his study of periodic solutions of problems in celestial mechanics, H. Poincaré constructed a very simple model which contains the basic difficulties of the problem. This model is an area-preserving mapping of the planar circular annulus to itself. Mappings of this form arise in the study of dynamical systems with two degrees of freedom. In fact, a mapping of a two-dimensional surface of section to itself is defined as follows: each point p of the surface of section is taken to the next point at which the phase curve originating at p intersects the surface (cf. Appendix 7). Thus, a closed phase curve corresponds to a fixed point of the mapping or of a power of the mapping. Conversely, every fixed point of the mapping or of a power of the mapping determines a closed phase curve.

In this way, a question about the existence of periodic solutions of problems in dynamics is reduced to a question about fixed points of area-preserving mappings of the annulus to itself. In studying such mappings, Poincaré arrived at the following theorem.

A Fixed points of mappings of the annulus to itself

Theorem. Suppose that we are given an area-preserving homeomorphic mapping of the planar circular annulus to itself. Assume that the boundary circles of the annulus are turned in different directions under the mapping. Then this mapping has at least two fixed points.

The condition that the boundary circles are turned in different directions means that, if we choose coordinates (x, y mod 2π) on the annulus so that the boundary circles are x = a and x = b, then the mapping is defined by the formula

where the functions f and g are continuous and 2π-periodic in y, with f(a, y) ≡ a, f(b, y) ≡ b, and g(a, y) < 0, g(b, y) > 0 for all y.

The proof of this theorem, announced by Poincaré not long before his death, was given only later by G. D. Birkhoff (cf. his book, Dynamical Systems, Amer. Math. Soc., 1927).

There remain many open questions related to this theorem; in particular, attempts to generalize it to higher dimensions are important for the study of periodic solutions of problems with many degrees of freedom. The argument Poincaré used to arrive at his theorem applies to a whole series of other problems. However, the intricate proof given by Birkhoff does not lend itself to generalization. Therefore, it is not known whether the conclusions suggested by Poincaré’s argument are true beyond the limits of the theorem on the two-dimensional annulus. The argument in question is the following.

B The connection between fixed points of a mapping and critical points of the generating function

We will define a symplectic diffeomorphism of the annulus

with the help of the generating function X y + S(X, y), where the function S is 2π-periodic in y. For this to be a diffeomorphism we need that ∂X/∂x ≠ 0. Then

and, therefore, the fixed points of the diffeomorphism are critical points of the function F(x, y) = S(X(x, y), y). This function F can always be constructed by defining it as the integral of the form (x − X)dy + (Y − y)dX. The gradient of this function is directed either inside the annulus or outside on both boundary circles at once (by the condition on rotation in different directions).

But every smooth function on the annulus whose gradient on both boundary circles is directed inside the annulus (or out from it) has a critical point (maximum or minimum) inside the annulus. Furthermore, it can be shown that the number of critical points of such a function on the annulus is at least two. Therefore, we could assert that our diffeomorphism has at least two critical points if we were sure that every critical point of F is a fixed point of the mapping.

Unfortunately, this is true only under the condition that ∂X/∂x ≠ 0, so that we can express F in terms of X and y. Thus our argument is valid for mappings which are not too different from the identity. For example, it is sufficient that the derivatives of the generating function S be less than 1.

A refinement of this argument (with a different choice of generating function¹¹¹) shows that it is even sufficient that the eigenvalues of the Jacobi matrix D(X, Y)/D(x, y) never be equal to −1 at any point, i.e., that our mapping never flips the tangent space at any point. Unfortunately, all such conditions are violated at some points for mappings far from the identity. The proof of Poincaré’s theorem in the general case uses entirely different arguments.

The connection between fixed points of mappings and critical points of generating functions seems to be a deeper fact than the theorem on mappings of a two-dimensional annulus into itself. Below, we give several examples in which this connection leads to meaningful conclusions which are true under some restrictions whose necessity is not obvious.

C Symplectic diffeomorphisms of the torus

Consider a symplectic diffeomorphism of the torus which fixes the center of gravity

where x and y mod 2π are angular coordinates on the torus, “symplectic” means the Jacobian D(X, Y)/D(x, y) is equal to 1, and the condition on preserving the center of gravity means that the average values of the functions f and g are equal to zero.

Theorem. Such a diffeomorphism has at least four fixed points, counting multiplicity, and at least three geometrically different ones, at least under the assumption that the eigenvalues of the Jacobi matrix are not equal to −1 at any point.

The proof is based on consideration of the function on the torus given by the formula

and on the fact that a smooth function on the torus has at least four critical points (counting multiplicity) of which at least three are geometrically different.

Attempts at proving this theorem without restrictions on the eigenvalues meet with difficulties very similar to those encountered by Poincaré in the theorem about the annulus.

We note that the theorem about the annulus would follow from the theorem about the torus if in the latter we could throw out the condition on the eigenvalues. In fact, we can put together a torus from two copies of our annulus, inserting a narrow connecting annulus along each of the two boundary circles.

Then we can extend our mapping of the annulus to a symplectic diffeomorphism of the torus such that: (1) on each of the two large annuli the diffeomorphism coincides with the original, (2) on each of the connecting annuli the diffeomorphism has no fixed points, and (3) the center of gravity remains fixed.

The construction of such a diffeomorphism of the torus uses the property that the boundary circles rotate in different directions. On each connecting annulus all points are translated in the same direction as on both circles bounding the connecting annulus. Since the translations on the connecting annuli are in opposite directions, the size of the translations can be chosen to ensure preservation of the center of gravity.

Now out of four fixed points on the torus, two must lie in the original annulus, and we obtain the theorem on annuli from the theorem on tori.

The theorem on tori formulated above can be generalized to other symplectic manifolds, both two-dimensional and many-dimensional. To formulate these generalizations, we must first reformulate the condition of preservation of the center of gravity.

Let g : M → M be a symplectic diffeomorphism. We say that g is homologous to the identity if it can be connected to the identity diffeomorphism by a smooth curve g_t consisting of symplectic diffeomorphisms such that the field of velocities

at each moment of time t has a single-valued hamiltonian function. It can be shown that the symplectic diffeomorphisms homologous to the identity form the commutator subgroup of the connected component of the identity in the group of all symplectic diffeomorphisms of the manifold.

In the case when our manifold is the two-dimensional torus, the symplectic diffeomorphisms homologous to the identity are exactly those which preserve the center of gravity.

Thus we come to the following generalization of Poincaré’s theorem.

Theorem. Every symplectic diffeomorphism of a compact symplectic manifold, homologous to the identity, has at least as many fixed points as a smooth function on this manifold has critical points (at least if this diffeomorphism is not too far from the identity).¹¹²

We note that the condition of the mapping being homologous to the identity is essential, as we see already from the example of a translation on the torus, which has no fixed points at all.

As to the last restriction (that the diffeomorphism be not too far from the identity), it is not clear whether it is essential.^112a In the case that our manifold is the two-dimensional torus, it is sufficient that none of the eigenvalues of the Jacobi matrix of the diffeomorphism (in any global symplectic coordinate system on ℝ²ⁿ) be equal to minus one.

A restriction of this sort may be necessary in higher-dimensional problems. It is not impossible that Poincaré’s theorem is due to an essentially two-dimensional effect, as is the following theorem of A. I. Shnirel’man and N. A. Nikishin: every area-preserving diffeomorphism of the two-dimensional sphere to itself has at least two geometrically different fixed points.

The proof of this theorem is based on the fact that the index of the gradient vector field of a smooth function of two variables at an isolated critical point cannot be greater than 1 (although it can be equal to 1, 0, −1, −2, −3,...), and the sum of the indices of all the fixed points of an orientation-preserving diffeomorphism of the two-dimensional sphere to itself is equal to 2. On the other hand, the index of the gradient of a smooth function of a large number of variables at a critical point can take any integer value.

D Intersections of lagrangian manifolds

Poincaré’s argument can be given a slightly different form if on every radius of the annulus we consider the points shifted only radially. There are such points on every radius, since the boundary circles of the annulus turn in different directions. Assume that we can make a smooth curve of radially shifting points, separating the interior and exterior circles of the annulus. Then the image of this curve under our mapping must intersect the curve (since the regions into which the curve divides the annulus are carried to regions of equal area).

If this curve and its image each intersect each radius once, then the points of intersection of the curve with its image are obviously fixed points of the mapping.

Part of this argument can be carried out in higher dimensions, and this gives useful results about periodic solutions of problems in dynamics. The role of the annulus in the many-dimensional case is played by the phase space: the direct product of a region in euclidean space with a torus of the same dimension (the annulus is the product of an interval with the circle). A symplectic structure on the phase space is defined in the usual way, i.e., it has the form

, where the x_k are action variables and y_k are angle variables.

It is not difficult to explain which symplectic diffeomorphisms of our phase space are homologous to the identity. Namely, a symplectic diffeomorphism A is homologous to the identity if it can be obtained from the identity by a continuous deformation and if

for any closed contour γ (not necessarily homologous to zero). The condition that the transformation be homologous to the identity prohibits systematic shifts along the x-direction (“evolution of the action variables”), but permits shifts along the tori.

We consider one of the n-dimensional tori x = c = const and apply to it our symplectic diffeomorphism homologous to the identity. It turns out that the original torus intersects its image in at least 2ⁿ points (counting multiplicities), of which at least n + 1 are geometrically different, at least under the assumption that the image torus has an equation of the form x = f(y), where f is smooth.

For n = 1, this assertion means that each of the concentric circles constituting the annulus intersects its image in at least two points. This also follows from the preservation of area, so that the assumption that the image has equation x = f(y) is not necessary.

Whether or not this assumption is necessary in higher dimensions is not known. If we make this assumption, the proof proceeds in the following way.

We note that the original torus. is a lagrangian submanifold of phase space. Our diffeomorphism is symplectic, so the image torus is also lagrangian. Therefore, the 1-form (x − c)dy on it is closed. Furthermore, this form on the torus is the total differential of some single-valued smooth function F, since our diffeomorphism is homologous to the identity, and therefore for any closed contour γ we have

We note that points of intersection of the torus with its image are critical points of the function F (since at them dF = (x − c)dy = 0).

From the condition of single-valued projection of the image torus (i.e., from the fact that the image torus has equation x = f(y)) it follows that, conversely, all critical points of the function F are points of intersection of our tori. In fact, under these conditions y can be taken for local coordinates on the torus, and therefore the fact that dF is zero for all vectors tangent to the image torus implies x = c.

A smooth function on an n-dimensional torus has at least 2ⁿ critical points, counting multiplicities, of which at least n + 1 are geometrically different (cf., for example, Milnor, “Morse Theory,” Princeton University Press, 1967).

Therefore, our tori intersect in at least 2ⁿ points (counting multiplicities), and there are at least n + 1 geometrically different points of intersection.

Exactly the same argument shows that any lagrangian torus intersects its image in at least 2ⁿ points (of which at least n + 1 are geometrically different), under the assumption that both the original torus and its image project single-valued onto the y-space, i.e., are given by equations y = f(x) and x = g(y), respectively. Besides, this statement reduces to the previous one by the canonical transformation (x, y) → (x −f(y), y).

E Applications to determining fixed points and periodic solutions

We now consider a symplectic transformation, homologous to the identity, of the special form which arises in integrable problems in dynamics, i.e., of the form

Here x ∈ ℝⁿ is the action variable and y mod 2π ∈ Tⁿ is the angular coordinate.

We assume that on the torus x = x₀ all the frequencies are commensurable:

and that the nondegeneracy condition

is satisfied.

Theorem. Every symplectic diffeomorphism A homologous to the identity and sufficiently close to A₀ has, near the torus x = x₀, at least 2ⁿ periodic points ξ of period N (such that A^Nξ = ξ), counting multiplicity.

The proof could be reduced to investigating the intersection of two lagrangian submanifolds of a 4n-dimensional space (ℝⁿ × Tⁿ × ℝⁿ × Tⁿ) with Ω = dx ⋀ dy − dX ⋀ dY, one of which is the diagonal (X = x, Y = y) and the other the graph of the mapping A^N.

However, it is easier to directly construct a suitable function on the torus. In fact, the mapping has the form

By the implicit function theorem, the mapping A^N has, near the torus x = x₀, a torus which is displaced only radially ((x, y) → (X, Y)) and is given by an equation of the form x = f(y); its image is also given by an equation x = g(y) of the same form. In this notation, X(f(y), y) = g(y), Y(f(y), y) = y.

Since A is homologous to the identity, it follows that A^N has a single-valued global generating function of the form Xy + S(X, y), where S has period 2π in the variable y.

The function F(y) = S(X(f(y), y), y) has at least 2ⁿ critical points y_k on the torus. All the points ξ_k = ( f (y_k), y_k ) are fixed points for A^N. In fact,

Therefore, since dF |_{y_k} = 0, it follows that , as was to be shown.

We turn now to closed orbits of conservative systems. Using the terminology of Appendix 8, we can formulate the result as follows.

Corollary. Upon disintegration of an n-dimensional torus, entirely filled up by closed trajectories of an isoenergetically nondegenerate system, at least 2^{n − 1} closed trajectories of the perturbed problem are formed (counting multiplicities), among which at least n are geometrically distinct, at least if the perturbation is sujficiently small.

The proof is reduced to the preceding theorem with the help of a (2n − 2)-dimensional surface of section. We must first choose angular coordinates y such that the closed trajectories of the unperturbed problem on the torus are given by the equations

, and then define a surface of section by y₁ = 0.

In the case of two degrees of freedom we can apply Poincaré’s theorem to the annuli formed by intersecting invariant tori with a two-dimensional intersecting surface. We obtain the following result:

In the gap between two two-dimensional invariant tori of a system with two degrees of freedom there are always at least two closed phase trajectories, if the ratio of the frequencies of conditionally-periodic motions on these tori are different.

In this way we obtain many periodic solutions in all problems with two degrees of freedom, where invariant tori are found (for example, in the bounded circular three-body problem, in the problem of closed geodesics, etc.). There is even a conjecture that in hamiltonian systems of “general form” with compact phase spaces, the closed phase curves form a dense set.¹¹³ However, if this is true, the closedness of most of these curves has little importance since their periods are extremely large.

As an example of applying Poincaré’s methods to systems with more than two degrees of freedom, we have a theorem of Birkhoff about the existence of infinitely many periodic solutions close to a given linearly stable periodic solution of general form (or about the existence of infinitely many periodic points in a neighborhood of a fixed point of a linearly stable nondegenerate symplectic mapping of a space to itself). In the proof, the mapping is first approximated by its normal form, and then the connection between fixed points of a mapping and critical points of the generating function is used.

Knowing periodic solutions allows us, among other things, to prove the nonexistence of first integrals (other than the classical ones) in many problems in dynamics. Assume, for example, that on some level manifold of known integrals we discover a periodic trajectory which is unstable. Its separatrices, in general, form a complicated network, which we considered in Appendix 7. If this phenomenon of splitting of separatrices is discovered, and if we can show that the separatrices are not contained in any manifold of lower dimension than the level manifold we are considering, then we can be sure that the system has no new first integrals.

The complicated behavior of phase curves, which obstructs the existence of first integrals, can often be detected without the help of periodic solutions by one simple glance at the picture, obtained by a computer, formed by the intersection of the phase curves with the surface of section.

F Invariance of generating functions

We have already noted the discouraging noninvariance of generating functions with respect to the choice of a canonical coordinate system on a symplectic manifold. On the other hand, we repeatedly used the connection between fixed points of a mapping and critical points of the generating function.

It turns out that, although generally the generating function is not invariantly associated to the mapping, near a fixed point there is an invariant connection. More precisely, suppose we are given a symplectic diffeomorphism fixing some point. In a neighborhood of this point, we define a “generating function”

with the help of some symplectic coordinate system (x, y).¹¹⁴ Using another symplectic coordinate system (x′, y′), we construct a generating function Φ′ in the same way.

Theorem. If the linearization of the symplectic diffeomorphism at the fixed point has no eigenvalues equal to −1, then the functions Φ and Φ′ are equivalent in a neighborhood of the fixed point, in the sense that there is a diffeomorphism g (in general not symplectic) such that

For the proof see the article: A. Weinstein, The invariance of Poincaré’s generating function for canonical transformations, Inventiones Mathematicae, 16, No. 3 (1972), 202–214.

It should be noted that two diffeomorphisms with generating functions which are equivalent in a neighborhood of a fixed point are not necessarily equivalent in the class of symplectic diffeomorphisms (for example, rotation and rotation through an angle which depends on the radius, with nondegenerate quadratic parts of the generating function at zero).

Since the first edition of this book had appeared in 1974, the content of this Appendix has grown into a new branch of mathematics: symplectic topology. To describe this development (triggered by the conjectures in this Appendix, which still remain, for general manifolds, neither proved, nor disproved) one would need a book longer than the present one.

The interested reader might follow this development using the (incomplete) bibliography on pages 503–509.

¹¹¹

¹¹²

[For a proof, see V. Arnold, Sur les propriétés topologiques des applications globalement canoniques de la mécanique classique, C. R. Acad. Sci. Paris, 1965 and A. Weinstein, Symplectic manifolds and their lagrangian submanifolds, Advances in Math. 6 (1971) 329–346.]

^112a

[Recently, Conley and Zehnder, followed by others, have proved the theorem for tori, surfaces, and other manifolds, without the restriction of closeness to the identity.]

¹¹³

A proof of this density in the C¹-topology has been announced by C. Pugh and C. Robinson. [Editor’s note]

¹¹⁴

The increase of this function along any arc is equal to the integral of the form defining the symplectic structure over the band formed by the rectilinear intervals connecting each point with its image. Therefore, the function Φ is associated to the mapping invariantly with respect to linear canonical changes of coordinates.

Appendix 10: Multiplicities of characteristic frequencies, and ellipsoids depending on parameters

Several times in this course we have encountered families of ellipsoids in euclidean space. For example, in studying the dependence on parameters of characteristic frequencies of small oscillations, we encountered equipotential surfaces which were ellipsoids in euclidean space, depending upon the degree of rigidity of the system, (the metric of the space was defined by the kinetic energy). Another example was the ellipsoid of inertia of a rigid body (the parameter here was the shape of the rigid body and its distribution of mass).

Here we will consider the general problem of describing the values of the parameter for which the spectrum of eigenvalues degenerates, i.e., the corresponding ellipsoid becomes an ellipsoid of revolution. We note that the eigenvalues of a quadratic form on euclidean space (or the lengths of the axes of an ellipsoid) change continuously under continuous changes of the parameters of a system (the coefficients of the form). It seems natural to expect that in a system depending on one parameter, under changes of the parameter, at certain moments one of the eigenvalues would collide with another, so that for these values of the parameter the system would have a multiple spectrum.

Suppose, for example, that we want to make the ellipsoid of inertia of a rigid body into an ellipsoid of revolution by movement of an adjustable mass along an arc rigidly attached to the body so that there is one parameter at our disposal. The three major axes a, b, and c will be continuous functions of this parameter, and at first glance it seems that for a suitable value of the parameter (p) we can achieve equality of two of the axes, say a(p) = b(p). It turns out, however, that this is not so, and that generally we need to attach at least two adjustable masses to make the ellipsoid of inertia an ellipsoid of revolution.

In general, a multiple spectrum in typical families of quadratic forms is observed only for two or more parameters, while in one-parameter families of general form the spectrum is simple for all values of the parameter. Under a change of parameter in the typical one-parameter family, the eigenvalues can approach closely, but when they are sufficiently close, it is as if they begin to repel one another. The eigenvalues again diverge, disappointing the person who hoped, by changing the parameter, to achieve a multiple spectrum.

In this appendix we consider the reasons for this seemingly strange behavior of the eigenvalues, and we discuss briefly analogous questions for systems with various groups of symmetries.

A The manifold of ellipsoids of revolution

Consider the set of all possible quadratic forms on the n-dimensional euclidean space ℝⁿ. This set has itself a natural structure of a vector space of dimension n(n + 1)/2. For example, the quadratic forms on the plane form a three-dimensional space (a form Ax² + 2Bxy + Cy² has as coordinates the three numbers A, B, and C).

The positive-definite forms form an open region in this space of all quadratic forms (for example, in the case of the plane this is the inside of one nappe of the cone B² = AC of degenerate forms).

Every ellipsoid centered at the origin defines a positive-definite quadratic form, for which it is the level set of 1; conversely, the set of level 1 of any positive-definite quadratic form is an ellipsoid. We can therefore identify the sets of positive-definite quadratic forms and ellipsoids centered at the origin. In this way we give the set of ellipsoids with center 0 in ℝⁿ the structure of a smooth manifold of dimension n(n + 1)/2 (this manifold is covered by one chart: a region in the space of quadratic forms).

Now consider the set of all ellipsoids of revolution. We claim that this set has codimension 2 in the space under consideration, i.e., it is given by two independent equations, rather than one as it would seem at first glance. More precisely, we have

Theorem 1. The set of ellipsoids of revolution is a finite union of smooth submanifolds of codimension 2 and higher in the manifold of all ellipsoids.

The codimension of a manifold is the difference between the dimension of the ambient space and the dimension of the submanifold.

Proof. We first consider an ellipsoid in n-dimensional space which has two equal axes, and whose other axes are distinct. Such an ellipsoid is defined by the directions of the distinct axes, which gives

different parameters, and also by the magnitudes of the axes, which gives n − 1 parameters. Thus the total number of parameters is

which is two less than the dimension of the space of all ellipsoids (which is n(n + 1)/2). This count of parameters also shows that the set of ellipsoids with exactly two equal axes is a manifold.

As for ellipsoids with a larger number of equal axes, it is clear that they form a set of even smaller dimension. A rigorous proof follows from the following lemma.

Lemma. The set of all ellipsoids with v₂ double, v₃ triple, v₄ four fold axes, etc. is a smooth submanifold of the manifold of all ellipsoids, with codimension

The proof of this theorem reduces to the same kind of parameter count as in the special case analyzed above (which corresponds to v₂ = 1, v₃ = v₄ = ⋯ = 0). The reader can easily carry out this calculation, noting first that the dimension of the manifold of all k-dimensional subspaces in an n-dimensional vector space is equal to k(n − k) (since a k-dimensional plane in general position in an n-dimensional space can be thought of as the graph of a mapping from a k-dimensional space to an (n − k)-dimensional space, and such a mapping is given by a rectangular k × (n − k) matrix).

Example. Consider the case n = 2, i.e., ellipses in the plane. An ellipse is determined by three parameters (e.g., the lengths of the two axes and the angle giving the direction of one of them). Thus the manifold of ellipses in the plane is three-dimensional, as it must be by our formula.

A circle, however, is determined by one parameter (the radius). Thus the manifold of circles in the space of ellipses is a line in a three-dimensional space, and not a surface as it would seem at first glance.

This “paradox” becomes, perhaps, clearer from the following calculation. The quadratic forms Ax² + 2Bxy + Cx² with different eigenvalues form a submanifold of the three-dimensional space with coordinates A, B, and C, given by one equation λ₁ − λ₂ = 0, where λ_{1 . 2}(A, B, C) are the eigenvalues. However, the left-hand side of this equation is the sum of two squares, as is clear from the formula for the discriminant of the characteristic equation:

Thus the single equation Δ = 0 determines a line in the three-dimensional space of quadratic forms (A = C, B = 0), and not a surface.

A simple consequence of the fact that the manifold of ellipsoids of revolution has codimension 2 is that this manifold does not divide the space of all ellipsoids (and the manifold of quadratic forms with a multiple spectrum does not divide the space of quadratic forms), as a line does not divide a three-dimensional space. Therefore, we can assert not only that in an ellipsoid in “general position” all the axes share different lengths, but also that any two such ellipsoids can be connected by a smooth curve in the space of ellipsoids consisting entirely of ellipsoids with axes of different lengths. Furthermore, if two ellipsoids in general position are connected by a smooth curve in the space of ellipsoids which contains a point which is an ellipsoid of revolution, then by an arbitrarily small displacement of the curve we can remove it from the set of ellipsoids of revolution, so that on the new curve all the points will be ellipsoids without multiple axes.

One consequence of what we have said is a simple proof of the theorem that characteristic frequencies increase when the rigidity of a system is increased. The derivative of a non-multiple eigenvalue of a quadratic form with respect to a parameter is determined by the derivative of the quadratic form in the corresponding characteristic direction. If the rigidity is increased, the potential energy increases in every direction, including the characteristic directions. Thus the characteristic frequencies also increase. Hence we have proved the theorem on the growth of frequencies in the case when it is possible to go from the original system to a more rigid system, avoiding multiple spectra. The proof in the presence of multiple spectrum is now obtained by a passage to the limit, based on the fact that the interior of the path from the original system to the more rigid system can be removed by an arbitrarily small perturbation from the set of systems with multiple spectra.

In summary, we can say that a typical one-parameter family of ellipsoids (or quadratic forms in euclidean space) does not contain ellipsoids of revolution (quadratic forms with multiple spectra). Applying this to an ellipsoid of inertia we obtain the conclusion above about the necessity for two adjustable masses.

We turn now to two-parameter systems. It follows from our calculations that, in a typical two-parameter system, ellipsoids of revolution are encountered only at isolated points of the parameter plane.

Consider, for example, a convex surface in three-dimensional euclidean space. The second fundamental form of the surface determines an ellipse in the tangent space at every point. Therefore, we have a two-parameter family of ellipses (which can be translated to one plane by choosing a local coordinate system near a point on the surface). We come to the conclusion that, at every point of the surface except at certain isolated points, the ellipse has axes of different lengths. Therefore, on surfaces of general form, there are two orthogonal fields of directions (the major and minor axes of the ellipses) with isolated singular points. In differential geometry these directions are called the directions of principal curvature, and these singular points are called umbilical points. For example, on the surface of an ellipsoid there are four umbilical points; they lie on the ellipse containing the major and minor axes, and two of them are clearly visible in the picture of the geodesics on an ellipsoid (cf. Figure 207).

In exactly the same way, in a typical three-parameter family, ellipsoids of revolution are encountered only on certain lines in the three-dimensional parameter space. For example, if at every point of three-dimensional euclidean space, we are given an ellipsoid (i.e., a symmetric two-index tensor), then the singularities of the fields of principal axes will be, in general, on certain lines (where two of the three fields of directions have discontinuities). These lines, like the umbilical points in the preceding example, are of several different types. Their classification (for typical fields of ellipsoids) can be obtained from the classification of singularities of lagrangian projections given in Appendix 12.

In a typical four-parameter family, ellipsoids of revolution occur on two-dimensional surfaces in the space of parameters. These surfaces have no singularities other than transverse intersections at isolated points of the parameter space; these values of the parameters correspond to ellipsoids with two (different) pairs of equal axes.

Triple axes appear first for five parameters, at isolated points of the parameter space. The values of the parameters corresponding to ellipsoids with a double axis form a three-dimensional manifold in the five-dimensional parameter space with two types of singularities: transversal intersections of two branches along some curve and conic singularities at isolated points (not lying on this curve), i.e., at points of the parameter space corresponding to ellipsoids with three equal axes. These conic singularities have the following structure: by intersecting the three-dimensional manifold of ellipsoids of revolution with a four-dimensional sphere of small radius with center at the singular point, we obtain two copies of the projective plane. The resulting embeddings of the projective plane in the four-dimensional sphere are diffeomorphic to the embedding given by the five spherical harmonics of degree two on the two-dimensional sphere (five linear combinations of the functions x_i x_j, orthonormal in the space of functions on the sphere

, orthogonal to the identity, give an even mapping of S² into S⁴ and, therefore, an embedding ℝP² → S⁴

It remains to describe the behavior of the eigenvalues of a quadratic form in a typical two-parameter family as the parameter approaches a singular point where the two eigenvalues coincide. A little calculation shows that the graph of the pair of eigenvalues we are considering has, over the plane of parameters near the singular point, the form of a two-sheeted cone, whose vertex corresponds to the singular point, and each of its nappes to one of the eigenvalues (Figure 243).

Figure 243

Characteristic frequencies of one- and two-parameter families of oscillating systems of general form

A typical one-dimensional subfamily of our two-dimensional family has the form of a curve in the plane of parameters which does not pass through any singular points. Every one-parameter family which contains a singular point can be removed from it by a small perturbation; the resulting one-parameter family will be a curve in the space of parameters passing near the singular point. The graph of the eigenvalues over a curve on the plane of parameters passing near a singular point consists of those points of the cone which project onto this curve. Therefore, this graph near the singular point is close to a hyperbola, resembling a pair of intersecting straight lines (a pair of straight lines would be obtained if our one-parameter family passed through the singular point).

This discussion of eigenvalues of two-parameter systems of quadratic forms explains the strange behavior of characteristic frequencies when a single parameter is varied: in general (except for completely singular cases), when a single parameter is varied the characteristic frequencies can approach one another but cannot collide; after approaching, they must again go off in different directions.

B Application to the study of oscillations of continuous media

The general argument above has numerous applications in the study of the dependence on parameters of the characteristic frequencies of various mechanical systems with finitely many degrees of freedom; however, the most interesting applications may be to systems with infinitely many degrees of freedom, describing oscillations of continuous media. These applications are based on the fact that the codimensions of manifolds of ellipsoids with given multiplicities of axes are determined by these multiplicities and do not depend on the dimension of the space.

For example, the codimension of the set of ellipsoids of revolution in the manifold of all ellipsoids is equal to two in a space of any dimension; therefore, it is natural to assume that in the infinite “manifold” of ellipsoids in infinite-dimensional hilbert space, the set of ellipsoids of revolution has codimension 2 (and, in particular, the space of ellipsoids without multiple axes is connected).

Of course, arguments of this kind need rigorous justification. We will not, however, occupy ourselves with this, but we will see what conclusions follow from the argument above if we apply it to the problem of oscillations in continuous media.

The kinetic energy of a continuous medium filling a compact region D is expressed in terms of the deviation u of a point x from equilibrium by the formula

For definiteness, we can take the medium to be a membrane (in this case the region D is two-dimensional, and the deviation u one-dimensional). The kinetic energy defines a euclidean structure on the configuration space of the problem (i.e., in the space of functions u). The potential energy is given by the Dirichlet integral

(from the mathematical point of view these data constitute the definition of the membrane).

The squares of the characteristic frequencies of the membrane are the eigenvalues of the quadratic form U on the configuration space, whose metric is defined using the kinetic energy. We assume that a typical membrane corresponds to a typical quadratic form (this assumption means transversality of the manifold of quadratic forms corresponding to different membranes to the manifold of forms with multiple eigenvalues). If we believe in this property of general position, we come to the following conclusions.

For membranes in general position, all the characteristic frequencies are different. We can go from one membrane in general position to another by a continuous path consisting entirely of membranes with simple spectra. Furthermore, a typical path connecting any two membranes does not contain even one membrane with a multiple spectrum (except, possibly, the ends of the path).

By varying two parameters of the membrane we can make two characteristic frequencies coincide; to obtain a triple frequency, we must have at our disposal five independent parameters; for a four-fold frequency we need ten parameters, etc.

If, by starting from a membrane with a simple spectrum and continuously deforming it, we pass to another membrane with a simple spectrum along any path in general position, then as a result, the k-th largest characteristic frequency of the second membrane is always obtained independently of the path of deformation from the k-th largest characteristic frequency of the original membrane; continuations of characteristic functions, however, do generally depend on the path of deformation (i.e., by changing the path, the sign of the resulting characteristic function can be changed).

In particular, if by starting from a membrane with a simple spectrum and deforming it we describe a closed path in the space of membranes and return to the original membrane, bypassing the set of membranes with multiple spectra (which has codimension 2), then the k-th characteristic frequency returns to its original value, while the k-th characteristic function may change sign. [Editor’s note: Conclusions like this have been proven by K. Uhlenbeck (Amer. J. Math. 98 (1976), 1059–1078).]

C The effect of symmetries on the multiplicity of the spectrum

A multiple spectrum is the exception in systems of general form, but it is not removable under small perturbations in cases when the given system is symmetric and the deformations preserve the symmetry.

Consider, for example, a system of three identical masses at the vertices of an equilateral triangle, connected to one another and to the center of the triangle by identical springs, and capable of moving in the plane of the triangle. The system has rotational symmetry of order 3. Therefore, there is a linear operator g acting on the configuration space (which has dimension 6), whose third power is equal to 1 and which leaves invariant both the euclidean structure of the configuration space and the ellipsoid in the configuration space giving the potential energy.

It follows that this ellipsoid must be an ellipsoid of revolution. If we let g be the indicated operator on the configuration space and ξ a vector on the major axis of the ellipsoid, then the axis in the direction gξ is also a major axis (since the rotation g takes the ellipsoid to itself).

There are two possibilities for the vector gξ: either gξ = ξ, or the vectors ξ and gξ are linearly independent. In the second case, the plane spanned by the vectors ξ and gξ consists entirely of major axes. Therefore, the eigenvalues corresponding to these axes are at least double. The space spanned by the three vectors ξ, gξ, and g²ξ is invariant under g. It is either two dimensional (in which case g acts by a 120° rotation) or three dimensional (in which case g acts by the same rotation around ξ + gξ + g²ξ as an axis). In the latter case, we may choose the direction of this sum for one of the principal axes of the ellipsoid, with the two other principal axes in the three-dimensional space perpendicular to it. It is therefore possible to choose the principal axes for an ellipsoid which is invariant under an orthogonal transformation of order three (in a space of any number of variables), so that each axis is either fixed under the transformation or is rotated by 120° in an invariant plane spanned by it and another axis (orthogonal to it, as well as to all other axes) of the same length. In what follows, we shall assume that the axes of ellipsoids and the directions of the corresponding characteristic oscillations have been chosen in the manner just described.

Our argument shows that characteristic oscillations of a system with third-order rotational symmetry can be of two types: those invariant under rotation by 120° (gξ = ξ) and those passing under such a rotation to independent characteristic oscillations with the same frequency (gξ and ξ independent). In the second case, there actually arise three forms of characteristic oscillations with the same frequency (ξ, gξ, and g²ξ), but only two of them are independent:

since the sum of three vectors of equal length on the plane forming angles of 120° is equal to zero.

The number of characteristic oscillations of our system is generally equal to 6. To find out how many of them are of the first (symmetric) and second (nonsymmetric) type, we can use the following argument. Consider the limiting case, when each of the masses oscillates independently from the others. In this case, we can choose an orthonormal basis of the configuration space consisting of six characteristic oscillations, two for each point, for which that point moves and the other two do not. We denote by ξ_i and η_i the characteristic vectors corresponding to the i-th point with characteristic frequencies a and b, respectively, and let x_i, y_i be coordinates in the orthonormal basis ξ_i, η_i. Then the potential energy can be written in the form

The symmetry operator g permutes the coordinate axes:

We can now represent our six-dimensional space as the orthogonal direct sum of two straight lines and two two-dimensional planes, invariant under the symmetry operator g. That is, the invariant lines are defined by the directions of the vectors

and the invariant planes are their orthogonal complements in the spaces spanned by the vectors ξ_i and η_i, respectively. The first straight line is the direction of a symmetric characteristic oscillation with frequency a, and the second the direction of one with frequency b. In exactly the same way, every vector in the first plane is a direction of characteristic oscillation with frequency a which, under rotation by 120°, goes to an independent oscillation of the same frequency; for all vectors in the second plane, the oscillation is also not symmetric, with frequency b.

Thus, in this degenerate case of three independent points, there are two independent characteristic oscillations of symmetric type, and four unsymmetric, of which the latter are divided into two pairs. In each pair the oscillations have the same eigenvalue and are obtained from one another by rotation of the plane of our points by 120°.

We now claim that the conclusion above holds true for any law of interaction between our points if the interaction is symmetric, i.e., if the potential energy of the system is preserved under rotation of the plane by 120°.

In fact, decompose the 6-dimensional configuration space into an orthogonal sum of the plane of invariant vectors of g and of its orthogonal complement. The potential energy will decompose into a sum of two quadratic forms—one in two variables, the other in four. Now consider characteristic oscillations in the two-dimensional and four-dimensional configuration spaces, with potential energy described above. The four-dimensional space decomposes into two g-invariant planes, orthogonal in the potential energy metric. We have obtained a system of six characteristic oscillations having the required properties.

Thus, in a system in general form of three points in the plane with rotational symmetry of order 3, there are four different characteristic frequencies, two of which are simple and two double. Each of the simple characteristic frequencies corresponds to a symmetric characteristic oscillation, and each of the double ones to three characteristic oscillations obtained from one another by rotation by 120° and summing to zero (so that only two of them are independent).

Problem. Classify the characteristic oscillations of a system with the symmetries of an equilateral triangle (allowing not only rotation by 120°, but also reflection through the altitude of the triangle).

Problem. Classify the characteristic oscillations of a system whose group of symmetries is the group of 24 rotations of the cube.

Answer . The oscillations will be of five types. By rotations, from each oscillation one can obtain systems of 8, or 6, or 4, or 2, or 1 independent oscillations (in the last case the oscillations are entirely symmetric).

Remark. To classify oscillations in systems with any group of symmetries, a special apparatus has been developed (the so-called theory of group representations). Cf., for example, Michael Tinkham, Group Theory and Quantum Mechanics, McGraw-Hill, 1964.

D The behavior of frequencies of a symmetric system under a variation of parameters preserving the symmetry

We assume now that our symmetric system depends in a general way on some number of parameters, and that the symmetry is not disturbed when the parameters are varied. Then the characteristic frequencies of various multiplicities will also depend on the parameters, and the question arises of when the characteristic frequencies will collide. We will confine ourselves to formulating a result for the simplest case of systems with third-order rotational symmetry (for rotational symmetry of any order n ≥ 3, the answer is the same). The details can be found in the following articles: V. I. Arnold, Modes and quasi-modes, Functional Analysis and Its Applications, 6:2 (1972), 94–101; V. N. Karpushkin, The asymptotic behavior of the eigenvalues of symmetric manifolds and the “most probable” representations of finite groups, Moscow Univ. Math. Bull. 29 (1974), no. 2, 136–139.

Characteristic oscillations of any system with rotational symmetry of order 3 are divided into two types: symmetric oscillations, and oscillations carried by rotation by 120° into independent ones. For a general system with third-order rotational symmetry (without, in particular, any additional symmetry) all the characteristic frequencies of the first type are simple, and of the second, double. In addition, it turns out that if a system depends in a general way on one parameter and is symmetric for all values of the parameter, then under variation of the parameter, the characteristic frequencies of symmetric oscillations do not collide with one another, and the double characteristic frequencies of asymmetric oscillations do not split. In addition, the double characteristic frequencies of asymmetric oscillations do not collide with one another under a change of parameters. However, the characteristic frequencies of symmetric and asymmetric oscillations move under changes of parameter independently from one another, so that for discrete values of the parameter the characteristic frequency of a symmetric oscillation and the (double) characteristic frequency of an asymmetric oscillation can collide (and pass through one another).

In order to make two characteristic frequencies of symmetric oscillations collide, we must vary at least two parameters; and to make two characteristic frequencies of asymmetric oscillations collide we must vary at least three.

In general, in the typical family of systems with third-order rotational symmetry, for the collision of i simple characteristic frequencies (i symmetric oscillations) and j double frequencies (j unsymmetric oscillations) to occur, the number of parameters of the family must be at least

We apply this to oscillations of symmetric membranes. Here we will assume that the membrane is of general form, admits rotation by 120°, and corresponds to an ellipsoid of general form in the space of ellipsoids of the configuration space admitting the transformation of the configuration space induced by the rotation of the membrane.

The exact formulation of this assumption is that, for all membranes except a set of infinite codimension, the mapping from the space of symmetric membranes into the space of symmetric ellipsoids is transverse to each of the manifolds of ellipsoids with a given number of multiple axes.

If we agree to this assumption, we come to the following conclusions about oscillations of symmetric membranes.

For membranes of general form admitting rotation by 120°, asymptotically one-third of the characteristic frequencies (counting them with multiplicities) are simple, and the corresponding characteristic oscillations admit rotation by 120°. The remaining characteristic frequencies are double; each double characteristic frequency corresponds to three eigenfunctions whose sum is zero and which are taken to one another under rotation by 120°.

In general one-parameter families of such symmetric membranes, for isolated values of the parameters there are collisions of a single frequency with a double frequency, but there are no collisions of single frequencies with one another or collisions of double frequencies with one another.

The minimal number of parameters of a family of membranes for which more complicated collisions of characteristic frequencies are realized (stably with respect to small perturbations preserving the symmetry) is given by the formula

where v_ij is the number of points of collision of i single and j double frequencies.

In particular, for a typical small deformation of a circular membrane preserving rotational symmetry of order 3, a third of the eigenvalues (corresponding to eigenfunctions with azimuthal part cos 3kφ and sin 3kφ) immediately disperse. Under further one-parameter deformation the simple and double characteristic frequencies can pass through one another, but two simple or two double frequencies cannot collide with one another.

E Discussion

The value of the concepts of general position and symmetry lies, in particular, in the fact that they allow us to obtain some information in those cases where we cannot find an exact solution of a problem. In particular, for almost no membranes do we know the forms of the characteristic oscillations. Nevertheless, from general arguments we can say something, for example, about the multiplicities of eigenvalues.

The study of high-frequency oscillations of continuous media is very important in many fields (optics, acoustics, etc.), and special methods have been developed for approximate determination of the form of characteristic oscillations. One of these methods (called the method of quasi-classical asymptotics) consists of seeking an oscillation which is locally close to a simple harmonic wave of short length, but which changes its amplitude and the direction of its front from point to point.

Analysis (which we will not go into here) shows that in some cases we can construct approximate solutions, with the indicated properties, of the equation for eigenfunctions. They are approximate solutions in the sense that they almost satisfy the equation for eigenfunctions (not in the sense that they are close to real eigenfunctions).

In particular, if the membrane has the form of an equilateral triangle with smoothed and strongly blunted corners, then we can construct an approximate solution of the type described which differs appreciably from zero only in a neighborhood of one of the altitudes of the triangle. (Physicists call this approximate solution the wave analogue of a beam moving along the altitude of the triangle; this beam is a stable¹¹⁵ trajectory on a billiard table having the shape of our membrane; c.f. the following appendix on short wave asymptotics).

It follows from symmetry and general position arguments that typical membranes with rotational symmetry of third order have no real characteristic oscillations of the type described. Assume that one of the characteristic oscillations of the membrane is concentrated near an altitude (but not near the center of the membrane). Then, rotating it by 120° and 240° we obtain three characteristic oscillations with the same characteristic frequency. These three oscillations are independent (this follows from the fact that their sum is not zero). Therefore, the characteristic frequency has multiplicity 3, which does not occur in typical systems with third-order rotational symmetry.

From this argument it is clear that attempting to construct rigorous high-frequency asymptotics for eigenfunctions is a rather hopeless task; what we can hope to do is to obtain approximate formulas for almost characteristic oscillations. Such an almost characteristic oscillation can differ very strongly from real characteristic oscillations, but if we give the membrane the initial condition corresponding to it, then for a long time the oscillation will resemble a standing wave (characteristic oscillation).

An example of an almost characteristic oscillation is the motion of one of two identical pendulums connected by a very weak spring. If, at the initial moment, we set the first pendulum in motion and leave the second fixed, then for a long time it will appear that only the first pendulum is oscillating, and the oscillation will be almost characteristic. For true characteristic oscillations, both pendulums oscillate with the same amplitude.

The problem of connecting the geometry of a membrane with the properties of its characteristic oscillations has been intensively studied in recent years by many authors (including H. Weyl, S. Minakshisundaram and A. Pleijel, A. Selberg, J. Milnor, M. Kac, I. Singer, H. McKean, M. Berger, Y. Colin de Verdière, J. Chazarain, J. J. Duistermaat, V. F. Lazutkin, A. I. Shnirel’man, and S. A. Molchanov).

To the simplest question, “Can you hear the shape of a drum?” the answer turns out to be negative: there exist non-isometric riemannian manifolds with the same spectrum. On the other hand, several properties of a manifold can be recovered from the eigenvalues of the laplacian and from the properties of eigenfunctions (for example, the complete set of lengths of closed geodesics can be recovered).

115

The condition for linear stability of a billiard trajectory has the form

where l is the length of the interval of the trajectory and r₁ and r₂ are the radii of curvature of the walls at its ends.

Appendix 11: Short wave asymptotics

From the point of view of physical optics, the description of the propagation of light in geometric optics, using rays (i.e., Hamilton’s canonical equations) or wave fronts (i.e., the Hamilton-Jacobi equation), is only an approximation. According to the ideas of physical optics, light is electromagnetic waves, and geometric optics is a first approximation, a good description of phenomena only when the length of the waves is small compared to the size of the objects being considered.

A mathematical version of these physical ideas consists of asymptotic formulas for solving the corresponding differential equations—formulas which give better approximations for higher-frequency oscillations (i.e., for shorter waves). These asymptotic formulas can be written in terms of rays (i.e., motions in some hamiltonian dynamical system) or fronts (i.e., solutions of the Hamilton-Jacobi equation).

Similar short wave asymptotics exist for solutions of many equations in mathematical physics, describing all wave processes. In different areas of physics and mathematics they are connected with different names. For example, in quantum mechanics, short wave asymptotics are called quasi-classical approximations; they are determined by the so-called WKBJ method (Wentzel, Kramers, Brillouin, Jeffreys), although these approximations were used much earlier by Liouville, Green, Stokes, Rayleigh and others.

The construction of short wave asymptotics is based on the idea that, locally, a series of almost strictly sinusoidal waves is observed at each place, although the amplitudes of these waves and the directions of their fronts change slowly from point to point. Formal substitution of a function of this form into the partial differential equations describing the wave process reduces us (in a first approximation for waves of small length) to the Hamilton-Jacobi equation for wave fronts. The higher-order approximations allow us to determine as well the dependence of the amplitude of oscillation on the point.

Of course, the entire procedure requires a mathematical foundation. The exact formulation and proof of the corresponding theorems are not at all easy. Particular difficulty is introduced by “caustics” (i.e., focal or conjugate points, or turning points).

Caustics are envelopes of families of rays; they can be seen on a wall illuminated by rays reflected from some smooth curved surface. If the rays orthogonal to the wave fronts intersect and form caustics, then near the caustics the formulas for short wave asymptotics must be slightly changed. Namely, the phase of oscillations along each ray undergoes a standard discontinuity (one-fourth of a wave) upon each passage of the ray through a caustic.

A precise description of all these phenomena may be conveniently developed in terms of the geometry of lagrangian submanifolds of the corresponding phase space and their projections onto the configuration space. Here, caustics are interpreted as singularities of the projection, from phase space to configuration space, of that lagrangian manifold which represents a family of rays. Thus, the normal forms of singularities of lagrangian projections introduced in Appendix 12 supply a classification of singularities of caustics formed by systems of rays in “general position.”

In this appendix we introduce (without proof) the simplest formulas of short wave asymptotics for the Schrödinger equation of quantum mechanics. A more detailed exposition can be found in the following places:

J. Heading, Introduction to phase integral methods, Methuen Co. Ltd., 1962. (Cf. especially Appendix II (by V. P. Maslov) in the Russian translation of Heading’s book, Moscow 1965).

V. P. Maslov, Théorie des perturbations et méthodes asymptotiques, Pairs, Dunod, 1972 (Russian edition: Moscow University, 1965).

V. I. Arnold, On a characteristic class entering into conditions of quantization, Functional Analysis and its Applications, v. I (1967).

L. Hörmander, Fourier integral operators, Acta Math. 127 (1971), 79–183.

A Quasi-classical approximation for solutions of Schrödinger’s equation

Schrödinger’s equation for a particle in a field with potential energy U in euclidean space is an equation for a complex-valued function ψ(q, t):

Here, h is some real constant which is also a small parameter of the problem being considered, and Δ is the Laplace operator.

We assume that the initial condition has the short wave form

where the smooth function φ is nonzero only inside some bounded region. We will find below an asymptotic (as h → 0) formula for the solution of Schrödinger’s equation with such an initial condition.

First of all, we consider the motion of a classical particle in the field with potential energy U, i.e., we consider Hamilton’s equations

in 2n-dimensional phase space. The solutions of these equations determine a phase flow (under some conditions on the potential, which we assume fulfilled; these conditions prevent the particle from going off to infinity in a finite time).

We associate to our short wave initial condition a lagrangian submanifold of the phase space (i.e., a manifold whose dimension is equal to the dimension of the configuration space and on which the 2-form dp ⋀ dq defining the symplectic structure on the phase space is identically zero). Namely, we define the “momentum” corresponding to our initial condition as the gradient of the phase, i.e., we set

Lemma. For any smooth function s, the graph of the function p(q) constructed by it in the phase space ℝ²ⁿ = {(p, q)} is a lagrangian manifold. Conversely, if a lagrangian manifold projects diffeomorphically onto the q-space (i.e., it is a graph), then it is given by some generating function s, according to the formula above.

We denote the lagrangian manifold constructed from the initial condition (with the function s) by M. After time t the phase flow g^t carries the manifold M to another manifold g^tM. This new manifold is also lagrangian, since the phase flow preserves the symplectic structure.

For small t, the new lagrangian manifold, like the old, projects diffeomorphically onto the configuration space. However, for large t this is not necessarily true (Figure 244).

Figure 244

Transformation of lagrangian manifolds by the phase flow

In other words, several points of the new lagrangian manifold may project to one point Q of the configuration space. We assume that there are only finitely many of these points and that they are all nondegenerate (i.e., that at each of the points of the new lagrangian manifold which project onto Q, the derivative of the projection mapping onto the configuration space is nondegenerate).

The nondegeneracy condition is satisfied for almost all points Q. Those exceptional points Q for which it is not satisfied form a set of measure zero in the configuration space. In the general case, this set is a surface whose dimension is one less than the dimension of the configuration space. This surface, playing the role of a caustic in our problem, can itself have complicated singularities.

The points of the new lagrangian manifold projecting to the point Q arose under the phase flow transformation from several points of the original lagrangian manifold (constructed from the initial condition). In other words, after time t, several trajectories of classical particles, with initial conditions belonging to the original lagrangian manifold, arrive at Q.

We let (p_j, q_j) denote these initial points in the phase space, and S_j the action along the trajectories of the phase flow coming from the point (p_j, q_j). More precisely, we set

Then, as h → 0, the solution of Schrödinger’s equation with the oscillating initial condition given by the functions s and φ has asymptotic form

where μ_j is an integer (the Morse index) which will be defined below.

In order to explain this formula, we first consider the case when the time interval t is small. In this case, the sum is reduced to a single term, since the lagrangian manifold obtained from the original lagrangian manifold by the phase flow transformation after small time projects diffeomorphically onto the configuration space. In other words, of the family of particles corresponding to the initial condition for Schrödinger’s equation, only one arrives at Q after the small time t.

For small t, the Morse index is equal to zero (as we will see below from its definition). In this way the function ψ(Q, t) has, like the initial condition, a rapidly oscillating form. Thus, the function S defining the wave fronts at time t is none other than the value at time t of the solution of the Hamilton-Jacobi equation, the initial condition for which is given by the function s defining the wave front at the initial moment. The amplitude of the wave at time t at the point Q is obtained from the amplitudes, at the initial moment at the original point, of the trajectories coming to Q multiplied by a certain factor. This factor is chosen so that, under motions of the particles corresponding to our initial conditions, the integral of the square of the modulus of the function ψ, over a region of configuration space filled with particles, does not change with time. (Here we assume that at the initial moment, some region in the configuration space has been selected; then the phase points on the original lagrangian manifold are selected whose projections onto the configuration space lie in this region; their images under the action of the phase flow after time t are found; finally, the projections of these images onto the configuration space form the region “filled with particles at time t.”)

B The Morse and Maslov indices

The number μ_j is defined as the number of focal points to the manifold M on the interval [0, t] of the phase curve starting out at the point (p_j, q_j).

Focal points to the manifold M are defined as follows. We chose the point Q so that, under projection of the lagrangian manifold obtained from M at time t, a nondegeneracy condition is satisfied at this point. However, if we consider the entire phase curve coming from the point (p_j, q_j), then at some moments of time θ between 0 and t, the nondegeneracy condition may not be satisfied at the point (p(θ), q(θ)) of the lagrangian manifold g^θM. Such points are called focal points to the manifold M along this phase curve.

We note that the definitions of focal points to M and the Morse index do not depend on Schrödinger’s equation, but relate simply to the geometry of the phase flow in the cotangent bundle to the configuration space (or to the calculus of variations, which is the same thing).

In particular, as our lagrangian manifold M we may take the fiber of the cotangent bundle passing through the point (p₀, q₀) (given by the condition q = q₀). In this case a focal point to M on the phase curve going out from (p₀, q₀) is called conjugate to the original point (more precisely, the projection of this focal point onto the configuration space is said to be conjugate to the point q₀ along the extremal in the configuration space starting at q₀ with momentum p₀). In the even more special case of motion along a geodesic on a riemannian manifold, a focal point to a fiber of the cotangent bundle is called conjugate to the initial point of the geodesic along this geodesic. For example, the south pole of a sphere is conjugate to the north pole along any meridian.

The Morse index of an interval of a geodesic, equal to the number of points conjugate to the initial point, plays an important role in the calculus of variations. Namely, we consider the second differential of the action as a quadratic form on the space of variations (with fixed end-points) of the geodesic we are studying. Then the index of inertia of this quadratic form is equal to the Morse index (cf., for instance, J. Milnor, Morse Theory, Princeton University Press, 1967).

Thus the geodesic, up to the first conjugate point, is a minimum of the action, which justifies the name “principle of least action” for various variational principles of mechanics.

We note that in calculating the Morse index, the focal points must be counted with multiplicity (the multiplicity of a focal point in general position is equal to 1).

The Morse index is a particular case of the so-called Maslov index, which is defined independently of the phase flow for any curve on a lagrangian manifold of the cotangent bundle over the configuration space.

Consider the projection of our n-dimensional lagrangian manifold onto the n-dimensional configuration space. This is a smooth mapping of manifolds of the same dimension. It can have singular points, i.e., points at which the rank of the derivative mapping drops, and in a neighborhood of which the projection is not a diffeomorphism.

It turns out that in general the set of singular points has dimension n − 1 and consists of the union of a smooth manifold of dimension n − 1 made up of simple singular points at which the rank drops to 1, and a finite set of manifolds whose dimensions are n − 3 and smaller. Here, “in general” means that these properties can be attained by an arbitrarily small perturbation of the lagrangian manifold, under which it remains lagrangian.

We should point out that, among the pieces of various ranks into which the set of singular points is divided, there is no piece of dimension n − 2. After the simplest singular points, forming a manifold of dimension n − 1, there are the points where the rank drops by two; they form a manifold of dimension n − 3. The projection of the set of singular points onto the configuration space (the caustic) consists, in general, of pieces of all dimensions from 0 to n − 1 without omissions.

Furthermore, it turns out that the (n − 1)-dimensional manifold of the simplest singular points is two-sided in the lagrangian manifold; that is, we can coordinate the orientations of the normals at all points in the following way.

Consider some simple singular point on the lagrangian manifold. We take a system of coordinates q₁,..., q_n in a neighborhood of the projection of this point onto the configuration space. Let p₁,..., p_n be corresponding coordinates in the fiber of the cotangent bundle. In a neighborhood of our singular point, we can consider the lagrangian manifold as the graph of the vector function (q₁, p₂,..., p_n) of the variables (p₁, q₂,..., q_n) (or a vector function of an analogous form in which the role of the distinguished coordinate is played not by the first coordinate but by any of the remaining coordinates).

Singular points near the given one are then defined by the condition ∂q₁/∂p₁ = 0. For lagrangian manifolds in general position, this derivative changes sign upon passing from one side of the manifold of singular points to the other in our neighborhood of the simple singular point. We will call the side where this derivative is positive the positive side.

We note that it is necessary to prove that the definitions of positive direction near different points agree with one another. Furthermore, it must be shown that the positive direction near one point is well defined, i.e., does not depend on the coordinate system. All this can be done by direct calculations (cf. the article cited above in “Functional Analysis”). For further development of these ideas, see V. I. Arnold, Sturm theorems and symplectic geometry, Funct. Anal. Appl. 19 (1985).

Now the Maslov index of an oriented curve on a lagrangian manifold is defined as the number of passages from the negative side of the manifold of singularities to the positive side, minus the number of passages in the other direction. In this we assume that the ends of the curve are nonsingular and that the curve intersects only the manifold of simple singular points and only with nonzero angles. Having defined the index for such curves, we can define it for an arbitrary curve connecting two nonsingular points: to do this it is sufficient to approximate the curve by one which intersects only the manifold of simple singular points and only with nonzero angles. It can be shown that the index does not depend on the choice of the approximating curve.

Problem. Find the index of the circle p = cos t, q = sin t oriented by the parameter t, 0 ≤ t ≤ 2π, in the lagrangian manifold p² + q² = 1 of the phase plane.
Answer. + 2.

Finally, the Morse index of a phase curve in ℝ²ⁿ can now be defined as the Maslov index of a curve in an (n + 1)-dimensional lagrangian manifold in a suitable (2n + 2)-dimensional phase space. As coordinates in this space we will take (p₀, p; q₀, q) (where (p, q) ∈ ℝ²ⁿ). If we set q₀ = t and p₀ = −H(p, q), and let the point (p, q) range over the n-dimensional lagrangian manifold in ℝ²ⁿ obtained from the original after time t by the action of the phase flow, then under change of t the points in ℝ^{2n + 2} form an (n + 1)-dimensional lagrangian manifold. The graph of the motion of a phase point under the action of the phase flow can be considered as a curve on this (n + 1)-dimensional lagrangian manifold. We can verify that the Maslov index of this graph agrees with the Morse index of the original phase curve.

C Indices of closed curves

The indices of closed curves on lagrangian submanifolds of a linear phase space can also be calculated with the help of a complex structure. In addition to the symplectic structure dp ∧ dq on the linear phase space ℝ²ⁿ = {(p, q)}, we introduce a euclidean structure (with scalar square p² + q²) and a complex structure, in which multiplication by i is

All three structures are connected by the relation

where the square brackets denote the skew-scalar product.

Linear transformations of the phase space preserving any two (and, therefore, all three) structures are called unitary transformations. Such transformations take lagrangian planes to lagrangian planes.

Every lagrangian plane can be obtained from any other (e.g., from the real plane ℝⁿ given by the equation q = 0) by a unitary transformation. In addition, any two unitary transformations A and B carrying the real plane to the same lagrangian plane differ by a unitary transformation which is a real orthogonal transformation:

Conversely, any preliminary orthogonal transformation does not change the image of the plane under the action of a unitary transformation.

We now note that the determinant of an orthogonal transformation is equal to ± 1. Therefore the square of the determinant of a unitary transformation carrying the real plane to a given lagrangian plane depends only on the lagrangian plane itself and does not depend at all on the choice of unitary transformation.

After these preliminary remarks we return to our lagrangian manifold and closed oriented curve lying in it. At every point of the curve, there is a plane tangent to the lagrangian manifold in the symplectic vector space. The square of the determinant of the unitary transformation carrying the real plane to this tangent plane is a complex number with modulus one. As a point moves along our closed curve, this complex number changes. After an entire circuit of the curve, the square of the determinant makes some integral number of rotations around the origin on the plane of complex variables, oriented from 1 to i. This integer is the index of the closed curve.

The indices of closed curves enter into asymptotic formulas for stationary problems (characteristic oscillations). Assume that the phase flow corresponding to the potential U has an invariant lagrangian manifold lying on the energy level H = E. Then the equation

has a series of eigenvalues λ_N → ∞ with asymptotic form

if, for every closed contour γ on the lagrangian manifold, we have the congruence

In the one-dimensional case, the lagrangian manifold is a circle, its index is equal to 2, and the formula above reduces to the so-called “quantization condition”

The eigenfunctions corresponding to these eigenvalues are also associated with lagrangian manifolds, but this association is not so simple. In fact, we cannot write down asymptotic formulas for eigenfunctions, but only for functions approximately satisfying the equations of characteristic functions. These functions turn out to be small outside the projection of the lagrangian manifold onto the configuration space. The asymptotic formulas have singularities near the caustics formed by the projection.

The actual eigenfunctions, however, can behave entirely differently, at least if the eigenvalue is multiple or if there are eigenvalues close to it (cf. Appendix 10).

Appendix 12: Lagrangian singularities

Lagrangian singularities are singularities of projections of lagrangian manifolds onto configuration space. Such singularities are encountered in investigating global solutions to the Hamilton-Jacobi equation, in studying caustics, focal or conjugate points, in analyzing the propagation of discontinuities and shock waves in the mechanics of a solid medium, and also in problems of short wave asymptotics (cf. Appendix 11).

In order to describe lagrangian singularities we must first say a few words about singularities of smooth mappings in general. We begin with the simplest examples.

A Singularities of smooth mappings of a surface onto a plane

The mapping projecting a sphere onto a plane is singular on the equatorial circle (at points of the equator the rank of the derivative drops to one). As a result, a curve is formed on the plane of projection (the so-called apparent contour) bounding regions in which points have different numbers of pre-images: every point of the plane inside the apparent contour has two pre-images, and every point outside has none.

In more complicated cases of “apparent contours” there can be more complicated singularities. Consider, for example, the surface given in three-dimensional space with coordinates (x, y, z) by the equation (Figure 245)

and the mapping of projection parallel to the z-axis onto the plane with coordinates (x, y).

Figure 245

Whitney’s tuck

The singular points of the projection form a smooth curve on the surface (with equation 3z² = y). However, the image of this curve on the (x, y) plane is not a smooth curve. This image is a semi-cubical parabola with a cusp at the point (0, 0) with equation

Such a curve divides the plane into two parts: a smaller part (inside the cusp) and a larger part (outside). Over each point of the smaller part there are three points of our surface, and over each point of the larger part there is only one.

We now consider any small deformation of our surface. It turns out that, under projection of any surface close to ours, the apparent contour will always have a similar singularity (semi-cubical cusp) at some point close to the singularity of the apparent contour of the original surface. In other words, this singularity is not removable by a small perturbation of the surface.

Furthermore, in place of a deformation of the surface, we can arbitrarily deform the mapping itself of the surface to the plane (no longer caring whether it is a projection), as long as it remains smooth and the deformation is small. It turns out that, for these deformations too, the cusp does not disappear but is only slightly deformed.

The examples presented here exhaust all typical singularities of mappings of a surface to the plane. It can be shown that all more complicated singularities are removable by a small perturbation. Therefore, by slightly deforming any smooth mapping, we can always arrange that in a neighborhood of any point of the surface, the mapping will be either nonsingular, or structurally similar to the projection mapping of a sphere onto a plane near the equator, or structurally similar to the projection mapping of the surface considered above with a cubic cusp on the apparent contour.

The words “structurally similar to” mean that, on the pre-image surface and the image plane, we can choose local coordinates (in a neighborhood of our point and its image) such that in these coordinates the mapping will be written in a special way. Namely, the normal forms to which the mapping of the surface to the plane will be reduced in a neighborhood of points of the three types indicated above will be

Here (x₁, x₂) are the local coordinates in the pre-image, and (y₁, y₂) are the local coordinates in the image.

The proof of this theorem (it is due to H. Whitney) and its multidimensional generalizations can be found in works on the theory of singularities of smooth maps, such as

V. I. Arnold, Singularities of smooth mappings, Russian Math. Surveys 23: 1 (1968) 1–44.
Symposium on Singularities of Smooth Manifolds and Maps, Univ. of Liverpool, 1969–70.

Proceedings. Springer, 1971. See especially the article of R. Thom and H. Levine.
Golubitsky and Guillemin, Stable Mappings and Their Singularities, Springer-Verlag, 1973.

B Singularities of projection of lagrangian manifolds

We now consider an n-dimensional configuration manifold, the corresponding 2n-dimensional phase space, and an n-dimensional lagrangian submanifold (i.e., an n-dimensional submanifold on which the 2-form giving the symplectic structure of the phase space is identically zero).

By projecting the lagrangian manifold onto the configuration space, we obtain a mapping of one smooth n-dimensional manifold to another. At most points, this mapping is a local diffeomorphism, but at some points of the lagrangian manifold the rank of the differential drops. These points are said to be singular. Under projection of the set of singular points to the configuration space an “apparent contour” is formed, which is called a caustic in the lagrangian case.

Caustics can have complicated singularities; however, as in the usual theory of singularities of smooth maps, we can get rid of singularities which are too complicated by a small perturbation (here, by a small perturbation, we mean a small deformation of a lagrangian manifold in phase space under which this manifold remains lagrangian).

After this there remain only the simplest unremovable singularities, for which we can write out normal forms and which we can study once and for all. When considering problems in general position which do not satisfy any special properties of symmetry, it is natural to expect that only these simple unremovable singularities will appear.

Consider, for example, the caustics formed on a wall by light from a point source reflected from some smooth curved surface (here the four-dimensional phase space is formed by straight lines intersecting the surface of the wall in all possible directions, and the lagrangian submanifold by the rays of light coming from the source as they intersect the wall). By moving the source, we can see that generally the caustics have only simple singularities (semi-cubical cusps), while more complicated singularities appear only for special, exceptional positions of the source.

We will give below, for n ≤ 5, normal forms for singularities of the projection of an n-dimensional lagrangian submanifold of 2n-dimensional phase space onto an n-dimensional configuration space. There are a finite number of these normal forms, and their classification is related (in a rather mysterious way) with the classifications of simple Lie groups, simple degenerate critical points of functions, regular polyhedra, and many other objects. For n ≥ 6, the normal forms of some singularities must inevitably contain parameters. For further details the reader is referred to the articles:

V. I. Arnold, Normal forms for functions near degenerate critical points, the Weyl groups of A_k, D_k, E_k, and lagrangian singularities, Functional Analysis and Its Applications 6:4 (1972) 254–272.

V. I. Arnold, Critical points of smooth functions and their normal forms, Uspekhi Mat. Nauk 30:5 (1975).

C Tables of normal forms of typical singularities of projections of lagrangian manifolds of dimension n ≤ 5

We will use the following notation:

so that p and q together form a symplectic coordinate system in the phase space.

We will give a lagrangian manifold with the help of a generating function F by the formulas

where the index i runs over some subset of {1,..., n} and j runs over the remainder of {1,..., n}. That is, i = 1, j > 1 for singularities denoted in the list by A_k, and i = 1, 2, j > 2 for singularities denoted by D_k and E_k.

With this notation, one and the same expression F(p_i, q_j) can be considered as giving a lagrangian manifold in spaces of a different number of dimensions: we can add arbitrarily many arguments q_j, on which F does not actually depend.

The list of normal forms of typical singularities is now as follows: for n = 1

for n = 2, in addition to the two above, there is

for n = 3, in addition to the three preceding, there are

for n = 4, in addition to the five preceding, there are

for n = 5, in addition to the seven preceding, there are

D Discussion of the normal forms

A point of type A₁ is nonsingular. A singularity of type A₂ is a fold singularity. If we take (p₁, q₂,..., q_n) as coordinates on the lagrangian manifold, then the projection mapping may be written as

A singularity of type A₃ is a tuck with a semi-cubical cusp on the visible contour. To convince ourselves of this, it is enough to write out the corresponding mapping of the two-dimensional lagrangian manifold to the plane:

A singularity of type A₄ first appears in the three-dimensional case, and the corresponding caustic is represented by a surface in three-dimensional space (Figure 246) with a singularity called a swallowtail (we already encountered this in Section 46).

Figure 246

Typical singularities of caustics in three-dimensional space

The caustic of a singularity of type D₄ in three-dimensional space is represented as a surface with three cuspidal edges (of type A₃), tangent at one point; two of these cuspidal edges can be imaginary, so that there are two versions of the caustic of D₄.

E Lagrangian equivalence

We must now say in what sense the examples mentioned are normal forms of typical singularities of projections of lagrangian manifolds. First of all, we will define which singularities we will consider to have the “same structure.”

A projection mapping of a lagrangian manifold onto configuration space will be called a lagrangian mapping for short. Suppose that we are given two lagrangian mappings of manifolds of the same dimension n (the corresponding n-dimensional lagrangian manifolds lie, in general, in different phase spaces which are cotangent bundles of two different configuration spaces). We say that two such lagrangian mappings are lagrangian equivalent if there is a symplectic diffeomorphism of the first phase space to the second, taking fibers of the first cotangent bundle to fibers of the second, and taking the first lagrangian manifold to the second. The symplectic diffeomorphism itself is then called a lagrangian equivalence mapping.

We note that two lagrangian equivalent lagrangian mappings are taken one to the other with the help of diffeomorphisms in the pre-image space and the image space (or, as they say in analysis, are carried to one another by a change of coordinates in the pre-image and in the image). In fact, our symplectic diffeomorphism restricted to the lagrangian manifold gives a diffeomorphism of the pre-images; a diffeomorphism of the configuration-space images arises because fibers are carried to fibers.

In particular, the caustics of the two lagrangian equivalent mappings are diffeomorphic, hence a classification up to lagrangian equivalence implies a classification of caustics. However, the classification up to lagrangian equivalence is finer than the classification of caustics, since a diffeomorphism of caustics does not in general give rise to a lagrangian equivalence of the mappings. Furthermore, the classification up to lagrangian equivalence is finer then the classification up to diffeomorphisms of the pre-image and image, since not every such pair of diffeomorphisms is realized by a symplectic diffeomorphism of the phase space.

A lagrangian mapping considered in a neighborhood of some chosen point is called lagrangian equivalent at that point to another lagrangian mapping (also with a chosen point), if there is a lagrangian equivalence of the first mapping in some neighborhood of the first point onto the second in some neighborhood of the second point, carrying the first point to the second.

We can now formulate a classification theorem for singularities of lagrangian mappings in dimensions n ≤ 5.

Theorem. Every n-dimensional lagrangian manifold (n ≤ 5) can, by an arbitrarily small perturbation in the class of lagrangian manifolds, be made into one such that the projection mapping onto the configuration space will be lagrangian equivalent at every point to one of the lagrangian mappings in the list above.

In particular, a two-dimensional lagrangian manifold can be put in “general position” by an arbitrarily small perturbation in the class of lagrangian manifolds, so that the projection mapping onto the configuration space (two-dimensional) will not have singularities other than folds (which can be reduced by a lagrangian equivalence to the normal form A₂) or tucks (which can be reduced by a lagrangian equivalence to the normal form A₃).

We note that this assertion about two-dimensional lagrangian mappings does not follow from the classification theorem for general (non-lagrangian) mappings. In the first place, lagrangian mappings make up a very restricted class among all smooth mappings, and therefore they can (and actually do for n > 2) have as typical, singularities which are not typical for mappings of general form. Secondly, the possibility of reducing a mapping to normal form by diffeomorphisms of the pre-image and image does not imply that this can be done using a lagrangian equivalence.

In this way, the caustics of a two-dimensional lagrangian manifold in general position have as singularities only semi-cubical cusps (and points of transversal intersection). All more complicated singularities break up under a small perturbation of the lagrangian manifold, the resulting cusps and self-intersection points of caustics are unremovable by small perturbations, and are only slightly deformed.

Normal forms of the singularities A₄, D₄,... can be used in a similar way for studying the caustics of lagrangian manifolds of higher dimensions, and also for studying the development of caustics of low-dimensional lagrangian manifolds, when parameters on which the manifold depends are varied.¹¹⁶

Other applications of the formulas of this section can be found in the theory of Legendre singularities, i.e., singularities of wave fronts. Legendre transforms, envelopes, and convex hulls (cf. Appendix 4). The theories of lagrangian and Legendre singularities have direct application, not only in geometric optics and the theory of asymptotics of oscillating integrals, but also in the calculus of variations, in the theory of discontinuous solutions of nonlinear partial differential equations, in optimization problems, pursuit problems, etc. R. Thom has suggested the general name catastrophe theory for the theory of singularities, the theory of bifurcations, and their applications.

¹¹⁶

See, e.g., V. Arnold, Evolution of wavefronts and equivariant Morse lemma, Comm. Pure Appl. Math., 1976, No. 6.

Appendix 13: The Korteweg-de Vries equation

Not all first integrals of equations in classical mechanics are explained by obvious symmetries of a problem (examples are specific integrals of Kepler’s problem, the problem of geodesics on an ellipsoid, etc.). In such cases, we speak of “hidden symmetry.”¹¹⁷

Interesting examples of such hidden symmetry are furnished by the Korteweg-de Vries equation

(1)

This nonlinear partial differential equation first arose in the theory of waves in shallow water; later it turned out that this equation is encountered in a whole series of problems in mathematical physics.

As a result of a series of numerical experiments, remarkable properties of solutions of this equation with zero boundary conditions at infinity were discovered: as t → ∞ and t → −∞ these solutions decompose into “solitons”—waves of definite form moving with different velocities.

To obtain a soliton moving with velocity c, it is sufficient to substitute the function u = φ(x − ct) into equation (1). Then we obtain the equation φ″ = 3φ² + cφ + d for φ (d is a parameter). This is Newton’s equation with a cubic potential. There is a saddle on the phase space (φ, φ′). The separatrix going from this saddle to the saddle for which φ = 0 determines a solution φ tending to 0 as x → ± ∞; it is a soliton.

When solitons collide, there is a complicated nonlinear interaction. However, numerical experiments showed that the sizes and velocities of the solitons do not change as a result of collision. And, in fact, Kruskal, Zabusky, Lax, Gardner, Green, and Miura succeeded in finding a whole series of first integrals for the Korteweg-de Vries equation. These integrals have the form I_s = ∫ P_s(u,..., u^(s))dx, where P_s is a polynomial. For example, it is easy to verify that the following are first integrals of equation (1):

The appearance of an infinite series of first integrals is easily explained by the following theorem of Lax.¹¹⁸ We will denote the operator of multiplication by a function of x by the symbol for the function itself, and the operator of differentiation with respect to x by the symbol ∂. Consider the Sturm-Liouville operator L = − ∂² + u depending on a function u(x). We verify directly:

Theorem. The Korteweg-de Vries equation (1) is equivalent to the equation

, where A = 4 ∂³ − 3(u ∂ + ∂u).

Directly from this theorem of Lax, we have

Corollary. The operators L constructed from a solution of equation (1) are unitarily equivalent for all t; in particular, each of the eigenvalues λ of the Sturm-Lionville problem Lf = λf with zero boundary conditions at infinity is a first integral of the Korteweg-de Vries equation.

Gardner, V. E. Zakharov and L. D. Faddeev noted that equation (1) is a completely integrable infinite-dimensional hamiltonian system, and found the corresponding action-angle variables.¹¹⁹ A symplectic structure on the space of functions vanishing at infinity is given by the skew-scalar product ω²(∂w, ∂v) = ½ ∫ (w ∂v − v ∂w)dx, and the hamiltonian of equation (1) is the integral I₁. In other words, equation (1) can be written in the form of Hamilton’s equation in the functional space of functions of

Every integral I_s gives in this way a “higher Korteweg-de Vries equation”

, where Q_s = (d/dx)(δI_s/δu) is a polynomial in the derivatives u, u′,..., u^{2s + 1}. The integrals I_s are in involution, and the flows corresponding to them on the functional space commute.

The explicit form of the polynomials P_s and Q_s, and also the explicit form of the action-angle variables (and therefore of solutions of equation (1)), is described in terms of solutions of the direct and inverse problems of scattering theory with potential u.

The explicit form of the polynomials Q_s can also be obtained from the following theorem of Gardner, generalizing Lax’s theorem. In the space of functions of x, we consider a differential operator of the form A = ∑ p_i∂^{m − i}, where p₀ = 1, and the remaining coefficients p_i are polynomials in u and the derivatives of u with respect to x. It turns out that, for any s there is an operator A_s of order 2s + 1 such that its commutator with the Sturm-Liouville operator L is the operator of multiplication by a function [L, A_s] = Q_s.

The operator A_s is defined by these conditions uniquely up to the addition of linear combinations of the A_r with r < s; in the same way, the polynomials Q_s are determined up to the addition of linear combinations of the preceding Q_r’s.

V. E. Zakharov, A. B. Shabat, L. D. Faddeev, and others, using Lax’s method and techniques of inverse scattering theory, have studied a whole series of physically important equations, including the equations u_tt − u_xx = sin u and iψ_t + ψ_xx ± ψ |ψ|² = 0.

Investigation of the problem with periodic boundary conditions for the Korteweg-de Vries equation led S. P. Novikov¹²⁰ to the discovery of an interesting class of completely integrable systems with a finite number of degrees of freedom. These systems are constructed in the following way.

Consider any finite linear combination of first integrals, I = ∑ c_i I_{n − i}, and let c₀ = 1. The set of stationary points of the flow with hamiltonian I on the functional space is invariant under the phase flows with hamiltonians I_s, including the phase flow of equation (1).

On the other hand, these stationary points are determined from the equations (d/dx)(δI/δu) = 0, or δI/δu = d. The second equation is the Euler-Lagrange equation for the functional I − dI _{− 1}, involving derivatives of order n. Therefore, it has order 2n and can be written as a hamiltonian system of equations in 2n-dimensional euclidean space.

It turns out that this hamiltonian system with n degrees of freedom has n integrals in involution and can be integrated completely with the help of suitable action-angle coordinates. In this way, we obtain a finite-dimensional family of particular solutions of the Korteweg-de Vries equation depending on 3n + 1 parameters (2n phase coordinates and n + 1 further parameters c₁,..., c_n; d).

These solutions have, as Novikov showed, remarkable properties; for example, in the periodic problem they give functions u(x) for which the linear differential equation with periodic coefficients

has a finite number of zones of parametric resonance (cf. Section 25) on the λ-axis.

After this book was written, much work was done on the subjects discussed in this appendix, in particular by Novikov, Dubrovin, Krichever, Manakov, Matveev, Its, Dikii, Manin, Drinfeld, Gelfand, Lax, Moser, McKean, Van Moerbeke, Adler, Perelomov, Olshanetskii, and many others. Among other things, Manakov solved the Euler equations of a rigid body in ℝⁿ for arbitrary n: these are completely integrable. For more details see the forthcoming book by Novikov and his collaborators. (Note added by author in translation.)

¹¹⁷

The term “accidental symmetry” is frequently used in English. [Trans. note.]

¹¹⁸

Lax, P. D., Integrals of nonlinear equations of evolution and solitary waves, Comm. Pure Appl. Math. 21 (1968) 467–490.

¹¹⁹

Zakharov, V. E. and Faddeev, L. D., The Korteweg-de Vries equation is a completely integrable hamiltonian system, Functional Analysis and Its Applications, 5:4 (1971) 280–287.

¹²⁰

Novikov, S. P., The periodic problem for the Korteweg-de Vries equation, Functional Analysis and Its Applications, 8:3 (1974) 236–246.

Appendix 14: Poisson structures

Along with the classical Poisson bracket of functions, one also encounters more general (degenerate) brackets. A typical example is the Poisson bracket of functions of the components M_i of the angular momentum vector: {F, G} = ∑ (∂F/∂M_i)(∂G/∂M_j){M_i, M_j}. Such degenerate brackets may be considered as families of ordinary Poisson brackets or families of sympletic manifolds. These families generally have singularities (they are not foliations): they consist of symplectic manifolds (leaves) of different dimensions, related to one another by the condition of smoothness for the given degenerate Poisson bracket structure on the ambient space. (In the angular momentum example above, the leaves are concentric spheres and their center at the origin.)

In this appendix, we shall present the simplest elementary properties of Poisson structures on finite-dimensional manifolds. One should keep in mind, though, that in applications (especially to the mathematical physics of continuous media) one frequently encounters Poisson structures on infinite-dimensional manifolds. In these cases, the symplectic leaves often (but not always) have finite dimension or codimension.

A Poisson manifolds

A Poisson structure on a manifold is a Lie algebra structure on its space of smooth functions (i.e., a bilinear skew-symmetric operation of “Poisson bracket” on functions, satisfying the Jacobi identity) such that the operator ad_a = {a, } (contraction of the Poisson bracket with any fixed function a) is an operator of differentiation by some vector field θ_a. The vector field θ_a is then called the hamiltonian vector field with hamiltonian function a. The mapping a ↦ θ_a gives a homomorphism from the Lie algebra of functions to the Lie algebra of vector fields. A manifold with a given Poisson structure is called a Poisson manifold.

Two points on a Poisson manifold are called equivalent if they can be joined by a path consisting of segments of integral curves of hamiltonian vector fields. The equivalence classes under this relation are called the leaves of the Poisson manifold. The values of all possible hamiltonian vector fields at a given point of a Poisson manifold form a linear space which is just the tangent space of the leaf through that point. Thus the leaves are smooth manifolds, but they are in general not closed, and they have different dimensions.

The classical (explicitly described by S. Lie in 1890, but essentially considered already by Jacobi) example of a Poisson manifold is the dual space of a (finite-dimensional) Lie algebra. The elements of the algebra itself may be considered as linear functions on this space. The Poisson structure is defined as an extension of the Lie algebra structure from this finite-dimensional subspace to the entire space of smooth functions on the dual of the original Lie algebra. Such an extension exists and is unique: if ω₁,..., ω_n is a basis of the original Lie algebra, then

In this example, the leaves are the orbits of the co-adjoint representation of the underlying Lie group in the dual of its Lie algebra.

Every leaf of a Poisson manifold carries a natural symplectic structure (closed nondegenerate 2-form), defined in the following way. Consider the values of two hamiltonian vector fields at a point of the leaf. The value of the 2-form on this pair of vectors is defined to be the value of the Poisson bracket of the hamiltonian functions at the given point (this value depends only on the two vectors and not on the choice of hamiltonian functions). The fact that the form is closed on the leaf follows from the Jacobi identity; nondegeneracy comes from the fact that, if the derivative of every function by a given tangent vector is zero, then the vector itself must be zero. The phase flow of every hamiltonian vector field preserves the symplectic structures on the leaves.

Thus, the leaves of a Poisson manifold are even dimensional, and the manifold may be considered as a union of sympletic manifolds (generally of different dimensions), whose symplectic structures are coordinated by the condition that the Poisson bracket on the ambient space be smooth.

For example, the co-adjoint orbits of SO(3) (spheres centered at the origin) may be organized according to local Darboux coordinates: in the neighborhood of any nonzero point, the Poisson structure in suitable local coordinates takes the form {x, y} = 1, {x, z} = {y, z} = 0. This normal form for the Poisson structure on the space of angular momenta is convenient in carrying out the process of elimination of the nodes in the many-body problem (see Section III.5.5 of the paper: V. I. Arnol’d, Small denominators and problems of stability of motion in classical and celestial mechanics, Russian Math. Surveys 18, No. 6 (1963), 85–191).

Jacobi realized that the (classical) Poisson brackets of the first integrals of any hamiltonian system could be considered as a Poisson structure (this structure is discussed in Section VI.1.3 of the author’s paper cited above).

The construction of a Poisson structure on the dual space of a Lie algebra leads to a new Lie algebra. This construction may then be repeated, leading to a whole series of new (infinite-dimensional) Poisson structures. More generally, suppose that one is given any Poisson structure on a manifold. Then the space of functions on that manifold carries the structure of a Lie algebra. This implies that the dual space of this function space carries its own Poisson structure. Elements of this dual space may be interpreted as distribution densities on the original manifold. Thus, the space of distributions on a Poisson manifold (for example, on a symplectic phase space) has a natural Poisson structure. This structure makes it possible to apply the hamiltonian formalism to equations of Vlasov type, which describe the evolution of distributions of particles in phase space under the action of a field which is consistent with the particles themselves.

B Poisson mappings

A mapping from one Poisson manifold to another is called a Poisson mapping if it is consistent with the Poisson structures, i.e., if for any two functions on the second manifold, the Poisson bracket of their pullbacks to the first manifold coincides with the pullback of their Poisson brackets. For example, the embedding of each symplectic leaf in a Poisson manifold is a Poisson mapping.

The cartesian product of two Poisson manifolds has a natural Poisson structure, for which the projection on each factor is a Poisson mapping (the Poisson bracket of functions pulled back from different factors is zero).

S. Lie showed that every Poisson manifold is locally (in the neighborhood of a point where the dimension of the symplectic leaves is locally constant, for example, in the neighborhood of a generic point, where the rank is locally maximal) decomposible into the product of a symplectic leaf and a complementary space on which all Poisson brackets are zero.

On such a neighborhood, one may introduce coordinates p_i, q_i, c_i such that p and q have the usual symplectic Poisson brackets, while the Poisson bracket of each c_i with any function is equal to zero. In physics, the coordinates p_i and q_i are called Clebsch variables,¹²¹ while the c_i’s are called Casimir functions. Clebsch introduced his variables for the hamiltonian description of the hydrodynamics of ideal fluids, while Casimir considered the center of the Lie algebra of functions on the dual space of a given Lie algebra.

The dimension of the symplectic leaf through a nongeneric point of a Poisson manifold is less than that for nearby generic points. In the neighborhood of such a point, the Poisson manifold may still be represented as the product of a neighborhood of the point in its symplectic leaf and a neighborhood of a distinguished point in some Poisson manifold of complementary dimension. In other words, on a minimal transverse manifold to a symplectic leaf there arises a (unique up to diffeomorphism) local Poisson structure—the so-called transverse Poisson structure (cf. A. Weinstein, The local structure of Poisson manifolds, J. Diff. Geom. 18 (1983), 523–557).¹²² In the transverse structure, the Poisson brackets of all functions are zero at the distinguished point (which may be taken as the origin of a coordinate system). The Taylor series for these brackets begin with

where

are the structure constants of a finite-dimensional Lie algebra (the linearized transverse structure).

A natural question arises: Is it possible to annihilate the higher order terms in the Taylor series by a suitable change of coordinates?

The question of the form of transverse structures was already raised by the author in Section VI.1.3 of the previously cited article.

If the linearized algebra is semisimple and the Poisson structure is analytic, then one can eliminate the higher order terms of the Taylor series by an analytic change of coordinates: J. Conn, Linearization of analytic Poisson structures, Annals of Math. 119 (1984), 577–601. An analogous result is true for the C^∞ case, when the linearized algebra is of compact type: J. Conn, Linearization of C^∞ Poisson structures, Annals of Math. (1985).

A. Weinstein, along with his earlier proof of an analogous result for formal series, expressed the conjecture that semisimplicity was a necessary condition for the annihilation of nonlinear terms. The study of singularities of Poisson structures in the plane (or, more generally, structures with symplectic leaves of codimension 2) leads, however, to a different conclusion.

C Poisson structures in the plane

From the point of view of differential geometry, a Poisson structure is given by a smooth bivector field on a manifold. In fact, the Poisson brackets at each point associate a number to each pair of cotangent vectors. Therefore they define a section of the second exterior power of the tangent bundle, i.e., a bivector field.

The Jacobi identity expresses a sort of “closedness” of this bivector field. On a two-dimensional manifold, this closedness condition is automatically satisfied everywhere, so that every smooth bivector field on the plane gives a Poisson structure. This circumstance allows one to apply to the classification of Poisson structures in the plane the usual considerations of general position (transversality, etc.). In terms of coordinates x, y, a bivector field may be expressed in the form f(∂_x ∧ ∂_y), where f is a smooth function. The corresponding Poisson structure is defined by the condition

(1)

A Poisson structure on the plane may also be given by a differential 2-form dx ∧ dy/f. This form, like the bivector field, is invariantly connected with the Poisson structure; however, unlike the bivector field, it has pole singularities along the curve f = 0. The leaves in this case are the points of the curve f = 0 and the connected components of the complement of this curve in the plane. Points of the curve f = 0 are called singular points of the Poisson structure. In the neighborhood of a nonsingular point, any Poisson structure in the plane may be put into the normal form {x, y} = 1.

The following diagram shows the beginning of the hierarchy of singularities of Poisson structures on the plane in the neighborhood of a singular point.

Each letter in the diagram represents a Poisson structure which, in suitable local coordinates with origin at the singular point under consideration, can be written in the form {x, y} = f, where the function f is given by Table 1.

Table 1

Theorem. Given a Poisson structure on a two-dimensional manifold, it is either reducible in a neighborhood of each point to one of the normal forms in Table 1, or it belongs to a set of codimension 8 in the space of Poisson structures.

Thus, a generic Poisson structure may be reduced in a neighborhood of each point to the normal form {x, y} = 1 (nonsingular point) or {x, y} = y (point of type A₀). In a generic one-parameter family, one encounters for special values of the parameter structures of the type A₁ : {x, y} = b(x² ± y²), b ≠ 0; in two-parameter families one finds A₂, etc.

Remark 1. In the two-dimensional case, the set of all Poisson structures forms a linear space, so that one may speak of a generic structure or family of structures (having in mind a structure [family] belonging to some open dense subset of the space of structures [families]). The problem of classifying generic Poisson structures in three or more dimensions is not uniquely posed, since the set of all such structures does not form a single manifold (one may find components of “different dimensions,” as in the classification of Lie algebras).

Remark 2. The structure {x, y} = y of type A₀ is the standard Poisson structure on the dual space of the Lie algebra of the group of affine transformations of the line. This structure was considered in 1965, in connection with the study of the Euler equations for left-invariant metrics on groups (in this case—the Lobachevskii metric on a half-plane), at which time it was already realized that the structure is stable and is locally equivalent to any structure of the form {x, y} = y + ⋯, where the dots designate higher order terms. This (evident) observation contradicts the previously mentioned conjecture of A. Weinstein, according to which the possibility of removing any higher order terms by a formal change of coordinates was characteristic of the linear Poisson structures on the dual spaces of semisimple Lie algebras.

Remark 3. The parameters a, b in the table above are moduli (invariants depending continuously on the structure). More precisely, structures equivalent to a given one are found only a finite number of times as the parameters are varied.

The rational functions in Table 1 may be replaced by polynomials, but it is not very convenient to do so. The number of moduli in the numerator is one less than the number of irreducible components of the curve f = 0. This is not merely a coincidence. One invariant of a Poisson structure on the plane is the residue constructed from the form dx ⋀ dy/f (initially, one constructs a residue-form on each component, then its residue at the origin). The sum of the residues corresponding to all the components is zero. Therefore the number of moduli is 1 less than the number of components.

D Powers of volume forms

The classification of Poisson structures on the plane may be considered as the classification of differential forms of the type f(dx ⋀ dy)⁻¹, where f is a smooth (or holomorphic) function. More generally, it is natural to consider forms of the type

(2)

where α is a fixed number, generally complex. The classification of such forms and their deformations in the one-dimensional case, recently carried out by V. P. Kostov, revealed the role of resonance values of α (certain negative rational numbers).

For example, the resonance case n = 1, α = −1 corresponds to the classification of the singularities and their bifurcations for vector fields on the line, i.e., singular points of differential equations ẋ = v(x) and their bifurcations in finite-parameter families. A generic one-parameter family may be reduced by a smooth (holomorphic) change of the parameter and a smooth (holomorphic) change of the variable x, depending smoothly (holomorphically) on the parameter, to the form

. (For k parameters, the corresponding form is

The nonresonance case was studied by S. Lando for all n and α: he showed that almost every versal deformation of the function f defines, after multiplication by (dx)^α, a versal deformation of the form, as long as α is not a resonance value.

The case α = −1, which is interesting in connection with Poisson structures, is generally a resonance case. Instead of powers of volume forms, as in (2), we may consider the differential forms

(3)

whose classification is obviously equivalent.

The hypersurface f = 0 is invariantly connected with the form (3). The classification therefore begins with the reduction to normal form of the singularity manifold f = 0. The beginning of the hierarchy of singular points of hypersurfaces is known. In suitable local coordinates, a hypersurface is given by one of the equations in the following list:

After we have brought the hypersurface into normal form, the classification of the forms (2) or (3) comes down to classifying forms of the type

(4)

where f = 0 is the given equation of the singularity hypersurface and h is a smooth (holomorphic) function which remains to be put in normal form.

E The quasi-homogeneous case

We shall consider here the case in which the singularity hypersurface f = 0 is quasi-homogeneous (this condition holds for the cases A, D, E).

Definition. A function f is called quasi-homogeneous of weight p, with weights w_i attached to the variables x_i, if it is an eigenfunction with eigenvalue p for the quasi-homogeneous Euler vector field ε (or is zero):

A quasi-homogeneous polynomial is called nondegenerate if the critical point 0 has finite multiplicity (i.e., it is ℂ isolated). From here on, we will take the weights w_i to be positive numbers.

Theorem. Let f be a nondegenerate quasi-homogeneous polynomial of weight 1. Then the differential form f ^βh dx (where dx = dx₁ ⋀ ⋯ ⋀ dx_n and h is a holomorphic function on a neighborhood of 0) may be reduced by a biholomorphic coordinate change in a neighborhood of zero to the form f ^β(1 + ϕ) dx, where ϕ is a quasi-homogeneous polynomial of weight −β − σ, σ = w₁ + ⋯ + w_n.

The weight of ϕ is chosen so that the weight of the form f ^βϕ dx is zero.

An analogous theorem is true for smooth h (and smooth coordinate changes), except that in the real case one must replace 1 + ϕ by ± 1 + ϕ.

Example 1. If β is positive, then ϕ ≡ 0, so that the complex form reduces to f ^β dx.

More generally, ϕ ≡ 0 if the (possibly complex) number β is not a negative rational number: in this case, a nonzero quasi-homogeneous polynomial of weight −β − σ does not appear. If the polynomial f (or just its quasi-homogeneity type w) is fixed, then the resonance values of β form a finite set of arithmetic progressions in the negative rationals (for the remaining β, f ^βh dx reduces to the form f ^β dx).

Example 2. If β = −1, then the monomials occurring in ϕ may be enumerated by the interior integral points of the Newton diagram of f. The monomial

corresponds to the point (m₁ + 1,..., m_n + 1 ) of the diagram (i.e., the exponent of the form x^m dx).

Example 3. Suppose that β = − 1 , n = 3, and f is one of the A, D, E polynomials introduced above, defining a simple singularity. Calculating weights, we find that −β − σ < 0; therefore ϕ ≡ 0, from which we obtain:

Corollary 1. The form with pole singularity

where f is one of the polynomials A, D, E, may be reduced to the form dx ⋀ dy ⋀ dz/f by a holomorphic (smooth) change of coordinates.

In exactly the same way for any n ≥ 3, a factor h(x₁,..., x_n) which does not vanish at the origin can be converted to unity.

Corollary 2. A simple form (i.e., one not having moduli) of the type dx₁ ⋀ ⋯ ⋀ dx_n/f(x₁,..., x_n), where f is a holomorphic (smooth) function near the origin and n > 2, may be reduced by a coordinate change in a neighborhood of the origin to a normal form in which f is either 1 or one of the A, D, E polynomials.

Corollary 3. A simple (not having moduli) n-vector field in n-dimensional space (n > 2) is locally equivalent to a normal form f · (∂₁ ⋀ ⋯ ⋀ ∂_n), where f is either 1 or one of the A, D, E polynomials; ∂_k = ∂/∂x_k.

Corollary 4. For l ≤ 6, in generic l-parameter families of n-vector fields on n-dimensional space (n > 2), the field in a neighborhood of each point and for each value of the parameters is equivalent to one of the simple fields in the preceding corollary.

Corollary 5. For l ≤ 6, in generic l-parameter families of forms dx ⋀ dy ⋀ dz/f(x, y, z), one finds only forms which in the neighborhood of each point are locally equivalent to one of the following 24 types:

For n = 2 and β = − 1, the theorem may be applied in the following way.

Corollary 6. Let f be a nondegenerate quasi-homogeneous polynomial of weight 1 with argument weights w₁, w₂. Then the form

where h is a smooth (holomorphic) function in a neighborhood of 0, can be reduced by a suitable smooth (holomorphic) coordinate change to a form in which h = ± 1 + ϕ, where ϕ is a quasi-homogeneous polynomial of weight 1 − w₁ − w₂.

Correspondingly, bivector fields and Poisson structures may be locally reduced to the form

Calculating the weights of the simple singularity types A, D, E for functions of two variables, we obtain Table 1 from the last corollary. For example, for A₁ we have w₁ = w₂ = ½, the weight of ϕ equals 0, and so ϕ is constant.

The dimension of the space of equivalence classes of forms h dx ⋀ dy/f, where h(0) ≠ 0 and f is a fixed nondegenerate quasi-homogeneous polynomial, equals the dimension of the space of quasi-homogeneous polynomials of weight σ.

F Varchenko’s theorem

A. N. Varchenko has proven a series of generalizations of the preceding theorem. Here we shall describe the simplest of these.

1. Let f be a quasi-homogeneous polynomial of weight 1 in the variables x₁,..., x_n with weights w₁,..., w_n. Suppose that, for some set I of multi-indices, the residue classes of the monomials x^I generate (as a vector space) the factor algebra of the algebra of formal power series

Theorem. Every germ f ^βh dx is equivalent to a germ of the form f ^β(1 + ∑λ_m,lx^mf^l) dx, where the l’s are nonnegative integers and the m’s are elements of I such that the weight of each form f ^βx^mf^l dx is equal to zero.

2. We define the degree of non-quasi-homogeneity of the germ f to be the dimension of the factor space (f, ∂f/∂x₁,..., ∂f/∂x_n)/(∂f/∂x₁,..., ∂f/∂x_n).

Theorem. For almost all β, the number of moduli of the form f^βh dx₁ ⋀ ⋯ ⋀ dx_n (for fixed β and f and arbitrary h, h(0) ≠ 0) is equal to the degree of non-quasi-homogeneity of the germ f. The exceptional (resonance) values of β consist of a finite number of arithmetic progressions of negative rational numbers, with difference −1. In particular, for any β ≥ 0, the number of moduli equals the degree of non-quasi-homogeneity.

3. Example. For β = 0, we obtain:

Corollary. The number of moduli of the form h dx (h(0) ≠ 0), relative to the group of diffeomorphisms preserving the germ of f, equals the degree of non-quasi-homogeneity of f (equal to zero, if the germ of f is equivalent to a quasi-homogeneous one).

4. In the resonance cases, the result is more complicated.

Example. Let n = 2, β = −1 (Poisson structures in the plane).

Theorem. The number of moduli for a germ of a Poisson structure with given singular curve f = 0 equals the degree of non-quasi-homogeneity of the germ of f augmented by one less than the number of irreducible components of the germ of the curve f = 0.

In resonance cases, the number of moduli behaves in a rather regular way along each arithmetic progression with difference −1. Namely, when β decreases by 1 the number of moduli increases (not necessarily strictly), but its maximal value does not exceed (for any β > −n) the “nonresonant” value (i.e., the degree of non‐quasi‐homogeneity of f) by more than the number of Jordan blocks associated with the eigenvalue e^2πiβ of the monodromy operator of the function f.

G Poisson structures and period mappings

An interesting source of Poisson structures is provided by the period mappings of critical points of holomorphic functions (A. N. Varchenko and A. B. Givental’, Mapping of periods and intersection form, Funct. Anal. Appl. 16, (1982), 83–93).

Period mappings allow one to transfer to the base of a fibre bundle certain structures which live on the (co)homology spaces of the fibres. A Poisson structure on the base arises in this way from the intersection form in the middle-dimensional homology of the fibres, when this form is skew-symmetric.

Period mappings are defined by the following construction. Suppose that one is given a locally trivial fibration. Associated to such a fibration are the bundles (over the same base) of homology and cohomology of the fibres with complex coefficients. These bundles are not only locally trivial, but they are locally trivialized in a canonical way (the integer cycles in a fibre are uniquely identifiable with integer cycles in the nearby homology fibres). A period mapping is defined as a section of the cohomology bundle.

Suppose now that one is given, on the total space of a differentiable fibre bundle, a differential form which is closed on each fibre. The period mapping of this form associates to each point of the base the cohomology class of the form on the fibre over this point.

If one is given a vector field on the base of the fibration, then any (smooth) period mapping may be differentiated along this vector field, and the derivative is again a period mapping. In fact, neighboring fibres of the cohomology bundle are identified with one another by the above-mentioned “integer” local trivialization, so a section may be considered (locally) as a map into one fibre and may be differentiated as an ordinary (vector-valued) function.

Suppose now that the base is a complex manifold having the same complex dimension as the fibres of the cohomology bundle. A period mapping is called nondegenerate if its derivatives along any ℂ-independent vectors at each point are linearly independent. In other words, a period mapping is nondegenerate if the corresponding local maps from the base to typical fibres are diffeomorphisms.

The derivative of a nondegenerate period mapping thus allows us to map the tangent bundle of the base isomorphically onto the cohomology bundle. The dual isomorphism goes from the homology bundle to the cotangent bundle of the base. This isomorphism transfers to the base any additional structures carried by the homology groups.

Suppose that the fibres of our original bundle are (real) oriented even dimensional manifolds, and consider their homology in the middle dimension. In this case, the homology of each fibre carries a bilinear form: the index of intersection. This form is symmetric if the dimension of the fibre is a multiple of 4; otherwise, it is skew-symmetric. The form is nondegenerate if the fibre is closed (i.e., compact and without boundary); otherwise, it may be degenerate. We shall suppose below that we are in the situation where the form is skew-symmetric.

In this situation a nondegenerate period mapping induces a Poisson structure on the base. In fact, the isomorphism described above, between the cotangent spaces of the base and the homology groups of the fibres (carrying their skew-symmetric intersection forms), defines a skew-symmetric bilinear form on pairs of cotangent vectors. The Poisson bracket of two functions on the base is defined as the value of this form on the differentials of the functions.

This bracket defines a Poisson structure (of constant rank) on the base. This is obvious from the fact that the local identification of the base with the cohomology of the typical fibre, given by the period mapping, provides the base with local coordinates whose Poisson brackets are constant.¹²³

Varchenko and Givental’ observed that if one constructs, in the way just described, using a generic 1-form, a Poisson structure on the complement of the discriminant locus in the base of a versal deformation of a critical point of a function of two variables, then this structure may be holomorphically extended across the discriminant locus. (One may replace the discriminant locus above by the wave front of a typical singularity.) We shall limit ourselves here to the simplest examples of Poisson structures arising in this way.

Consider the three-dimensional space of polynomials ℂ³ = {x⁴ + λ₁x² + λ₂x + λ₃} with coordinates λ_k. The polynomials with multiple roots form therein the discriminant surface (a swallowtail; see Figure 247).

Figure 247

Poisson structure and the swallowtail

The Poisson structures arising from period mappings may be reduced (by diffeomorphisms preserving the swallowtail) to the following form: the symplectic leaves are the planes λ₂ = const., and their symplectic structures are of the form dλ₁ ⋀ dλ₃.

The fibration of interest here is formed by the complex curves {(x, y): y² = x⁴ + λ₁x² + λ₂x + λ₃}, and the period mapping is given by, for example, the form y dx. (See V. I. Arnold, A. N. Varchenko, S. M. Gusein-Zade, “Singularities of Differentiable Mappings,” Vol. 2: Monodromy and the Asymptotics of Integrals, Birkhäuser, 1988, §15, or Uspekhi Mat. Nauk 40, no. 5 (1985).)

The Poisson structures on the swallowtail space which arise from period mappings may be characterized locally among all generic structures by the following property: the line of self-intersections of the tail lies entirely in one symplectic leaf. The required genericity condition is that the tangent planes at the origin to the symplectic leaf and the swallowtail do not coincide. Every smooth function which is constant along the line of self-intersections of the tail, and whose derivative along the symplectic leaf at the origin is nonzero, may be reduced in a neighborhood of the origin, by a diffeomorphism preserving the tail, to the form λ₂ + const.; also, a family of holomorphic symplectic structures in the planes λ₂ = const. may be reduced to the form dλ₁ ⋀ dλ₃ by a holomorphic local diffeomorphism of three-dimensional space which preserves the swallowtail as well as the foliation by the planes.

One may conjecture more generally that those Poisson (in particular, symplectic) structures on the base of a versal deformation of a singularity, induced from the intersection form by an infinitesimally stable period mapping, may be characterized (up to diffemorphisms preserving the bifurcation set) by a natural condition on the rank of the restricted Poisson structure to the strata of the discriminant locus. The “natural condition” in the three-dimensional example above is that the line of self-intersections of the swallowtail be contained in a symplectic leaf. In four-dimensional space, an analogous role would apparently be played by the condition that a certain submanifold be lagrangian, namely, the manifold of polynomials having two critical points with critical value zero in the symplectic space of polynomials x⁵ + λ₁x³ + λ₂x² + λ₃x + λ₄ (the ranks of the symplectic structure on the tangent spaces to the other strata may also be important).

¹²¹

Translator’s note: The term Clebsch variables is also used to refer to canonical coordinates on a symplectic manifold which projects onto (rather than embeds into) a Poisson manifold.

¹²²

Warning: As A. B. Givental’ has noted, Theorem 3.1 in this paper is incorrect. (Translator’s note: For further discussion, see A. Weinstein, Lie algebras and Poisson structures, Astérisque, hors série (1985), 257–271.)

¹²³

In the case where the intersection form is symmetric, the analogous construction defines on the base a flat pseudo-riemannian (possibly degenerate) metric.

Appendix 15: On elliptic coordinates

A system of Jacobi’s elliptic coordinates is associated to each ellipsoid in euclidean space. These coordinates make it possible to integrate the equations of geodesics on the given ellipsoid, as well as certain other equations, such as the equations of motion for a point on a sphere under the influence of a force with quadratic potential, or for a point on a paraboloid under the influence of a uniform gravitational field.

These facts suggest that, even on an infinite-dimensional Hilbert space, there should be a class of integrable systems associated to each symmetric operator. To study these systems, it is necessary to extend the theory of elliptic coordinates to the infinite-dimensional case. To do this, it is first necessary to express the finite-dimensional theory of confocal quadric surfaces in coordinate free form.

In the transition to the infinite-dimensional case, symmetric operators on finite-dimensional euclidean spaces must be replaced by self-adjoint operators on Hilbert spaces. Since the elliptic coordinates are not really connected with the operator itself, but rather with its resolvent, the unboundedness of the original operator (which might be, for example, a differential operator) does not present a serious obstacle.

In some cases, the elliptic coordinates on Hilbert space obtained from a self-adjoint operator form a countable sequence; however, when the operator has a continuous spectrum, the coordinates form a continuous family. In this case, the transformation from the original point of the Hilbert space (thought of as a function space) to the continuous family of elliptic coordinates of the point may be considered as a nonlinear mapping between function spaces. This mapping, by analogy with the Fourier transform, might be called the Jacobi transform: the original function is transformed into a function which expresses the elliptic coordinates in terms of some continuous “index.” (More precisely, the result of the transform is a measure on the spectral parameter axis.) The study of the functional analytic properties and the inversion of the Jacobi transform will probably be accomplished before too long.

Following an exposition of the general theory of elliptic coordinates, we shall describe below some of the applications of these coordinates to potential theory.

This appendix is based on the following papers by the author.

Some remarks on elliptic coordinates, Notes of the LOMI Seminar (volume dedicated to L. D. Faddeev on his 50th birthday), 133 (1984), 38–50.

Integrability of hamiltonian systems associated with quadrics (after J. Moser), Uspekhi 34, no. 5, 214.

Some algebro-geometrical aspects of the Newton attraction theory, Progress in Math. (I. R. Shafarevich volume), 36 (1983), 1–4.

Magnetic analogues of the theorem of Newton and Ivory, Uspekhi 38, no. 5 (1983), 145–146.

Further details on background material for the results in this appendix may be found in the following papers.

R. B. Melrose, Equivalence of glancing hypersurfaces, Invent. Math. 37 (1976), 165–191.

J. Moser, Various aspects of integrable Hamiltonian systems, in: J. Guckenheimer and S. E. Newhouse, eds. “Dynamical systems”, CIME Lectures, Bressanone, Italy, June 1978, Cambridge, Mass., Birkhäuser, Boston, 1980, pp. 233–289.

V. I. Arnold, Lagrangian manifolds with singularities, asymptotical of rays, and unfoldings of the swallowtail, Funct. Anal. Appl. 15 (1981).

V. I. Arnold, Singularities in variational calculus, J. Soviet Mathematics 27 (1984), 2679–2713.

A. B. Givental’, Polynomial electrostatic potentials (Seminar report, in Russian), Uspekhi Mat. Nauk 39, no. 5 (1984), 253–254.

V. I. Arnold, On the Newtonian potential of hyperbolic layers, Selecta Math. Sovietica 4 (1985), 103–106.

A. D. Vainshtein and B. Z. Shapiro, Higher-dimensional analogs of the theorem of Newton and Ivory, Funct. Anal. Appl. 19 (1985), 17–20.

A Elliptic coordinates and confocal quadrics

Elliptic coordinates in euclidean space are defined with the aid of confocal quadrics (surfaces of degree two). The geometry of these quadrics is obtained from the geometry of pencils of quadratic forms in euclidean space (i.e., from the theory of principal axes of ellipsoids or from the theory of small oscillations) by a passage to the dual space.

Definition 1. A eucildean pencil of quadrics (resp. quadratic forms) in a euclidean vector space V is a one-parameter family of surfaces of degree two

(resp. forms A_λ), where

and where A is a symmetric operator

Definition 2. A confocal family of quadrics in a euclidean space W is a family of quadrics dual to the quadrics of a euclidean pencil in W*:

Thus, quadrics which are confocal to one another form a one-parameter family, but the quadratic forms defining the family do not depend linearly on the parameter.

Example. The family of plane curves which are confocal to a given ellipse consists of all those ellipses and hyperbolas with the same foci. In Figure 248, the curves of a confocal family are shown on the left, and the curves of the corresponding euclidean pencil are shown on the right.

Figure 248

A confocal family and the corresponding euclidean pencil

The elliptic coordinates of a point are the value of the parameter λ for which the corresponding quadrics of a fixed confocal family pass through the point.

We fix an ellipsoid in eucildean space with all its axes of different lengths.

Theorem 1 (Jacobi). Through each point of an n-dimensional euclidean space there pass n quadrics confocal to a given ellipsoid. Smooth confocal quadrics intersect at right angles.

Proof. Each point other than 0 in our space corresponds to an affine hyperplane in the dual space, consisting of those linear functionals whose value is 1 at the given point. In terms of the dual space, Theorem 1 means that every hyperplane not passing through 0 in an n-dimensional euclidean space is tangent to precisely n of the quadrics in a euclidean pencil, and the vectors from 0 to the points of tangency are pairwise orthogonal (Figure 248, right).

The proof of the property of euclidean pencils just stated is based on the fact that the aforementioned vectors define the principal axes of the quadratic forms B = ½(Ax, x) − ½(l, x)², where (l, x) = 1 is the equation of the hyperplane.

As a matter of fact, on a principal axis of any quadratic form B, corresponding to the proper value λ, the form B − λE reduces to 0 along with its gradient. The vanishing of this form at the point of intersection of the principal axis and the hyperplane means that the point of intersection lies on the quadric ½(Ax, x) = 1, while the vanishing of the gradient means that the quadric and the hyperplane are tangent at the point. □

Theorem 2 (Chasles). Given a family of confocal quadrics in n-dimensional euclidean space, a line in general position is tangent to n − 1 different quadrics in the family, and the planes tangent to the quadrics at the points of tangency are pairwise orthogonal.

Proof. We project the quadrics in the confocal family along a pencil of parallel lines onto the hyperplane perpendicular to the pencil. Each quadric defines an apparent contour (the set of critical values of the projection of the quadric). For a projection whose direction is in general position, the apparent contour is a quadric (i.e., a surface of degree two) in the image hyperplane.

Here we need a lemma.

Lemma. The apparent contours of the quadrics in a confocal family form themselves a confocal family of quadrics.

Proof. On passage to the dual, sections become projections and vice versa. The apparent contours of the projections of confocal quadrics along a pencil of parallel lines are therefore dual to the sections of the dual quadrics by a hyperplane passing through the origin.

The sections of the quadrics in a euclidean pencil by a hyperplane through 0 form a euclidean pencil of quadrics in the hyperplane. The lemma now follows by duality. □

Returning to the proof of Theorem 2, we apply the lemma above to the projections along the line in the statement of the theorem. According to the lemma, the apparent contours of the projections of the confocal quadrics in Theorem 2 form a confocal family of quadrics in a hyperplane. By Theorem 1, n − 1 of these apparent contours pass through each point, where they intersect at right angles. This completes the proof of Theorem 2. □

Theorem 3 (Jacobi and Chasles). Given a geodesic on a quadric Q in n-dimensional space, there is a set of n − 2 quadrics confocal to Q such that all the tangent lines to the geodesic are also tangent to the quadrics in the set.

Proof (Beginning). We consider the manifold of oriented lines in euclidean space. This manifold has a natural symplectic structure as the manifold of characteristics in the hypersurface p² = 1 in the phase space of a free particle moving under its own inertia in our euclidean space.

(The characteristics on a hypersurface in a symplectic manifold are the integral curves of the field of characteristic directions, i.e., the field of directions which are skew-orthogonal to the tangent spaces of the hypersurface. In other words, the characteristics of the hypersurface are the phase curves for any hamiltonian flow whose hamiltonian function vanishes to first order on the hypersurface.

The symplectic structure on the manifold of characteristics on a hypersurface in a symplectic manifold is defined in such a way that the skew-scalar product of any two vectors tangent to the hypersurface is equal to the skew-scalar product of their projections in the manifold of characteristics.

Note, finally, that the notion of characteristics is equally well defined for any submanifold of a symplectic manifold on which the induced 2-form has constant nullity. The characteristics then have dimension equal to that nullity, and the manifold of characteristics still inherits a symplectic structure.) □

Lemma A. Each characteristic of the manifold of lines tangent to a given hypersurface in euclidean space consists of all the lines tangent to a single geodesic on the hypersurface.

Proof of Lemma A. For efficiency of expression, we will identify the cotangent vectors to euclidean space with tangent vectors by using the euclidean structure, so that our original phase space is represented as the space of vectors based at points of eucildean space (i.e., momenta are identified with velocities). The unit vectors to the given hypersurface form a submanifold of odd codimension (equal to 3) in phase space. The characteristics of this submanifold define the geodesic flow on the hypersurface.

The map which assigns to each vector the line in which it lies takes the codimension 3 submanifold just described to the manifold of lines tangent to the hypersurface. Under this mapping, characteristics are transformed to characteristics (with respect to the symplectic structure on the space of lines). This proves the lemma. □

[Remark. The preceding argument may be easily extended to the following general situation, first considered by Melrose. Let Y and Z be a pair of hypersurfaces in a symplectic manifold X which intersect transversally along a submanifold W. We consider the manifolds of characteristics B and C of the hypersurfaces Y and Z together with the canonical quotient fibrations

and

; the manifolds B and C inherit symplectic structures from X.

In the intersection W, there is a distinguished hypersurface (of codimension 3 in X) consisting of points at which the restriction to W of the symplectic structure on X is degenerate. This hypersurface Σ in W may also be defined as the set of critical points of the composed mapping

(or

if one wishes). These objects form the following commutative diagram:

The analogue to Lemma A in this situation is the assertion that the characteristics on the images of the mappings Σ → B and Σ → C are the images of one and the same curve on Σ (namely, the characteristics of Σ considered as a submanifold of the symplectic manifold X).

Lemma A itself is the special case of the assertion above in which X = ℝ²ⁿ (the phase space of a free particle in ℝⁿ), the hypersurface Y consists of the unit vectors (given by the condition p² = 1, i.e., a level surface of the hamiltonian for a free particle), and the hypersurface Z consists of those vectors which are based at the points of the given hypersurface in ℝⁿ. In this case, B is the manifold of all oriented lines in euclidean space, and Σ is the manifold of unit vectors tangent to the hypersurface. The mapping Σ → B assigns to each unit vector the line which contains it. The manifold C is the (co)tangent bundle of the given hypersurface. Σ → C is the embedding into this bundle of its unit sphere bundle (in other words, the embedding of a level surface of the kinetic energy, i.e., the hamiltonian for motion constrained to the hypersurface).

It is always useful to keep the diagram above in mind when one is dealing with constraints in symplectic geometry.]

Proof of Theorem 3 (Middle). We suppose given a smooth function on euclidean (configuration) space whose restriction to a certain line has a nondegenerate critical point. In this situation, the function will also have a critical point when restricted to each nearby line; i.e., on each nearby line, there will be a nearby point where the line is tangent to a level surface of the function. The value of the function at the critical point is thus a function (defined locally) on the space of lines. We call this function of lines the induced line function (from the original point function). □

Lemma B. If two point functions in euclidean space are such that the tangent planes to their level surfaces are orthogonal at the points where a given line is tangent to these surfaces (these points being in general different for the two functions), then the Poisson bracket of the induced line functions is zero at the given line (considered as a point in the space of lines).

Proof of Lemma B. We calculate the derivative of the second induced line function along the phase flow whose hamiltonian is the first induced function. The phase curves for the first induced function, which lie on its level surfaces, are the characteristics of those surfaces. A level surface for the first induced function consists of those lines which are tangent to a single level surface of the first point function. Each characteristic of this surface, according to Lemma A, consists of the lines which are tangent to a single geodesic on the level surface of the first point function.

For an infinitesimally small displacement of a point on a geodesic in a surface, the tangent line to the geodesic rotates (up to infinitesimal quantities of higher order) in the plane spanned by the original tangent and the normal to the surface. By hypothesis, the tangent plane to the level surface of the second function at the point where this surface is tangent to our line is perpendicular to the tangent plane of the level surface of the first function. Therefore, under the above-mentioned infinitesimally small rotation, the line remains tangent to the same level surface of the second function (up to infinitesimals of higher order). It follows that the rate of change of the second induced function under the action of the phase flow given by the first is zero at the element in question of the space of lines, which proves Lemma B. □

Proof of Theorem 3 (End). We fix a line in general position in ℝⁿ. According to Theorem 2, this line is tangent to n − 1 quadrics in the confocal family, at n − 1 points. We construct in the neighborhood of each of these points a smooth function, without critical points, whose level surfaces are the quadrics of our confocal family.

We fix one of these quadrics (the “first”) and consider the hamiltonian system on the space of lines whose hamiltonian function is the first induced line function. Each of its phase curves on a fixed level surface of the hamiltonian function consists of the tangent lines to one geodesic of that quadric (Lemma A). The remaining induced functions have zero Poisson bracket with the hamiltonian, by Lemma B (since the planes tangent to the confocal surfaces at the points where they touch one line are orthogonal, by Theorem 2).

Thus all the induced functions are first integrals for the hamiltonian system generated by any one of them. Since the lines tangent to a geodesic on the first quadric form a phase curve of the first system, all the induced functions take constant values on this curve. That proves Theorem 3, as well as the following result. □

Theorem 4. The geodesic flow on a central surface of degree 2 in euclidean space is a completely integrable system in the sense of Liouville (i.e., it has as many independent integrals in involution as it has degrees of freedom).

Remark. Strictly speaking, we proved Theorem 3 only for lines in general position, but the result extends by continuity to the exceptional cases (in particular, to asymptotic lines of our quadrics). In the same way, Theorem 4 was initially proved just for quadrics with unequal principal axes, but passage to a limit extends the result to more symmetric quadrics of revolution (as well as to noncentral “paraboloids”).

B Magnetic analogues of the theorems of Newton and Ivory

Elliptic coordinates make it possible to extend Newton’s well-known theorem on the gravitational attraction of a sphere to the case of attraction by an ellipsoid.

Definition. A homeoidal density on the surface of an ellipsoid E is the density of a layer between E and an infinitely nearby ellipsoid which is homothetic to E (with the same center).

The following is a well-known result.

Ivory’s Theorem. A finite mass, distributed on the surface of an ellipsoid with homeoidal density, does not attract any internal point; it attracts every external point the same way as ιf the mass were distributed with homeoidal density on the surface of a smaller confocal ellipsoid.

The attraction in Ivory’s theorem is defined by the law of Newton or Coulomb: in n-dimensional space, the force is proportional to r¹⁻ⁿ (as prescribed by the fundamental solution of Laplace’s equation).

Newton’s theorem on the (non)attraction of an internal point carries over to the case of a hyperbolic homeoidal layer and to the case of an attracting mass distributed on a level hypersurface of a hyperbolic polynomial of any degree. (Apolynomialofdegree m, f(x₁,..., x_n) is called hyperbolic if its restriction to any line through the origin has all its roots real.)

A homeoidal charge density on the zero hypersurface f = 0 of a hyperbolic polynomial is defined as the density of a homogeneous infinitesimally thin layer between the hypersurfaces f = 0 and f = ε → 0 (the signs of the charges being chosen so that successive ovaloids have opposite charges).

[A homeoidal charge does not attract the origin (nor any other point within the innermost ovaloid), and this property is preserved ιf the charge density is multiplied by any polynomial of degree at most m − 2.

Generalization: If a homeoidal charge density is multiplied by any polynomial of degree m − 2 + r, then the potential inside the innermost ovaloid is a harmonic polynomial of degree r (A. B. Givental’, 1983).]

When one attempts to find a version for hyperboloids of Ivory’s theorem on the attraction of confocal ellipsoids, it turns out that an essential role is played by the topology of the hyperboloids. When passing to hyperboloids of different signatures, one must consider, instead of homeoidal densities, harmonic forms of different degrees, and instead of the Newton or Coulomb potential, the corresponding generalized forms-potentials given by the Biot-Savart law.

In the simplest nontrivial case of a hyperboloid of one sheet in three-dimensional euclidean space, the result is as follows.

The hyperboloid divides space into two parts: “internal” and “external,” the latter being nonsimply connected. We consider elliptic coordinate curves from the system whose level surfaces are the quadrics confocal to the given hyperboloid.

The elliptic coordinate curves on our hyperboloid, which are obtained by intersecting with the confocal ellipsoids (closed lines of curvature on the hyperboloid), are called the parallels of the hyperboloid. The orthogonal curves, obtained by intersection with the two-sheeted hyperboloids, are called the meridians.

Although the elliptic coordinate system has singularities (on each symmetry plane of the quadrics in the family), the hyperboloid is smoothly fibred by the parallels (diffeomorphic to the circle) and meridians (diffeomorphic to the line).

The region inside the hyperboloidal tube is also smoothly fibred by meridians (orthogonal to the ellipsoids in the confocal family), while the annular region outside the hyperboloid is smoothly fibred by parallels (orthogonal to the hyperboloids of two sheets).

Theorem. A current with a suitable density, flowing along the meridians of a hyperboloid, produces a magnetic field which is zero inside the hyperboloidal tube, while the field in the annular exterior region is directed along the parallels. A current with a suitable density, flowing along the parallels of a hyperboloid, produces a magnetic field which is zero in the exterior annular region, while the field inside the hyperboloidal tube is directed along the meridians. (See Figure 249.)

Figure 249

Magnetic fields generalizing the theorems of Newton and Ivory

The current densities giving rise to such magnetic fields, which generalize the homeoidal charge densities on ellipsoids, may be described in the following way. There are associated to each family of confocal quadrics in three-dimensional euclidean space two “focal curves”: an ellipse and a hyperbola. (See Figure 250.) The focal ellipse is the boundary of the limiting ellipsoid of the family in which the shortest axis shrinks to zero; the focal hyperbola arises in a similar way from the hyperboloids of one or two sheets.

Figure 250

Focal ellipse and focal hyperbola

We define a homeoidal density on a focal ellipse in the following way. To begin we consider any nonplanar parallel, defined as the nonplanar intersection of an ellipsoid with a hyperboloid of one sheet. A homeoidal density on this parallel is defined as the density on an infinitesimally thin “wire,” obtained by intersecting the layer between the given ellipsoid and a homothetic one infinitesimally nearby with the layer between the given hyperboloid and a homothetic one infinitesimally close by, both homotheties being taken with respect to the center of the confocal family. We normalize this homeoidal density on the parallel in such a way that the mass of the entire parallel is equal to 1.

Now we consider the focal ellipse as a limit of nonplanar parallels. It turns out that the normalized homeoidal densities on the parallels have a well-defined limit as the parallels approach the focal ellipse. This limiting density is called the homeoidal density on the focal ellipse.

The homeoidal density on a focal hyperbola is defined in an analogous way.

We may now describe the current densities referred to as “suitable” in the theorem above on magnetic fields. The surface of a hyperboloid of one sheet is fibred over the focal ellipse (the fibre over a point is the meridian which lies on the same hyperboloid of two sheets as that point).

The flux of the meridianal current suitable for the theorem, through any curve on the hyperboloid, equals the integral of the homeoidal density form on the focal ellipse over the projection of that curve onto the focal ellipse (along the hyperboloids of two sheets).

The density of the flow along the parallels is induced in an analogous way from the homeoidal density on the focal hyperbola.

Remark. The magnetic field of the parallel flow with the indicated density, inside the hyperboloidal tube, coincides outside each confolal ellipsoid (up to sign) with the newtonian or coulombian field produced by a charge which is distributed with homeoidal density on that ellipsoid.¹²⁴

In exactly the same way, the magnetic field in the annular domain outside the hyperboloid of one sheet coincides (up to sign), in the region between the sheets of each confocal hyperboloid of two sheets, with the coulombian field produced by two equal charges with opposite signs distributed on the two sheets of the hyperboloid with homeoidal density (O. P. Shcherbak).

The results formulated above have recently been extended by B. Z. Shapiro and A. D. Vainshtein to hyperboloids in euclidean spaces of any number of dimensions. For a hyperboloid in ℝⁿ, diffeomorphic to S^k × ℝ^l, a harmonic k-form is constructed on the exterior region (diffeomorphic to the product of S^k with a half-space) and a harmonic l-form is constructed on the interior.

The corresponding homeoidal densities are defined on the focal ellipsoid with codimension k and the focal hyperboloid of two sheets with codimension l by the same limiting procedure that we described above for k = l = 1, using the intersections of layers between infinitesimally close and homothetic quadrics.

Noncomputational proofs of these geometric theorems are unknown, even for the special case of magnetic fields in three-dimensional space.

Remark. The presence of distinguished harmonic forms on hyperboloids and in their complementary domains suggests that one might try to find filtrations, analogous to those arising in the theory of mixed Hodge structures, in spaces of differential forms on noncompact (and possibly even singular) algebraic and semialgebraic real manifolds.

¹²⁴

This is actually the density with which a charge will distribute itself on the surface of a conducting ellipsoid.

Appendix 16: Singularities of ray systems

The simplest example of a ray system is the system of normals to a surface in euclidean space.

In a neighborhood of a smooth surface, its normals form a smooth fibration, but at some distance from the surface various normals begin to intersect one another (Figure 251). The complicated figures which are thereby formed were already investigated by Archimedes, but their full details were not revealed until the discovery in 1972 of the relation between singularities of ray systems and the theory of groups generated by reflections.

Figure 251

A caustic as the envelope of rays

This relation, for which there is no evident a priori reason (and which is as surprising as, say, the relation between the problems of tangents and areas), has turned out to be a powerful instrument for the study of critical points of functions. By 1978, it had become clear that the theory of reflection groups also governs the singularities of the Huygens evolvents.

Huygens (1654) discovered that the evolvent of a plane curve has a cusp singularity at each point where it meets the curve (Figure 252). Evolents of plane curves and their higher-dimensional generalizations are wave fronts on manifolds with boundary. Singularities of wave fronts, like those of ray systems, are classified in terms of reflection groups.

Figure 252

An evolvent of a curve

While rays and fronts on manifolds without boundary are related to the Weyl groups in the A, D, and E series, singularities of evolvents are described by the groups of types B, C, and F (the ones with double connections in their Dynkin diagrams).

The remaining reflection groups (I₂(p), H₃, H₄) continued for some time to have no visible relation to the theory of singularities. This situation changed in the fall of 1982 when it was discovered that the symmetry group H₃ of the icosahedron governs the singularities of evolvent systems in the neighborhood of inflection points of plane curves.

The appearance of the icosahedron at an inflection point of a curve looks as mystical as the icosahedron in Kepler’s law of planetary distances. But the presence of the icosahedron here is not an accident: upon the investigation in 1984 of more complicated systems of rays and fronts, the remaining group H₄ appeared.

We shall give in this appendix a brief description of the theory of singularities of ray systems. Further details may be found in the following references:

V. I. Arnold, Singularities of ray systems, Russian Math. Surveys 38 (1983).

V. I. Arnold, Singularities in variational calculus, J. Soviet Math. 27 (1984), 2679–2713.

O. V. Lyashko, Classification of critical points of functions on a manifold with singular boundary, Funct. Anal. Appl. 17 (1983), 187–193.

O. P. Shcherbak, Singularities of families of evolvents in the neighborhood of an inflection point of the curve, and the group H₃, generated by relections, Funct. Anal. Appl. 17 (1983), 301–303.

A. N. Varchenko and S. V. Chmutov, Finite irreducible groups, generated by relections, are monodromy groups of suitable singularities, Funct. Anal. Appl. 18 (1984), 171–183.

V. I. Arnold, Singularities of solutions of variational problems (Seminar report, in Russian), Uspekhi Mat. Nauk 39, no. 5 (1984), 256.

O. P. Shcherbak, Wave fronts and reflection groups. Russian Math. Surveys, 43, no. 3 (1988).

Itogi Nauki i Tekhniki, Sovremennye Problemy matematiki, Noveishie dostijenia, Moscow, VINITI, vol. 33 (1988). English translation: J. Sov. Math. 27 (1984).

Many of the results which we will describe concern such simple geometric objects that it is surprising that they were not already known in classical times. For instance, the local classification of projections of generic surfaces in three-dimensional space was not discovered until 1981. The number of equivalence classes of germs of projections turned out to be finite—namely 14: neighborhoods of points on generic surfaces can have that many different appearances when viewed from different points in space.

A Symplectic manifolds and ray systems

The space of oriented lines in euclidean space may be identified with the (co)tangent bundle of the sphere (Figure 253), and it thereby obtains a symplectic structure.

Figure 253

The space of oriented lines in euclidean space

More generally, we consider any hypersurface in a symplectic manifold. The skew-orthogonal complement to its tangent space at each point is called the characteristic direction. The integral curves of the field of characteristic directions on a hypersurface are called characteristics. The manifold of characteristics inherits a symplectic structure from the original manifold.

In particular, the manifold of extremals of a general variational problem carries a symplectic structure.

We consider the space of binary forms (homogeneous polynomials in two variables) of a particular odd degree. The group of linear transformations of the plane acts on this even dimensional linear space. Up to multiplication by a constant, there is a unique nondegenerate skew-symmetric form on this space which is invariant under the action of the group SL(2) of linear transformations with determinant equal to 1. This form gives a natural symplectic structure on the manifold of binary forms of each odd degree.

The binary forms in x and y for which the coefficient of x^2k+1 is unity form a hypersurface in the space of all forms. The manifold of characteristics of this hypersurface is naturally identified with the manifold of monic polynomials of even degree x^2k + ⋯ in x. We have thereby defined a natural symplectic structure on this space of polynomials.

The one-parameter group of translations along the x-axis preserves the symplectic structure just introduced. The hamiltonian function for this group is a quadratic polynomial (found already by Hilbert (1893)). The manifold of characteristics for any level surface of this hamiltonian function may be identified with the manifold of monic polynomals of degree 2k − 1 in x for which the sum of the roots is zero. Thus we have a natural symplectic structure on this space of polynomials.

B Submanifolds of symplectic manifolds

The restriction of a symplectic structure to a submanifold is a closed 2-form, but it is not necessarily nondegenerate. For submanifolds in euclidean space there is, in addition to the intrinsic geometry, an extensive theory of extrinsic curvatures. In symplectic geometry, the situation is simpler:

Theorem (A. B. Givental’, 1981). The restriction of the symplectic form to a germ of a submanifold in a symplectic manifold determines the germ up to a symplectic diffeomorphism of the ambient manifold.

An intermediate theorem, in which one uses the values of the symplectic form at all vectors based on the submanifold, not just those tangent to it, was proved earlier by A. Weinstein (1971). Unlike Weinstein’s theorem, Givental’s theorem makes it possible to classify generic submanifold germs in symplectic manifolds: it is sufficient to use the classification of degenerate symplectic structures obtained by J. Martinet (1970) and his successors.

Examples. 1. A generic two-dimensional surface in symplectic space is symplectically diffeomorphic in a neighborhood of each point with the surface

(in Darboux coordinates). 2. On four-dimensional submanifolds, one finds stable curves of elliptic and hyperbolic Martinet singular points with normal forms

[The ellipticity or hyperbolicity of a singular point is determined by the nature of the dynamical system invariantly attached to the submanifold. The divergence-free vector fields in three-dimensional space which arise have entire curves of singular points. The classification of singular lines turns out to be less pathological than the classification of singular points (which is almost as difficult as all of celestial mechanics).]

This concludes a description of the first steps in the theory of symplectic singularities on smooth manifolds.

C Lagrangian submanifolds in the theory of ray systems

We recall that a lagrangian submanifold is a submanifold of symplectic space on which the symplectic structure pulls back to zero and which has the highest possible dimension consistent with this property (equal to half the dimension of the ambient manifold).

Examples. 1. Each fibre of a cotangent bundle is lagrangian. 2. The manifold of all oriented normals to a smooth submanifold (of any dimension) in euclidean space is a lagrangian submanifold of the space of lines. 3. The manifold of all polynomials x^2m + ⋯ divisible by x^m is lagrangian.

A lagrangian fibration is a fibration all of whose fibres are lagrangian.

Examples. 1. The cotangent fibration is lagrangian. 2. The Gauss fibration from the space of lines in euclidean space to the unit sphere of directions is lagrangian.

All lagrangian fibrations of a fixed dimension are locally (on a neighborhood of a point in the total space) symplectically diffeomorphic.

A lagrangian mapping is the projection of a lagrangian submanifold to the base of a lagrangian fibration, i.e., a triple V → E → B, where the first arrow is an immersion onto a lagrangian manifold and the second arrow is a lagrangian fibration.

Examples. 1. A gradient mapping q ↦ ∂S/∂q is lagrangian. 2. The normal mapping which maps each normal vector of a submanifold in euclidean space to its tip is lagrangian. 3. The Gauss mapping which takes each point of a transversely oriented hypersurface in euclidean space to the unit vector at the origin in the direction of the normal is lagrangian. (The corresponding lagrangian manifold consists of the normals themselves.)

An equivalence of lagrangian mappings is a fibre-preserving symplectic diffeomorphism of the total spaces of the fibrations which takes the first lagrangian manifold to the second.

The set of critical values of a lagrangian mapping is called a caustic. The caustics of equivalent mappings are diffeomorphic.

Example. The caustic of the normal mapping of a surface is the envelope of the family of normals, i.e., the focal surface (surface of centers of curvature).

Every lagrangian mapping is locally equivalent to a gradient (or normal, or Gauss) mapping. The singularities of generic gradient (or normal, or Gauss) mappings are the same as those for arbitrary generic lagrangian mappings. The simplest of these are classified by the reflection groups A_k, D_k, E₆, E₇, E₈ (see Appendix 12).

Example. We consider a medium of dust particles moving inertially, with their initial velocities forming a potential field. After time t, the particle at x moves to x + t(∂S/∂x). We thereby obtain a one-parameter family of smooth mappings ℝ³ → ℝ³

These mappings are lagrangian. In fact, a potential field of velocities gives a lagrangian section of the cotangent bundle. The phase flow of Newton’s equations preserves the lagrangian property. For large t, though, our lagrangian manifold is no longer a section: its projection on the base develops singularities. The caustics of the corresponding lagrangian mappings are places where the density of particles has become infinite.¹²⁵ According to Ya. B. Zel’dovich (1970) an analogous model (taking into account gravity and the expansion of the universe) describes the formation of large scale nonhomogeneities in the distribution of matter in the universe.

According to the theory of Lagrange singularities, the newborn caustics have the form of elliptic saucers (Figure 254) (after time t from the moment of birth, a saucer has length of order t^½. depth of order t, and thickness of order t^3/2). The birth of a saucer corresponds to A₃. The metamorphoses of caustics which occur in generic one-parameter families of lagrangian mappings are shown in Figure 255 (V. I. Arnold, Wave fronts evolution and equivariant Morse lemma, Comm. Pure Appl. Math. 6 (1976), 319–335).

Figure 255

Perestroikas of caustics in 3-space

Theorem (1972). The germs at each point of generic lagrangian mappings between manifolds of dimension ≤ 5 are simple (i.e., having no moduli) and stable. The simple stable germs of lagrangian mappings are classified by the reflection groups A, D, E, in a way which will be explained below.

D Contact geometry and systems of rays and wave fronts

We recall that a contact structure on an odd-dimensional smooth manifold is a nondegenerate field of tangent hyperplanes. The specific condition of nondegeneracy is inessential here, since near generic points, all generic hyperplane fields on manifolds of a fixed odd dimension are diffeomorphic (Darboux’s theorem for contact structures, Appendix 4).

Examples. 1. The manifold of contact elements of a smooth manifold consists of all its tangent hyperplanes. The rate of change of a contact element belongs to the contact structure if and only if the rate of change of the point of contact (i.e., the point where the hyperplane is tangent to the manifold) belongs to the contact element itself. 2. The manifold of 1-jets of functions y = f(x) has a contact structure dy = p dx (p = ∂f/∂x for the 1-jet of a function f).

The extrinsic geometry of a submanifold of contact space is locally determined by the intrinsic geometry (Givental’s theorem on contact structures).

Integral submanifolds of a contact structure are called Legendre (or legendrian) submanifolds if they have the largest possible dimension.

Examples. 1. The set of all contact elements tangent to a fixed submanifold (of any dimension) is a Legendre submanifold. 2. In particular, all contact elements at a given point form a Legendre submanifold (a fibre of the bundle of contact elements). 3. The set of all the 1-jets of a single function is a Legendre submanifold in the space of 1-jets.

A fibration is called a Legendre fibration if its fibres are Legendre submanifolds.

Examples. 1. The projective cotangent fibration (attaching each contact element to its point of contact) is Legendre. 2. The fibration of 1-jets of functions over the 0-jets (forgetting the derivative) is Legendre.

All Legendre fibrations of a fixed dimension are locally contact diffeomorphic (in a neighborhood of a point in the total space of the fibration).

The projection of a Legendre submanifold on the base of a Legendre fibration is called a Legendre mapping. The image of a Legendre mapping is called its front.

Examples. 1. The Legendre transformation: A hypersurface in projective space may be lifted to the space of contact elements of projective space as a Legendre submanifold. The manifold of contact elements of projective space is also fibred over the dual projective space. (The fibration assigns to each contact element the plane containing it.) This is a Legendre fibration. The projection of the lifted Legendre submanifold maps it onto the hypersurface which is projectively dual to the original one. Thus, the projective dual of a smooth hypersurface is the front of a Legendre mapping. 2. Frontal mappings: Laying out a segment of length t on each normal to a hypersurface in euclidean space, we obtain a Legendre mapping whose front is equidistant from the given hypersurface.

Every Legendre mapping is locally equivalent to a Legendre transformation, as well as to a frontal mapping. The theory of Legendre singularities thus coincides exactly with the theory of singularities of Legendre transformations and of frontal mappings. Equivalence, stability, and simplicity of Legendre mappings are defined just as the lagrangian case.

Theorem (1973). The germs, at all points, of generic Legendre mappings between manifolds of dimension ≤ 5 are simple and stable. The simple and stable germs of Legendre mappings are classified by the groups A, D, E: their fronts are locally diffeomorphic (in the complex domain) to the manifolds of non-regular orbits of the corresponding reflection groups.

Example. The only singularities of a typical wave front in three-dimensional space are (semicubic) cuspidal curves (A₂) and “swallowtails” (A₃, Figure 256; near such a point, the front is diffeomorphic to the surface formed by the polynomials with multiple roots in the space of polynomials x⁴ + ax² + bx + c). Of course, there may also be transverse intersections of branches of fronts of the types just described.

Figure 256

Singularities of wave fronts

Remark. The real forms of simple singularities of fronts may also be described in terms of reflection groups. E. Looijenga has shown that the real components in the complement of a simple germ of a front may be identified with the conjugacy classes of involutions (elements of order 2) in the normalizer of the reflection group, conjugacy being taken with respect to the reflection group itself. (See E. Looijenga, The discriminant of a real simple singularity, Compositio Math. 37 (1978), 51–62.)

E Applications of contact geometry to symplectic geometry

All lagrangian singularities may be obtained from Legendre singularities, if one realizes the latter by projections of Legendre submanifolds of the space of 1-jets of functions onto the space of 0-jets. If one forgets the value of each function, the space of 1 jets is projected onto phase space (i.e., the cotangent bundle); a Legendre submanifold in the first space projects to a lagrangian submanifold in the second. In particular, the caustic of a lagrangian mapping is the image of the cuspidal edge of the front of a Legendre mapping under a projection with one-dimensional fibres.

Theorem (O. V. Lyashko, 1979). All holomorphic vector fields transverse to the front of a simple singularity are locally equivalent under holomorphic diffeomorphisms preserving the front.

Example. A generic vector field in the neighborhood of the most singular point of a swallowtail {x⁴ + ax² + bx + c = (x + d)²...} is equivalent, by a holomorphic diffeomorphism preserving the swallowtail, to the normal form ∂/∂c (Figure 257).

Figure 257

The normal form of a vector field at the swallowtail

The reduction of various objects to normal form, by a diffeomorphism preserving a wave front or caustic, is a basic technique for studying the geometry of systems of rays and fronts. For instance, the study of the metamorphoses of moving wave fronts is based on the following result, which is “dual” to the previous one.

Theorem (1976). All generic holomorphic functions equal to zero at the most singular point of a simple singularity of a front are locally equivalent under holomorphic diffeomorphisms which preserve the front.

Example. In a neighborhood of the most singular point of a swallowtail, a generic function may be reduced, by a diffeomorphism preserving the swallowtail, to the normal form a.

This theorem is a special case of the equivariant Morse lemma. It is applied in the following way. The instantaneous wave fronts together form a “large front” in space-time. “Time” is a function on space-time. We reduce this function to normal form by a diffeomorphism which preserves the front, and we thereby obtain a normal form for the metamorphoses of the instantaneous fronts. The metamorphoses of fronts in ℝ³ are shown in Figure 258. The problem of describing the metamorphoses of caustics in generic one-parameter families (Figure 255) is solved in exactly the same way. In this case, the time function is reduced to normal form by a transformation of space-time which preserves the “large caustic.” If the dimension of space-time is no larger than 4, then all the singularities of the large caustic are of types A and D.

Figure 258

Perestroikas of wave fronts

The caustics of lagrangian singularities in the A series differ from the wave fronts in the A series only by a shift of 1 unit in the index. The same is therefore true for their metamorphoses.

The caustics in the D series are not the same as the fronts. The normal forms for a generic time function in the neighborhood of a caustic singularity of type D were found by V. M. Zakalyukin (1975). The topological normal forms for the time function are especially simple:

Here, the large caustic D_μ is the set of λ for which

has a degenerate critical point, where

The reduction to normal form of the germ of the time function is accomplished by a local homeomorphism of the space ℝ^μ−1 (ℂ^μ−1), which preserves the large caustic and which is smooth everywhere except at 0 (V. I. Bakhtin, 1984).

J. Nye (1984) has noticed that not all metamorphoses of caustics and fronts may be realized by the motion of a front under an equation of eikonal (or Hamilton-Jacobi) type. For example, the caustic of a ray system cannot have the form of “lips” with two cusps (although this is possible for lagrangian caustics). The point is that the inclusion of a lagrangian or Legendre manifold in the hypersurface given by a Hamilton-Jacobi or eikonal equation imposes topological restrictions on the coexistence, and thus on the metamorphoses, of singularities, even though the individual singularities may be realized on hypersurfaces. This is namely the case when the level surface of the hamiltonian is locally nondegenerately convex in the momentum variables.

The vector fields generating the diffeomorphisms preserving a front are those which are tangent to it. The study of these vector fields leads to an unusual “convolution” operation on the invariants of a reflection group. To a pair of invariants (functions on the orbit space) we associate a new invariant—the scalar product of the gradients of the functions (pulled back from the orbit space to the original euclidean space).

The linearization of this operation defines a symmetric bilinear mapping from each cotangent space of the orbit space into itself.

Theorem (1979). The linearized convolution of invariants of a reflection group is isomorphic as a bilinear operation to the operation on the local algebra of the corresponding singularity given by the formula (p, q) ↦ S(p · q), where S = D + (2/h)E, D is Euler’s quasi-homogeneous derivation, and h is the Coxeter number.

In 1981, A. N. Varchenko and A. B. Givental’ (who also proved the theorem above for the exceptional groups) found a far-reaching generalization of this result. They replaced the euclidean structure by the intersection form of the underlying period mapping, which arises from a family of holomorphic differential forms on the fibres of the Milnor fibration of a versal family of functions. A nondegenerate intersection form defines (depending on the parity of the number of variables) either a locally flat pseudo-euclidean metric with a standard singularity on the Legendre front or a symplectic structure which extends holomorphically to the front.

Example. The space of monic polynomials with odd degree and sum of the roots equal to zero acquires yet another symplectic structure. Relative to this structure, the submanifold of polynomials with the maximal number of double roots turns out to be lagrangian.

When the intersection form is indefinite, the symplectic structure is replaced by a Poisson structure (see Appendix 14).

F Tangential singularities

The first applications of the theory of lagrangian and Legendre singularities, around which the theory itself developed (∼ 1966), concerned short wave asymptotics in the form of the asymptotics of oscillatory integrals. A survey of these applications (including the determination of uniform estimates for oscillatory integrals when saddle points meet, the calculation of asymptotics using Newton polyhedra, the construction of mixed Hodge structures, applications to number theory and the theory of convex polyhedra, and estimates of the index of singular points of vector fields and the number of singular points of algebraic surfaces) may be found in the book:

V. I. Arnold, A. N. Varchenko, and S. M. Gusein-Zade, “Singularities of Differentiable Mappings,” Vol. 2, Monodromy and Asymptotics of Integrals, Moscow, Nauka, 1984. English translation: Birkhäuser, 1988.
and in the paper

V. I. Arnold, Singularities of ray systems, Proceedings of the International Congress of Mathematicians, August 16–24, 1983, Warsaw.

Here we shall present other applications of the theory of lagrangian and Legendre singularities to the study of the configurations of projective manifolds and tangential planes of various dimensions. One is led to such problems from variational problems with one-sided constraints (such as the obstacle problem), as well as from the study of Nekhoroshev’s exponent of roughness for unperturbed hamiltonian functions (see Appendix 8).

We consider a generic surface in three-dimensional projective space (Figure 259). The curve of parabolic points (p) divides the surface into a domain of elliptic points (e) and a domain of hyperbolic points (h); the latter domain contains the curve of inflection points of the asymptotic lines (f), with its points of biinflection (b), self-intersection (c), and tangency to the parabolic curve (t),

Figure 259

Projective classification of points of a surface

From this classification of points, one may derive both estimates of curvature exponents and the following classification of projections.

Theorem (O. A. Platonova and O. P. Shcherbak, 1981). Every projection from a point outside a generic surface in ℝP³ is locally equivalent at each point of the surface to the projection along lines parallel to the x-axis of a surface z = f(x, y), where f is one of the following 14 functions:

By a projection we mean here a diagram V → E → B consisting of an embedding and a fibration; an equivalence of projections is then a 3 × 2 commutative diagram whose vertical arrows are diffeomorphisms.

The only singularities of the projection from a generic center are folds and Whitney tucks. The tucks appear when the projection is along an asymptotic direction. The remaining singularities are visible only from special points. The finiteness of the number of singularities of projections (and therefore the number of singularities of apparent contours) was not obvious before the result above was obtained, since there is a continuum of inequivalent singularities for generic three-parameter families of mappings from a surface to the plane.

The regions of space from which the generic surface has a different appearance, as well as the corresponding views of the surface, are shown in Figure 260 (for the most complicated cases).

Figure 260

The peres troikas of the visible contours of surfaces

The hierarchy of tangential singularities becomes more comprehensible when it is reformulated in terms of symplectic and contact geometry. R. Melrose (1976) observed that the rays tangent to a surface are described by a pair of hypersurfaces in symplectic phase space: one of them, p² = 1, is defined by the metric; the other is defined by the surface.

A significant part of the geometry of asymptotic lines may be reformulated in terms of this pair of hypersurfaces. In this way, we may transfer concepts from the geometry of surfaces to the more general case of arbitrary pairs of hypersurfaces in symplectic space, and thereby use the geometric intuition gained from surface theory to study general variations problems with one-sided phase constraints.

Let Y and Z be hypersurfaces in the symplectic space X which intersect transversely along a submanifold W. Projecting Y and Z onto their manifolds of characteristics, we obtain the hexagonal diagram

in which Σ is the common manifold of critical points for the projections of W on U and V.

Example. Let X be the {q, p} phase space for a free particle in euclidean space (q is the position of the particle, p its momentum). Y is the manifold of unit vectors (p² = 1). Z is the manifold of vectors at the boundary (q belongs to a hypersurface Γ). Then U is the manifold of rays, V is the tangent bundle of the boundary Γ, W is the manifold of unit vectors at the boundary, and Σ is the unit tangent bundle of the boundary.

If a unit tangent vector to the boundary is not asymptotic, then both of the projections W → U and W → V have fold singularities at this point. Each of them defines an involution on W which fixes Σ.

Example. There are two involutions, σ and τ, on the manifold of tangent vectors along a convex plane curve W (Figure 261). Their product is Birkhoff’s billiard mapping (1927).

Figure 261

The two involutions generating the billiard mapping

Using pairs of involutions, Melrose found a local normal form for pairs of hypersurfaces in symplectic space which are in the situation just described. (This was for the C^∞ case; in the analytic case, one usually obtains divergent series, just as in the theory of Ecalle (1975) and Voronin (1981) on resonant dynamical systems.)

For more complicated singularities (for example, near asymptotic directions), pairs of hypersurfaces have moduli. For the two simplest singularity types after the fold, it is possible to put in normal form (at least formally) the pair consisting of the first hypersurface and its intersection with the second. This allows us to study, in a neighborhood of an asymptotic or biasymptotic unit tangent vector to the boundary, the mapping which assigns the ray containing it to each unit vector at the boundary. The critical values of this mapping in the symplectic space of lines are described by the following result, since the manifold of tangent rays is locally diffeomorphic near a biasymptotic ray to the product of a swallowtail and a line.

Theorem (1981). All the generic symplectic structures in the neighborhood of a point in the direct product of a swallowtail and a linear space are formally diffeomorphic by local dιffeomorphisms preserving the product structure.

G The obstacle problem

We consider an obstacle bounded by a smooth surface in euclidean space. The obstacle problem consists of the study of the singularities of the function defined outside the obstacle whose value at each point is the length of the shortest path remaining outside the obstacle and joining the point to a fixed initial set. This variational problem on a manifold with boundary is unsolved even in three-dimensional space.

Each minimizing path consists of segments of straight lines and segments of geodesics on the surface of the obstacle (Figure 262). We consider therefore a system of geodesics on the surface of the obstacle, orthogonal to a fixed front. The system of all rays tangent to these geodesics forms a lagrangian variety in the symplectic space of lines, just as any system of extremals for a variational problem. But while in an ordinary variational problem this lagrangian variety is a smooth manifold (even at caustics), the lagrangian variety arising in the obstacle problem has singularities. From the last theorem (in the previous section), one obtains:

Figure 262

An extremal of the obstacle problem

Corollary (1981). The lagrangian variety of rays in a generic obstacle problem has a semicubic cuspidal edge along each asymptotic ray and a singularity diffeomorphic to an open swallowtail at each biasymptotic ray.

The open swallowtail is the surface in the four-dimensional space of monic polynomials x⁵ + Ax³ + Bx² + Cx + D formed by the polynomials with triple roots. Differentiation of the polynomials turns the open swallowtail into an ordinary one; when the swallowtail is opened, the cuspidal edge is retained, but the self-intersection disappears (Figure 263).

Figure 263

The open (“unfurled”) swallowtail

Theorem (1981). In the generic motion of a wave front, the cuspidal edges of the instantaneous fronts sweep out an open swallowtail in four-dimensional space-time (over the usual swallowtail caustic).

Theorem (O. P. Shcherbak, 1982). Consider a generic one-parameter family of space curves and suppose that, for some value of the parameter (time), one of the curves has a point of double flatness (of type 1, 2, 5). Then the projective duals of these curves form a surface in space-time which is locally dιffeo-morphic to the open swallowtail.

The open swallowtail is the first member of a whole series of singularities. Consider, in the space of monic polynomials xⁿ + λ₁ xⁿ⁻¹ + ⋯ + λ_n−1. the set of polynomials with a root of fixed comultiplicity k, (x − α)^n−k(x^k + ⋯). Differentiation of polynomials preserves the comultiplicity of roots.

Theorem (A. B. Givental’, 1981). The sequence of sets of polynomials of fixed comultiplicity becomes stabilized as the degree grows, beginning with degree n = 2k + 1 (i.e., when the self-intersections are eliminated).

Example. The open swallowtail is the first stable variety over the ordinary swallowtail.

The appearance of swallowtails in the obstacle problem was axiomatized by Givental’ (1982) in his theory of triads.

Definition. A symplectic triad (H, L, l) consists of a smooth hypersurface H in a symplectic manifold and a lagrangian submanifold L which is tangent to H to first order along a hypersurface l of L.

The lagrangian variety generated by the triad is the image of L in the manifold of characteristics of the hypersurface H.

Example 1. Consider, in the problem of bypassing an obstacle with boundary Γ ⊂ ℝⁿ, the distance along geodesics from an initial front as a function s: Γ → ℝ. The manifold L consisting of all extensions of the 1-form ds from Γ to ℝⁿ, together with the hypersurface H: p² = 1, forms a triad. The lagrangian variety generated by this triad is precisely the variety of rays tangent to the geodesics in our system of extremals on Γ.

Example 2. In the symplectic manifold of monic polynomials

with even degree d = 2m, the polynomials divisible by x^m form a lagrangian submanifold L.

Consider the hamiltonian for translation along the x-axis. [This polynomial in λ is equal to

The hypersurface h = 0 is tangent to the lagrangian submanifold L along the subspace l of polynomials divisible by x^m+1 thus forming a triad. The lagrangian variety generated by this triad is an open swallowtail of dimension m − 1 (the set of polynomials x^d−1 + a₁ x^d−3 + ⋯ + a_d−2 having a root of multiplicity greater than half the degree).]

Theorem (A. B. Givental’, 1982). The triads in Example 2 are stable. Every germ of a generic triad is diffeomorphic to a germ of a triad in Example 2.

Corollary. The variety of rays tangent to the geodesics in the system of extremals of a generic obstacle problem is locally symplectically diffeomorphic to a lagrangian open swallowtail.

In contact geometry, there are two kinds of Legendre varieties associated to obstacle problems: varieties of contact elements of fronts and varieties of 1-jets of time functions. The first of these are diffeomorphic to lagrangian open swallowtails; the second are diffeomorphic to cylinders over the first.

Example. Consider the problem of bypassing an obstacle in the plane which is bounded by a curve with an inflection point. The fronts, which are the evolvents of the curve, have two kinds of singularities: ordinary cusps (of order 3/2) on the curve itself and singularities of order 5/2 on the tangent line through the inflection point (Figure 264). Over points of the boundary curve, the Legendre variety is nonsingular, while over points on the tangent line through the inflection point it has a cuspidal edge of order 3/2.

Figure 264

The evolvents of a cubical parabola

Theorem (1978). In the space of contact elements to the plane, fibered over the plane itself, the surface consisting of the contact elements of the evolvents of a generic curve near a point of inflection is locally equivalent by a fiber-preserving diffeomorphism to the surface consisting of all polynomials with multiple roots in the space of polynomials x³ + ax² + bx + c, fibered into lines parallel to the b-axis.

This surface (Figure 265), together with the surface c = 0 representing the contact elements along the boundary curve, forms a variety which is diffeomorphic to the set of irregular orbits for the reflection group B₃. This observation led to the theory of boundary singularities (1978).

Figure 265

The surface of contact elements of the evolvents

Example (I. G. Shcherbak, 1982). Consider a generic curve on a surface in three-dimensional euclidean space. At certain points, the direction of the curve coincides with principal curvature directions of the surface. It follows from the theory of lagrangian boundary singularities that the Weyl group F₄ is connected with each such point: the focal points of the surface (A₂), focal points of the curve (A′₂), and normals to the surface at points of the curve (B₂) together form an F₄ caustic near the center of curvature (Figure 266).

Figure 266

The caustic singularity F₄

We will not dwell here on the theory of boundary singularities, but it is worth mentioning the “Lagrange duality” relating a function and its restriction to the boundary (up to stable equivalence): this may be thought of as a modern version of the Lagrange multiplier rule (I. G. Shcherbak, 1982).

Returning to inflection points of plane curves, we consider the graph of the multiple-valued time function in an obstacle problem. The level curves of this function are the evolvents of the obstacle boundary. Therefore, the graph of this function has the form (shown in Figure 267) of a surface with two cuspidal edges (of orders 3/2 and 5/2). When I showed this surface to A. B. Givental’, he recognized O. V. Lyashko’s drawing of the singular orbit Σ of the group H₃ (symmetries of the icosahedron). Givental’s conjecture was soon verified:

Figure 267

The discriminant of H₃

Theorem (O. P. Shcherbak, 1982). The graph of the (multiple-valued) time function in the problem of bypassing an obstacle bounded by a generic plane curve is formally diffeomorphic near an inflection point of the curve to the variety Σ.

The proof of this theorem uses:

Theorem (O. V. Lyashko, 1981). The variety Σ is diffeomorphic to the variety of polynomials x⁵ + ax⁴ + bx² + c having a multiple root.

Lyashko’s theorem describes the variety of singular orbits for the group H₃ as the union of the tangents to the curve (t, t³, t⁵), while Shcherbak’s theorem applies to any curve of the form (t + o(t), t³ + o(t³), t⁵ + o(t⁵)).

The same singularity appears on a generic front at the point of tangency of a asymptotic ray with the bounding surface of an obstacle in ℝ³.

Finally, we describe a variational problem leading to the singularity H₄ (after O. P. Shcherbak).

The group H₄ consists of the symmetries of a regular polyhedron in ℝ⁴. Its 120 vertices lie on S³ ≈ SU(2) and form the binary icosahedral group (the binary group being the inverse image of the symmetry group of the icosahedron under the double covering S³ → SO(3)).

Consider the problem of bypassing an obstacle bounded by a smooth surface in three-dimensional euclidean space. The extremals beginning at a fixed point outside the obstacle generate a pencil (one-parameter family) of geodesics on the surface. A time function is the distance from a fixed initial manifold (e.g., a point) along stationary (not necessarily minimizing) paths consisting of arcs of geodesics and their tangents, considered as a (multiple-valued) function of the terminal point in space (solution of the Hamilton-Jacobi equation).

Theorem (O. P. Shcherbak, 1984). For a generic obstacle, the graph of the time function at a point which is focal for the pencil along an asymptotic tangent at a parabolic point of the surface is locally diffeomorphic to the variety Σ of singular orbits of the group H₄.

An explicit parametrization of Σ is:

The group H₄ is related to a four-dimensional subspace of the base space of the versal deformation of E₈ (this connection is explained in Remark 7, §9 of the paper by V. I. Arnold, Indices of singular points of 1-forms on manifolds with boundary, convolution of invariants of reflection groups, and singular projections of smooth surfaces, Russian Math. Surveys 34:2 (1979), 1–42).

Corresponding to this four-dimensional subspace, there is an embedding of the local algebra D₄ into the local algebra E₈, which induces on the former the same grading which is given by the convolution of invariants of H₄. O. P. Shcherbak has shown that this relationship establishes yet another description of the variety of singular orbits of H₄:

Theorem. Consider those values of λ for which the curve x⁵ + y³ + λ₁ x³y + λ₂x³ + λ₃y + λ₄ = 0 is singular. One of the irreducible components of this three-dimensional hypersurface in λ-space is diffeomorphic to the variety of singular orbits of the group H₄.

The caustic and three typical sections of the variety of singular orbits of H₄ are shown in Figure 268 and 269. See O. P. Shcherbak, Wavefronts and reflection groups, Russian Math. Surveys, 43 (1988).

Figure 268

The caustic singularity H₄

Figure 269

The front perestroika H₄

¹²⁵

The relation between caustics and dust-like media was first discovered by Lifshitz, Sudakov, and Khalatnikov: see the survey by E. M. Lifshitz and I. M. Khalatnikov, Investigations in relativistic cosmology, Adv. Phys. 12 (1963), 185.

Bibliography of Symplectic Topology

Arnold, V.I. Sur une propriété topologique des applications globalement canoniques de la mécanique classique. C.R. Acad. Sci. Paris 261 (1965), 3719–3722.

Arnold, V.I. On a characteristic class entering the quantization conditions. Funct. Anal. Appl. 1:1 (1967), 1–14.

Arnold, V.I. A comment on “Sur un théorème de géométrie”. In: Izbrannye trudy A. Puankaré. Moscow, Nauka, 1972, vol. 2, pp. 987–989.

Arnold, V.I. Lagrange and Legendre cobordisms. Funct. Anal. Appl. 14:3 (1980), 1–13; 14:4 (1980), 8–17.

Arnold, V.I. The Sturm theorems and symplectic geometry. Funct. Anal. Appl. 19 (1985), 251–259.

Arnold, V.I. First steps in symplectic topology. Russian Math. Survey 41:6 (1986), 1–21.

Arnold, V.I. On functions with mild singularities. Funct. Anal. Appl. 23:3 (1989), 1–10.

Arnold, V.I., and Givental, A.B. Symplectic geometry. In: Dynamical Systems IV (Enc. of Math. Sc. vol. 4). Berlin-Heidelberg-New York, Springer, 1990, pp. 1–136.

Arnold, V.I. Sur les propriétés topologiques des projections lagrangiennes en géométrie symplectique des caustiques. Preprint 9320, CEREMADE, Université Paris-Dauphine, 1993, pp. 1–9 (Cahiers de Mathématiques de la Décision, 14/6/93).

Arnold, V.I. Some remarks on symplectic monodromy of Milnor fibration. In: Progress in Math., A Floer Memorial Volume. Basel-Boston, Birkhäuser, 1993.

Arnold, V.I. Invariants and perestroikas of plane fronts. Trudy (Proceedings) Steklov Math. Inst., Russ. Acad. of Sc., Vol. 209, 1985.

Arnold, V.I. On topological properties of Legendre projections in contact geometry of wave fronts. Algebra and Analysis. S. Petersbourg Math. J. 6:3 (1994).

Arnold, V.I. Symplectic geometry and topology. In: Trends and Perspectives in Modern Mathematics. Cambridge Univ. Press, to appear (Preprint MIT, 1993, 68 pp).

Arnold, V.I. Topological Invariants of Plane Curves and Caustics. J.B. Lewis Memorial Lectures, Rutgers, 1993, 106 pp; AMS University Lecture Series, Vol. 5, Providence, AMS, 1994, 60 pp.

Arnold, V.I., ed. Singularities and Curves (Advances in Sov. Math.), Providence, AMS, 1994.

Atiyah, M. New invariants of 3- and 4-manifolds. In: The Mathematical Heritage of H. Weyl. Durham, NC, 1987 (Sympos. Pure Math., vol. 48). Providence, AMS, 1988, pp. 285–289.

Audin, M. Quelques calculs en cobordisme lagrangien. Ann. Inst. Fourier 35:3 (1985), 159–194.

Audin, M. Cobordismes d’immersions lagrangiennes et legendriennes (Travaux en Cours, vol. 20). Hermann, 1987, 203 pp.

Audin, M. Fibrés normaux d’immersion en dimension double, points doubles d’immersions lagrangiennes et plongements totalement réels. Comm. Math. Helvet. 63 (1988), 593–623.

Audin, M. Hamiltoniens périodiques sur les variétés symplectiques compactes de dimension 4. In: Lect. Notes in Math. 1416. Berlin-Heidelberg-New York, Springer, 1990, pp. 1–25.

Audin, M. The Topology of Torus Actions on Symplectic Manifolds. Basel, Birkhäuser, 1991.

Banyaga, A. Sur la structure du groupe des difféomorphismes qui préservent une forme symplectique. Comm. Math. Helv. 53 (1978), 174–227.

Bennequin, D. Entrelacements et équations de Pfaff. Astérisque 107–108 (1983), 83–161.

Bennequin, D. Quelques remarques simples sur la rigidité symplectique. In: Géométrie Symplectique et de Contact: Autour du Théorème de Poincaré-Birkhoff, P. Dazord and N. Desolneux-Moulis, eds. Paris, Herman, 1984, pp. 1–50.

Bialy, M.L., and Polterovich, L.V. Lagrangian singularities of invariant tori of hamiltonian systems with two degrees of freedom. Invent. Math. 97:2 (1989), 291–303.

Bialy, M., and Polterovich, L. Hamiltonian diffeomorphisms and Lagrangian distributions. Geom. Funct. Anal. 2 (1992), 173–21.

Bialy, M., and Polterovich, L. Optical Hamiltonian functions. Preprint 1992, 20 p.

Boothby, W.M., and Wang, H.C. On contact manifolds. Ann. Math. 68 (1958), 721–734.

Calabi, E. On the group of automorphisms of a symplectic manifold. In: Problems in Analysis (Symposium in honour of S. Bochner). Princeton Univ. Press, 1970, 1–26.

Chaperon, M. Quelques questions de géométrie symplectique [d’après, entre autres, Poincaré, Arnold, Conley et Zehnder], Séminaire Bourbaki 1982–83. Astérisque 105–106 (1983), 231–249.

Chaperon, M. Une idée du type “géodésiques brisées” pour les systèmes hamiltoniens. C.R. Acad. Sci. Paris 298 (1984), 293–296.

Chaperon, M. An elementary proof of the Conley-Zehnder theorem in symplectic geometry. In: Dynamical Systems and Bifurcations, B.L.J. Braaksma, H.W. Broer, F. Takens, eds. (Lecture Notes in Math. 1125) Berlin-Heidelberg-New York, Springer, 1985, 1–8.

Chaperon, M. Familles génératrices. Cours à l’école d’été Erasmus de Samos (1990), Publication Erasmus, 1993.

Chekanov, Yu.V. Lejandrova teoriya Morsa. Uspekhi Mat. Nauk 42:4 (1987),139–141.

Chekanov, Yu.V. Caustics in geometrical optics. Funct. Anal. Appl. 20 (1986), 223–226.

Chekanov, Yu.V. Lagrangian tori in a symplectic vector space and global symplec-tomorphisms. Bochum Preprint 169, 1993, 13 p (to appear in Math. Z).

Conley, C., and Zehnder, E. The Birkhoff-Lewis fixed point theorem and a conjecture of V.I. Arnold. Invent. Math. 73 (1983), 33–49.

Duistermaat, J.J. On the Morse index in variational calculus. Adv. Math. 21 (1976), 173–195.

Duistermaat, J.J. On global action-angle variables. Comm. Pure Appl. Math. 33 (1980), 687–706.

Ekeland, I., and Hofer, H. Symplectic topology and Hamiltonian dynamics. Math. Z. 200 (1988), 355–378.

Ekeland, I., and Hofer, H. Symplectic topology and Hamiltonian dynamics II. Math. Z. 203 (1990), 553–567.

Eliashberg, Y. Rigidity of symplectic and contact structures, Preprint, 1981.

Eliashberg, Y. Cobordisme des solutions de relations différentielles. In: Sem. Sud-Rhodanien de Géom., tome 1, P. Dazord and N. Desolneux-Moulis, eds. Hermann, 1984, pp. 17–32.

Eliashberg, Y. The complexification of contact structures on a 3-manifold. Uspekhi Mat. Nauk 6:40 (1985), 161–162.

Eliashberg, Y. Classification of overtwisted contact structures on 3-manifolds. Invent. Math. 98 (1989), 623–637.

Eliashberg, Y. Filling by holomorphic discs and its applications. In: Geometry of Low-Dimensional Manifolds, Vol. 2, S.K. Donaldson and C.B. Thomas, eds. (London Math. Soc. Lect. Notes Ser. 151) Cambridge Univ. Press, 1990, pp. 45–67.

Eliashberg, Y., and Gromov, M. Convex symplectic manifolds. Proceedings of Symposia in Pure Mathematics, E. Bedford et al. (eds), 52:2 (1991), 135–162.

Eliashberg, Y., and Polterovich, L. Bi-invariant metrics on the group of Hamiltonian diffeomorphisms. Preprint, 1991.

Eliashberg, Y. New invariants of open symplectic and contact manifolds. J. Amer. Math. Soc. 4 (1991), 513–520.

Eliashberg, Y., and Ratiu, T. The diameter of the symplectomorphism group is infinite. Invent. Math. 103 (1991), 327–340.

Eliashberg, Y. On symplectic manifolds with some contact properties. J. Diff. Geometry 33 (1991), 233–238.

Eliashberg, Y., and Hofer, H. Unseen symplectic boundaries. Preprint, 1992, 16 pp.

Eliashberg, Y., and Polterovich, L. Unknottedness of Lagrangian surfaces in symplectic 4-manifolds. Preprint, l992, 9 pp.

Eliashberg, Y., and Polterovich, L. New applications of Luttinger’s surgery. Preprint, 1992, 12 pp.

Eliashberg, Y. Contact 3-manifolds twenty years since J. Martinet’s work. Ann. Inst. Fourier 42 (1992), 165–191.

Eliashberg, Y., and Hofer, H. An energy-capacity inequality for the symplectic holonomy of hypersurfaces flat at infinity. Preprint, 1992, 8 pp.

Eliashberg, Y. Topology of 2-knots in ℝ⁴ and symplectic geometry. In: Progress in Math., A. Floer Memorial Volume. Boston-Basel, Birkhäuser, 1993.

Eliashberg, Y. Legendrian and transversal knots in tight contact 3-manifolds. In: Topological Methods in Modern Mathematics. Houston, Publish or Perish, 1993, pp. 171–193.

Eliashberg, Y. Classification of contact structures on ℝ³. Duke Math. J. Intern. Math. Res. Notes N°3 (1993), 87–91.

Eliashberg, Y., and Hofer, H. Towards the definition of symplectic boundary. Preprint, 1993.

Floer, A. Proof of the Arnold conjecture and generalizations to certain Kaehler manifolds. Duke Math. J. 53 (1986), 1–32.

Floer, A. Morse theory for Lagrangian intersections. J. Diff: Geom. 28 (1988), 513–547.

Floer, A. The unregularized gradient flow for the symplectic action. Comm. Pure Appl. Math. 41 (1988), 775–813.

Floer, A. A relative Morse index for the symplectic action. Comm. Pure Appl. Math. 41 (1988), 393–407.

Floer, A. An instanton invariant for 3-manifolds. Comm. Math. Phys. 118:2 (1988), 215–240.

Floer, A. Witten’s complex in infinite dimensional Morse theory. J. Diff. Geom. 30 (1989), 207–221.

Floer, A. Cuplength estimates for Lagrangian intersections. Comm. Pure Appl. Math. 42 (1989), 335–356.

Floer, A. Symplectic fixed points and holomorphic spheres. Comm. Math. Phys. 120 (1989), 575–611.

Floer, A., and Hofer, H. Symplectic homology I: open sets in ℂⁿ. Preprint, 1992.

Floer, A., Hofer, H., and Wysocki, K. Applications of symplectic homology I. Preprint, 1992.

Fortune, B., and Weinstein, A. A symplectic fixed point theorem for complex projective spaces. Bull. Am. Math. Soc. 12:1 (1985), 128–130.

Fuchs, D.B. Maslov-Arnold characteristic classes. Sov. Math. Dokl. 9 (1968), 96–99.

Ginzburg, V.L. Calculation of contact and symplectic cobordism groups. Topology 31:4 (1992), 757–762.

Ginzburg, V.L., and Khesin, B.A. Steady fluid flows and symplectic geometry. Preprint IHES, October 1992, 20 pp. (to appear in: J. Geom. Phys.).

Giroux, E. Convexité en topologie de contact. Comm. Math. Helvet. 66 (1991), 637–677.

Givental, A.B. Lagrangian embeddings of surfaces and the open Whitney umbrella. Funct. Anal. Appl. 20:3 (1986), 35–41.

Givental, A.B. Periodic mappings in symplectic topology. Funct. Anal. Appl. 23:4 (1989), 287–300.

Givental, A.B. Nonlinear generalization of the Maslov index. In: Singularity Theory and Its Applications, V. Arnold, ed. (Advances in Soviet Math., vol. 1), Providence, AMS, 1990, pp. 71–103.

Givental, A.B. A symplectic fixed point theorem for toric manifolds. In: Progress in Math., A. Floer Memorial Volume. Boston-Basel, Birkhäuser, 1993.

Gray, J.W. Some global properties of contact structures. Ann. Math. 69 (1959), 421– 450.

Gromov. M. Partial Dιfferential Relations. Berlin-Heidelberg-New York, Springer, 1996.

Gromov, M. Pseudo holomorphic curves in symplectic manifolds. Invent. Math. 82 (1985), 307–347.

Guillemin, V., and Sternberg, S. Birational equivalence in symplectic category. Invent. Math. 97 (1989), 485–522.

Harlamov, V., and Eliashberg, Y. On the number of complex points of a real surface in a complex surface. Proc. LITC−82 (1982), 143–148.

Hofer, H., and Zehnder, E. A new capacity for symplectic manifolds. In: Analysis Et Cetera, Boston, Academic Press, 1990, 405–428.

Hofer, H. On the topological properties of symplectic maps. Proc. Roy. Soc. Edinburgh, Ser. A. 115 (1990), 25–38.

Hofer, H. Symplectic Invariants. In: Proceedings ICM Kyoto 1990. Berlin-Heidelberg-New York, Springer, 1991.

Hofer, H. Symplectic capacities. In: Durham Conferences, S.K. Donaldson and C.B. Thomas, eds. London Math. Soc., 1992.

Hofer, H., and Salamon, D. Floer homology and Novikov rings. Preprint, 1992. 39 pp.

Hofer, H. Estimates for the energy of a symplectic map. Comm. Math. Helvet. 68 (1993), 48–72.

Kazarian, M.È. Umbilical characteristic number of Lagrangian mappings of 3-dimensional pseudo-optical manifolds. Preprint, Ruhr-Univ. Bochum, 1993, 12 pp.

Kuksin, S. Infinite-dimensional symplectic capacities and a squeezing theorem for Hamiltonian PDE’s. Preprint Forschungsinstitut für Mathematik ETH Zürich, August 25, 1993.

Lalonde, F., and Sikorav, J.-C. Sous-variétés lagrangiennes exactes des fibrés cotangents. Comm. Math. Helvet. (1991), 18–33.

Lalonde, F. Isotopy of symplectic balls, Gromov’s radius and the structure of ruled symplectic 4-manifolds. Preprint, 1992.

Lalonde, F., and McDuff, D. The geometry of symplectic energy. Preprint # 1993/6 IMS SUNY Stony Brook, June 1993, 26 pp.

Laudenbach, F., and Sikorav, J.-C. Persistence d’intersection avec la section nulle au cours d’une isotopie hamiltonienne dans un fibré cotangent. Invent. Math. 82:2 (1985), 349–358.

Laudenbach, F., and Sikorav, J.C. Disjonction hamiltonienne et limites de sous-variétés lagrangiennes. Preprint, Centre de Math., Ecole Polytechnique, septembre 1993.

Lee, Yng-Ing. Nonlagrangian limits of Lagrangian discs. Duke Math. J. Intern. Math. Res. Notes. N°2, 1993.

Luttinger, K. Lagrangian tori in ℝ⁴. Preprint, 1992.

Lutz, R. Structures de contact sur les fibrés principaux en cercles de dimension 3. Ann. Inst. Fourier 3 (1977), 1–15.

Martinet, J. Formes de contact sur les variétés de dimension 3. In: Lect. Notes in Math. 209. Berlin-Heidelberg-New York, Springer, 1971, pp. 142–163.

Meckert, C. Formes de contact sur la source connexe de deux variétés de contact. IRMA, Strasbourg, 1980.

McDuff, D. The structure of rational and ruled symplectic 4-manifolds. JAMS 3:1 (1990), 679–712.

McDuff, D. Elliptic methods in symplectic geometry. Bull. Amer. Math. Soc. 23 (1990), 311–358.

McDuff, D. Symplectic manifolds with contact-type boundaries. Invent. Math. 103 (1991), 651–671.

McDuff, D. Blow-ups and symplectic embeddings in dimension 4. Topology 30 (1991), 409–421.

McDuff, D. Singularities of J-holomorphic curves. J. Geom. Anal. 3 (1992), 249–266.

McDuff, D. Notes on ruled symplectic 4-manifolds. Preprint, 1992 (to appear in Trans. Amer. Math. Soc.).

McDuff, D., and Polterovich, L. Symplectic packing and algebraic geometry. Preprint, 1992.

McDuff, D. Remarks on the uniqueness of symplectic blowing-up. Proceedings of 1990 Warwick Symposium, Cambridge Univ. Press, 1993.

McDuff, D., and Salamon, D. Notes on J-holomorphic curves. Stony Brook preprint, 1993.

McDuff, D., and Traynor, L. The 4-dimensional symplectic camel and related results. (London Math. Soc. Lect. Notes Series). Cambridge Univ. Press (to appear).

McDuff, D., and Salamon, D. Symplectic Topology (in preparation).

Moser, J. On the volume elements on a manifold. Trans. Amer. Math. Soc. 120 (1965), 286–294.

Oh, Y.-G. A symplectic fixed point theorem on T²ⁿ × ℂP^k. Math. Z. 203:4 (1990), 535–552.

Polterovich, L. New invariants of embedded totally real tori and one problem of Hamiltonian mechanics. In: Methods of Qualitative Theory and the Theory of Bifurcations, Gorki, 1988, pp. 84–90.

Polterovich, L. Strongly optical Lagrange manifolds. Math. Notes Ac. Sc. USSR 45 (1989), 152–158.

Polterovich, L. Symplectic displacement energy for Lagrangian submanifolds. Preprint, 1991.

Polterovich, L. The surgery of Lagrange submanifolds. Geom. Funct. Anal. 2 (1991), 213–246.

Polterovich, L. The Maslov class of Lagrange surfaces and Gromov’s pseudoholomorphic curves. Trans. Amer. Math. Soc. 325 (1991), 241–248.

Rabinowitz, P. Critical points of indefinite functionals and periodic solutions of differential equations. In: Proceedings ICM Helsinki 1978. Acad. Sci. Fennica, Helsinki, 1980, pp. 791–796.

Sato, H. Remarks concerning contact manifolds. Tôhoku Math. J. 29 (1977), 577–584.

Siegel, C.L. Symplectic geometry. Amer. J. Math. 65:1 (1943).

Sikorav, J.C. Problèmes d’intersections et de points fixes en géométrie hamiltonienne. Comm. Math. Helvet. 62:1 (1987), 62–73.

Sikorav, J.-C. Rigidité symplectique dans le cotangent de Tⁿ. Duke Math. J. 59 (1989), 227–231.

Sikorav, J.-C. Systèmes hamiltoniens et topologie symplectique. Pisa, ETS Editrice, 1990.

Sikorav, J.-C. Quelques propriétés des plongements lagrangiens. Preprint, 1990.

Tabachnikov, S.L. Calculation of the generalized Bennequin invariant of a Legendrian curve from the geometry of its front. Funct. Anal. Appl. 22:3 (1988), 246–248.

Tabachnikov, S. Around four vertices. Russian Math. Surveys 45:1 (1990), 229–230.

Tabachnikov, S. Geometry of Lagrangian and Legendrian 2-web. Preprint, Arkansas Univ., 1992, 22 pp.

Traynor, L. Symplectic embedding trees for generalized camel spaces. Preprint 034-93 MSRI Berkeley, January 1993, 19 pp.

Traynor, L. Symplectic packing constructions. Preprint, October 1993, 20 pp.

Vasil’ev, V.A. Characteristic classes of Lagrangian and Legendre manifolds dual to singularities of caustics and wave fronts. Funct. Anal. Appl. 15 (1981), 164–173.

Vasil’ev, V.A. Self-intersections of wave fronts and Legendre (Lagrangian) charactristic numbers. Funct. Anal. Appl. 16 (1982), 131–133.

Vassilyev, V.A. Lagrange and Legendre Characteristic Classes. New York, Gordon and Breach, 1988.

Vasil’ev, V.A. Topology of spaces of functions having no complicated singularities. Funct. Anal. Appl. 23:4 (1989), 24–36.

Viterbo, C. Capacités symplectiques et applications. Séminaire Bourbaki, n°714, Astérisque 177–178 (1989), 345–362.

Viterbo, C. A new obstruction to embedding Lagrangian tori. Invent. Math. 100 (1990), 301–320.

Viterbo, C. Plongement lagrangiens et capacités symplectiques des tores dans ℝ²ⁿ. C.R. Acad. Sci. Paris, Sér. I, Math. 311 (1990), 487–490.

Viterbo, C. Symplectic topology as the geometry of generating functions. Math. Ann. 292 (1992), 685–710.

Weinstein, A. Lectures on symplectic manifolds. C.B.M.S. Regional Conf. Ser. in Math. vol. 29, Providence, AMS, 1977.

Weinstein, A. Periodic orbits for convex hamiltonian systems. Ann. Math. 108 (1978), 507–518.

Weinstein, A. On the hypotheses of Rabinowitz’s periodic orbit theorems. J. Diff. Eq. 33 (1979), 353–358.

Weinstein, A. Contact surgeries and symplectic handlebodies. Hokkaido Math. J. 20 (1991), 241–251.

Weinstein, A. Symplectic manifolds and their lagrangian submanifolds. Adv. Math. 6 (1971), 329–346.

Part I Newtonian Mechanics

1Experimental facts

1 The principles of relativity and determinacy

A Space and time

B Galileo’s principle of relativity

C Newton’s principle of determinacy

2 The galilean group and Newton’s equations

A Notation

B Galilean structure

C Motion, velocity, acceleration

D Newton’s equations

E Constraints imposed by the principle of relativity

3 Examples of mechanical systems

A Example 1: A stone falling to the earth

B Example 2: Falling from great height

C Example 3: Motion of a weight along a line under the action of a spring

D Example 4: Conservative systems

2Investigation of the equations of motion

4 Systems with one degree of freedom

A Definitions

B Phase flow

C Examples

D Phase flow

5 Systems with two degrees of freedom

A Definitions

B The law of conservation of energy

C Phase space

6 Conservative force fields

A Work of a force field along a path

B Conditions for a field to be conservative

C Central fields

7 Angular momentum

A The law of conservation of angular momentum

B Kepler’s law

8 Investigation of motion in a central field

A Reduction to a one-dimensional problem

B Integration of the equation of motion

C Investigation of the orbit

D Central fields in which all bounded orbits are closed

E Kepler’s problem

9 The motion of a point in three-space

A Conservative fields

B Central fields

C Axially symmetric fields

10 Motions of a system of n points

A Internal and external forces

B The law of conservation of momentum

C The law of conservation of angular momentum

D The law of conservation of energy

E Example: The two-body problem

11 The method of similarity

A Example

B A problem

Part II Lagrangian Mechanics

3Variational principles

12 Calculus of variations

A Variations

B Extremals

C The Euler-Lagrange equation

D An important remark

13 Lagrange’s equations

A Hamilton’s principle of least action

B The simplest examples

14 Legendre transformations

A Definition

B Examples

C Involutivity

D Young’s inequality

E The case of many variables

15 Hamilton’s equations

A Equivalence of Lagrange’s and Hamilton’s equations

B Hamilton’s function and energy

C Cyclic coordinates

16 Liouville’s theorem

A The phase flow

B Liouville’s theorem

C Proof

D Poincaré’s recurrence theorem

E Applications of Poincaré’s theorem

4Lagrangian mechanics on manifolds

Part I
Newtonian Mechanics

1
Experimental facts

2
Investigation of the equations of motion

Part II
Lagrangian Mechanics

3
Variational principles

4
Lagrangian mechanics on manifolds

5
Oscillations

6
Rigid bodies

C The inertia operator⁵⁰