275 -Positive polynomials

When studying local extreme points of functions of several (real) variables, a typical textbook exercise is to consider the polynomial

P(x,y)=x^2+3xy+3y^2-6x+3y-6.

Here we have P_x=2x+3y-6 and P_y=3x+6y+3, so the only critical point of P is (15,-8). Since P_{xx}=2 and the Hessian of P is 2\times 6-3^2=3>0, it follows that (15,-8) is a local minimum of P and, since it is the only critical point, it is in fact an absolute minimum with P(15,-8)=-63.

P being a polynomial, it is reasonable to expect that there is an algebraic explanation as for why -63 is its minimum, and why it lies at (15,-8). After all, this is what happens in one variable: If p(x)=ax^2+bx+c and a\ne0, then

\displaystyle p(x)=a\left(x+\frac b{2a}\right)^2+\frac{4ac-b^2}{4a},

and obviously p has a minimum at x=-b/2a, and this minimum is (4ac-b^2)/4a.

The polynomial P of the example above can be analyzed this way as well. A bit of algebra shows that we can write

\displaystyle P(x,y)=\left(x-3+\frac32 y\right)^2+3\left(\frac y2+4\right)^2-63,

and it follows immediately that P(x,y) has a minimum value of -63, achieved precisely when both x-3+3y/2=0 and 4+y/2=0, i.e, at (15,-8).

(One can go further, and explain how to go in a systematic way about the `bit of algebra’ that led to the representation of P as above, but let’s leave that for now.)

What we did with P is not a mere coincidence.  Hilbert’s 17th of the 23 problems of his famous address to the Second International Congress of Mathematicians in Paris, 1900, asks whether every polynomial P(x_1,\dots,x_n) with real coefficients which is non-negative for all  (real) choices of x_1,\dots,x_n is actually a sum of squares of rational functions. (A rational function is a quotient of polynomials.) A nonnegative polynomial is usually called positive definite, but I won’t use this notation here.

If Hilbert’s problem had an affirmative solution, this would provide a clear explanation as for why P is non-negative.

Hilbert himself showed that a non-negative polynomial P is a sum of squares of polynomials in either of the following cases:

  • P is a polynomial in only one variable,
  • P is a quadratic polynomial (as in our example above), in any number n of variables,
  • P is a quartic polynomial in two variables.

Emil Artin solved Hilbert’s problem (in the affirmative) in 1927. The stronger conclusion, that P is a sum of squares of polynomials, is unfortunately false in general. Here is a counterexample:

Q(x,y,z)=z^6+x^4y^2+x^2y^4-3x^2y^2z^2.

That this polynomial is non-negative follows from the inequality between the arithmetic and the geometric means, see example 1 below. This polynomial appears in the paper of Marie-Francoise Roy, The role of Hilbert’s problems in real algebraic geometry, in Proceedings of the ninth EWM Meeting, Loccum, Germany 1999, and one can also check that it is not a sum of squares of polynomials by applying the algorithm I refer to below.

Although Hilbert himself was aware that some non-negative polynomials P are not a sum of squares of polynomials, I believe it was the logician Rafael Robinson who exhibited the first explicit example of such a P, in his nice paper Some definite polynomials which are not sums of squares of real polynomials, in Selected questions of algebra and logic(collection dedicated to the memory of A. I. Malcev) (Russian), pp. 264282, Izdat. ‘‘Nauka’’Sibirsk.Otdel.,Novosibirsk, 1973.

Here I quote from the Mathscinet review by W. Ellison:

The key result is the following lemma: A polynomial f(x, y) of degree at most 3 that vanishes at 8 of the 9 points (x, y) with x, y\in\{-1,0,1\} must also vanish at the 9th point.

The polynomial f(x, y) =x^2(x^2-1)^2+y^2(y^2-1)^2 vanishes to the 2nd order at each of the 9 points and we can find a sextic curve g(x, y) = 0 that does not pass through (0,0), but does have double points at each of the other 8 points, for example g(x, y)=(x^2-1)(y^2-1)(x^2+y^2+1). It follows that the function g(x, y)/f(x, y) is bounded above in the whole (x, y) plane. Thus, for a suitable \alpha >0 (in fact one can take \alpha= 1) it follows that S_\alpha(x,y)=f(x,y)-\alpha g(x,y) is positive definite and that S_\alpha(0,0)\ne0. It is now clear that we cannot have S_{\alpha} (x,y)=\sum f_r^2 (x,y) with real polynomials f_r, for these polynomials are at most cubics and so would have to vanish at (0,0), since they vanish at the other 8 points.

It is interesting to see that several logicians have worked on this problem; notably Robinson, Henkin and Kreisel.

This is a good place to mention that there are algorithms that, given a non-negative polynomial, decide whether it admits a representation as a sum of squares of polynomials, and if so, find such a sum. This is the main result of the nice paper An algorithm for sums of squares of real polynomials, by Victoria Powers and Thorsten Woermann, J. Pure and Appl. Alg. 127 (1998), 99-104. (As of this writing, the paper is also available at Dr. Powers’s webpage.)

Many natural inequalities that appear frequently in analysis can effectively be stated as asserting that certain polynomials are non-negative, so in fact one can (at least in principle) prove these inequalities by simply displaying a representation of the given polynomials as sums of squares.

Examples

1. The inequality between the arithmetic and the geometric means asserts that

\displaystyle\frac{x_1+\dots+x_n}n\ge\sqrt[n]{x_1\dots x_n}

for all nonnegative numbers x_1,\dots,x_n, with equality only when x_1=\dots=x_n. This classical result is one of the most quoted inequalities in analysis. Notice that it is equivalent to the assertion that the polynomial

\displaystyle P(z_1,\dots,z_n)=\frac{z_1^{2n} + z_2^{2n} +\dots+z_n^{2n}}n -z_1^2\dots z_n^2

is expressible as a sum of squares: First, replace the x_i with y_i^2, to dispense with the restriction that the x_i must be non-negative. Then, replace y_i^2 with z_i^{2n}, to remove the n-th roots.

As suggested, one may try to prove this inequality by simply exhibiting P as a sum of squares. Such a representation was actually found by Hurwitz in 1891, in his paper Über den Vergleich des arithmetischen und des geometrischen Mittels, that can be found in Math. Werke, 505-507, Basel, E. Berkhäuser, 1933.

To write the corresponding representation in a palatable way, let’s introduce some notation. Given a function f(z_1,\dots,z_n), let’s write {\mathcal P}f(z_1,\dots,z_n) for the result of adding all the expressions of the form f(z_{i_1},\dots,z_{i_n}) for all possible rearrangements (i_1,\dots,i_n) of the indices (1,\dots,n) (there are n! such possible rearrangements, where n!, n-factorial, is defined by setting 0!=1 and n!=1\times\dots\times n for n=1,2,\dots).

For example, {\mathcal P}z_1=(n-1)!(z_1+\dots+z_n) and {\mathcal P}(z_1\dots z_n)=n!z_1\dots z_n.

Write

\phi_k(z_1,\dots,z_n) ={\mathcal P} ((z_1^{n-k}-z_2^{n-k}) (z_1-z_2)z_3 z_4\dots z_{k+1})

for k=1,\dots,n-1 and

f_k=\phi_k(z_1^2,\dots,z_n^2).

Notice that each f_k is clearly non-negative, since (z_1^{n-k}-z_2^{n-k})(z_1-z_2)z_3 z_4\dots z_{k+1}=

\displaystyle (z_1-z_2)^2\left(\sum_{j=0}^{n-k-1}z_1^{n-k-1-j}z_2^j\right)z_3\dots z_{k+1}.

Then

\displaystyle P(z_1,\dots,z_n)=\frac1{2(n!)}\sum_{k=1}^{n-1}f_k.

Notice that this expression makes it obvious that equality can only occur if the z_i are all equal.

For example,

\displaystyle \frac{x_1^2+x_2^2}2-x_1x_2=\frac12(x_1-x_2)^2,

\displaystyle \frac{x_1^3+x_2^3+x_3^3}3-x_1x_2x_3=\frac16((x_1-x_2)^2x_3+(x_2-x_3)^2x_1 \displaystyle +(x_3-x_1)^2x_2+(x_1-x_2)^2(x_1+x_2) \displaystyle +(x_1-x_3)^2(x_1+x_3) +(x_2-x_3)^2(x_2+x_3)),

and
\displaystyle \frac{x_1^4+x_2^4+x_3^4+x_4^4}4-x_1x_2x_3x_4=\frac{(x_1^2-x_2^2)^2+(x_3^2-x_4^2)^2}4 \displaystyle +\frac{(x_1x_2-x_3x_4)^2}2.

2. Another fairly used classical inequality is the Cauchy-Schwarz inequality

\displaystyle \sum_{i=1}^n x_iy_i\le\sqrt{\left(\sum_{i=1}^n x_i^2\right)\left(\sum_{i=1}^n y_i^2\right)},

with equality only if the tuples (x_1,\dots,x_n) and (y_1,\dots,y_n) are proportional. This result is usually shown by analyzing the discriminant of a non-negative quadratic polynomial, and it is also widely presented in the more compact form |\vec x\cdot\vec y|\le|\vec x||\vec y| for \vec x,\vec y arbitrary vectors in {\mathbb R}^n and \cdot the usual dot product. Letting

\displaystyle P(\vec x,\vec y)=\left(\sum_{i=1}^n x_i^2\right) \left(\sum_{i=1}^n y_i^2\right)-\left(\sum_{i=1}^n x_iy_i\right)^2,

the inequality is seen to be equivalent to the assertion that the polynomial P is non-negative, and so (as in the previous example) it should be possible to prove this by exhibiting P as a sum of squares. The resulting identity is due to Lagrange, namely,

\displaystyle P(x_1,\dots,x_n,y_1,\dots,y_n)=\sum_{1\le i<j\le n}(x_iy_j-x_jy_i)^2,

from which the case of equality also follows immediately.

3. Let me close with an example of a classical inequality that leads to non-negative polynomials for which I do not know whether they admit representations as a sum of squares of polynomials. This inequality is due to Minkowski: For non-negative x_i,y_i, we have

\displaystyle \left(\prod_{i=1}^n(x_i+y_i)\right)^{1/n}\ge\left(\prod_{i=1}^n x_i\right)^{1/n}+\left(\prod_{i=1}^n y_i\right)^{1/n}.

As in example 1 above, we can rewrite this as the assertion that a polynomial is non-negative, namely,

\displaystyle P(x_1,\dots,x_n,y_1,\dots,y_n) \displaystyle =\prod_{i=1}^n(x_i^{2n}+y_i^{2n})-\left(\prod_{i=1}^n x_i^2+\prod_{i=1}^n y_i^2\right)^n.

That the inequality holds is a consequence of the inequality between the arithmetic mean and the geometric mean. For example, expanding the first product, one finds exactly n\choose k terms with exactly k many factors of the form x_i^{2n} and n-k many factors y_i^{2n}. For any j with 1\le j\le n, precisely {n-1}\choose{k-1} many of these terms contain the factor x_j^{2n} so, by the arithmetic mean-geometric mean inequality, the sum of these terms is at least n\choose k times the product of the x_i to the power 2k times the product of the y_i to the power 2(n-k) (since 2n {{n-1}\choose{k-1}}/ {n\choose k} =2k), which is precisely the corresponding term when we expand the n-th power we are subtracting in P.

However, it is not clear to me that P is a sum of squares of polynomials. The problem trying to apply the representation I displayed in example 1 is similar to the problem one finds trying to apply that representation to the polynomial Q(x,y,z)=z^6+x^4y^2+x^2y^4-3x^2y^2z^2, namely, since for example x^4y^2 is not a perfect sixth power, the representation from example 1 gives us Q as a sum of squares of functions involving cubic roots rather than polynomials.

Update (Dec. 22, 2012): Péter E. Frenkel and Péter Horváth have shown (June, 2012) that Minkowski’s inequality can indeed be proved by showing that P as above is sum of squares, see Minkowski’s inequality and sums of squares, at the arXiv (Their paper has appeared in Cent. Eur. J. Math. 12 (2014), no. 3, 510–516).

For Young’s inequality with rational exponents, see Purely “algebraic” proof of Young’s inequality (December, 2012), at Math.Stackexchange.

(Sep. 6, 2017): For more on sum-of-squares proofs of the AM-GM inequality, see Kazumasa Fujiwara and Tohru Ozawa, Identities for the Difference between the Arithmetic and Geometric Means, Int. Journal of Math. Analysis, Vol. 8, 2014, no. 31, 1525 – 1542. See also these posts at MathOverflow and Math.Stackexchange: 1, 2, 3.

Advertisement

6 Responses to 275 -Positive polynomials

  1. […] closely related to the inequality between the arithmetic and the geometric means, see this post for further discussion: […]

  2. Andrés Navas says:

    Many thanks for all of this. I was thinking on this problem and, actually, I produced another expression for P(x_1,\dots,x_n) as a sum of squares which is more complicated than Hurwitz’. So, thanks again for your link to Hurwitz’ work.

    Now, concerning your question at the end, there is a well-known algorithm that determines whether a positive polynomial is a sum of squares. Have you already tested it, say for n = 3,4,5?

  3. I agree that this may be confusing. I have slightly edited that example, hopefully for the better. Thanks!

    • Aaron Meurer says:

      IMHO it would be clearer to just use x_i^2 throughout (except in the very first equation). What really confused me was that the sum of squares was not really a sum of squares (until I figured out that you replaced x_i^2 with x_i).

  4. […] closely related to the inequality between the arithmetic and the geometric means, see this post for further […]

  5. […] on this, positive polynomials, sums of squares of polynomials, and Hilbert’s 17th problem, in this old blog post of […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: