Barna’s inequality

(Disclaimer: This is planned to be part of a larger set of notes on the dynamics of Newton’s method.)

We present Barna’s proof of the following result: Suppose that f is a polynomial all of whose roots are real and N is is associated Newton’s function. If r is a root of f and its immediate basin of attraction I_r=(a,b) is bounded, then |N'(a)|>1 and |N'(b)|>1.

1. Introduction

If f is a differentiable function, we define N=N_f, Newton’s function for f, by

\displaystyle N_f(x)=x-\frac{f(x)}{f'(x)}

for all values of x such that f'(x)\ne 0. Under reasonable assumptions on f, N can be extended (by continuity) so that it is also defined at those points x such that f(x)=f'(x)=0. Note that if f'(x)\ne 0, then N'(x) exists, and we have

\displaystyle N'(x)=\frac{f(x)f''(x)}{(f'(x))^2}.

The function N is of course the function obtained through the application of the familiar Newton’s method for approximating roots of f. Recall that (under reasonable assumptions on f) the method starts with a guess x_0 for a root of f, and refines this guess successively, with each new guess x_{n+1} being obtained from the previous one x_n by replacing f with its linear approximation at x=x_n, that is, with the line y-x_n=f'(x_n)(x-x_n), and letting x_{n+1} be the value of x where this approximation equals 0, that is, x_{n+1}=N_f(x_n).

One can say much about what makes a function “reasonable”, but for the purposes of this note it suffices that polynomials certainly fall under this category. One easily verifies that if (f is a real valued polynomial and) f(x^*)=0 then there is an open neighborhood I of x^* such that if x_0\in I, then for all n we have that x_n is defined, x_n\in I, and \lim_{n\to \infty}x_n=x^*. This is perhaps easiest to see if we assume that x^* is a simple zero, that is, f(x^*)=0\ne f'(x^*). In this case N'(x^*)=0 and, by continuity, N(x) and N'(x) are defined, and |N'(x)|<1, for all x in a neighborhood (x^*-\epsilon,x^*+\epsilon) of x^*. But if x is in this neighborhood, by the mean value theorem we have that |N(x)-x^*|=|N(x)-N(x^*)|<|x-x^*|. It follows that N(x) is also in this neighborhood, and that successive applications of N result in a sequence that converges to x^*.

The largest interval I about x^* such that for any point x_0 in I the sequence x_0,x_1,\dots,x_{n+1}=N(x_n),\dots is well defined and converges to x^*, we call the immediate basin of attraction of x^*. One can verify that I is open, that N(I)=I, and that if I is bounded, say I=(a,b), then N(a)=b and N(b)=a, so that \{a,b\} is a cycle of period 2 for N. By the observations above, we know that if c is whichever of a and b is closest to x^*, then |N'(c)|\ge 1.

The study of the dynamics of Newton’s method is very interesting. Nowadays, most work on the subject is part of the more general topic of complex dynamics, but the study of the dynamical behavior on \mathbb R deserves to be better known. In this note I present a result of Barna, who in the late 1950s and early 1960s began such a study. The result is an inequality that improves the observation at the end of the previous paragraph. This inequality is useful in studying the chaotic behavior of the method for polynomials of degree at least 4, see for instance [SU84].

2. Barna inequality

Theorem (Barna [Bar61]). Suppose that f is a polynomial all of whose roots are real. If r is a root of f and its immediate basin of attraction I_r=(a,b) is bounded, then |N'(a)|>1 and |N'(b)|>1.

Barna’s argument is completely elementary, but it is not easy to locate it in modern literature.

Proof. 

Let f(x)=\alpha(x-r_1)^{n_1}\dots(x-r_k)^{n_k} where \alpha\ne0, r_1<r_2<\dots<r_k, and the n_i are positive integers. Suppose that f has degree n, so that \displaystyle \sum_{j=1}^k n_j=n. Letting

\displaystyle \Phi(x)=\frac{f'(x)}{f(x)},

we see that \Phi(x)=1/(x-N(x)) and \displaystyle \Phi(x)=\sum_{j=1}^k\frac{n_j}{x-r_j}. Suppose r=r_i, a, and b are as in the statement of the theorem, so that N(a)=b, N(b)=a, a<r<b, and all other r_j lie outside of {}[a,b]. Since I_r=(a,b), in particular we have that N'(a),N'(b)<0, though we do not need to assume this (it will follow from the analysis below). Note that

\displaystyle -\Phi(a)=\sum_{j=1}^k\frac{n_j}{r_j-a}=\frac1{b-a},\quad(1)

and

\displaystyle \Phi(b)=\sum_{j=1}^k\frac{n_j}{b-r_j}=\frac1{b-a}.\quad(2)

Also,

\displaystyle N'(x)=\left(x-\frac1{\Phi(x)}\right)'=1+\frac{\Phi'(x)}{(\Phi(x))^2}.

We need to prove that \displaystyle 1+\frac {\Phi'(a)}{(\Phi(a))^2}<-1 or, equivalently,

\displaystyle -\Phi'(a)>\frac2{(b-a)^2}.

Similarly, we need to prove that

\displaystyle -\Phi'(b)>\frac2{(b-a)^2}.

Since \displaystyle -\Phi'(x)=\sum_{j=1}^k\frac{n_j}{(x-r_j)^2}, our task is to deduce from (1) and (2) that

\displaystyle \sum_{j=1}^k\frac{n_j}{(a-r_j)^2}>\frac2{(b-a)^2},\quad(3)

and

\displaystyle \sum_{j=1}^k\frac{n_j}{(b-r_j)^2}>\frac2{(b-a)^2}.\quad(4)

Let \displaystyle d_j=\frac{a-r_j}{b-r_j}, so that d_j>0 for j\ne i, and d_i<0 (recall that r=r_i). We have that \displaystyle r_j=\frac{a-bd_j}{1-d_j} and therefore \displaystyle r_j-a=\frac{d_j(a-b)}{1-d_j}. Hence (1) is equivalent to

\displaystyle \sum_{j=1}^k \frac{n_j(1-d_j)}{d_j}=-1,

or

\displaystyle \sum_{j=1}^k \frac{n_j}{d_j}=\sum_{j=1}^k n_j -1 =n-1,

or

\displaystyle \sum_{\substack{j=1\\ j\ne i}}^k\frac{n_j}{d_j}=n-1+\frac{n_i}{|d_i|}.\quad(5)

Similarly, \displaystyle b-r_j=\frac{b-a}{1-d_j}, so that (2) is equivalent to

\displaystyle \sum_{\substack{j=1\\ j\ne i}}^k n_jd_j=n-1+n_i|d_i|,\quad(6)

while (3) and (4) are respectively equivalent to

\displaystyle \sum_{j=1}^k\frac{n_j}{d_j^2}>n\quad(7)

and

\displaystyle \sum_{j=1}^k n_jd_j^2>n.\quad(8).

Now we make use of the following lemma:

Lemma. Suppose that n,p are positive integers with n>p, that s_1,\dots,s_{n-p} and t are positive reals, and that

\displaystyle \sum_{j=1}^{n-p}s_j=n-1+pt

and

\displaystyle \sum_{j=1}^{n-p}\frac1{s_j}=n-1+\frac pt

hold. It then follows that 

\displaystyle \sum_{j=1}^{n-p}s_j^2+pt^2>n.

First we prove that the lemma gives us the result. Note first that, under the assumptions of the lemma, we also have that

\displaystyle \sum_{j=1}^{n-p}\frac1{s_j^2}+\frac p{t^2}>n.

To see this, simply apply the lemma with 1/s_1,\dots,1/s_{n-p} and 1/t in place of s_1,\dots,s_{n-p} and t.

Now, if p=n_i, t=|d_i|, and for each j with 1\le j\le k and j\ne i, we have that s_l=d_j for precisely n_j indices l, then the assumptions of the lemma correspond exactly to (5) and (6), while its conclusion is precisely (7), and the additional conclusion remarked in the paragraph above is precisely (8).

It remains to prove the lemma.

Proof.

We consider two cases: Suppose first that t<s_j for all j. Then

\displaystyle \sum_j(s_j-1)^2=\sum_{j}s_j^2-2\sum_{j}s_j + n-p = \displaystyle \sum_{j} s_j^2-2(n-1+pt)+n-p,

or

\displaystyle \sum_{j}s_j^2=\sum_{j}(s_j-1)^2+2(n-1+pt)-n+p= \displaystyle \sum_{j}s_j\left(s_j-2+\frac1{s_j}\right)+n+2pt+p-2.

Using that x+1/x\ge 2 for all x>0, we conclude that

\displaystyle \begin{array}{rcl}  \displaystyle \sum_j s_j^2&\ge&\displaystyle \sum_{j}t\left(s_j-2+\frac1{s_j}\right)+n+2pt+p-2\\  &=&\displaystyle t(n-1+pt)-2t(n-p)+t\left(n-1+\frac{p}{t}\right)+\\ &&\displaystyle \hspace{5mm}n+2pt+p-2\\  &=& pt^2+t(4p-2)+n+2p-2\\  &>& n-pt^2,  \end{array}

as wanted.

If, on the other hand, t\ge s_j for some j, we may as well assume that t\ge s_1 and, using the convexity of the function g(x)=x^2, we have that

\displaystyle \sqrt{\frac{\sum_{j>1}s_j^2}{n-p-1}}\ge\frac{\sum_{j>1}s_j}{n-p-1}=\frac{n-1+pt-s_1}{n-p-1},

so

\displaystyle \begin{array}{rcl}  \displaystyle \sum_{j>1}s_j^2&\ge&\displaystyle \frac{(n-1+pt-s_1)^2}{n-p-1}=\frac{(n-p-1+p+pt-s_1)^2}{n-p-1}\\  &=&\displaystyle n-p-1+2(p+pt-s_1)+\frac{(p+pt-s_1)^2}{n-p-1}\\  &>& n+p-1+2(pt-s_1)\ge n\\  &>&n-pt^2,  \end{array}

and the result also follows in this case. \Box

This completes the proof. \Box

References

[Bar61] Béla Barna. Über die Divergenzpunkte des Newtonschen Verfahrens zur Bestimmung von Wurzeln algebraischer Gleichungen. III. Publ. Math. Debrecen, 8:193–207, 1961. MR0135224 (24 #B1274).

[SU84] Donald G. Saari and John B. Urenko. Newton’s method, circle maps, and chaotic motion. Amer. Math. Monthly, 91(1):3–17, 1984. MR0729188 (85a:58060).

Pdf version of this note.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: