Barna’s inequality

(Disclaimer: This is planned to be part of a larger set of notes on the dynamics of Newton’s method.)

We present Barna’s proof of the following result: Suppose that $f$ is a polynomial all of whose roots are real and $N$ is is associated Newton’s function. If $r$ is a root of $f$ and its immediate basin of attraction $I_r=(a,b)$ is bounded, then $|N'(a)|>1$ and $|N'(b)|>1$.

1. Introduction

If $f$ is a differentiable function, we define $N=N_f$, Newton’s function for $f$, by

$\displaystyle N_f(x)=x-\frac{f(x)}{f'(x)}$

for all values of $x$ such that $f'(x)\ne 0$. Under reasonable assumptions on $f$, $N$ can be extended (by continuity) so that it is also defined at those points $x$ such that $f(x)=f'(x)=0$. Note that if $f'(x)\ne 0$, then $N'(x)$ exists, and we have

$\displaystyle N'(x)=\frac{f(x)f''(x)}{(f'(x))^2}.$

The function $N$ is of course the function obtained through the application of the familiar Newton’s method for approximating roots of $f$. Recall that (under reasonable assumptions on $f$) the method starts with a guess $x_0$ for a root of $f$, and refines this guess successively, with each new guess $x_{n+1}$ being obtained from the previous one $x_n$ by replacing $f$ with its linear approximation at $x=x_n$, that is, with the line $y-x_n=f'(x_n)(x-x_n)$, and letting $x_{n+1}$ be the value of $x$ where this approximation equals $0$, that is, $x_{n+1}=N_f(x_n)$.

One can say much about what makes a function “reasonable”, but for the purposes of this note it suffices that polynomials certainly fall under this category. One easily verifies that if ($f$ is a real valued polynomial and) $f(x^*)=0$ then there is an open neighborhood $I$ of $x^*$ such that if $x_0\in I$, then for all $n$ we have that $x_n$ is defined, $x_n\in I$, and $\lim_{n\to \infty}x_n=x^*$. This is perhaps easiest to see if we assume that $x^*$ is a simple zero, that is, $f(x^*)=0\ne f'(x^*)$. In this case $N'(x^*)=0$ and, by continuity, $N(x)$ and $N'(x)$ are defined, and $|N'(x)|<1$, for all $x$ in a neighborhood $(x^*-\epsilon,x^*+\epsilon)$ of $x^*$. But if $x$ is in this neighborhood, by the mean value theorem we have that $|N(x)-x^*|=|N(x)-N(x^*)|<|x-x^*|$. It follows that $N(x)$ is also in this neighborhood, and that successive applications of $N$ result in a sequence that converges to $x^*$.

The largest interval $I$ about $x^*$ such that for any point $x_0$ in $I$ the sequence $x_0,x_1,\dots,x_{n+1}=N(x_n),\dots$ is well defined and converges to $x^*$, we call the immediate basin of attraction of $x^*$. One can verify that $I$ is open, that $N(I)=I$, and that if $I$ is bounded, say $I=(a,b)$, then $N(a)=b$ and $N(b)=a$, so that $\{a,b\}$ is a cycle of period $2$ for $N$. By the observations above, we know that if $c$ is whichever of $a$ and $b$ is closest to $x^*$, then $|N'(c)|\ge 1$.

The study of the dynamics of Newton’s method is very interesting. Nowadays, most work on the subject is part of the more general topic of complex dynamics, but the study of the dynamical behavior on $\mathbb R$ deserves to be better known. In this note I present a result of Barna, who in the late 1950s and early 1960s began such a study. The result is an inequality that improves the observation at the end of the previous paragraph. This inequality is useful in studying the chaotic behavior of the method for polynomials of degree at least $4$, see for instance [SU84].

2. Barna inequality

Theorem (Barna [Bar61]). Suppose that $f$ is a polynomial all of whose roots are real. If $r$ is a root of $f$ and its immediate basin of attraction $I_r=(a,b)$ is bounded, then $|N'(a)|>1$ and $|N'(b)|>1$.

Barna’s argument is completely elementary, but it is not easy to locate it in modern literature.

Proof.

Let $f(x)=\alpha(x-r_1)^{n_1}\dots(x-r_k)^{n_k}$ where $\alpha\ne0$, $r_1, and the $n_i$ are positive integers. Suppose that $f$ has degree $n$, so that $\displaystyle \sum_{j=1}^k n_j=n$. Letting

$\displaystyle \Phi(x)=\frac{f'(x)}{f(x)},$

we see that $\Phi(x)=1/(x-N(x))$ and $\displaystyle \Phi(x)=\sum_{j=1}^k\frac{n_j}{x-r_j}$. Suppose $r=r_i$, $a$, and $b$ are as in the statement of the theorem, so that $N(a)=b$, $N(b)=a$, $a, and all other $r_j$ lie outside of ${}[a,b]$. Since $I_r=(a,b)$, in particular we have that $N'(a),N'(b)<0$, though we do not need to assume this (it will follow from the analysis below). Note that

$\displaystyle -\Phi(a)=\sum_{j=1}^k\frac{n_j}{r_j-a}=\frac1{b-a},\quad(1)$

and

$\displaystyle \Phi(b)=\sum_{j=1}^k\frac{n_j}{b-r_j}=\frac1{b-a}.\quad(2)$

Also,

$\displaystyle N'(x)=\left(x-\frac1{\Phi(x)}\right)'=1+\frac{\Phi'(x)}{(\Phi(x))^2}.$

We need to prove that $\displaystyle 1+\frac {\Phi'(a)}{(\Phi(a))^2}<-1$ or, equivalently,

$\displaystyle -\Phi'(a)>\frac2{(b-a)^2}.$

Similarly, we need to prove that

$\displaystyle -\Phi'(b)>\frac2{(b-a)^2}.$

Since $\displaystyle -\Phi'(x)=\sum_{j=1}^k\frac{n_j}{(x-r_j)^2}$, our task is to deduce from (1) and (2) that

$\displaystyle \sum_{j=1}^k\frac{n_j}{(a-r_j)^2}>\frac2{(b-a)^2},\quad(3)$

and

$\displaystyle \sum_{j=1}^k\frac{n_j}{(b-r_j)^2}>\frac2{(b-a)^2}.\quad(4)$

Let $\displaystyle d_j=\frac{a-r_j}{b-r_j}$, so that $d_j>0$ for $j\ne i$, and $d_i<0$ (recall that $r=r_i$). We have that $\displaystyle r_j=\frac{a-bd_j}{1-d_j}$ and therefore $\displaystyle r_j-a=\frac{d_j(a-b)}{1-d_j}$. Hence (1) is equivalent to

$\displaystyle \sum_{j=1}^k \frac{n_j(1-d_j)}{d_j}=-1,$

or

$\displaystyle \sum_{j=1}^k \frac{n_j}{d_j}=\sum_{j=1}^k n_j -1 =n-1,$

or

$\displaystyle \sum_{\substack{j=1\\ j\ne i}}^k\frac{n_j}{d_j}=n-1+\frac{n_i}{|d_i|}.\quad(5)$

Similarly, $\displaystyle b-r_j=\frac{b-a}{1-d_j}$, so that (2) is equivalent to

$\displaystyle \sum_{\substack{j=1\\ j\ne i}}^k n_jd_j=n-1+n_i|d_i|,\quad(6)$

while (3) and (4) are respectively equivalent to

$\displaystyle \sum_{j=1}^k\frac{n_j}{d_j^2}>n\quad(7)$

and

$\displaystyle \sum_{j=1}^k n_jd_j^2>n.\quad(8).$

Now we make use of the following lemma:

Lemma. Suppose that $n,p$ are positive integers with $n>p$, that $s_1,\dots,s_{n-p}$ and $t$ are positive reals, and that

$\displaystyle \sum_{j=1}^{n-p}s_j=n-1+pt$

and

$\displaystyle \sum_{j=1}^{n-p}\frac1{s_j}=n-1+\frac pt$

hold. It then follows that

$\displaystyle \sum_{j=1}^{n-p}s_j^2+pt^2>n.$

First we prove that the lemma gives us the result. Note first that, under the assumptions of the lemma, we also have that

$\displaystyle \sum_{j=1}^{n-p}\frac1{s_j^2}+\frac p{t^2}>n.$

To see this, simply apply the lemma with $1/s_1,\dots,1/s_{n-p}$ and $1/t$ in place of $s_1,\dots,s_{n-p}$ and $t$.

Now, if $p=n_i$, $t=|d_i|$, and for each $j$ with $1\le j\le k$ and $j\ne i$, we have that $s_l=d_j$ for precisely $n_j$ indices $l$, then the assumptions of the lemma correspond exactly to (5) and (6), while its conclusion is precisely (7), and the additional conclusion remarked in the paragraph above is precisely (8).

It remains to prove the lemma.

Proof.

We consider two cases: Suppose first that $t for all $j$. Then

$\displaystyle \sum_j(s_j-1)^2=\sum_{j}s_j^2-2\sum_{j}s_j + n-p =$ $\displaystyle \sum_{j} s_j^2-2(n-1+pt)+n-p,$

or

$\displaystyle \sum_{j}s_j^2=\sum_{j}(s_j-1)^2+2(n-1+pt)-n+p=$ $\displaystyle \sum_{j}s_j\left(s_j-2+\frac1{s_j}\right)+n+2pt+p-2.$

Using that $x+1/x\ge 2$ for all $x>0$, we conclude that

$\displaystyle \begin{array}{rcl} \displaystyle \sum_j s_j^2&\ge&\displaystyle \sum_{j}t\left(s_j-2+\frac1{s_j}\right)+n+2pt+p-2\\ &=&\displaystyle t(n-1+pt)-2t(n-p)+t\left(n-1+\frac{p}{t}\right)+\\ &&\displaystyle \hspace{5mm}n+2pt+p-2\\ &=& pt^2+t(4p-2)+n+2p-2\\ &>& n-pt^2, \end{array}$

as wanted.

If, on the other hand, $t\ge s_j$ for some $j$, we may as well assume that $t\ge s_1$ and, using the convexity of the function $g(x)=x^2$, we have that

$\displaystyle \sqrt{\frac{\sum_{j>1}s_j^2}{n-p-1}}\ge\frac{\sum_{j>1}s_j}{n-p-1}=\frac{n-1+pt-s_1}{n-p-1},$

so

$\displaystyle \begin{array}{rcl} \displaystyle \sum_{j>1}s_j^2&\ge&\displaystyle \frac{(n-1+pt-s_1)^2}{n-p-1}=\frac{(n-p-1+p+pt-s_1)^2}{n-p-1}\\ &=&\displaystyle n-p-1+2(p+pt-s_1)+\frac{(p+pt-s_1)^2}{n-p-1}\\ &>& n+p-1+2(pt-s_1)\ge n\\ &>&n-pt^2, \end{array}$

and the result also follows in this case. $\Box$

This completes the proof. $\Box$

References

[Bar61] Béla Barna. Über die Divergenzpunkte des Newtonschen Verfahrens zur Bestimmung von Wurzeln algebraischer Gleichungen. III. Publ. Math. Debrecen, 8:193–207, 1961. MR0135224 (24 #B1274).

[SU84] Donald G. Saari and John B. Urenko. Newton’s method, circle maps, and chaotic motion. Amer. Math. Monthly, 91(1):3–17, 1984. MR0729188 (85a:58060).