Caïus Wojcik and Luca Zamboni recently posted a paper on the arXiv solving an interesting problem in combinatorics on words.
http://arxiv.org/abs/1608.03519
Monochromatic factorisations of words and periodicity.
Caïus Wojcik, Luca Q. Zamboni.
I had recently learned of the problem through another paper by Zamboni and a collaborator,
MR3425965
Aldo de Luca, Luca Q. Zamboni
On prefixal factorizations of words.
European J. Combin. 52 (2016), part A, 59–73.
It is a nice result and I think it may be enjoyable to work through the argument here. Everything that follows is either straightforward, standard, or comes from these papers.
1. The problem
To make the post reasonably self-contained, I begin by recalling some conventions, not all of which we need here.
By an alphabet we simply mean a set , whose elements we refer to as letters. A word
is a sequence
of letters from
where
is a (not necessarily non-empty, not necessarily proper) initial segment of
. If we denote
for all
, it is customary to write the word simply as
and we will follow the convention. The empty word is typically denoted by or
. By
we denote the collection of all finite words from
, and
By
we denote the length of the word
(that is, the size of the domain of the corresponding function).
We define concatenation of words in the obvious way, and denote by the word resulting from concatenating the words
and
, where
. This operation is associative, and we extend it as well to infinite concatenations.
If a word can be written as the concatenation of words
we refer to the right-hand side as a factorization of . If
and
is non-empty, we say that
is a prefix of
. Similarly, if
is non-empty, it is a suffix of
. By
for
we denote the word resulting form concatenating
copies of
. Similarly,
is the result of concatenating infinitely many copies.
By a coloring we mean here a function where
is a finite set of “colors”.
Apparently the problem I want to discuss was first considered by T.C. Brown around 2006 and, independently, by Zamboni around 2010. It is a question about monochromatic factorizations of infinite words. To motivate it, let me begin with a cute observation.
Fact. Suppose
is an infinite word, and
is a coloring. There is then a factorization
![]()
where all the
have the same color.
Proof. The proof is a straightforward application of Ramsey’s theorem: Assign to the coloring of the set
of
-sized subsets of
given by
whenever
. Ramsey’s theorem ensures that there is an infinite set
such that all
with
have the same color. We can then take
and
for all
.
In the fact above, the word was arbitrary, and we obtained a monochromatic factorization of a suffix of
. However, without additional assumptions, it is not possible to improve this to a monochromatic factorization of
itself. For example, consider the word
and the coloring
If nothing else, it follows that if is an infinite word that admits a monochromatic factorization for any coloring, then the first letter of
must appear infinitely often. The same idea shows that each letter in
must appear infinitely often.
Actually, significantly more should be true. For example, consider the word
and the coloring
This example shows that in fact any such must admit a prefixal factorization, a factorization
where each is a prefix of
.
Problem. Characterize those infinite words
with the property P that given any coloring, there is a monochromatic factorization of
.
The above shows that any word with property P admits a prefixal factorization. But it is easy to see that this is not enough. For a simple example, consider
Consider the coloring where
if
is not a prefix of
,
1, and
otherwise. If
is a monochromatic factorization of , then
so
and each
must be a prefix of
of length at least
. But it is easy to see that
admits no such factorization: For any
, consider the first appearance in
of
and note that none of the first
zeros can be the beginning of an
, so for some
we must have
and since
, in fact
, but this string only appears once in
, so actually
. Since
was arbitrary, we are done.
Here is a more interesting example: The Thue-Morse word
was defined by Axel Thue in 1906 and became known through the work of Marston Morse in the 1920s. It is defined as the limit (in the natural sense) of the sequence of finite words given by
and
where, for
,
is the result of replacing each letter
in
with
.
This word admits a prefixal factorization, namely
To see this, note that the sequence of letters of can be defined recursively by
,
and
. To see this, note in turn that the sequence given by this recursive definition actually satisfies that
is the parity of the number of
s in the binary expansion of
from which the recursive description above as the limit of the
should be clear. The relevance of this observation is that no three consecutive letters in
can be the same (since
for all
), and from this it is clear that
can be factored using only the words
,
, and
.
But it is not so straightforward as in the previous example to check whether admits a factorization into prefixes of length larger than
.
Instead, I recall a basic property of and use it to exhibit an explicit coloring for which
admits no monochromatic factorization.
The basic property is the following:
Lemma.
contains no overlaps, that is, no words of the form
where
is a letter. In particular, no non-empty factor of
is a cube.
Proof. Proceed by contradiction. Suppose is an overlap present in
say
and, letting
, then
for
, so
Assume further that is minimal with this property.
It follows that must be odd. In fact, if
is even, consider the word
If is even, then (by the recursive definition of the
) this is also the word
which is an overlap that is a factor of
with
, contradicting the minimality of
.
Similarly, if , then (again, by the recursive definition of the
) the word
is an overlap in of the form
(where
for
) with
, again contradicting the minimality of
.
We reach a contradiction by arguing that cannot be odd either. First, we note that
. Otherwise, for some
with
, we see that
. This means that
(by the description of the
in terms of the binary expansion of
). This means that we must also have that
, but this is impossible, since
is odd, so
is even.
We proceed to rule out the two remaining cases. We already know that is not possible. So, suppose
. This means the overlap has the form
where
are also letters. By parity considerations, one of the two occurrences of
must have
for odd
, so
. Similarly,
and, we have a contradiction since, for the same reason, we must also have
.
Now that we know that admits no overlaps, it is easy to produce a coloring for which
admits no monochromatic factorization. Indeed, define
by setting
if
is not a prefix of
, and
(for
or
) if
is a prefix of
that ends in
.
Suppose that is a monochromatic factorization of
. Each
is then a prefix of
, and all end in the same latter, say
. Now note that, for some
, we must have
. Since
and
are prefixes of
, it follows that
is also a prefix of
. And, since
, this means that
is defined, and ends in
, so we find in
the word
, that is,
where
. This is an overlap, contradicting what we just showed.
This seems a good moment to point out the obvious remark that there are words with property P: Any constant word has this property. In fact, any periodic word (any word of the form ) has this property.
The main result of the Wojcik-Zamboni paper is that these are all the examples.
Theorem. A word has property P iff it is periodic.
The rest of this post is devoted to the proof of this theorem.
2. Ensuring periodicity
What Wojcik and Zamboni prove in their note is actually stronger than stated. They show that it is enough to consider colorings with range : If
admits monochromatic factorizations for all such colorings, then
is periodic. That in itself is worthy of mention. All prior attempts on the problem that I am aware of took advantage of the fact that the colorings could use any (finite) number of colors. The idea was to exploit the following easy observation:
Observation. Suppose
has property P. Let
be a property of finite words, and suppose that there is a coloring
such that for any monochromatic factorization
of
with respect to
, at least one of the
has property
. There is then a coloring
such that any monochromatic factorization
of
with respect to
is also monochromatic with respect to $c$, and moreover all the
have property
.
Proof. Suppose takes the values
. Consider the coloring
where
if
and
does not have property
, and
if
and
has property
. Now note that by design any monochromatic factorization
of with respect to
is monochromatic with respect to
, and therefore at least one of the
has property
. By definition of
, this means that all
have property
.
(For instance, we may assume that all the factors in are prefixes, that every letter that ever appears in
appears in all of them, that they all end in the same letter and coincide on their first
symbols, that their lengths are all congruent modulo
, etc.)
We now proceed with the proof of the theorem.
Begin by fixing a linear ordering of
. Assume that
is not periodic. We describe a coloring
for which
admits no monochromatic factorization. Given distinct words
of the same length, write
if
precedes
lexicographically (with respect to the ordering
we fixed on
), that is if
and
, and
is least such that
, then in fact
For not necessarily distinct words of the same length,write
iff either
, or
. Finally, for any
, denote by
the prefix of
of length
. Clearly,
implies
for any prefixes
and
of
and
, respectively, of the same length. Similarly, if
and
are prefixes of
, respectively, with
of the same length, then
.
Definition. For
, set
if
or if
and
, where
. Also, set
if
or if
and
, where
.
Note that is well defined since if
and
, then either
, (which cannot be by assumption) or else
.
Fix a factorization We proceed to argue that it cannot be monochromatic with respect to
. The argument will be by contradiction. So, assume that all
have the same color. First note that we may as well assume that
for all
. Indeed, if this is not the case, replace the ordering
of
with its inverse
, and note that now
for all
.
We proceed to check that each is a prefix of
, and a bit more. Given
, if
, It is convenient to define
as the longest common prefix of
and
.
Lemma 1.
for all
.
Proof. We argue by induction on . Suppose
. By definition of
, this means that if
, then there are letters
in
and infinite words
with
and
. The key observation is that this implies that
is a prefix of
. Indeed, otherwise
is a prefix of
, while
so
and therefore
.
Since is a prefix of
(and therefore also of
), we can write
for some
, so that
. On the other hand,
and, since
is a prefix of
, then we must have
where
But
, so
implies in particular that
. But then
, since
From this we conclude that , so that
as wanted.
We are almost done. With notation as in the lemma, we have that and also
, so that
. It follows that
is a prefix of
. Otherwise,
is a prefix of
, which is in turn a prefix of
, so
is a prefix of
. Since
, this gives us that
so that
, and
, a contradiction.
Lemma 2. Define
for all
. Then
and
for all
.
Clearly, the two conclusions of Lemma 2 contradict one another, and we are done. It remains to prove Lemma 2.
Proof. With notation as above, for some
, so that
, and
. On the other hand,
.
The result follows.