Caïus Wojcik and Luca Zamboni recently posted a paper on the arXiv solving an interesting problem in combinatorics on words.
http://arxiv.org/abs/1608.03519
Monochromatic factorisations of words and periodicity.
Caïus Wojcik, Luca Q. Zamboni.
I had recently learned of the problem through another paper by Zamboni and a collaborator,
MR3425965
Aldo de Luca, Luca Q. Zamboni
On prefixal factorizations of words.
European J. Combin. 52 (2016), part A, 59–73.
It is a nice result and I think it may be enjoyable to work through the argument here. Everything that follows is either straightforward, standard, or comes from these papers.
1. The problem
To make the post reasonably self-contained, I begin by recalling some conventions, not all of which we need here.
By an alphabet we simply mean a set , whose elements we refer to as letters. A word
is a sequence
of letters from
where
is a (not necessarily non-empty, not necessarily proper) initial segment of
. If we denote
for all
, it is customary to write the word simply as
and we will follow the convention. The empty word is typically denoted by or
. By
we denote the collection of all finite words from
, and
By
we denote the length of the word
(that is, the size of the domain of the corresponding function).
We define concatenation of words in the obvious way, and denote by the word resulting from concatenating the words
and
, where
. This operation is associative, and we extend it as well to infinite concatenations.
If a word can be written as the concatenation of words
we refer to the right-hand side as a factorization of . If
and
is non-empty, we say that
is a prefix of
. Similarly, if
is non-empty, it is a suffix of
. By
for
we denote the word resulting form concatenating
copies of
. Similarly,
is the result of concatenating infinitely many copies.
By a coloring we mean here a function where
is a finite set of “colors”.
Apparently the problem I want to discuss was first considered by T.C. Brown around 2006 and, independently, by Zamboni around 2010. It is a question about monochromatic factorizations of infinite words. To motivate it, let me begin with a cute observation.
Fact. Suppose
is an infinite word, and
is a coloring. There is then a factorization
![]()
where all the
have the same color.
Proof. The proof is a straightforward application of Ramsey’s theorem: Assign to the coloring of the set
of
-sized subsets of
given by
whenever
. Ramsey’s theorem ensures that there is an infinite set
such that all
with
have the same color. We can then take
and
for all
.
In the fact above, the word was arbitrary, and we obtained a monochromatic factorization of a suffix of
. However, without additional assumptions, it is not possible to improve this to a monochromatic factorization of
itself. For example, consider the word
and the coloring
If nothing else, it follows that if is an infinite word that admits a monochromatic factorization for any coloring, then the first letter of
must appear infinitely often. The same idea shows that each letter in
must appear infinitely often.
Actually, significantly more should be true. For example, consider the word
and the coloring
This example shows that in fact any such must admit a prefixal factorization, a factorization
where each is a prefix of
.
Problem. Characterize those infinite words
with the property P that given any coloring, there is a monochromatic factorization of
.
The above shows that any word with property P admits a prefixal factorization. But it is easy to see that this is not enough. For a simple example, consider
Consider the coloring where
if
is not a prefix of
,
1, and
otherwise. If
is a monochromatic factorization of , then
so
and each
must be a prefix of
of length at least
. But it is easy to see that
admits no such factorization: For any
, consider the first appearance in
of
and note that none of the first
zeros can be the beginning of an
, so for some
we must have
and since
, in fact
, but this string only appears once in
, so actually
. Since
was arbitrary, we are done.
Here is a more interesting example: The Thue-Morse word
was defined by Axel Thue in 1906 and became known through the work of Marston Morse in the 1920s. It is defined as the limit (in the natural sense) of the sequence of finite words given by
and
where, for
,
is the result of replacing each letter
in
with
.
This word admits a prefixal factorization, namely
To see this, note that the sequence of letters of can be defined recursively by
,
and
. To see this, note in turn that the sequence given by this recursive definition actually satisfies that
is the parity of the number of
s in the binary expansion of
from which the recursive description above as the limit of the
should be clear. The relevance of this observation is that no three consecutive letters in
can be the same (since
for all
), and from this it is clear that
can be factored using only the words
,
, and
.
But it is not so straightforward as in the previous example to check whether admits a factorization into prefixes of length larger than
.
Instead, I recall a basic property of and use it to exhibit an explicit coloring for which
admits no monochromatic factorization.