Caïus Wojcik and Luca Zamboni recently posted a paper on the arXiv solving an interesting problem in combinatorics on words.
Monochromatic factorisations of words and periodicity.
Caïus Wojcik, Luca Q. Zamboni.
I had recently learned of the problem through another paper by Zamboni and a collaborator,
Aldo de Luca, Luca Q. Zamboni
On prefixal factorizations of words.
European J. Combin. 52 (2016), part A, 59–73.
It is a nice result and I think it may be enjoyable to work through the argument here. Everything that follows is either straightforward, standard, or comes from these papers.
1. The problem
To make the post reasonably self-contained, I begin by recalling some conventions, not all of which we need here.
By an alphabet we simply mean a set , whose elements we refer to as letters. A word is a sequence of letters from where is a (not necessarily non-empty, not necessarily proper) initial segment of . If we denote for all , it is customary to write the word simply as
and we will follow the convention. The empty word is typically denoted by or . By we denote the collection of all finite words from , and By we denote the length of the word (that is, the size of the domain of the corresponding function).
We define concatenation of words in the obvious way, and denote by the word resulting from concatenating the words and , where . This operation is associative, and we extend it as well to infinite concatenations.
If a word can be written as the concatenation of words
we refer to the right-hand side as a factorization of . If and is non-empty, we say that is a prefix of . Similarly, if is non-empty, it is a suffix of . By for we denote the word resulting form concatenating copies of . Similarly, is the result of concatenating infinitely many copies.
By a coloring we mean here a function where is a finite set of “colors”.
Apparently the problem I want to discuss was first considered by T.C. Brown around 2006 and, independently, by Zamboni around 2010. It is a question about monochromatic factorizations of infinite words. To motivate it, let me begin with a cute observation.
Fact. Suppose is an infinite word, and is a coloring. There is then a factorization
where all the have the same color.
Proof. The proof is a straightforward application of Ramsey’s theorem: Assign to the coloring of the set of -sized subsets of given by whenever . Ramsey’s theorem ensures that there is an infinite set such that all with have the same color. We can then take and for all .
In the fact above, the word was arbitrary, and we obtained a monochromatic factorization of a suffix of . However, without additional assumptions, it is not possible to improve this to a monochromatic factorization of itself. For example, consider the word and the coloring
If nothing else, it follows that if is an infinite word that admits a monochromatic factorization for any coloring, then the first letter of must appear infinitely often. The same idea shows that each letter in must appear infinitely often.
Actually, significantly more should be true. For example, consider the word
and the coloring
This example shows that in fact any such must admit a prefixal factorization, a factorization
where each is a prefix of .
Problem. Characterize those infinite words with the property P that given any coloring, there is a monochromatic factorization of .
The above shows that any word with property P admits a prefixal factorization. But it is easy to see that this is not enough. For a simple example, consider
Consider the coloring where if is not a prefix of , 1, and otherwise. If
is a monochromatic factorization of , then so and each must be a prefix of of length at least . But it is easy to see that admits no such factorization: For any , consider the first appearance in of and note that none of the first zeros can be the beginning of an , so for some we must have and since , in fact , but this string only appears once in , so actually . Since was arbitrary, we are done.
Here is a more interesting example: The Thue-Morse word
was defined by Axel Thue in 1906 and became known through the work of Marston Morse in the 1920s. It is defined as the limit (in the natural sense) of the sequence of finite words given by and where, for , is the result of replacing each letter in with .
This word admits a prefixal factorization, namely
To see this, note that the sequence of letters of can be defined recursively by , and . To see this, note in turn that the sequence given by this recursive definition actually satisfies that is the parity of the number of s in the binary expansion of from which the recursive description above as the limit of the should be clear. The relevance of this observation is that no three consecutive letters in can be the same (since for all ), and from this it is clear that can be factored using only the words , , and .
But it is not so straightforward as in the previous example to check whether admits a factorization into prefixes of length larger than .
Instead, I recall a basic property of and use it to exhibit an explicit coloring for which admits no monochromatic factorization.