Caïus Wojcik and Luca Zamboni recently posted a paper on the arXiv solving an interesting problem in combinatorics on words.
Monochromatic factorisations of words and periodicity.
Caïus Wojcik, Luca Q. Zamboni.
I had recently learned of the problem through another paper by Zamboni and a collaborator,
Aldo de Luca, Luca Q. Zamboni
On prefixal factorizations of words.
European J. Combin. 52 (2016), part A, 59–73.
It is a nice result and I think it may be enjoyable to work through the argument here. Everything that follows is either straightforward, standard, or comes from these papers.
1. The problem
To make the post reasonably self-contained, I begin by recalling some conventions, not all of which we need here.
By an alphabet we simply mean a set , whose elements we refer to as letters. A word is a sequence of letters from where is a (not necessarily non-empty, not necessarily proper) initial segment of . If we denote for all , it is customary to write the word simply as
and we will follow the convention. The empty word is typically denoted by or . By we denote the collection of all finite words from , and By we denote the length of the word (that is, the size of the domain of the corresponding function).
We define concatenation of words in the obvious way, and denote by the word resulting from concatenating the words and , where . This operation is associative, and we extend it as well to infinite concatenations.
If a word can be written as the concatenation of words
we refer to the right-hand side as a factorization of . If and is non-empty, we say that is a prefix of . Similarly, if is non-empty, it is a suffix of . By for we denote the word resulting form concatenating copies of . Similarly, is the result of concatenating infinitely many copies.
By a coloring we mean here a function where is a finite set of “colors”.
Apparently the problem I want to discuss was first considered by T.C. Brown around 2006 and, independently, by Zamboni around 2010. It is a question about monochromatic factorizations of infinite words. To motivate it, let me begin with a cute observation.
Fact. Suppose is an infinite word, and is a coloring. There is then a factorization
where all the have the same color.
Proof. The proof is a straightforward application of Ramsey’s theorem: Assign to the coloring of the set of -sized subsets of given by whenever . Ramsey’s theorem ensures that there is an infinite set such that all with have the same color. We can then take and for all .
In the fact above, the word was arbitrary, and we obtained a monochromatic factorization of a suffix of . However, without additional assumptions, it is not possible to improve this to a monochromatic factorization of itself. For example, consider the word and the coloring
If nothing else, it follows that if is an infinite word that admits a monochromatic factorization for any coloring, then the first letter of must appear infinitely often. The same idea shows that each letter in must appear infinitely often.
Actually, significantly more should be true. For example, consider the word
and the coloring
This example shows that in fact any such must admit a prefixal factorization, a factorization
where each is a prefix of .
Problem. Characterize those infinite words with the property P that given any coloring, there is a monochromatic factorization of .
The above shows that any word with property P admits a prefixal factorization. But it is easy to see that this is not enough. For a simple example, consider
Consider the coloring where if is not a prefix of , 1, and otherwise. If
is a monochromatic factorization of , then so and each must be a prefix of of length at least . But it is easy to see that admits no such factorization: For any , consider the first appearance in of and note that none of the first zeros can be the beginning of an , so for some we must have and since , in fact , but this string only appears once in , so actually . Since was arbitrary, we are done.
Here is a more interesting example: The Thue-Morse word
was defined by Axel Thue in 1906 and became known through the work of Marston Morse in the 1920s. It is defined as the limit (in the natural sense) of the sequence of finite words given by and where, for , is the result of replacing each letter in with .
This word admits a prefixal factorization, namely
To see this, note that the sequence of letters of can be defined recursively by , and . To see this, note in turn that the sequence given by this recursive definition actually satisfies that is the parity of the number of s in the binary expansion of from which the recursive description above as the limit of the should be clear. The relevance of this observation is that no three consecutive letters in can be the same (since for all ), and from this it is clear that can be factored using only the words , , and .
But it is not so straightforward as in the previous example to check whether admits a factorization into prefixes of length larger than .
Instead, I recall a basic property of and use it to exhibit an explicit coloring for which admits no monochromatic factorization.
The basic property is the following:
Lemma. contains no overlaps, that is, no words of the form where is a letter. In particular, no non-empty factor of is a cube.
Proof. Proceed by contradiction. Suppose is an overlap present in say and, letting , then for , so
Assume further that is minimal with this property.
It follows that must be odd. In fact, if is even, consider the word
If is even, then (by the recursive definition of the ) this is also the word
which is an overlap that is a factor of with , contradicting the minimality of .
Similarly, if , then (again, by the recursive definition of the ) the word
is an overlap in of the form (where for ) with , again contradicting the minimality of .
We reach a contradiction by arguing that cannot be odd either. First, we note that . Otherwise, for some with , we see that . This means that (by the description of the in terms of the binary expansion of ). This means that we must also have that , but this is impossible, since is odd, so is even.
We proceed to rule out the two remaining cases. We already know that is not possible. So, suppose . This means the overlap has the form where are also letters. By parity considerations, one of the two occurrences of must have for odd , so . Similarly, and, we have a contradiction since, for the same reason, we must also have .
Now that we know that admits no overlaps, it is easy to produce a coloring for which admits no monochromatic factorization. Indeed, define by setting if is not a prefix of , and (for or ) if is a prefix of that ends in .
Suppose that is a monochromatic factorization of . Each is then a prefix of , and all end in the same latter, say . Now note that, for some , we must have . Since and are prefixes of , it follows that is also a prefix of . And, since , this means that is defined, and ends in , so we find in the word , that is, where . This is an overlap, contradicting what we just showed.
This seems a good moment to point out the obvious remark that there are words with property P: Any constant word has this property. In fact, any periodic word (any word of the form ) has this property.
The main result of the Wojcik-Zamboni paper is that these are all the examples.
Theorem. A word has property P iff it is periodic.
The rest of this post is devoted to the proof of this theorem.
2. Ensuring periodicity
What Wojcik and Zamboni prove in their note is actually stronger than stated. They show that it is enough to consider colorings with range : If admits monochromatic factorizations for all such colorings, then is periodic. That in itself is worthy of mention. All prior attempts on the problem that I am aware of took advantage of the fact that the colorings could use any (finite) number of colors. The idea was to exploit the following easy observation:
Observation. Suppose has property P. Let be a property of finite words, and suppose that there is a coloring such that for any monochromatic factorization
of with respect to , at least one of the has property . There is then a coloring such that any monochromatic factorization
of with respect to is also monochromatic with respect to $c$, and moreover all the have property .
Proof. Suppose takes the values . Consider the coloring where if and does not have property , and if and has property . Now note that by design any monochromatic factorization
of with respect to is monochromatic with respect to , and therefore at least one of the has property . By definition of , this means that all have property .
(For instance, we may assume that all the factors in are prefixes, that every letter that ever appears in appears in all of them, that they all end in the same letter and coincide on their first symbols, that their lengths are all congruent modulo , etc.)
We now proceed with the proof of the theorem.
Begin by fixing a linear ordering of . Assume that is not periodic. We describe a coloring for which admits no monochromatic factorization. Given distinct words of the same length, write if precedes lexicographically (with respect to the ordering we fixed on ), that is if and , and is least such that , then in fact For not necessarily distinct words of the same length,write iff either , or . Finally, for any , denote by the prefix of of length . Clearly, implies for any prefixes and of and , respectively, of the same length. Similarly, if and are prefixes of , respectively, with of the same length, then .
Definition. For , set if or if and , where . Also, set if or if and , where .
Note that is well defined since if and , then either , (which cannot be by assumption) or else .
Fix a factorization We proceed to argue that it cannot be monochromatic with respect to . The argument will be by contradiction. So, assume that all have the same color. First note that we may as well assume that for all . Indeed, if this is not the case, replace the ordering of with its inverse , and note that now for all .
We proceed to check that each is a prefix of , and a bit more. Given , if , It is convenient to define as the longest common prefix of and .
Lemma 1. for all .
Proof. We argue by induction on . Suppose . By definition of , this means that if , then there are letters in and infinite words with and . The key observation is that this implies that is a prefix of . Indeed, otherwise is a prefix of , while so and therefore .
Since is a prefix of (and therefore also of ), we can write for some , so that . On the other hand, and, since is a prefix of , then we must have where But , so implies in particular that . But then , since
From this we conclude that , so that as wanted.
We are almost done. With notation as in the lemma, we have that and also , so that . It follows that is a prefix of . Otherwise, is a prefix of , which is in turn a prefix of , so is a prefix of . Since , this gives us that so that , and , a contradiction.
Lemma 2. Define
for all . Then and for all .
Clearly, the two conclusions of Lemma 2 contradict one another, and we are done. It remains to prove Lemma 2.
Proof. With notation as above, for some , so that , and . On the other hand, .
The result follows.