I had recently learned of the problem through another paper by Zamboni and a collaborator,

MR3425965

Aldo de Luca, Luca Q. Zamboni

On prefixal factorizations of words.

European J. Combin. **52** (2016), part A, 59–73.

It is a nice result and I think it may be enjoyable to work through the argument here. Everything that follows is either straightforward, standard, or comes from these papers.

**1. The problem**

To make the post reasonably self-contained, I begin by recalling some conventions, not all of which we need here.

By an *alphabet* we simply mean a set , whose elements we refer to as *letters*. A *word* is a sequence of letters from where is a (not necessarily non-empty, not necessarily proper) initial segment of . If we denote for all , it is customary to write the word simply as

and we will follow the convention. The empty word is typically denoted by or . By we denote the collection of all finite words from , and By we denote the *length* of the word (that is, the size of the domain of the corresponding function).

We define concatenation of words in the obvious way, and denote by the word resulting from concatenating the words and , where . This operation is associative, and we extend it as well to infinite concatenations.

If a word can be written as the concatenation of words

we refer to the right-hand side as a *factorization* of . If and is non-empty, we say that is a *prefix* of . Similarly, if is non-empty, it is a *suffix* of . By for we denote the word resulting form concatenating copies of . Similarly, is the result of concatenating infinitely many copies.

By a *coloring* we mean here a function where is a *finite* set of “colors”.

Apparently the problem I want to discuss was first considered by T.C. Brown around 2006 and, independently, by Zamboni around 2010. It is a question about monochromatic factorizations of infinite words. To motivate it, let me begin with a cute observation.

**Fact.** *Suppose is an infinite word, and is a coloring. There is then a factorization *

* *

*where all the have the same color.*

**Proof.** The proof is a straightforward application of Ramsey’s theorem: Assign to the coloring of the set of -sized subsets of given by whenever . Ramsey’s theorem ensures that there is an infinite set such that all with have the same color. We can then take and for all .

In the fact above, the word was arbitrary, and we obtained a monochromatic factorization of a suffix of . However, without additional assumptions, it is not possible to improve this to a monochromatic factorization of itself. For example, consider the word and the coloring

If nothing else, it follows that if is an infinite word that admits a monochromatic factorization for any coloring, then the first letter of must appear infinitely often. The same idea shows that each letter in must appear infinitely often.

Actually, significantly more should be true. For example, consider the word

and the coloring

This example shows that in fact any such must admit a *prefixal* factorization, a factorization

where each is a prefix of .

**Problem.** Characterize those infinite words with the property **P** that given any coloring, there is a monochromatic factorization of .

The above shows that any word with property **P** admits a prefixal factorization. But it is easy to see that this is not enough. For a simple example, consider

Consider the coloring where if is not a prefix of , 1, and otherwise. If

is a monochromatic factorization of , then so and each must be a prefix of of length at least . But it is easy to see that admits no such factorization: For any , consider the first appearance in of and note that none of the first zeros can be the beginning of an , so for some we must have and since , in fact , but this string only appears once in , so actually . Since was arbitrary, we are done.

Here is a more interesting example: The Thue-Morse word

was defined by Axel Thue in 1906 and became known through the work of Marston Morse in the 1920s. It is defined as the limit (in the natural sense) of the sequence of finite words given by and where, for , is the result of replacing each letter in with .

This word admits a prefixal factorization, namely

To see this, note that the sequence of letters of can be defined recursively by , and . To see this, note in turn that the sequence given by this recursive definition actually satisfies that is the parity of the number of s in the binary expansion of from which the recursive description above as the limit of the should be clear. The relevance of this observation is that no three consecutive letters in can be the same (since for all ), and from this it is clear that can be factored using only the words , , and .

But it is not so straightforward as in the previous example to check whether admits a factorization into prefixes of length larger than .

Instead, I recall a basic property of and use it to exhibit an explicit coloring for which admits no monochromatic factorization.

Read the rest of this entry »