Monochromatic colorings

August 20, 2016

Caïus Wojcik and Luca Zamboni recently posted a paper on the arXiv solving an interesting problem in combinatorics on words.
Monochromatic factorisations of words and periodicity.
Caïus Wojcik, Luca Q. Zamboni.

I had recently learned of the problem through another paper by Zamboni and a collaborator,

Aldo de Luca, Luca Q. Zamboni
On prefixal factorizations of words.
European J. Combin. 52 (2016), part A, 59–73.

It is a nice result and I think it may be enjoyable to work through the argument here. Everything that follows is either straightforward, standard, or comes from these papers.

1. The problem

To make the post reasonably self-contained, I begin by recalling some conventions, not all of which we need here.

By an alphabet we simply mean a set A, whose elements we refer to as letters. A word w is a sequence w:N\to A of letters from A where N is a (not necessarily non-empty, not necessarily proper) initial segment of \mathbb N. If we denote w_i=w(i) for all i\in N, it is customary to write the word simply as


and we will follow the convention. The empty word is typically denoted by \Lambda or \varepsilon. By A^* we denote the collection of all finite words from A, and A^+=A^*\setminus \{\varepsilon\}. By |x| we denote the length of the word x (that is, the size of the domain of the corresponding function).

We define concatenation of words in the obvious way, and denote by x_0x_1 the word resulting from concatenating the words x_0 and x_1, where x_0\in A^*. This operation is associative, and we extend it as well to infinite concatenations.

If a word w can be written as the concatenation of words x_0,x_1,\dots,


we refer to the right-hand side as a factorization of w. If w=xy and x is non-empty, we say that x is a prefix of w. Similarly, if y is non-empty, it is a suffix of w. By x^n for n\in\mathbb N we denote the word resulting form concatenating n copies of x. Similarly, x^{\mathbb N} is the result of concatenating infinitely many copies.

By a coloring we mean here a function c:A^+\to C where C is a finite set of “colors”.

Apparently the problem I want to discuss was first considered by T.C. Brown around 2006 and, independently, by Zamboni around 2010. It is a question about monochromatic factorizations of infinite words. To motivate it, let me begin with a cute observation.

Fact. Suppose w=w_0w_1\dots is an infinite word, and c is a coloring. There is then a factorization 


where all the x_i\in A^+ have the same color.

Proof. The proof is a straightforward application of Ramsey’s theorem: Assign to c the coloring of the set [\mathbb N]^2 of 2-sized subsets of \mathbb N given by d(\{i,j\})=c(w_iw_{i+1}\dots w_{j-1}) whenever i<j. Ramsey’s theorem ensures that there is an infinite set I=\{n_0<n_1<\dots\} such that all w_{n_i}w_{n_i+1}\dots w_{n_j-1} with i<j have the same color. We can then take p=w_0\dots w_{n_0-1} and x_i=w_{n_i}\dots w_{n_{i+1}-1} for all i. \Box

In the fact above, the word w was arbitrary, and we obtained a monochromatic factorization of a suffix of w. However, without additional assumptions, it is not possible to improve this to a monochromatic factorization of w itself. For example, consider the word w=01^{\mathbb N} and the coloring

c(x)=\left\{\begin{array}{cl}0&\mbox{if }0\mbox{ appears in }x,\\ 1&\mbox{otherwise.}\end{array}\right.

If nothing else, it follows that if w is an infinite word that admits a monochromatic factorization for any coloring, then the first letter of w must appear infinitely often. The same idea shows that each letter in w must appear infinitely often.

Actually, significantly more should be true. For example, consider the word

w=010110111\dots 01^n0 1^{n+1}\dots,

and the coloring

c(x)=\left\{\begin{array}{cl}0&\mbox{if }x\mbox{ is a prefix of }w,\\1&\mbox{otherwise.}\end{array}\right.

This example shows that in fact any such w must admit a prefixal factorization, a factorization


where each x_i is a prefix of w.

Problem. Characterize those infinite words w with the property P that given any coloring, there is a monochromatic factorization of w.

The above shows that any word with property P admits a prefixal factorization. But it is easy to see that this is not enough. For a simple example, consider


Consider the coloring c where c(x)=0 if x is not a prefix of w, c(0)=1, and c(x)=2 otherwise. If


is a monochromatic factorization of w, then x_0=01\dots so c(x_0)=2 and each x_i must be a prefix of w of length at least 2. But it is easy to see that w admits no such factorization: For any n>2, consider the first appearance in w of 0^{n+1} and note that none of the first n zeros can be the beginning of an x_i, so for some j we must have x_j=01\dots 10^n and since n>2, in fact x_j=01\dots 10^n10^n, but this string only appears once in w, so actually j=0. Since n was arbitrary, we are done.

Here is a more interesting example: The Thue-Morse word


was defined by Axel Thue in 1906 and became known through the work of Marston Morse in the 1920s. It is defined as the limit (in the natural sense) of the sequence x_0,x_1,\dots of finite words given by x_0=0 and x_{n+1}=x_n\bar{x_n} where, for x\in\{0,1\}^*, \bar x is the result of replacing each letter i in x with 1-i.

This word admits a prefixal factorization, namely


To see this, note that the sequence of letters of t can be defined recursively by t_0=0, t_{2n}=t_n and t_{2n+1}=1-t_n. To see this, note in turn that the sequence given by this recursive definition actually satisfies that t_n is the parity of the number of 1s in the binary expansion of n, from which the recursive description above as the limit of the x_n should be clear. The relevance of this observation is that no three consecutive letters in t can be the same (since t_{2n+1}=1-t_{2n} for all n), and from this it is clear that t can be factored using only the words 0, 01, and 011.

But it is not so straightforward as in the previous example to check whether t admits a factorization into prefixes of length larger than 1.

Instead, I recall a basic property of t and use it to exhibit an explicit coloring for which t admits no monochromatic factorization.

Read the rest of this entry »

502 – Thue sequences

September 1, 2009

This is a “hint” for exercise 3.4. An infinite 2-free 3-sequence is sometimes called a Thue sequence, since the number theorist Axel Thue was the first to study them. There are several ways of generating Thue sequences. I mention three:

  1.  One could define a map \sigma as in the case of 3-free 2-sequences. Now set \sigma(0)=012, \sigma(1)=02, and \sigma(2)=1, and once again consider the iterates \sigma^n(0).
  2. Thue’s original example was \sigma(0)=01201, \sigma(1)=020121, and \sigma(2)=0212021.
  3. Another approach consists on taking the transformation \sigma giving the 3-free 2-sequence, so \sigma(0)=01 and \sigma(1)=10. Now define q_n for n\ge1 to be the string obtained from \sigma^{2n}(0) by counting ones between consecutive zeros. For example, \sigma^2(0)=0110 so q_1=2, while \sigma^4(0)=0110100110010110, so q_2=2102012. Check that each q_n is a 3-sequence. Now check that the \sigma^{2n}(0) contain no string of the form ixixi where i\in 2 and x\in 2^{<{\mathbb N}}, and conclude from this that the strings q_n are 2-free. 

If you need extra time, you have until Friday, September 11, to work on this question.