Lemma8.1.1
\[L\al = M\al\backslash\al.\]
Let \(\al\) denote a binary string, and let \(L\) denote the set of binary of binary strings that do not contain \(\al\) as a substring. How many strings are there in \(L\) with length \(n\)?
Let \(M\) denote the binary strings that contain exactly one copy of \(\al\), as a suffix. If \(\eps\) denotes the empty string, then \[\eps+L(0+1) = L+M.\tag{8.1.1}\label{eq-eL01}\] Also \(L\al\) clearly contains \(M\) as a subset, but for example if \(\al=111\), then \[L\al =M+M1+M11.\] We develop a precise description of \(L\al\). Let \(\al\backslash\be\) denote the set of suffixes \(\be_1\) such that \(\be_0\be_1=\be\) and \(\be_0\) is a suffix of \(\al\). For example if \[\al =100110,\quad \be=1011,\] then \[\al\backslash\be = \{11\},\quad \al\backslash\al = \{\eps,0110\}\]
\[L\al = M\al\backslash\al.\]
Note that \(\al\backslash\al\) is a finite language, i.e., a finite set of strings.
If \(N\) is a set of binary strings, let \(N(t)\) denote its generating series. This is what we get when we substitute \(t\) for \(0\) and \(1\), a string of length \(\ell\) maps to \(t^\ell\) and the language maps to its generating series. Now \eqref{eq-eL01} yields \[1+2t L(t) = L(t)+M(t)\] and, if \(D=\al\backslash\al\) and \(|\al|\) denotes the length of \(\al\), the lemma yields that\[ L(t)t^{|\al|} = M(t) D(t).\]
\[L(t) =\left(1-2t+\frac{t^{|\al|}}{D(t)}\right)^{\!-1}.\]
Let's use this to compute some numbers. We first set up the ring of formal power series over the rationals.
If \(\al=111\), then \(D(t)=1+t+t^2\) and \(L(t)\) is
As an exercise, write a program which takes a binary string \(\al\) and computes the generating series for \(\al\backslash\al\). (The string arrives as a Python string, there is no reason why your procedure should not work for strings over an arbitrary alphabet.)
The square root of the series \(L(2t)\) has non-negative integer coefficients.
Does this hold for other forbidden substrings?
For a second variant, suppose \(\cF=\{\seq\al1d\}\) is a set of binary strings. Let \(L\) be the set of strings that contain no element of \(\cF\) and let \(M_i\) denote the set of strings that contain one copy of \(\al_i\), as a suffix, but do not contain any other copies of strings in \(\cF\). If \(|\cF|=2\), we have equations \begin{align} \eps+L(0+1) &= L +M_1 +M_2\notag\\ L\al_1 &= M_1(\al_1\backslash\al_1) +M_2(\al_1\backslash\al_2)\notag\\ L\al_2 &= M_1(\al_2\backslash\al_1) +M_2(\al_2\backslash\al_2).\notag \end{align} Write a program that, given \(\al_1\) and \(\al_2\), computes the generating series for \(L\), \(M_1\) and \(M_2\).