Hints¶

Green (3a): Converting NFAs to DFAs¶

Q1¶

Q1

QHintVideo

Consider the following NFA.

NFA diagram

What is its transition table?
Use the subset construction method to convert it to an equivalent DFA.
Draw the state diagram of the resulting DFA.
Which of the following sets of NFA states is not a state of the resulting DFA?
- {A, C}
- {B, C}
- {A}
- {B}

Transition table:

\[ \begin{array}{|rl||c|c|} \hline && 0 & 1 \\ \hline\hline \to & A & \{A\} & \{B\}\\ & B & \{A, C\} & \emptyset\\ * & C & ....... & .......\\ \hline \end{array} \]

Conversion into an equivalent DFA using the subset construction method.

\[ \begin{array}{|rc||c|c|} \hline && 0 & 1 \\ \hline\hline \to & \{A\} & ....... & ....... \\ & \{B\} & \{A, C\} & \emptyset\\ * & \{A, C\} & ....... & ....... \\ & \emptyset & \emptyset & ....... \\ & ....... & ....... & ....... \\ \hline \end{array} \]

State diagram of the resulting DFA.
Which of those sets is not in your table after performing the subset construction method?

Q2¶

Q2

QHint

Use the subset construction method to convert the following NFA to an equivalent DFA.

\[ \begin{array}{|rl||c|c|} \hline && 0 & 1 \\ \hline \hline \to & A & \{A, B\} & \{C\} \\ & B & \{D\} & \{B\} \\ & C & \{C\} & \{E\} \\ * & D & \emptyset & \{D\} \\ * & E & \emptyset & \{E\} \\ \hline \end{array} \]

NFA diagram

The resulting DFA has 13 states, 8 of which are accepting states.

\[ \begin{array}{|rr||c|c|} \hline & & 0 & 1 \\ \hline\hline \to & \{A\} & ....... & ....... \\ & \{A,B\} & ....... & ....... \\ & \{C\} & ....... & ....... \\ * & \{A,B, D\} & ....... & ....... \\ & \{B,C\} & ....... & ....... \\ * & \{ E\} & ....... & ....... \\ * & \{B,C, D\} & ....... & ....... \\ * & \{C, D\} & ....... & ....... \\ * & \{B, E\} & ....... & ....... \\ & \emptyset & \emptyset & \emptyset \\ * & \{B, D, E\} & ....... & ....... \\ * & \{ D, E\} & ....... & ....... \\ * & \{ D\} & ....... & ....... \\ \hline \end{array} \]

Q3¶

FAs for strings ending in aaa

QHint

Design an NFA that accepts strings over $\{\texttt{a},\texttt{b}\}$ which end with aaa, then convert it to an equivalent DFA.

NFA diagram

Q4¶

FAs for strings with 1 in the second position from the end

QHint

Design an NFA that accepts strings over $\{0,1\}$ which have 1 in the second position from the end (e.g. 0010, 1011,10, etc.), then convert it to an equivalent DFA.

NFA diagram

Green (3b): Regular Expressions¶

Q1¶

Making sense of RegEx's

QHintsVideo

Complete the descriptions of the following regular expressions (write in the shaded boxes). Assume the alphabet $\Sigma=\{0,1\}$ in all the parts.

Operators precendence

Recall that, unless brackets are used to explicitly denote precedence, the operators precendence for the regular operations is: star, concatenation, then union.

$01+ 10 =\{\_\_\_,\_\_\_\}$
$({\varepsilon}+ 0)({\varepsilon}+ 1)=\{{\varepsilon},0,1,\_\_\_\}$
$(0+ \varepsilon)1^* = 01^* + 1^* = \{w\mid w \text{ has at most ___ and is at the start of $w$}\}$
$\Sigma^*0 = \{w\mid w \text{ ends with a ___} \} = \{w\mid w \text{ respresents an ___ number in binary}\}$
$0^*10^* = \{w\mid w \text{ contains a single ___ }\}$
$\Sigma^*0\Sigma^*=\{w\mid w \text{ has at least one ___} \}$
$\Sigma^*001\Sigma^*=\{w\mid w \text{ contains the string ___ as a substring}\}$
$\Sigma^*000^*\Sigma^*=\{w\mid w \text{ cotains at least ___ consective ___'s}\}$
$(011^*)^* = \{w\mid \text{ every ___ in $w$ is followed by at least one ___}\}$
$\Sigma\Sigma + \Sigma\Sigma\Sigma = \Sigma\Sigma({\varepsilon}+\Sigma) = \{w\mid \text{ the length of $w$ is exactly ___ or ___}\}$
$(\Sigma\Sigma)^*=\{w\mid w \text{ is a string of ___ length}\}$
$(\Sigma\Sigma\Sigma)^*=\{w\mid \text{ the length of $w$ is a multiple of ___}\}$
$0\Sigma^*0+ 1\Sigma^*1+ 0 + 1=\{w\mid w \text{ starts and ends with the ___ symbol}\}$

In what follows:

the dots "......." below are an informal notation to indicate omitted parts of a string.
the square $\square$ is an informal notation for "any symbol".

These are used to help you "see" the patterns encoded by the RegEx's.

$01+10$, i.e. first string $01$ or second string $10$ only.
$({\varepsilon}+ 0)({\varepsilon}+ 1)$. Select one from $\{\varepsilon, 0\}$ then concatenate it (glue it) to one from $\{\varepsilon, 1\}$.
$(0+ {\varepsilon})1^*=01^*+ 1^*$. Think: $\{01.......1, 1.......1\}$
$\Sigma^*0$. Think: $......0$
$0^*10^*$. Think: $0.......010.......0$
$\Sigma^*0\Sigma^*$. Think: $.......0.......$
$\Sigma^*001\Sigma^*$. Think: $.......001.......$
$\Sigma^*000^*\Sigma^*$. Think: $.......00...0.......$
$(011^*)^*=(011.......1)^*$$. Think: "011.......1" repeated, i.e. 011.......1011.....1...................011.....1
$\Sigma\Sigma + \Sigma\Sigma\Sigma$. Two symbols $\square\square$ or three symbols $\square\square\square\}$ only.
$(\Sigma\Sigma)^*$. $\varepsilon$ or $\square\square$ or $\square\square\square\square$ or $\square\square\square\square\square\square$ etc. The lengths are: 0,2,4,6,...
$(\Sigma\Sigma\Sigma)^*$. $\varepsilon$ or $\square\square\square$ or $\square\square\square\square\square\square$ or $\square\square\square\square\square\square\square\square\square$ etc. The lengths are: 0,3,6,9,...
$0\Sigma^*0+ 1\Sigma^*1+ 0 + 1$. Think $\{0.......0, 1.......1, 0, 1\}$.

Q2¶

RegEx's and ε-NFAs

QHintsVideo

Produce a regular expression for the following languages over the alphabet $\{\texttt{a}, \texttt{b}\}$

The language $L_\texttt{a}$ of all strings that start with a.
The language $L_\texttt{b}$ of all strings that end with b.
The union $L_\texttt{a}\cup L_\texttt{b}$.
The concatenation $L_\texttt{a}L_\texttt{b}$.
$L=(L_\texttt{a}\cup L_\texttt{b})L_\texttt{a}L_\texttt{b}$.
The star closure of $L$: $L^*$.

Produce ${\varepsilon}$-NFAs for each of the above using the constructions shown in the lecture for the union, concatenation, and star.

$L_\texttt{a}$. Think: $\texttt{a}.......$
$L_\texttt{b}$. Think: $.......\texttt{b}$
$L_\texttt{a}\cup L_\texttt{b}$. Think: $\texttt{a}.......$ or $.......\texttt{b}$
$L_\texttt{a}L_\texttt{b}$. Think: $\texttt{a}..............\texttt{b}$
$L$. Think: $(\texttt{a}....... + .......\texttt{b})\texttt{a}.......\texttt{b}$
$L^*=\left((L_\texttt{a}\cup L_\texttt{b})L_\texttt{a}L_\texttt{b}\right)^*$. Think: $\Big((\texttt{a}....... + .......\texttt{b})\texttt{a}.......\texttt{b}\Big)^*$

Q3¶

A second look at RegEx's

QExamplesVideo

For each of the following RegEx's, give two strings that are members of the corresponding language, and two strings that are not. (A total of 4 strings for each part.)

Assume the alphabet $\Sigma=\{a,b\}$ in all the parts.

$a^*b^*$
$a(ba)^*b$
$a^*+ b^*$
$(aaa)^*$
$\Sigma^*a\Sigma^*b\Sigma^*a\Sigma^*$
$aba+ bab$
$({\varepsilon}+ a)b$
$(a+ ba+ bb)\Sigma^*$

RegEx	Examples	Non examples
$a^b^$	a, b	ba, aba.
$a(ba)^*b$	ab, abab	aab, aabb
$a^+ b^$	${\varepsilon}$, a	ab, ba
$(aaa)^*$	${\varepsilon}$, aaa	a, aa
$\Sigma^a\Sigma^b\Sigma^a\Sigma^$	aba, aaba	a, ab
$aba+ bab$	aba, bab	a, ab
$({\varepsilon}+ a)b = b + ab$	b, ab	a, ba
$(a+ ba+ bb)\Sigma^*$	a, ba	${\varepsilon}$, b

Q4¶

Designing RegEx's

QHintsVideo

Give regular expressions generating the languages below over $\Sigma=\{0,1\}$

$\{w\mid w \text{ begins with 1 and ends with a 0} \}$
$\{w\mid w \text{ contains at least three 1's} \}$
$\{w\mid w \text{ contains the substring 0101} \}$
$\{w\mid w \text{ has length at least 3 and its third symbol is 0} \}$
$\{w\mid w \text{ starts with 0 and has odd length, or starts with 1 and has even length} \}$
$\{w\mid w \text{ does not contain the substring 110} \}$
$\{w\mid \text{ the length of $w$ is at most 5} \}$
$\{w\mid w \text{ is any string except 11 and 111} \}$
$\{w\mid \text{ every odd position of $w$ is 1} \}$
$\{w\mid w \text{ contains at least two 0's and at most one 1} \}$
$\{{\varepsilon}, 0\}$
$\{w\mid w \text{ contains an even number of 0's, or contains exactly two 1's} \}$
The empty set.
All strings except the empty string.

Please note that multiple solutions are possible. If yours looks different then check if they are equivalent.

$1 ....... 0$
$....... 1 ....... 1 ....... 1 .......$
$....... 0101 ....... $
$..0 ....... $
The first few cases for the odd length strings are:

\[0, 0(\Sigma\Sigma), 0(\Sigma\Sigma)(\Sigma\Sigma), \ldots\]

and the first few cases for the even length strings are:

\[1\Sigma, 1\Sigma(\Sigma\Sigma), 1\Sigma(\Sigma\Sigma)(\Sigma\Sigma), \ldots\]

From these two case we infer the general RegEx by taking their union.
This is challenging -- You can create a DFA first then use the GNFA algorithm to get the required RegEx.

Once you have the expression, can you see why it works?

Another way to get to the solution is as follows:

If {\tt 110} is not a substring of a string $w$ then there are no consecutive {\tt 1}'s other than possibly at the end of $w$.

So $w$ can be written as $w=u\ell$ where $u$ has no consecutive {\tt 1}'s and $\ell$ is made exclusively of zero or more {\tt 1}'s.

$u$ can be taken to be $\tt (0 + 10)^*$ or $\tt 0^*(100^*)^*$ (or any other equivalent RegEx), while $\ell={\tt 1^*}$.

Reflection:

$(0 + 10)^*$ gives you two types of "bricks" (0 and 10) to build your string by concatenation (gluing of the bricks), and these will never produce $11$. Once you have $11$ then you are not allowed any 0's, hence the part: $1^*$.

Tip

Just because you may have found this difficult it does not mean the rest are even harder; in fact the next one is straight forward!
Strings of lengths: $0,1,2,3,4,5$.

Informally, $\varepsilon$ or $\square$ or $\square\square$ or $\square\square\square$ or $\square\square\square\square$ or $\square\square\square\square\square$.
This is obtained by listing all the acceptable strings of length $\leq 3$ other than $11$ and $111$, then adding the option for strings of length $\geq 4$.

${\varepsilon}+ 0+1 + \_\_+\_\_+\_\_ + \_\_\_+\_\_\_+\_\_\_+\_\_\_+\_\_\_+\_\_\_+\_\_\_ + \Sigma^4\Sigma^*$.
Think: ${\varepsilon}, 1, 1\square, 1\square1, 1\square1\square, 1\square1\square1, 1\square1\square1,\square, ...$
The condition "at most one 1" means we have two cases:
- No 1 at all. This gives a string of 0s only, and since we must have "at least two 0s" then the RegEx is: $000^*$ (or any equivalent).
- Exactly one 1. We must have "at least two 0s", and these can either:
  - both be before this 1: $001$
  - both be after this 1: $100$
  - be around this 1: $010$
  These three possibilities give us $001 + 010 + 100$, and then accounting for any other 0s we get the Regex: $0^*(001 + 010 + 100)0^*$
Simple enumeration of the finite set $\{\varepsilon, 0\}$.
The language is a union because of the "or" in the condition:

\[\{w\mid w \text{ contains an even number of 0's} \} \;{\textcolor{red}{\cup}}\; \{w\mid w \text{ contains exactly two 1's} \}\]

Hint: $(\ldots \square\ldots \square\ldots)^*$, where "..." contains no $\square$s, produces strings with 0 or 2 or 4 or 6 etc. $\square$s.
One of the three base cases in the definition of regular expressions.
Think: $\Sigma.......$

This is also written as $\Sigma^+$ as a shorthand (i.e. strings of length $\geq 1$).

Be careful: that is a "superscript plus", not the "union plus"; e.g. ${1^++0^*}$.

Green (3c): NFA to GNFA to RegEx¶

Q1¶

GNFA algorithm

QReminder: GNFA AlgorithmHintsVideo

Use the GNFA algorithm to find regular expressions for the languages recognized by the following NFAs.

Can you interpret the results?

NFA diagram

We can convert any NFA into a GNFA as follows:

Add a new start state with an ${\varepsilon}$-transition to the NFA's start state.
Add a new accept state with ${\varepsilon}$-transitions from the NFA's accept states.
If a transition has multiple labels then replace them with their union. (e.g. $a,b\to a+ b$.)

Once the GNFA is produced, start removing states, one at a time, and "patch" any affected transitions using regular expressions (RegEx's). Repeat until only two states (initial and accept) remain. The RegEx on the only remaining transition is the equivalent RegEx to the NFA.

Convert to GNFA, then remove A, then B.

GNFA diagram

etc.

Convert to GNFA. Remove B, C, A, D.

GNFA diagram

etc.

Convert to GNFA. Remove D, A, B, S.

GNFA diagram

etc.

Convert to GNFA. Remove B, C, A, D.

GNFA diagram

etc.

Convert to GNFA. Remove 2, 3, 4, 5, 1, 0.

GNFA diagram

etc.

Q2¶

RegEx's for similar NFAs

QHintsVideo

Give RegEx's for the languages recognized by the following similar NFAs, using the GNFA algorithm. What do you notice?

NFA diagram

The RegEx for an NFA, whose accepting states include the accepting states of another, also includes its RegEx as a sub-expression.

1 accepting state: $( \_\_ + \_\_ ) ( \_\_ + \_\_ )^*$. Call this $R$.

3 accepting states: $R ( 1 + 0 + \varepsilon) + 1 + 0$

4 accepting states: $R ( 1 + 0 + \varepsilon) + 1 + 0 + \varepsilon$

If all the states of an NFA are accepting then it does not necessarily mean it accepts all possible strings.

Q3¶

Cycles

Q3HintsVideo

Let $L_n$ be the language of all strings over $\Sigma=\{1\}$ that have length a multiple of $n$, where $n$ is a natural number (i.e. $n\in\mathbb{N}=\{1,2,3,\ldots\}$).

Design an NFA to recognize $L_3$, and another to recognize $L_5$.
Write down RegEx's for $L_3$ and $L_5$, then for their union $L_3\cup L_5$.
Construct the ${\varepsilon}$-NFA that recognizes $L_3\cup L_5$.
Use the GNFA algorithm to obtain a RegEx for $L_3\cup L_5$.

Think of cycles/circles, and one accept state which is also the start state.
$L_3\colon (___)^* $

$L_5\colon (\_\_\_\_\_)^*$

$L_3\cup L_5\colon (\_\_\_)^* + (\_\_\_\_\_)^*$.
Add a new start state. Connect it (with empty transitions) to the two start states of the $L_3$ and $L_5$ automata.
Follow the the GNFA algorithm steps.

Orange¶

Q1¶

Strings whose length is a multiple of a fixed number

Q1Hints

Let $B_n = \{\texttt{a}^{m}\mid m \text{ is a multiple of $n$}\} = \{\texttt{a}^{kn}\mid k\in\mathbb{Z}_{\geq 0}\}$ over the alphabet $\Sigma=\{\texttt{a}\}$.

Show that the language $B_n$ is regular for any $n\in\mathbb{N}$ by writing a regular expression for it.

Outline the description of an NFA that can recognize it.

RegEx: ${\texttt{a}\ldots\texttt{a}}$ ($n$ symbols) repeated zero or more times (star).

The corresponding NFA is a generalization of the case for $L_3$ and $L_5$ above. It would have $n$ states in a circular shape, with the start state being the only accepting state.

Q2¶

Closure of regular languages under reversal of strings

QHint

For any string $s=\texttt{s}_1\texttt{s}_2\ldots \texttt{s}_n$, where $\texttt{s}_i$ are symbols from the alphabet, the reverse of $s$ is the string $s$ written in reverse order: $s^R=\texttt{s}_n\texttt{s}_{n-1}\ldots \texttt{s}_1$.

Given an NFA or RegEx that recognizes a language $A$, describe how you can transform this NFA/RegEx to recognize the language $A^R=\{w^R\mid w\in A\}$, i.e. the language that contains all the strings from $A$ but in the reverse order.

Basic idea: reverse the arrows in the state diagram, but need to address the case with many accepting states... etc.

Test your ideas on the languages given by the RegEx's: ($\Sigma=\{a,b\}$)

\[a, b, aa, ab, aa+bb,ab+bb, a^*b^*, \Sigma^*a, a\Sigma^*, ab^*a^*b, (ab)^*, (aa+bb)^*, (ab+bb)^*.\]

Q3¶

Subset construction method applied to an e-NFA

QHint

Convert the following ${\varepsilon}$-NFA to an equivalent DFA.

e-NFA diagram

Take extra care with the epsilon transition between 1 and 3. Every time you make it to 1, you "slide" to 3 too, so you are in two places at the same time. You may informally think of these two states as one state "1-3".

Q4¶

Technical cases with RegEx's

QHints

Show that

$1^*\emptyset=\emptyset$

(Concatenating the empty RegEx $\emptyset$ to any RegEx yields the empty RegEx again)
$\emptyset^*=\{{\varepsilon}\}$

You may find it helpful to construct the corresponding ${\varepsilon}$-NFAs.

Check the lecture slides.

Q5¶

This seems impossible at a first look...!

QHint

Let $\Sigma=\{0,1\}$ and let

\[D = \{w\mid w \text{ contains an equal number of the occurrences of the substrings 01 and 10}\}.\]

As an example, $101\in D$ but $1010\not\in D$.

Show that $D$ is a regular language (by producing an NFA for it, or otherwise).

Does this hold for $\{w\mid w \text{ contains an equal number of 0's and 1's}\}$?

Can you see why? What is the difference!?

How about the language $\{w\mid w \text{ contains a non-equal number of 0's and 1's}\}$?

We will see next week that the language defined by $a^n b^n$ for $n\geq 0$ is not regular. The language $D$ seems to defy this and suggests that if we set $a=01$ and $b=10$ then we can build an NFA for it... the trick is that a string like $101$ is in $D$ because overlap is allowed, while this particular string cannot be decoded using $a=01$ and $b=10$.

Optional¶

Q1¶

Q1

Regular Expressions in practiceN.B.

Suppose we have a programming language where comments are delimited by @= and =@. Let $L$ be the language of all valid delimited comment strings, i.e. a member of $L$ must begin with @= and end with =@.

Use the page at https://regex101.com/r/Ez1kqp/3 and try the following RegEx searches:

Programming	Mathematical notation	Interpreation
@	@	Just the symbol @
@=	@=	Just the string @=
.	$\Sigma$	Any symbol from the alphabet
.*	$\Sigma^*$	Any string over the alphabet
@.*	@$\Sigma^*$	............................
@.\|.@	@$\Sigma^$+$\Sigma^$@	............................
@.*@	@$\Sigma^*$@	............................
@=.*=@	@=$\Sigma^*$=@	............................

Interpret the results for the last 4 searches. Try alternative searches to develop your understanding of how RegEx is used in practice. What is the correct RegEx for $L$?

Please note that the regular expressions used in programming languages are more general than RegEx's defined for Regular Languages.

See for example https://en.wikipedia.org/wiki/RE2_(software)

Q2¶

Q2

Extend your class for simulating DFAs and NFAs from the last lab to convert a given NFA into an equivalent DFA or to a RegEx.

RegEx	Examples	Non examples
\(a^b^\)	a, b	ba, aba.
\(a(ba)^*b\)	ab, abab	aab, aabb
\(a^+ b^\)	\({\varepsilon}\), a	ab, ba
\((aaa)^*\)	\({\varepsilon}\), aaa	a, aa
\(\Sigma^a\Sigma^b\Sigma^a\Sigma^\)	aba, aaba	a, ab
\(aba+ bab\)	aba, bab	a, ab
\(({\varepsilon}+ a)b = b + ab\)	b, ab	a, ba
\((a+ ba+ bb)\Sigma^*\)	a, ba	\({\varepsilon}\), b

Programming	Mathematical notation	Interpreation
@	@	Just the symbol @
@=	@=	Just the string @=
.	\(\Sigma\)	Any symbol from the alphabet
.*	\(\Sigma^*\)	Any string over the alphabet
@.*	@\(\Sigma^*\)	............................
@.\|.@	@\(\Sigma^\)+\(\Sigma^\)@	............................
@.*@	@\(\Sigma^*\)@	............................
@=.*=@	@=\(\Sigma^*\)=@	............................