Hints¶
Green (3a): Converting NFAs to DFAs¶
Q1¶
Q1
Consider the following NFA.
- What is its transition table?
- Use the subset construction method to convert it to an equivalent DFA.
- Draw the state diagram of the resulting DFA.
-
Which of the following sets of NFA states is not a state of the resulting DFA?
- {A, C}
- {B, C}
- {A}
- {B}
- Transition table:
- Conversion into an equivalent DFA using the subset construction method.
-
State diagram of the resulting DFA.
-
Which of those sets is not in your table after performing the subset construction method?
Q2¶
Q2
Use the subset construction method to convert the following NFA to an equivalent DFA.
The resulting DFA has 13 states, 8 of which are accepting states.
Q3¶
FAs for strings ending in aaa
Design an NFA that accepts strings over \(\{\texttt{a},\texttt{b}\}\)
which end with aaa
, then convert it to an equivalent DFA.
Q4¶
FAs for strings with 1
in the second position from the end
Design an NFA that accepts strings over \(\{0,1\}\)
which have 1
in the second position from the end (e.g. 0010
,
1011
,10
, etc.), then convert it to an equivalent DFA.
Green (3b): Regular Expressions¶
Q1¶
Making sense of RegEx's
Complete the descriptions of the following regular expressions (write in the shaded boxes). Assume the alphabet \(\Sigma=\{0,1\}\) in all the parts.
Operators precendence
Recall that, unless brackets are used to explicitly denote precedence, the operators precendence for the regular operations is: star, concatenation, then union.
- \(01+ 10 =\{\_\_\_,\_\_\_\}\)
- \(({\varepsilon}+ 0)({\varepsilon}+ 1)=\{{\varepsilon},0,1,\_\_\_\}\)
- \((0+ \varepsilon)1^* = 01^* + 1^* = \{w\mid w \text{ has at most ___ and is at the start of $w$}\}\)
- \(\Sigma^*0 = \{w\mid w \text{ ends with a ___} \} = \{w\mid w \text{ respresents an ___ number in binary}\}\)
- \(0^*10^* = \{w\mid w \text{ contains a single ___ }\}\)
- \(\Sigma^*0\Sigma^*=\{w\mid w \text{ has at least one ___} \}\)
- \(\Sigma^*001\Sigma^*=\{w\mid w \text{ contains the string ___ as a substring}\}\)
- \(\Sigma^*000^*\Sigma^*=\{w\mid w \text{ cotains at least ___ consective ___'s}\}\)
- \((011^*)^* = \{w\mid \text{ every ___ in $w$ is followed by at least one ___}\}\)
- \(\Sigma\Sigma + \Sigma\Sigma\Sigma = \Sigma\Sigma({\varepsilon}+\Sigma) = \{w\mid \text{ the length of $w$ is exactly ___ or ___}\}\)
- \((\Sigma\Sigma)^*=\{w\mid w \text{ is a string of ___ length}\}\)
- \((\Sigma\Sigma\Sigma)^*=\{w\mid \text{ the length of $w$ is a multiple of ___}\}\)
- \(0\Sigma^*0+ 1\Sigma^*1+ 0 + 1=\{w\mid w \text{ starts and ends with the ___ symbol}\}\)
In what follows:
- the dots "......." below are an informal notation to indicate omitted parts of a string.
- the square \(\square\) is an informal notation for "any symbol".
These are used to help you "see" the patterns encoded by the RegEx's.
- \(01+10\), i.e. first string \(01\) or second string \(10\) only.
- \(({\varepsilon}+ 0)({\varepsilon}+ 1)\). Select one from \(\{\varepsilon, 0\}\) then concatenate it (glue it) to one from \(\{\varepsilon, 1\}\).
- \((0+ {\varepsilon})1^*=01^*+ 1^*\). Think: \(\{01.......1, 1.......1\}\)
- \(\Sigma^*0\). Think: \(......0\)
- \(0^*10^*\). Think: \(0.......010.......0\)
- \(\Sigma^*0\Sigma^*\). Think: \(.......0.......\)
- \(\Sigma^*001\Sigma^*\). Think: \(.......001.......\)
- \(\Sigma^*000^*\Sigma^*\). Think: \(.......00...0.......\)
- \((011^*)^*=(011.......1)^*\)$. Think: "011.......1" repeated, i.e. 011.......1011.....1...................011.....1
- \(\Sigma\Sigma + \Sigma\Sigma\Sigma\). Two symbols \(\square\square\) or three symbols \(\square\square\square\}\) only.
- \((\Sigma\Sigma)^*\). \(\varepsilon\) or \(\square\square\) or \(\square\square\square\square\) or \(\square\square\square\square\square\square\) etc. The lengths are: 0,2,4,6,...
- \((\Sigma\Sigma\Sigma)^*\). \(\varepsilon\) or \(\square\square\square\) or \(\square\square\square\square\square\square\) or \(\square\square\square\square\square\square\square\square\square\) etc. The lengths are: 0,3,6,9,...
- \(0\Sigma^*0+ 1\Sigma^*1+ 0 + 1\). Think \(\{0.......0, 1.......1, 0, 1\}\).
Q2¶
RegEx's and ε-NFAs
Produce a regular expression for the following languages over the alphabet \(\{\texttt{a}, \texttt{b}\}\)
-
The language \(L_\texttt{a}\) of all strings that start with
a
. -
The language \(L_\texttt{b}\) of all strings that end with
b
. -
The union \(L_\texttt{a}\cup L_\texttt{b}\).
-
The concatenation \(L_\texttt{a}L_\texttt{b}\).
-
\(L=(L_\texttt{a}\cup L_\texttt{b})L_\texttt{a}L_\texttt{b}\).
-
The star closure of \(L\): \(L^*\).
Produce \({\varepsilon}\)-NFAs for each of the above using the constructions shown in the lecture for the union, concatenation, and star.
-
\(L_\texttt{a}\). Think: \(\texttt{a}.......\)
-
\(L_\texttt{b}\). Think: \(.......\texttt{b}\)
-
\(L_\texttt{a}\cup L_\texttt{b}\). Think: \(\texttt{a}.......\) or \(.......\texttt{b}\)
-
\(L_\texttt{a}L_\texttt{b}\). Think: \(\texttt{a}..............\texttt{b}\)
-
\(L\). Think: \((\texttt{a}....... + .......\texttt{b})\texttt{a}.......\texttt{b}\)
-
\(L^*=\left((L_\texttt{a}\cup L_\texttt{b})L_\texttt{a}L_\texttt{b}\right)^*\). Think: \(\Big((\texttt{a}....... + .......\texttt{b})\texttt{a}.......\texttt{b}\Big)^*\)
Q3¶
A second look at RegEx's
For each of the following RegEx's, give two strings that are members of the corresponding language, and two strings that are not. (A total of 4 strings for each part.)
Assume the alphabet \(\Sigma=\{a,b\}\) in all the parts.
- \(a^*b^*\)
- \(a(ba)^*b\)
- \(a^*+ b^*\)
- \((aaa)^*\)
- \(\Sigma^*a\Sigma^*b\Sigma^*a\Sigma^*\)
- \(aba+ bab\)
- \(({\varepsilon}+ a)b\)
- \((a+ ba+ bb)\Sigma^*\)
RegEx | Examples | Non examples |
---|---|---|
\(a^*b^*\) | a, b | ba, aba. |
\(a(ba)^*b\) | ab, abab | aab, aabb |
\(a^*+ b^*\) | \({\varepsilon}\), a | ab, ba |
\((aaa)^*\) | \({\varepsilon}\), aaa | a, aa |
\(\Sigma^*a\Sigma^*b\Sigma^*a\Sigma^*\) | aba, aaba | a, ab |
\(aba+ bab\) | aba, bab | a, ab |
\(({\varepsilon}+ a)b = b + ab\) | b, ab | a, ba |
\((a+ ba+ bb)\Sigma^*\) | a, ba | \({\varepsilon}\), b |
Q4¶
Designing RegEx's
Give regular expressions generating the languages below over \(\Sigma=\{0,1\}\)
- \(\{w\mid w \text{ begins with 1 and ends with a 0} \}\)
- \(\{w\mid w \text{ contains at least three 1's} \}\)
- \(\{w\mid w \text{ contains the substring 0101} \}\)
- \(\{w\mid w \text{ has length at least 3 and its third symbol is 0} \}\)
- \(\{w\mid w \text{ starts with 0 and has odd length, or starts with 1 and has even length} \}\)
- \(\{w\mid w \text{ does not contain the substring 110} \}\)
- \(\{w\mid \text{ the length of $w$ is at most 5} \}\)
- \(\{w\mid w \text{ is any string except 11 and 111} \}\)
- \(\{w\mid \text{ every odd position of $w$ is 1} \}\)
- \(\{w\mid w \text{ contains at least two 0's and at most one 1} \}\)
- \(\{{\varepsilon}, 0\}\)
- \(\{w\mid w \text{ contains an even number of 0's, or contains exactly two 1's} \}\)
- The empty set.
- All strings except the empty string.
Please note that multiple solutions are possible. If yours looks different then check if they are equivalent.
- \(1 ....... 0\)
- \(....... 1 ....... 1 ....... 1 .......\)
- $....... 0101 ....... $
- $..0 ....... $
-
The first few cases for the odd length strings are:
\[0, 0(\Sigma\Sigma), 0(\Sigma\Sigma)(\Sigma\Sigma), \ldots\]and the first few cases for the even length strings are:
\[1\Sigma, 1\Sigma(\Sigma\Sigma), 1\Sigma(\Sigma\Sigma)(\Sigma\Sigma), \ldots\]From these two case we infer the general RegEx by taking their union.
-
This is challenging -- You can create a DFA first then use the GNFA algorithm to get the required RegEx.
Once you have the expression, can you see why it works?
Another way to get to the solution is as follows:
If {\tt 110} is not a substring of a string \(w\) then there are no consecutive {\tt 1}'s other than possibly at the end of \(w\).
So \(w\) can be written as \(w=u\ell\) where \(u\) has no consecutive {\tt 1}'s and \(\ell\) is made exclusively of zero or more {\tt 1}'s.
\(u\) can be taken to be \(\tt (0 + 10)^*\) or \(\tt 0^*(100^*)^*\) (or any other equivalent RegEx), while \(\ell={\tt 1^*}\).
Reflection:
\((0 + 10)^*\) gives you two types of "bricks" (
0
and10
) to build your string by concatenation (gluing of the bricks), and these will never produce \(11\). Once you have \(11\) then you are not allowed any0
's, hence the part: \(1^*\).Tip
Just because you may have found this difficult it does not mean the rest are even harder; in fact the next one is straight forward!
-
Strings of lengths: \(0,1,2,3,4,5\).
Informally, \(\varepsilon\) or \(\square\) or \(\square\square\) or \(\square\square\square\) or \(\square\square\square\square\) or \(\square\square\square\square\square\).
-
This is obtained by listing all the acceptable strings of length \(\leq 3\) other than \(11\) and \(111\), then adding the option for strings of length \(\geq 4\).
\({\varepsilon}+ 0+1 + \_\_+\_\_+\_\_ + \_\_\_+\_\_\_+\_\_\_+\_\_\_+\_\_\_+\_\_\_+\_\_\_ + \Sigma^4\Sigma^*\).
-
Think: \({\varepsilon}, 1, 1\square, 1\square1, 1\square1\square, 1\square1\square1, 1\square1\square1,\square, ...\)
-
The condition "at most one
1
" means we have two cases:-
No
1
at all. This gives a string of0
s only, and since we must have "at least two0
s" then the RegEx is: \(000^*\) (or any equivalent). -
Exactly one
1
. We must have "at least two0
s", and these can either:- both be before this
1
: \(001\) - both be after this
1
: \(100\) - be around this
1
: \(010\)
These three possibilities give us \(001 + 010 + 100\), and then accounting for any other
0
s we get the Regex: \(0^*(001 + 010 + 100)0^*\) - both be before this
-
-
Simple enumeration of the finite set \(\{\varepsilon, 0\}\).
-
The language is a union because of the "or" in the condition:
\[\{w\mid w \text{ contains an even number of 0's} \} \;{\textcolor{red}{\cup}}\; \{w\mid w \text{ contains exactly two 1's} \}\]Hint: \((\ldots \square\ldots \square\ldots)^*\), where "..." contains no \(\square\)s, produces strings with 0 or 2 or 4 or 6 etc. \(\square\)s.
-
One of the three base cases in the definition of regular expressions.
-
Think: \(\Sigma.......\)
This is also written as \(\Sigma^+\) as a shorthand (i.e. strings of length \(\geq 1\)).
Be careful: that is a "superscript plus", not the "union plus"; e.g. \({1^++0^*}\).
Green (3c): NFA to GNFA to RegEx¶
Q1¶
GNFA algorithm
Use the GNFA algorithm to find regular expressions for the languages recognized by the following NFAs.
Can you interpret the results?
We can convert any NFA into a GNFA as follows:
- Add a new start state with an \({\varepsilon}\)-transition to the NFA's start state.
- Add a new accept state with \({\varepsilon}\)-transitions from the NFA's accept states.
- If a transition has multiple labels then replace them with their union. (e.g. \(a,b\to a+ b\).)
Once the GNFA is produced, start removing states, one at a time, and "patch" any affected transitions using regular expressions (RegEx's). Repeat until only two states (initial and accept) remain. The RegEx on the only remaining transition is the equivalent RegEx to the NFA.
Convert to GNFA, then remove A, then B.
etc.
Convert to GNFA. Remove B, C, A, D.
etc.
Convert to GNFA. Remove D, A, B, S.
etc.
Convert to GNFA. Remove B, C, A, D.
etc.
Convert to GNFA. Remove 2, 3, 4, 5, 1, 0.
etc.
Q2¶
RegEx's for similar NFAs
Give RegEx's for the languages recognized by the following similar NFAs, using the GNFA algorithm. What do you notice?
The RegEx for an NFA, whose accepting states include the accepting states of another, also includes its RegEx as a sub-expression.
1 accepting state: \(( \_\_ + \_\_ ) ( \_\_ + \_\_ )^*\). Call this \(R\).
3 accepting states: \(R ( 1 + 0 + \varepsilon) + 1 + 0\)
4 accepting states: \(R ( 1 + 0 + \varepsilon) + 1 + 0 + \varepsilon\)
If all the states of an NFA are accepting then it does not necessarily mean it accepts all possible strings.
Q3¶
Cycles
Let \(L_n\) be the language of all strings over \(\Sigma=\{1\}\) that have length a multiple of \(n\), where \(n\) is a natural number (i.e. \(n\in\mathbb{N}=\{1,2,3,\ldots\}\)).
-
Design an NFA to recognize \(L_3\), and another to recognize \(L_5\).
-
Write down RegEx's for \(L_3\) and \(L_5\), then for their union \(L_3\cup L_5\).
-
Construct the \({\varepsilon}\)-NFA that recognizes \(L_3\cup L_5\).
-
Use the GNFA algorithm to obtain a RegEx for \(L_3\cup L_5\).
-
Think of cycles/circles, and one accept state which is also the start state.
-
$L_3\colon (___)^* $
\(L_5\colon (\_\_\_\_\_)^*\)
\(L_3\cup L_5\colon (\_\_\_)^* + (\_\_\_\_\_)^*\).
-
Add a new start state. Connect it (with empty transitions) to the two start states of the \(L_3\) and \(L_5\) automata.
-
Follow the the GNFA algorithm steps.
Orange¶
Q1¶
Strings whose length is a multiple of a fixed number
Let \(B_n = \{\texttt{a}^{m}\mid m \text{ is a multiple of $n$}\} = \{\texttt{a}^{kn}\mid k\in\mathbb{Z}_{\geq 0}\}\) over the alphabet \(\Sigma=\{\texttt{a}\}\).
Show that the language \(B_n\) is regular for any \(n\in\mathbb{N}\) by writing a regular expression for it.
Outline the description of an NFA that can recognize it.
RegEx: \({\texttt{a}\ldots\texttt{a}}\) (\(n\) symbols) repeated zero or more times (star).
The corresponding NFA is a generalization of the case for \(L_3\) and \(L_5\) above. It would have \(n\) states in a circular shape, with the start state being the only accepting state.
Q2¶
Closure of regular languages under reversal of strings
For any string \(s=\texttt{s}_1\texttt{s}_2\ldots \texttt{s}_n\), where \(\texttt{s}_i\) are symbols from the alphabet, the reverse of \(s\) is the string \(s\) written in reverse order: \(s^R=\texttt{s}_n\texttt{s}_{n-1}\ldots \texttt{s}_1\).
Given an NFA or RegEx that recognizes a language \(A\), describe how you can transform this NFA/RegEx to recognize the language \(A^R=\{w^R\mid w\in A\}\), i.e. the language that contains all the strings from \(A\) but in the reverse order.
Basic idea: reverse the arrows in the state diagram, but need to address the case with many accepting states... etc.
Test your ideas on the languages given by the RegEx's: (\(\Sigma=\{a,b\}\))
Q3¶
Subset construction method applied to an e-NFA
Convert the following \({\varepsilon}\)-NFA to an equivalent DFA.
Take extra care with the epsilon transition between 1 and 3. Every time you make it to 1, you "slide" to 3 too, so you are in two places at the same time. You may informally think of these two states as one state "1-3".
Q4¶
Technical cases with RegEx's
Show that
-
\(1^*\emptyset=\emptyset\)
(Concatenating the empty RegEx \(\emptyset\) to any RegEx yields the empty RegEx again)
-
\(\emptyset^*=\{{\varepsilon}\}\)
You may find it helpful to construct the corresponding \({\varepsilon}\)-NFAs.
Check the lecture slides.
Q5¶
This seems impossible at a first look...!
Let \(\Sigma=\{0,1\}\) and let
As an example, \(101\in D\) but \(1010\not\in D\).
Show that \(D\) is a regular language (by producing an NFA for it, or otherwise).
Does this hold for \(\{w\mid w \text{ contains an equal number of 0's and 1's}\}\)?
Can you see why? What is the difference!?
How about the language \(\{w\mid w \text{ contains a non-equal number of 0's and 1's}\}\)?
We will see next week that the language defined by \(a^n b^n\) for \(n\geq 0\) is not regular. The language \(D\) seems to defy this and suggests that if we set \(a=01\) and \(b=10\) then we can build an NFA for it... the trick is that a string like \(101\) is in \(D\) because overlap is allowed, while this particular string cannot be decoded using \(a=01\) and \(b=10\).
Optional¶
Q1¶
Q1
Suppose we have a programming language where comments are delimited
by @=
and =@
. Let \(L\) be the language of all valid delimited
comment strings, i.e. a member of \(L\) must begin with @=
and end
with =@
.
Use the page at https://regex101.com/r/Ez1kqp/3 and try the following RegEx searches:
Programming | Mathematical notation | Interpreation |
---|---|---|
@ | @ | Just the symbol @ |
@= | @= | Just the string @= |
. | \(\Sigma\) | Any symbol from the alphabet |
.* | \(\Sigma^*\) | Any string over the alphabet |
@.* | @\(\Sigma^*\) | ............................ |
@.*|.*@ | @\(\Sigma^*\)+\(\Sigma^*\)@ | ............................ |
@.*@ | @\(\Sigma^*\)@ | ............................ |
@=.*=@ | @=\(\Sigma^*\)=@ | ............................ |
Interpret the results for the last 4 searches. Try alternative searches to develop your understanding of how RegEx is used in practice. What is the correct RegEx for \(L\)?
Please note that the regular expressions used in programming languages are more general than RegEx's defined for Regular Languages.
See for example https://en.wikipedia.org/wiki/RE2_(software)
Q2¶
Q2
Extend your class for simulating DFAs and NFAs from the last lab to convert a given NFA into an equivalent DFA or to a RegEx.