c4f1655e0219e77f7d542df23db2681a.ppt
- Количество слайдов: 101
Discrete Maths 241 -303, Semester 1 2014 -2015 7. Automata and Regular Expressions • Recognising input using: – automata: a graph-based technique – regular expressions: an algebraic technique • equivalent to automata 241 -303 Discrete Maths: Automata/7 1
Overview. 1 Introduction to Automata. 2 Representing Automata. 3 The ‘aeiou’ Automaton. 4 Generating Output. 5 Bounce Filter Example. 6 Deterministic and Nondeterministic Automata 241 -303 Discrete Maths: Automata/7 continued 2
‘. 7 washington’ Partial Anagrams. 8 Regular Expressions. 9 UNIX Regular Expressions. 10 From REs to Automata. 11 More Information 241 -303 Discrete Maths: Automata/7 3
. 1 Introduction to Automata • A finite state automaton represents a problem as a series of states and transitions between the states – the automaton starts in an initial state – input causes a transition from the current state to another; – a state may be accepting • the automaton can terminate successfully when it enters an accepting state (if it wants to( 241 -303 Discrete Maths: Automata/7 4
. 1. 1 An Example The ‘even-odd’ Automaton b b start even. A a odd. A a • • The states are the ovals. The transitions are the arrows – labelled with the input that ‘trigger’ them • The ‘odd. A’ state is accepting. 241 -303 Discrete Maths: Automata/7 continued 5
Execution Sequence Input babaa Move to State even. A babaa odd. A babaa • even. A babaa 241 -303 Discrete Maths: Automata/7 odd. A initial state the automaton could choose to terminate here stops since no more input 6
. 1. 2 Why are Automata Useful? • Automata are a very good way of modeling finite-state systems which change state due to input. Examples: – text editors, compilers, UNIX tools like grep – communications protocols – digital hardware components • e. g. adders, RAM 241 -303 Discrete Maths: Automata/7 very different applications 7
. 2 Representing Automata • Automata have a mathematical basis which allows them to be analysed, e. g: . – prove that they accept correct input – prove that they do not accept incorrect input • Automata can be manipulated to simplify them, and they can be automatically converted into code. 241 -303 Discrete Maths: Automata/7 8
. 2. 1 A Mathematical Coding • We can represent an automaton in terms of sets and mathematical functions. • The ‘even-odd’ automaton is: start. Set = { even. A{ accept. Set = { odd. A{ next. State(even. A, b) => even. A next. State(even. A, a) => odd. A next. State(odd. A, b) => odd. A next. State(odd. A, a) => even. A 241 -303 Discrete Maths: Automata/7 continued 9
• Analysis of the mathematical form can show that the ‘even-odd’ automaton only accepts strings which: – contain an odd number of ‘a’s – e. g. • babaa 241 -303 Discrete Maths: Automata/7 abb abaab aabba aaaaba … 10
. 2. 2 Automaton in Code • It is easy to (automatically) translate an automaton into code, but. . . – an automaton graph does not contain all the details needed for a program • The main extra coding issues: – what to do when we enter an accepting state? – what to do when the input cannot be processed? • e. g. abzz 241 -303 Discrete Maths: Automata/7 is entered 11
Encoding the ‘even-odd’ Automaton enum state {even. A, odd. A}; // possible states enum state curr. State = even. A; // start state int is. Accepting = 0; // false int ch; while ((ch = getchar()) != EOF)) { curr. State = next. State(curr. State, ch); is. Accepting = acceptable(curr. State); } if (is. Accepting) accepting state printf(“acceptedn); only used at else end of input printf(“not acceptedn; (” 241 -303 Discrete Maths: Automata/7 continued 12
enum state next. State(enum state s, int ch) { if ((s == even. A) && (ch == ‘b’)) return even. A; if ((s == even. A) && (ch == ‘a’)) return odd. A; if ((s == odd. A) && (ch == ‘b’)) return odd. A; if ((s == odd. A) && (ch == ‘a’)) return even. A; printf(“Illegal Input”); exit(1; ( { 241 -303 Discrete Maths: Automata/7 simple handling of incorrect input continued 13
int acceptable(enum state s) { if (s == odd. A) return 1; // odd. A is an accepting state return 0; { 241 -303 Discrete Maths: Automata/7 14
. 3 The ‘aeiou’ Automaton • What English words contain the five vowels (a, e, i, o, u) in order? • Some words that match: – abstemious – facetious – sacrilegious 241 -303 Discrete Maths: Automata/7 15
. 3. 1 Automaton Graph L = all letters L-a start 0 L-e a 241 -303 Discrete Maths: Automata/7 1 L-i e 2 L-o i 3 L-u o 4 u 5 16
. 3. 2 Execution Sequence (1( • Input facetious Move to State 0 facetious 1 241 -303 Discrete Maths: Automata/7 continued 17
• Input facetious Move to State 2 facetious 3 facetious 4 facetious 241 -303 Discrete Maths: Automata/7 5 the automaton can terminate here; no need to process more input 18
Execution Sequence (2( • Input andrew Move to State 0 andrew 1 241 -303 Discrete Maths: Automata/7 continued 19
• Input andrew 241 -303 Discrete Maths: Automata/7 Move to State 1 2 , 2 and end of input means failure 20
. 3. 3 Translation to Code enum state {0, 1, 2, 3, 4, 5}; enum state curr. State = 0; int is. Accepting = 0; int ch; // poss. states // start state // false while ((ch = getchar()) != EOF) && !is. Accepting) { curr. State = next. State(curr. State, ch); is. Accepting = acceptable(curr. State); } if (is. Accepting) stop processing printf(“acceptedn); when the accepting else state is entered printf(“not acceptedn; (” 241 -303 Discrete Maths: Automata/7 continued 21
enum state next. State(enum state s, int ch) { if (s == 0) { if (ch == ‘a’) return 1; else return 0; // input is L-a } if (s == 1) { if (ch == ‘e’) return 2; else return 1; // input is L-e } if (s == 2) { if (ch == ‘i’) return 3; else return 2; // input is L-i { : 241 -303 Discrete Maths: Automata/7 continued 22
: if (s == 3) { if (ch == ‘o’) return 4; else return 3; // input is L-o } if (s == 4) { if (ch == ‘u’) return 5; else return 4; // input is L-u } printf(“Illegal Input”); exit(1); } // end of next. State() 241 -303 Discrete Maths: Automata/7 simple handling of incorrect input 23
int acceptable(enum state s) { if (s == 5) return 1; // 5 is an accepting state return 0; { 241 -303 Discrete Maths: Automata/7 24
. 4 Generating Output • One possible extension to the basic automaton idea is to allow output: – when a transition is ‘triggered’ there can be optional output as well • Automata which generate output are sometimes called Finite State Machines (FSMs. ( 241 -303 Discrete Maths: Automata/7 25
‘. 4. 1 even-odd’ with Output b b start even. A a/1 odd. A a • When the ‘a’ transition is triggered out of the even. A state, then a ‘ 1’ is output. 241 -303 Discrete Maths: Automata/7 26
. 4. 2 Mathematical Coding • Add an ‘output’ mathematical function to the automaton representation: output( even. A, a ) => 1 241 -303 Discrete Maths: Automata/7 27
. 4. 3 Extending the C Coding • The while loop for ‘even-odd’ will become: : while ((ch = getchar()) != EOF)) { output(curr. State, ch); curr. State = next. State(curr. State, ch); is. Accepting = acceptable(curr. State; ( { : 241 -303 Discrete Maths: Automata/7 continued 28
• The output() C function: void output(enum state s, int ch) { if ((s == even. A) && (ch == ‘a’)) putchar(‘ 1; (’ { 241 -303 Discrete Maths: Automata/7 29
. 5 Bounce Filter Example • A signal processing problem: – a stream of 1’s and 0’s are ‘smoothed’ by the filter so that: • • • a single 0 surrounded by 1’s becomes a 1: . . . 111101111. . . =>. . . 11111. . . a single 1 surrounded by 0’s becomes a 0. . . 000010000. . . =>. . . 00000. . . This kind of filtering is used in image processing to reduce ‘noise. ’ 241 -303 Discrete Maths: Automata/7 30
. 5. 1 The ‘bounce’ Automaton b 0/0 start 1/1 01/ a 1/1 smoothing 10/ c 0/0 241 -303 Discrete Maths: Automata/7 0/0 d 1/1 31
Notes • There is no accepting state – the code will simply terminate at EOF • The ‘a’ and ‘b’ states (left side) mostly have transitions that output ‘ 0’s. • The ‘c’ and ‘d’ states (right side) mostly have transitions that output ‘ 1’s. 241 -303 Discrete Maths: Automata/7 32
. 5. 2 Execution Sequence • Input 1011010 Move to State a Output 1011010 a 0 1011010 b 0 1011010 a 0 241 -303 Discrete Maths: Automata/7 continued 33
• Input 1011010 Move to State b Output 0 1011010 c 1 1011010 d 1 1011010 c 1 241 -303 Discrete Maths: Automata/7 moved to right hand side 34
. 5. 3 I/O Behaviour smoothed away in the output • Input: 0 1 1 0 1 Output: 0 0 1 1 1 • It takes 2 bits of the same type before the automaton realises that it has a new bit sequence rather than a ‘noise’ bit. 241 -303 Discrete Maths: Automata/7 35
. 6 Deterministic and Nondeterministic Automata a S w • We have been writing deterministic automata so far: – for an input read by a state there is at most one transition that can be fired • state ‘s’ can process input ‘a’ and ‘w’, and fails for anything else 241 -303 Discrete Maths: Automata/7 36
Nondeterministic Automata a x S x • • V T U A nondeterministic (ND) automaton can have 2 or more transitions with the same label leaving a state. Problem: if state S sees input ‘x’, then which transition should it use? 241 -303 Discrete Maths: Automata/7 37
. 6. 1 The ‘man’ Automaton • Accept all strings that contain “man” – this is hard to write as a deterministic automaton. The following has bugs: L-m start 0 m WRONG 1 a 2 n 3 L-a L-n 241 -303 Discrete Maths: Automata/7 continued 38
• The input string command will get stuck at state 0: 0 c 0 o 0 m 1 m 0 a 0 n 0 d 0 the problem starts here 241 -303 Discrete Maths: Automata/7 39
. 6. 2 A ND Automaton Solution L start • 0 m 1 a 2 n 3 It is nondeterministic because an ‘m’ input in state 0 can be dealt with by two transitions: – a transition back to state 0, or – a transition to state 1 241 -303 Discrete Maths: Automata/7 continued 40
• Processing command input: 0 c 0 o 0 0 0 m m 1 1 241 -303 Discrete Maths: Automata/7 m a a 0 2 n n 0 3 d 0 accepting state fail: reject the input 41
. 6. 3 Executing a ND Automata • It is difficult to code ND automata in conventional languages, such as C. • Two different coding approaches: –. 1 When an input arrives, execute all transitions in parallel. See which succeeds. –. 2 When an input arrives, try one transition. If it leads to failure then backtrack and try another transition. 241 -303 Discrete Maths: Automata/7 42
Approach (1) in Parlog • A concurrent logic programming language. state 0([X|Rest]) state 0(Rest) : state 0([m|Rest]) state 1(Rest) : : true. concurrent testing state 1([a|Rest]) : state 2(Rest). state 2([n|Rest. ([ 241 -303 Discrete Maths: Automata/7 Call: ? - state 0([c, o, m, m, a, n, d. ([ 43
Approach (2) in Prolog a sequential logic programming language next. State(0, next. State(1, next. State(2, _, 0). ‘m’, 1). ‘a’, 2). ‘n’, 3). the nondeterministic part nda(State, [Ch|Input]) : next. State(State, Ch, New. State), nda(New. State, Input). nda(3, []). // accepting state Call: ? - nda(0, [c, o, m, m, a, n, d. ([ 241 -303 Discrete Maths: Automata/7 44
. 6. 4 Why use ND Automata? • With nondeterminism, some problems are easier to solve/model. • Nondeterminism is common in some application areas, such as AI, graph search, and compilers. 241 -303 Discrete Maths: Automata/7 continued 45
• It is possible to translate a ND automaton into a (larger, complex) deterministic one. • In mathematical terms, ND automata and determinstic automata are equivalent – they can be used to model all the same problems 241 -303 Discrete Maths: Automata/7 46
‘. 7 washington’ Partial Anagrams • Find all the words which can be made from the letters in “washington. ” • There are nearly 400 words. Some of the 7 -letter words: – agonist – goatish – showing – washing 241 -303 Discrete Maths: Automata/7 47
. 7. 1 A Two Stage Process • . 1 Select all the words from a dictionary (e. g. /usr/share/dict/words on takasila (which use the letters in “washington” – use a deterministic automaton • . 2 Delete the words which use the “washington” letters too many times (e. g. “hash(” – use a nondeterministic automaton 241 -303 Discrete Maths: Automata/7 48
. 7. 2 Stage 1: Deterministic Automaton • Send each word in the dictionary through the automaton: S = {w, a, s, h, i, n, g, t, o{ start 0 newline 1 v If state 1 is reached, then the word is passed to stage 2. 241 -303 Discrete Maths: Automata/7 49
• For example, “hashn” is accepted: 0 h 241 -303 Discrete Maths: Automata/7 0 a 0 s 0 h 0 n 1 50
. 7. 3 Stage 2: ND Automaton • Check if a word uses a “washington” letter too often: – e. g. delete “hash” • • The ND automaton succeeds if a word uses too many letters. Then the program will not output the word. 241 -303 Discrete Maths: Automata/7 51
Checking each Letter • There are 9 different letters in “washington”. • Nine deterministic automaton can be used to detect if the given word has: – more than 1 ‘a’ – more than 1 ‘g’ –. . . – more than 2 ‘n’s 241 -303 Discrete Maths: Automata/7 52
Check for more than 1 ‘a’ L-a start • 0 a 1 a 2 e. g. ‘ nana’ If this succeeds then the program will not output the word. 241 -303 Discrete Maths: Automata/7 53
Checking all the Letters at Once • The 9 deterministic automaton can be applied to the same word at the same time. • Combine the 9 deterministic automata to create a single nondeterministic automaton. 241 -303 Discrete Maths: Automata/7 54
Nondeterminstic Checking L-a L start a 0 1 a 2 two a's 4 two g's 6 two h's L-g g g 3 L-h h h 5 241 -303 Discrete Maths: Automata/7 continued 55
L-i i i 7 two i's 8 L-n n 9 L-n n n 10 11 three n's L-o o o 12 241 -303 Discrete Maths: Automata/7 13 two o's continued 56
L-s s s 14 15 two s's L-t t 16 t 17 two t's L-w w 18 241 -303 Discrete Maths: Automata/7 two w's w 19 57
Processing “hash” 0 h 0 a 0 s 0 h 0 5 h a 14 1 5 • a 5 s s 1 5 h h h 14 1 6 Reaching an accepting state means that the program will not output “hash. ” 241 -303 Discrete Maths: Automata/7 58
/usr/share/dict/words . 7. 4 UNIX Coding • tr grep egrep -v Stages 0, 1, 2, piped together: tr A-Z a-z < /usr/share/dict/words | grep '^[washingto]*$' | egrep -v 'a. *a|g. *g|h. *h|i. *i| n. *n|o. *o|s. *s|t. *t|w. *w’ • The call to tr translates all the words taken from the dictionary into lower case. 241 -303 Discrete Maths: Automata/7 59
. 8 Regular Expressions (REs( • REs are an algebraic way of specifying how to recognise input – ‘algebraic’ means that the recognition pattern is defined using RE operands and operators • REs are equivalent to automata – REs and automata can be used on all the same problems 241 -303 Discrete Maths: Automata/7 60
. 8. 1 REs in grep • • grep searches input lines, a line at a time. If the line contains a string that matches grep's RE (pattern), then the line is output. input lines )e. g. from a file( grep "RE" output matching lines )e. g. to a file( hello andy my name is andy my bye byhe 241 -303 Discrete Maths: Automata/7 continued 61
Examples grep "and" hello andy my name is andy my bye byhe hello andy my name is andy grep –E "an|my" hello andy my name is andy my bye byhe "|"means "or" 241 -303 Discrete Maths: Automata/7 continued 62
grep "hel"* hello andy my name is andy my bye byhe hello andy my bye byhe "*"means "0 or more" 241 -303 Discrete Maths: Automata/7 63
. 8. 2 Why use REs? • • They are very useful for expressing patterns that recognise textual input. For example, REs are used in: – editors – compilers – web-based search engines – communication protocols 241 -303 Discrete Maths: Automata/7 64
. 8. 3 The RE Language • A RE defines a pattern which recognises (matches) a set of strings – e. g. a RE can be defined that recognises the strings { aa, abba, abbbba {… , • These recognisable strings are sometimes called the RE’s language. 241 -303 Discrete Maths: Automata/7 65
RE Operands • There are 4 basic kinds of operands: – characters (e. g. ‘a’, ‘ 1(‘)‘ , ’ – the symbol e (means an empty string(’‘ – the symbol {} (means the empty set( – variables, which can be assigned a RE • variable = RE 241 -303 Discrete Maths: Automata/7 66
RE Operators • There are three basic operators: – union’|‘ – concatenation – closure* 241 -303 Discrete Maths: Automata/7 67
Union • S|T – this RE can use the S or T RE to match strings • Example REs: a|b|c 241 -303 Discrete Maths: Automata/7 matches strings {a, b{ matches strings {a, b, c{ 68
Concatenation • ST – this RE will use the S RE followed by the T RE to match against strings • Example REs: ab w | (a b) 241 -303 Discrete Maths: Automata/7 matches the string { ab{ matches the strings {w, ab{ 69
• What strings are matched by the RE (a | ab ) (c | bc( • Equivalent to: }a, ab} followed by {c, bc{ } <=ac, abc, abbc{ } <=ac, abbc{ 241 -303 Discrete Maths: Automata/7 70
Closure • S* – this RE can use the S RE 0 or more times to match against strings • Example RE: a* matches the strings: {e, a, aaa, aaaaa{. . . , empty string 241 -303 Discrete Maths: Automata/7 71
. 8. 4 REs for C Identifiers • We define two RE variables, letter and digit: letter = A | B | C | D. . . Z | a | b | c | d. . z digit = 0 | 1 | 2 |. . . 9 • ident is defined using letter and digit: ident = letter ( letter | digit*( 241 -303 Discrete Maths: Automata/7 continued 72
• Strings matched by ident include: ab 345 • w h 5 g Strings not matched: $2 241 -303 Discrete Maths: Automata/7 abc **** 73
. 9 UNIX Regular Expressions • Different UNIX tools use slightly different extensions of the basic RE notation – vi, awk, sed, grep, etc. • Extra features include: – character classes – line start ‘^’ and end ‘$’ symbols – the wild card symbol’. ‘ – additional operators, R? and R+ 241 -303 Discrete Maths: Automata/7 74
. 9. 1 Character Classes • The character class [a 1 a 2. . . an] stands for a 1 | a 2 |. . . | an • a 1 - an stands for the set of characters between a 1 and an – e. g. [A-Z] 241 -303 Discrete Maths: Automata/7 [a-z 0 -9[ 75
. 9. 2 Line Start and End • The ‘^’ matches the beginning of the line, ‘$’ matches the end – e. g. grep ‘^andr’ /usr/share/dict/words grep '^[washingto]*$' /usr/share/dict/words 241 -303 Discrete Maths: Automata/7 76
Example as a Diagram grep "^andr" A A's AOL's : : androgen's androgynous android's androids /usr/share/dict/words 241 -303 Discrete Maths: Automata/7 77
. 9. 3 Wild Card Symbol • The ‘. ’ stands for any character except the newline – e. g. grep ‘^a. . b. $’ chapter 1. txt grep ‘t. *t’ manual 241 -303 Discrete Maths: Automata/7 78
grep "^a. . b"$. A A's AOL's : : adobe alibi ameba /usr/share/dict/words 241 -303 Discrete Maths: Automata/7 79
. 9. 4 R? and R+ • R? stands for e | R • R+ stands for R | RRR |. . . which can also be written as R R* (0 or 1 R( – one or more occurrences of R 241 -303 Discrete Maths: Automata/7 80
. 9. 5 Operator Precedence • The operators *, +, and ? have the highest precedence. Then comes concatenation Union ‘|’ is the lowest precedence • Example: • • – a | bc? means a | (b(c? )), and matches the strings {a, b, bc{ 241 -303 Discrete Maths: Automata/7 81
. 10 From REs to Automata • The translation uses a special kind of ND automata which uses e-transitions. Automata of this type are sometimes called e-NFAs. • The translation steps are: – RE => e-NFA – e-NFA => ND automaton – ND automaton => deterministic automaton – deterministic automaton => code 241 -303 Discrete Maths: Automata/7 82
. 10. 1 e-NFAs • A e-NFA allows a transition to use a e label. • A transition using an e label can be triggered without having to match any input. 241 -303 Discrete Maths: Automata/7 83
e-NFA Example • a*b | b*a is accepted by the following e-NFA: b a 2 3 e start e nondeterminism occurs here 1 e e 4 241 -303 Discrete Maths: Automata/7 6 b a 5 Example input: "bbba" 84
. 10. 2 RE to e-NFA • The resulting e-NFA has: – one start state and one accepting state – at most two transitions out of any state • The construction uses standard automata ‘pieces’ corresponding to RE operands and operators. • The pieces are put together based on an expression tree for the RE. 241 -303 Discrete Maths: Automata/7 85
Automata Pieces for RE Operands Automaton for a character x Automaton for e Automaton for{} start x start e start This automaton does not accept any strings. 241 -303 Discrete Maths: Automata/7 86
Automata Pieces for RE Operators • Union S | T: e S e start e 241 -303 Discrete Maths: Automata/7 T e continued 87
• Concatenation S T: start 241 -303 Discrete Maths: Automata/7 S e T continued 88
• Closure S: * e start e S e e 241 -303 Discrete Maths: Automata/7 89
. 10. 3 Translating a | bc* • The first step in building the automaton is to draw a | bc* as an expression tree: | the concatenate symbol . a b * c 241 -303 Discrete Maths: Automata/7 90
Translate the 3 leaves Automaton for a start Automaton for b start Automaton for c start 241 -303 Discrete Maths: Automata/7 1 4 7 a b c 2 5 8 91
Automaton for c* e start 6 e 7 c 8 e 9 e 241 -303 Discrete Maths: Automata/7 92
Automaton for bc* e start 4 b 5 e 6 e 7 c 8 e 9 e 241 -303 Discrete Maths: Automata/7 93
Final Automaton for a | bc* a 1 2 e e start 0 3 e e 4 b 5 e 6 e 7 c 8 e e 9 e 241 -303 Discrete Maths: Automata/7 94
. 10. 4 From e-NFA to ND Automaton • The e-transitions can be removed by combining and/or duplicating the states that use them. • If we are in a state S with e-transition outputs, then we are also in any state that can be reached from S by following those e transitions. 241 -303 Discrete Maths: Automata/7 95
• Example: simplify the lower branch of a|bc* 0 3 e e 4 b 5 e 6 e 7 c 8 e e 9 e 241 -303 Discrete Maths: Automata/7 continued 96
becomes: 0 3 e e 4 b 5 e 6 e 7 c 8 e e 9 241 -303 Discrete Maths: Automata/7 e 3 continued 97
becomes: 0, 4 state combination begins b 5 e 6, 9, 3 e e 7 c 8 e 9, 3 e becomes: 0, 4 b 241 -303 Discrete Maths: Automata/7 5, 6, 9, 3 e 7 c 8, 9, 3 continued 98
becomes: c 0, 4 b 5, 6, 9, 3 becomes: e 7, 8, 9, 3 c 0, 4 b 5, 6, 7, 8, 9, 3 simplify the labels: 0 241 -303 Discrete Maths: Automata/7 b c 5 99
• All of a|bc* simplified: a start 2 0 c b This also happens to be a deterministic automaton, so the translation is finished. 5 241 -303 Discrete Maths: Automata/7 100
. 11 More Information • Johnsonbaugh, R. 1997. Discrete Mathematics, Prentice Hall, chapter 10. 241 -303 Discrete Maths: Automata/7 101