Syntax Specification and Analysis How to Specify

Скачать презентацию Syntax Specification and Analysis How to Specify

1466582210ac53bd4058d18cb06d642c.ppt

Количество слайдов: 30

Syntax Specification and Analysis

How to Specify the Language v How to specify the language q RE is not powerful enough § E. g. , matching ( and ) in expressions, RE cannot specify that q Need more powerful constructs: Grammar § Specifically, context free grammar § There can be other grammars § For example, regular grammar

Grammar v Definition q G = ( T, N, S, P ) § § T: Terminals N: Non-terminals S: Start symbol P: Production rules q T: the set of terminals § Terminals are essentially the tokens § Similar to the set of symbols in RE/FA § Generally represented by lower case alphabets in grammars o E. g. , if, while, a, b § Also, +, > § Also, id (represent the identifiers, not the alphabets themselves)

Grammar v Definition q G = ( T, N, S, P ) q N: the set of non-terminals § Used in production rules to generate substrings § Functionality-wise, similar to the states in FA § Generally represented by upper case alphabets o But for language specification, specialized form is used, such as BNF, for expressiveness q. N T § Sometimes, it is necessary to represent a substring in N T § Generally use lower case Greek alphabets to represent such substrings § E. g. , ,

Grammar v Definition q G = ( T, N, S, P ) q S: starting symbol § A nonterminal symbol from which the derivation starts § Functionality-wise, similar to the starting state in FA q P: the set of production rules § § Define how nonterminals can be used in derivation Functionality-wise, has some similarity to the transitions in FA There a finite set of production rules in a grammar Production rules in context free grammar o A single non-terminal A string of terminals and non-terminals q Other parts of the grammar § Separator: , (to separate multiple productions) § Alternation: | (to put several productions together)

Derivation v Derivation q Based on the grammar, derivations can be made q The purpose of a grammar is to derive strings in the language defined by the grammar q , can be derived from in one step q + derived in one or more steps q * derived in any number of steps q lm leftmost derivation § Always substitute the leftmost non-terminal q rm rightmost derivation § Always substitute the rightmost non-terminal

Context Free Grammar v CFG q Is a type of grammar most commonly used q Left side is always a single nonterminal v Example q T = {a, b, c} q N = {S, A, B} and S is the starting symbol q P includes three rules S AB B b A a. A | c

Derivation and Parse Tree v Example S AB B b A a. A | c v Derivation q Start from S, follow the rules to derive and lead to a string § E. g. , S AB a. Ab aacb v Parse tree q A tree representing a derivation q All internal nodes are non-terminals q All leave nodes are terminals q Build the tree following the derivation S A a B A a b A c

Derivation and Parse Tree v Example S AB B b A a. A | c v Derivation: S q Arbitrary order (previous one) A § S AB a. Ab aacb q Leftmost derivation: § S AB aac. B aacb q Rightmost derivation: § S AB Ab aacb a B A a b A c q A parse tree always has a unique leftmost derivation and a unique rightmost derivation

CFG, Derivation, Parse Tree v Another example q E E * E | E + E | ( E ) | id Ambiguity. q Build a. If, for somefor: id string that can be derived from the grammar, parse tree input * id + id * id there different ways § Can have exists more than one parse tree to parse it, then the grammar is ambiguous E E E + E * E E id id id E E E + E id id * * E id E * E E + E id id id

Ambiguity and. Leftmost: Derivations v Leftmost: E E * E id * E + E id * id + E E E +grammar E id **E ++ E * id id +*E id * id + id * id E E*E+ id id E * id id + Example id * id + E * E id * id +E E id * id + id * id id * Rightmost q Rightmost* E | E +EE | E E E E * E + E * id E E ( * ) | id E E+ q Derive: E E ++ id. E* EE+*E *+id id + id **id id * id + id * id E id * E E id + E * E + id * id E * id + id * id E E E + E * E id id E E * E Multiple derivations do not imply ambiguity, E + E only multiple parse trees do. E * E id id E * E If the grammar is ambiguous then there exists multiple parse trees for the grammar, and for id id each parse tree, there is a unique leftmost derivation and a unique rightmost derivation. id id

Ambiguity v Ambiguity implies multiple parse trees q Can make parsing more difficult q Can impact the semantics of the language § Different parse trees can have different semantic meanings, yield different execution results v Rewrite grammar to eliminate ambiguity q Many ways to rewrite a grammar § The new grammar should accept the same language q Each way may have a different semantic meaning, which one do we want? Should be based on the desired semantics v There is no general algorithm to rewrite ambiguous grammars

Rewrite Ambiguous Grammar v Build desired precedence in the grammar q Example E § E E + E | E * E | (E) | id q Change to E * T E + T id E * T id § E E + T | E * T | (E) | T § T id § Parse: id * id + id * id o E E*T E+T*T E*T+T*T T * T + T * T … id * id + id * id q What is the precedence? Leftmost term executes first id

Rewrite Ambiguous Grammar v Build desired precedence in the grammar q Example § E E + E | E * E | (E) | id q Change to E E+T|T T T*F|F F (E) | id q Parse id + id * id q What is the precedence? * precedes + E E + T T T F F id id * F id

Ambiguity – Another Example v if statement stmt if-stmt | while-stmt | … if-stmt if expr then stmt else stmt | if expr then stmt q Parse: if (a) then if (b) then x = c else x = d if-stmt if expr then stmt (a) if-stmt else stmt x=d if expr then stmt (b) x=c if expr then stmt (a) if-stmt if expr then stmt (b) x=c else stmt x=d

Ambiguity – Another Example v if statement stmt if-stmt | while-stmt | … if-stmt if expr then stmt else stmt | if expr then stmt v Desired semantics q Match the else with the closest if v How to rewrite the if-stmt grammar to eliminate ambiguity? q By defining different if statements § Unmatched and matched § Matched: if expr then stmt else stmt § Unmatched: if expr then stmt q Define them separately

Ambiguity – Another Example v Solution q if-stmt unmatched-stmt | matched-stmt q matched-stmt if expr then matched-stmt else matched-stmt § Matched statement should have matched-stmt in both then and else parts, fully complete q unmatched-stmt if expr then matched-stmt else unmatched-stmt § § If then part is fully matched (complete), the else will match the top level if-then Since this is an unmatched-stmt, the else part must be unmatched q unmatched-stmt if expr then if-stmt § § If then part is not matched, then by matching the closest else’s, the top level has to be unmatched The rest is pushed down a level, so they can be considered recursively at a lower level

Ambiguity v Rewritten grammar q Less intuitive § Harder to comprehend by the language designer as well as the user of the language v Current practice q Expression § Precedence is desired, so, good to use the grammar with precedence q If § Language definition still has the ambiguous grammar § Use some ad hoc method to resolve the problem (which is also easy to deal with)

General Concept: Languages and Grammars v Grammars are classified into 4 classes q Chomsky–Schützenberger hierarchy q Modifications may have been made later v Type-2 grammar q Context free grammar q Productions rules A § A is a non-terminal § (N T)+ { } q Context free grammar can specify any context free language and can only specify content free language q Put in another way: all languages that can be specified by context free grammars are called context free languages

General Concept: Languages and Grammars v Type-3 grammar q Regular grammar q Productions rules can only be § A a | A a. B | A q Regular grammar and regular expression are equivalent q Regular grammar can be constructed based on DFA q If we consider constructing from NFA, then the production rules can be § A a | A a. B | A B § This is to allow the moves on

General Concept: Languages and Grammars v Type-3 grammar q Example: (a|b)*abb q Corresponding NFA start a S 0 a b q Corresponding regular grammar S 0 a S 0 | b S 0 a S 1 b S 2 b S 3 S 1 b S 2 b S 3

General Concept: Languages and Grammars v How to construct regular grammar from NFA q Assign a non-terminal symbol for each state in NFA § Ai for state i q If state i has a transition to state j on input a then § Ai a A j q If state i has a transition to state j on empty input then § Ai Aj q If state i is the accepting state then § Ai q If state i is the starting state then § Ai is the staring symbol

General Concept: Languages and Grammars v What is the limitation of context free grammar? v Try to write the context free grammar for q L 1 = { anbn | n 0} Context sensitive: L 2, L 3, and L 5 q L 2 = { anbncn | n 0} q L 3 = { wcw | w = (a|b)* } q L 4 = { wcwr | w = (a|b)* } wr is reverse of w q L 5 = { anbmwcndm | m, n 0} v Use of the above languages q L 3: a variable before its use should be declared q L 5: anbm are the formal parameters defined in two procedures cndm are the matching numbers of actual parameters q L 2: printer file: an all characters, bn all backspaces, cn all underlines § first prints all the ch. , then back to the beginning to print underlines

General Concept: Languages and Grammars v Context free grammar still has limited power v What is beyond? q Type-0 and type-1 grammars v Generally, in compiler q Features corresponds to L 3, L 5 are checked with other mechanisms § More efficient

General Concept: Languages and Grammars v Type-1 grammar q Context sensitive grammar q Production rules § Include all possible rules in type-2 grammar § Also allow rules of the form: A o o Replace A by only if found in the context of and Left side does not have to be a single non-terminal , (N T)* (no erase rule) q Still belongs to recursive language § There are languages that are not context sensitive but are recursive

General Concept: Languages and Grammars v Type-0 grammar q Production rules § Include all possible forms for the rules § Allow rules of the form: o (N T)* N (N T)* - At least one non-terminal o (N T)* q Corresponds to recursive enumerable language q Include all languages that are recognizable by Tuning machine

General Concept: Languages and Grammars v What can context sensitive grammars do? q Write a grammar for anbncn S a. SBC Generate as many a’s as necessary Generate the last a S a. BC Now the string has as many a’s CB BC Switch CB so that B’s and C’s are in the correct order a. B ab Substitute the first B by b b. B bb Substitute the rest B’s b. C bc Substitute the first C by c Substitute the remaining C’s c. C cc § Small note about CB BC o Can be considered as context sensitive in a modified definition o , len( ) has been proven to produce CSL q Derivation: S a. SBc aa. BCBC aa. BBCC aabb. CC aabbcc

General Concept: Languages and Grammars v What can context sensitive grammars do? q Write a grammar for anbncn q Is it possible to accept strings other than anbncn § S a. SBC aa. BCBC aabc. BC fail q Why no other strings possible? § If the CB BC switch is done fully o Can only substitute sequentially to reach anbncn o B and C cannot be substituted without a terminal proceeding it § If the CB BC switch is not done fully S a. SBC S a. BC CB BC a. B ab b. B bb b. C bc c. C cc o Once a “c” is generated, if there is any remaining B, there is no way to substitute it q A simpler version § S → abc | a. SBc , c. B → Bc , b. B → bb

General Concept: Languages and Grammars v Language classes Type-0 languages Type-1 languages Type-2 languages Type-3 languages

Syntax Specification and Analysis - Summary v Read textbook Sections 4. 1 – 4. 3 q 4. 3. 1 and 4. 3. 2 v Context free grammar for language description v Ambiguity v Classes of grammar and languages