Скачать презентацию Can we Submissions 99 Average for Скачать презентацию Can we Submissions 99 Average for

5e750ae2f68367cc02bc6fce6e9a1f69.ppt

  • Количество слайдов: 46

Can we • Submissions: 99 • Average for A 2: 71% generate. Early submission Can we • Submissions: 99 • Average for A 2: 71% generate. Early submission bonus: 1 • code to support mundane • Full marks: 5 coding tasks and safe time? • 16 teams attempted nonce bonus • 7 got full marks • 7 teams attempted ACC bonus • 7 got full marks Scanning & Parsing with Lex and YACC Give you an example for Milestone 1. Hans-Arno Jacobsen ECE 297 Powerful, but not easy

Course. Peer – try it out! • Developed by a former ECE 297 student Course. Peer – try it out! • Developed by a former ECE 297 student – Many of the videos under tips & tricks are from him too • Short video about Course. Peer • To sign up and auto-enrol under ECE 297, use this link – http: //www. crspr. com/? rid=339 • Will have a quick demo and use it on Wednesday for our Q&A session

Know your tools! • Can we generate code based on a specification of what Know your tools! • Can we generate code based on a specification of what we want? • Is the specification simpler than writing a program for doing the same task? • Fully automated program generation has been a dream since the early days of computing.

Where do we need parsing in the storage server? Where do we need parsing in the storage server?

Where do we need parsing in the storage server? • Configuration file (file) • Where do we need parsing in the storage server? • Configuration file (file) • Bulk loading of data files (file) • Protocol messages (network) • Command line arguments (string)

Parsing • default. conf – the way the disk may see it server_host localhost Parsing • default. conf – the way the disk may see it server_host localhost n server_port 1111 n table marks n # This data directory may be an absolute or relative path. n data_directory. /data nnn EOF server_host localhost server_port 1111 table marks PROPERTY VALUE (TABLE-NAME)+ PROPERTY VALUE data_directory. /data Tokens

Scenarios Where we’d like to safe time in writing a quick language processor? Conceptually Scenarios Where we’d like to safe time in writing a quick language processor? Conceptually speaking In our storage servers • Languages – Data description language – Script language – Markup language – Data schema & data – Query language – Output formatting (Web, Latex, PDF, Word, Excel) • System configurations • Storage server configuration • Workload generation • Benchmarking

Parser generation from 30 K feet Written by developer Specification Generator Generated code Other Parser generation from 30 K feet Written by developer Specification Generator Generated code Other code Executable Compiler / Linker Written by developer

Scanning & parsing I er_host localhost n server_port 1111 n table marks n # Scanning & parsing I er_host localhost n server_port 1111 n table marks n # Th PROPERTY VALUE … Scanning PROPERTY VALUE (TABLE-NAME)+ PROPERTY VALUE Verify content, add to data structures, … Parsing Processing

Regular expressions • (TABLE-NAME)+ Patterns – TABLE-NAME TABLE-NAME –… • Regular expressions (formal languages) Regular expressions • (TABLE-NAME)+ Patterns – TABLE-NAME TABLE-NAME –… • Regular expressions (formal languages) • Extended regular expressions (UNIX)

Scanning & parsing II • Parsing is really two steps – Scanning (a. k. Scanning & parsing II • Parsing is really two steps – Scanning (a. k. a. tokenizing or lexical analysis) – Parsing, i. e. , analysis of structure and syntax according to a grammar (i. e. , a set of rules) • flex is the scanner generator (open source) – Fast Lex for lexical analysis • YACC is the parser generator – Yet Another Compiler for structural and syntax analysis • Lex and YACC work together • Generated scanner drives the generated parser • We use flex (fast Lex) and Bison (GNU YACC) • There are myriads of other tools for Java, C++, …, some of which combine Lex/Yacc into one tool (e. g. , javacc)

Objectives for today • Cover the basics of Lex & Yacc • Everybody should Objectives for today • Cover the basics of Lex & Yacc • Everybody should have an appreciation of the potential of these tools • There is a lot more detail that remains unsaid • To challenge you

Lex & YACC overview server_host localhost n server_port 1111 n table marks n # Lex & YACC overview server_host localhost n server_port 1111 n table marks n # This data directory may be an absolute or relative path. n data_directory. /data nnn EOF input stream Lexical Analyzer token stream PROPERTY VALUE Output defined by actions in parser Structural token stream specification Analyzer (often an in-memory representation of input)

LEXICAL ANALYSIS WITH LEX LEXICAL ANALYSIS WITH LEX

Synonyms: lexical analyzer, scanner, lexer, tokenizer Lex introduction Input specification (*. l) flex lex. Synonyms: lexical analyzer, scanner, lexer, tokenizer Lex introduction Input specification (*. l) flex lex. yy. c flex is fast Lex You can control the name of generated file C compiler input stream Lexical Analyzer token stream You generate the lexical analyzer by using flex

Lex • Input specification for lex – the “program” – Three parts: Definitions, Rules, Lex • Input specification for lex – the “program” – Three parts: Definitions, Rules, User code – Use “%%” as a delimiter for each part • First part: Definitions – Options used by flex inside the scanner – Defines variables & macros – Code within “%{” and “%}” directly copied into the scanner (e. g. , global variables, header files) • Second part: Rules – Patterns and corresponding actions • Actions are executed when corresponding pattern(s) matches – Patterns are defined by regular expressions

Parsing the configuration file of Milestone 1 %{ #include Parsing the configuration file of Milestone 1 %{ #include "config_parser. tab. h". . . %} a 2 Z [a-z. A-Z] host server_host port server_port dir data_directory %% {host} { return HOST_PROPERTY; } {port} { return PORT_PROPERTY; } table { return TABLE; } {dir} { return DDIR_PROPERTY; } [tn ]+ { } #. *n { } {a 2 Z}*Pattern { yylval. sval = strdup(yytext); return STRING; } [0 -9]+ { yylval. pval = (int) atoi(yytext); Action return PORT_NUMBER; } . { return yytext[0]; } … Shorthands for use below config_parser. l

flex pattern matching principles • Actions are executed when patterns match – Tokens are flex pattern matching principles • Actions are executed when patterns match – Tokens are returned to caller; next pattern … • Patterns match a given input character or string only once – Input stream is consumed • flex executes the action for the longest possible matching input – Order of patterns in the spec. is important

flex regular expressions by example I (Really: extended regular expressions) `x‘ `. ‘ `[xyz]’ flex regular expressions by example I (Really: extended regular expressions) `x‘ `. ‘ `[xyz]’ `[abj-o. Z]‘ match the character 'x' any character (byte) except newline match either an 'x', a 'y', or a 'z' match an 'a', a 'b', any letter from 'j' through 'o', or a 'Z‘ `[^A-Z]‘ a "negated character class", i. e. , any character EXCEPT those in the class `[^A-Zn]’ any character EXCEPT an uppercase letter or a newline

flex regular expression by example II `r*‘ zero or more r's, where r is flex regular expression by example II `r*‘ zero or more r's, where r is any regular expression `r+‘ one or more r's `r? ‘ zero or one r (that is, “an optional r”) ‘r{2, 5}‘ anywhere from two to five r's `r{2, }‘ two or more r's r is any `r{4}‘ exactly 4 r's regular ‘<>' an end-of-file expression

flex regular expressions • There are many more expressions, see manual • Form complex flex regular expressions • There are many more expressions, see manual • Form complex expressions – E. g. : IP address, names, … • The expression syntax is used in other tools as well (well worth learning)

Parsing the configuration file of Milestone 1 %{ #include Parsing the configuration file of Milestone 1 %{ #include "config_parser. tab. h". . . %} a 2 Z [a-z. A-Z] host server_host port server_port dir data_directory %% server_host localhost server_port 1111 table marks data_directory. /data {host} { return HOST_PROPERTY; } {port} { return PORT_PROPERTY; } table { return TABLE; } {dir} { return DDIR_PROPERTY; } [tn ]+{ } #. *n { } {a 2 Z}* { yylval. sval = strdup(yytext); return STRING; } [0 -9]+ { yylval. pval = (int) atoi(yytext); return PORT_NUMBER; }. { return yytext[0]; } <> { return User-defined 0; } variable in YACC (conveys token value to YACC) config_parser. l

PARSING WITH YACC PARSING WITH YACC

YACC introducing Input specification (*. y) YACC You can control the name of generated YACC introducing Input specification (*. y) YACC You can control the name of generated file y. tab. c C compiler token stream, e. g. , via flex Syntax analyzer / parser Output defined by actions in parser specification From the specified grammar, YACC generates a parser which recognizes “sentences” according to the grammar

YACC • Input specification for YACC (similar to flex) – Three parts: Definitions, Rules, YACC • Input specification for YACC (similar to flex) – Three parts: Definitions, Rules, User code – Use “%%” as a delimiter for each part • First part: Definitions – Definition of tokens for the second part and for use by flex – Definition of variables for use by the parser code • Second part: Rules – Grammar for the parser • Third part: User code – The code in this part is copied into the parser generated by YACC

Configuration file parser Milestone 1 %{ #include <string. h> #include <stdio. h> struct table Configuration file parser Milestone 1 %{ #include #include struct table *tl, *t; struct configuration *c; /* define a structure for the configuration information */ struct configuration { char *host; int port; struct table *tlist; char *data_dir; }; /* define a linked list of table names */ struct table { char *table_name; struct table *next; }; config_parser. y Definition section

Configuration file parser Milestone 1 %} %union{ char *sval; // String value (user defined) Configuration file parser Milestone 1 %} %union{ char *sval; // String value (user defined) int pval; // Port number value (user defined) } %token STRING %token PORT_NUMBER %token HOST_PROPERTY PORT_PROPERTY DDIR_PROPERTY TABLE %% Definition section cont’d. config_parser. y

Configuration file parser Milestone 1 property_list: HOST_PROPERTY STRING PORT_PROPERTY NUMBER table_list data_directory ; table_list: Configuration file parser Milestone 1 property_list: HOST_PROPERTY STRING PORT_PROPERTY NUMBER table_list data_directory ; table_list: table_list TABLE STRING | TABLE STRING ; data_directory: DDIR_PROPERTY STRING ; %% (Grammar) Rules section (simplified) config_parser. y

struct configuration *c; struct configuration { char *host; int port; data_directory: struct table *tlist; struct configuration *c; struct configuration { char *host; int port; data_directory: struct table *tlist; $1 $2 char *data_dir; DDIR_PROPERTY STRING }; { c= (struct configuration *) malloc(sizeof(struct configuration)); // Check c for NULL c->data_dir = strdup( $2 ); } ; config_parser. y (Grammar) Rules section (details)

property_list: struct configuration *c; struct configuration { char *host; int port; struct table *tlist; property_list: struct configuration *c; struct configuration { char *host; int port; struct table *tlist; char *data_dir; }; HOST_PROPERTY STRING PORT_PROPERTY PORT_NUMBER table_list data_directory { c->host = strdup( $2 ); c->port = $4; c->tlist = tl; } ; (Grammar) Rules section config_parser. y (details)

Configuration file parser Milestone 1 property_list: HOST_PROPERTY STRING PORT_PROPERTY NUMBER table_list data_directory ; table_list: Configuration file parser Milestone 1 property_list: HOST_PROPERTY STRING PORT_PROPERTY NUMBER table_list data_directory ; table_list: table_list TABLE STRING | TABLE STRING ; data_directory: DDIR_PROPERTY STRING ; %% … TABLE STRING (Grammar) Rules section (simplified) config_parser. y

table_list is a recursive rule • Example table specification in configuration file table My. table_list is a recursive rule • Example table specification in configuration file table My. Courses table My. Marks table My. Friends • table_list: table_list TABLE STRING | TABLE STRING ; • Terminology – table_list is called a non-terminal – TABLE & STRING are terminals

Recursive rule execution table_list : table_list TABLE STRING table My. Courses table_list TABLE STRING Recursive rule execution table_list : table_list TABLE STRING table My. Courses table_list TABLE STRING table My. Marks table My. Courses TABLE STRING table My. Friends table My. Courses table My. Marks table My. Friends table_list: table My. Marks table My. Courses table_list TABLE STRING | TABLE STRING ;

struct table *tl, *t; struct table { table_list: char *table_name; struct table *next; $1 struct table *tl, *t; struct table { table_list: char *table_name; struct table *next; $1 $2 $3 }; table_list TABLE STRING { t = (struct table *) malloc(sizeof(struct table)); t->table_name = strdup( $3 ); t->next = tl; tl = t table t->next = tl } $1 $2 | TABLE STRING { tl = (struct table *) malloc(sizeof(struct table)); tl->table_name = strdup( $2 ); tl->next = NULL; tl table } config_parser. y ;

How to invoke the parser int main (int argc, char **argv){ FILE *f; extern How to invoke the parser int main (int argc, char **argv){ FILE *f; extern FILE *yyin; if (argc == 2) { f = fopen(argv[1], "r"); if (!f){ …// error handling …} yyin = f; while( ! feof(yyin) ) { if (yyparse() != 0) { … yyerror(""); exit(0); }; } fclose(f); } … • yylex() for calling generated scanner • by default called within yyparse()

In the Makefile lexer: config_parser. l ${LEX} config_parser. l ${CC} ${CFLAGS} ${INCLUDE} -c lex. In the Makefile lexer: config_parser. l ${LEX} config_parser. l ${CC} ${CFLAGS} ${INCLUDE} -c lex. yy. c yaccer: config_parser. y ${YACC} -d config_parser. y ${CC} ${CFLAGS} ${INCLUDE} -c config_parser. tab. c parser: config_parser. tab. o lex. yy. o ${CC} ${CFLAGS} ${INCLUDE} -c parser. c ${CC} -o p ${CFLAGS} ${INCLUDE} lex. yy. o config_parser. tab. o parser. o

Benefits • Faster development – Compared to manual implementation • Easier to change the Benefits • Faster development – Compared to manual implementation • Easier to change the specification and generate new parser – Than to modify 1000 s of lines of code to add, change, delete an existing feature • Less error-prone, as code is generated • Cost: Learning curve – Invest once, amortized over 40+ years career

If you want to know more • Lecture, examples and some recommended reading are If you want to know more • Lecture, examples and some recommended reading are enough to tackle all of the parsing for Milestone 3 & 4 • 3 rd and 4 th year lectures on Compilers may show you the algorithms behind & inside Lex & YACC • Lectures on Computability and Theory of Computation may also show you these algorithms

A flex specification %{ #include <stdio. h #include A flex specification %{ #include

The header %{ #include <stdio. h #include The header %{ #include

The rules %% The rules %% " " [a-z] ; { c = yytext[0]; yylval = c - 'a'; return (LETTER); [0 -9] } { c = yytext[0]; yylval = c - '0'; return (DIGIT); } [^a-z 0 -9b] { c = yytext[0]; return(c); } yytext: the string associated with the token

The rules %% The rules %% " " [a-z] sets yylval to the character’s alphabetical order ; { c = yytext[0]; yylval = c - 'a'; return(LETTER); [0 -9] } { c = yytext[0]; yylval = c - '0'; return(DIGIT); } [^a-z 0 -9n] sets yylval to digit’s numerical value { c = yytext[0]; return(c); } otherwise simply returns that character; presumably it’s an operator: +*-, etc.

Simple example • Implement a calculator which can recognize adding or subtracting of numbers Simple example • Implement a calculator which can recognize adding or subtracting of numbers [linux 33]%. /y_calc 1+101 = 102 [linux 33] %. /y_calc 1000 -300+200+100 = 1000 [linux 33] %

Example – the Lex part %{ #include <math. h> #include Example – the Lex part %{ #include #include "y. tab. h" extern int yylval; %} Definitions pattern %% [0 -9]+ { action yylval = atoi(yytext); return NUMBER; } [t ]+ ; /* Do nothing for white space */ n return 0; /* End of the logic */. return yytext[0]; Rules %%

Example – the Yacc part %token NAME NUMBER %% Definitions statement: NAME '=' expression Example – the Yacc part %token NAME NUMBER %% Definitions statement: NAME '=' expression | expression { printf("= %dn", $1); } ; expression: expression '+' NUMBER Include Yacc library (-ly) { $$ = $1 + $3; } |expression '-' NUMBER { $$ = $1 - $3; } | NUMBER { $$ = $1; } Rules ;