Скачать презентацию Processing of structured documents Spring 2003 Part 1 Скачать презентацию Processing of structured documents Spring 2003 Part 1

f2218d0b67e453614b0996cef4cb43cb.ppt

  • Количество слайдов: 29

Processing of structured documents Spring 2003, Part 1 Helena Ahonen-Myka Processing of structured documents Spring 2003, Part 1 Helena Ahonen-Myka

Course organization n n 581290 -5 laudatur course, 2 cu lectures (in Finnish) n Course organization n n 581290 -5 laudatur course, 2 cu lectures (in Finnish) n n n 21. 1. -20. 2. Tue 12 -14, Thu 10 -12, A 217 not obligatory exercise sessions n n 27. 1. -28. 2. Mon 16 -18, Tue 14 -16, C 454 course assistant: Olli Lahti not obligatory project work included 2

Requirements n n Exam (Thu 6. 3. at 16 -20): 45 points Project (deadline Requirements n n Exam (Thu 6. 3. at 16 -20): 45 points Project (deadline Fri 14. 3. ): 15 points n n n integrated into the exercise sessions obligatory to return a report; attending the exercise sessions voluntary Maximum of points: 60 3

Outline (preliminary) n 1. Structure representations n n grammatical descriptions data model issues, information Outline (preliminary) n 1. Structure representations n n grammatical descriptions data model issues, information sets (XML DTD, ) XML Schema 2. Processing, transferring XML data n n SAX, DOM Web services (SOAP, WSDL, UDDI) 4

Outline. . . n 3. Traversing and querying structured documents n n XPath XML Outline. . . n 3. Traversing and querying structured documents n n XPath XML Query 4. XML Linking 5. Metadata: RDF 5

Prerequisites n You should know the basics of XML n n n DTD, elements, Prerequisites n You should know the basics of XML n n n DTD, elements, attributes, syntax XSLT (basics), formatting some programming experience is needed 6

Project work n n n Project work is integrated into the weekly exercises A Project work n n n Project work is integrated into the weekly exercises A ”large” example that lets us play with the concepts and tools discussed in the course Each exercise session includes one subtask n n n solution is discussed in the exercise session Solutions to the subtasks have to be presented as a report (written in HTML) Return a report by 14. 3. (as a URL; instructions are given later) 7

1. Structure descriptions n n Regular expressions, context-free grammars -> What is XML? (XML 1. Structure descriptions n n Regular expressions, context-free grammars -> What is XML? (XML Document type definitions) data modelling, information sets XML Schema 8

Regular expressions n n A way to describe a set of strings over an Regular expressions n n A way to describe a set of strings over an alphabet (of chars, events, elements…) many uses: n n n text searching (e. g. emacs, grep, perl) in grammatical formalisms (e. g. XML DTDs) relevant for document structures: what kind of structural content is allowed for different document components 9

Regular expressions n A regular expression over alphabet is either n n n n Regular expressions n A regular expression over alphabet is either n n n n (an empty set) (epsilon; sometimes lambda ) a, where a R | S (choice; sometimes R S) RS (catenation) or R* (Kleene closure) where R and S are regular expressions 10

Regular expressions n Regular expression E denotes a language (a set of strings) L(E): Regular expressions n Regular expression E denotes a language (a set of strings) L(E): n n n L( ) = (empty set) L( ) = { } (singleton set of empty string) L(a) = {a} (singleton set of a ) L(R|S) = L(R) L(S) = {w | w L(R) or w L(S)} L(RS) = L(R)L(S) = {xy | x L(R) and y L(S)} L(R*) = L(R)* = {x 1…xn| xk L(R), k=1, …, n; n 0} 11

Example n structure of an article: n n = {title, author, date, section} title Example n structure of an article: n n = {title, author, date, section} title followed by an optional list of authors, followed by an optional date, followed by one or more sections: title author* (date | ) section* common abbreviations: n n E? = (E | ); E+ = E E* -> title author* date? section+ 12

L(title author* date? section+) includes: title author date section title author section 13 L(title author* date? section+) includes: title author date section title author section 13

Expressive power of regular expressions n operations: n n n Catenation -> sequential order Expressive power of regular expressions n operations: n n n Catenation -> sequential order Choice -> also optional parts Closure -> repetition, optional repetition Operations can be nested -> more complex expressions … but we cannot express nested structures -> context-free grammars 14

<collection> <article> <title></title> <author></author> <date></date> <sect></sect> </article> <title></title><section></section> </article> </collection> 15

15