Скачать презентацию Processing XML with Java Representation and Management of Скачать презентацию Processing XML with Java Representation and Management of

758e736dc6f699ef4c50402bed708c75.ppt

  • Количество слайдов: 75

Processing XML with Java Representation and Management of Data on the Internet A comprehensive Processing XML with Java Representation and Management of Data on the Internet A comprehensive tutorial about XML processing with Java XML tutorial of W 3 Schools 1

Parsers • What is a parser? Formal grammar Input Parser Analyzed Data The structure(s) Parsers • What is a parser? Formal grammar Input Parser Analyzed Data The structure(s) of the input, according to the atomic elements and their relationships (as described in the grammar) 2

XML-Parsing Standards • We will consider two parsing methods that implement W 3 C XML-Parsing Standards • We will consider two parsing methods that implement W 3 C standards for accessing XML • DOM - convert XML into a tree of objects - “random access” protocol • SAX - “serial access” protocol - event-driven parsing 3

XML Examples 4 XML Examples 4

root element world. xml " src="https://present5.com/presentation/758e736dc6f699ef4c50402bed708c75/image-5.jpg" alt=" root element world. xml " /> root element world. xml validating DTD file Israel reference to an entity 6, 199, 008 Jerusalem Ashdod France 60, 424, 213 5

XML Tree Model element attribute simple content 6 XML Tree Model element attribute simple content 6

<!ELEMENT countries (country*)> world. dtd <!ELEMENT country (name, population? , city*)> <!ATTLIST country continent world. dtd parsed Not parsed default value As opposed to required Open world. xml in your browser 7 Check world 2. xml for #PCDATA exmaple

Namespaces sales. xml <xhtml: em>DBI: </xhtml: em> “xhtml” namespace declaration <![CDATA[Where I Learned <xhtml>. ]]> (non-parsed) character data My favorite book! default namespace declaration namespace overriding 8

sales. xml sales. xml sales. xml <xhtml: h 1> DBI </xhtml: h 1> <![CDATA[Where I Learned <xhtml>. ]]> Namespace: “http: //www. w 3. org/1999/xhtml” My favorite book! Local name: “h 1” Qualified name: “xhtml: h 1” 9

sales. xml sales. xml sales. xml Namespace: “http: //www. cs. huji. ac. il/~dbi/comments” <xhtml: h 1> DBI </xhtml: h 1> Local name: “par” <![CDATA[Where I Learned <xhtml>. ]]> Qualified name: “par” My favorite book! 10

sales. xml sales. xml sales. xml <xhtml: h 1>DBI</xhtml: h 1> <![CDATA[Where I Learned <xhtml>. ]]> My favorite book! Namespace: “” Local name: “title” Qualified name: “title” 11

sales. xml sales. xml sales. xml <xhtml: h 1>DBI</xhtml: h 1> Namespace: “http: //www. w 3. org/1999/xhtml” <![CDATA[Where I Learned <xhtml>. ]]> Local name: “b” Qualified name: “xhtml: b” My favorite book! 12

DOM – Document Object Model 13 DOM – Document Object Model 13

DOM Parser • DOM = Document Object Model • Parser creates a tree object DOM Parser • DOM = Document Object Model • Parser creates a tree object out of the document • User accesses data by traversing the tree - The tree and its traversal conform to a W 3 C standard • The API allows for constructing, accessing and manipulating the structure and content of XML documents 14

" src="https://present5.com/presentation/758e736dc6f699ef4c50402bed708c75/image-15.jpg" alt=" " /> Israel 6, 199, 008 Jerusalem Ashdod France 60, 424, 213 15

The DOM Tree 16 The DOM Tree 16

Using a DOM Tree XML File DOM Parser DOM Tree A P I Application Using a DOM Tree XML File DOM Parser DOM Tree A P I Application in memory 17

18 18

Creating a DOM Tree • A DOM tree is generated by a Document. Builder Creating a DOM Tree • A DOM tree is generated by a Document. Builder • The builder is generated by a factory, in order to be implementation independent • The factory is chosen according to the system configuration Document. Builder. Factory factory = Document. Builder. Factory. new. Instance(); Document. Builder builder = factory. new. Document. Builder(); Document doc = builder. parse("world. xml"); 19

Configuring the Factory • The methods of the document-builder factory enable you to configure Configuring the Factory • The methods of the document-builder factory enable you to configure the properties of the document building • For example - factory. set. Validating(true) - factory. set. Ignoring. Comments(false) Read more about Document. Builder. Factory Class, Document. Builder Class 20

The Node Interface • The nodes of the DOM tree include - a special The Node Interface • The nodes of the DOM tree include - a special root (denoted document) • The Document interface retrieved by builder. parse(…) actually extends the Node Interface - element nodes - text nodes and CDATA sections - attributes - comments - and more. . . • Every node in the DOM tree implements the Node interface 21

Figure as appears in : “The XML Companion” - Neil Bradley A lightweight fragment Figure as appears in : “The XML Companion” - Neil Bradley A lightweight fragment of the document. Can hold several sub -trees Node Interfaces in a DOM Tree Document. Fragment Document Character. Data Attr Text CDATASection Comment Element Document. Type Notation Node. List Entity Named. Node. Map Entity. Reference Processing. Instruction Document. Type Run Fragment. Vs. Element with 1 st argument fragment/element 22

Interfaces in the DOM Tree Document Type Attribute Text Attribute Element Comment Element Entity Interfaces in the DOM Tree Document Type Attribute Text Attribute Element Comment Element Entity Reference Element Text 23

Node Navigation • Every node has a specific location in tree • Node interface Node Navigation • Every node has a specific location in tree • Node interface specifies methods for tree navigation - Node get. First. Child(); Node get. Last. Child(); Node get. Next. Sibling(); Node get. Previous. Sibling(); Node get. Parent. Node(); Node. List get. Child. Nodes(); Named. Node. Map get. Attributes() 24

Node Navigation (cont( get. Previous. Sibling() get. First. Child() get. Child. Nodes() get. Parent. Node Navigation (cont( get. Previous. Sibling() get. First. Child() get. Child. Nodes() get. Parent. Node() get. Last. Child() get. Next. Sibling() 25

Node Properties • Every node has - a type - a name - a Node Properties • Every node has - a type - a name - a value - attributes • The roles of these properties differ according to the node types • Nodes of different types implement different interfaces (that extend Node) 26

Names, Values and Attributes Interface node. Name node. Value attributes name of attribute value Names, Values and Attributes Interface node. Name node. Value attributes name of attribute value of attribute null "#cdata-section" content of the Section null Comment "#comment" content of the comment null Document "#document" null "#document-fragment" null doc-type name null tag name null Node. Map entity name null name of entity referenced null notation name null target entire content null "#text" content of the text node null Attr CDATASection Document. Fragment Document. Type Element Entity. Reference Notation Processing. Instruction Text 27

Node Types - get. Node. Type() ELEMENT_NODE = 1 PROCESSING_INSTRUCTION_NODE = 7 ATTRIBUTE_NODE = Node Types - get. Node. Type() ELEMENT_NODE = 1 PROCESSING_INSTRUCTION_NODE = 7 ATTRIBUTE_NODE = 2 COMMENT_NODE = 8 TEXT_NODE = 3 DOCUMENT_NODE = 9 CDATA_SECTION_NODE = 4 DOCUMENT_TYPE_NODE = 10 ENTITY_REFERENCE_NODE = 5 DOCUMENT_FRAGMENT_NODE = 11 ENTITY_NODE = 6 NOTATION_NODE = 12 if (my. Node. get. Node. Type() == Node. ELEMENT_NODE) { //process node … } Read more about Node Interface 28

import org. w 3 c. dom. *; import javax. xml. parsers. *; public class import org. w 3 c. dom. *; import javax. xml. parsers. *; public class Echo. With. Dom { public static void main(String[] args) throws Exception { Document. Builder. Factory factory = Document. Builder. Factory. new. Instance(); factory. set. Ignoring. Element. Content. Whitespace(true); Document. Builder builder = factory. new. Document. Builder(); Document doc = builder. parse(“world. xml"); new Echo. With. Dom(). echo(doc); } e. g white spaces used for indentation in non mixed 29 data elements

private void echo(Node n) { print(n); if (n. get. Node. Type() == Node. ELEMENT_NODE) private void echo(Node n) { print(n); if (n. get. Node. Type() == Node. ELEMENT_NODE) { Named. Node. Map atts = n. get. Attributes(); ++depth; for (int i = 0; i < atts. get. Length(); i++) echo(atts. item(i)); --depth; } depth++; for (Node child = n. get. First. Child(); child != null; child = child. get. Next. Sibling()) echo(child); depth--; } Attribute nodes are not included… 30

private int depth = 0; private String[] NODE_TYPES = { private int depth = 0; private String[] NODE_TYPES = { "", "ELEMENT", "ATTRIBUTE", "TEXT", "CDATA", "ENTITY_REF", "ENTITY", "PROCESSING_INST", "COMMENT", "DOCUMENT_TYPE", "DOCUMENT_FRAG", "NOTATION" }; private void print(Node n) { for (int i = 0; i < depth; i++) System. out. print(" "); System. out. print(NODE_TYPES[n. get. Node. Type()] + ": "); System. out. print("Name: "+ n. get. Node. Name()); System. out. print(" Value: "+ n. get. Node. Value()+"n"); }} run Echo. With. Dom, pay attention to the default values 31

Another Example public class World. Parser { public static void main(String[] args) throws Exception Another Example public class World. Parser { public static void main(String[] args) throws Exception { Document. Builder. Factory factory = Document. Builder. Factory. new. Instance(); factory. set. Ignoring. Element. Content. Whitespace(true); Document. Builder builder = factory. new. Document. Builder(); Document doc = builder. parse("world. xml"); print. Cities(doc); } 32

Another Example (cont( public static void print. Cities(Document doc) { Node. List cities = Another Example (cont( public static void print. Cities(Document doc) { Node. List cities = doc. get. Elements. By. Tag. Name("city"); for(int i=0; i

Normalizing the DOM Tree • Normalizing a DOM Tree has two effects: - Combine Normalizing the DOM Tree • Normalizing a DOM Tree has two effects: - Combine adjacent textual nodes - Eliminate empty textual nodes Created by node manipulation… • To normalize, apply the normalize() method to the document element 34

Node Manipulation • Children of a node in a DOM tree can be manipulated Node Manipulation • Children of a node in a DOM tree can be manipulated added, edited, deleted, moved, copied, etc. • To constructs new nodes, use the methods of Document - create. Element, create. Attribute, create. Text. Node, create. CDATASection etc. • To manipulate a node, use the methods of Node - append. Child, insert. Before, remove. Child, replace. Child, set. Node. Value, clone. Node(boolean deep) etc. 35

Figure as appears in “The XML Companion” - Neil Bradley Node Manipulation (cont( Old Figure as appears in “The XML Companion” - Neil Bradley Node Manipulation (cont( Old New insert. Before Ref New replace. Child deep = 'false' clone. Node deep = 'true' 36

SAX – Simple API for XML 37 SAX – Simple API for XML 37

SAX Parser • SAX = Simple API for XML • XML is read sequentially SAX Parser • SAX = Simple API for XML • XML is read sequentially • When a parsing event happens, the parser invokes the corresponding method of the corresponding handler • The handlers are programmer’s implementation of standard Java API (i. e. , interfaces and classes) 38

" src="https://present5.com/presentation/758e736dc6f699ef4c50402bed708c75/image-39.jpg" alt=" " /> Israel 6, 199, 008 Jerusalem Ashdod France 60, 424, 213 39

" src="https://present5.com/presentation/758e736dc6f699ef4c50402bed708c75/image-40.jpg" alt=" " /> Israel 6, 199, 008 Start Document Ashdod Jerusalem France 60, 424, 213 40

" src="https://present5.com/presentation/758e736dc6f699ef4c50402bed708c75/image-41.jpg" alt=" " /> Israel 6, 199, 008 Start Element Ashdod Jerusalem France 60, 424, 213 41

" src="https://present5.com/presentation/758e736dc6f699ef4c50402bed708c75/image-42.jpg" alt=" " /> Israel 6, 199, 008 Start Element Ashdod Jerusalem France 60, 424, 213 42

" src="https://present5.com/presentation/758e736dc6f699ef4c50402bed708c75/image-43.jpg" alt=" " /> Israel 6, 199, 008 Comment Jerusalem Ashdod France 60, 424, 213 43

" src="https://present5.com/presentation/758e736dc6f699ef4c50402bed708c75/image-44.jpg" alt=" " /> Israel 6, 199, 008 Start Element Ashdod Jerusalem France 60, 424, 213 44

" src="https://present5.com/presentation/758e736dc6f699ef4c50402bed708c75/image-45.jpg" alt=" " /> Israel 6, 199, 008 Characters Jerusalem Ashdod France 60, 424, 213 45

" src="https://present5.com/presentation/758e736dc6f699ef4c50402bed708c75/image-46.jpg" alt=" " /> Israel 6, 199, 008 Jerusalem End Ashdod Element France 60, 424, 213 46

" src="https://present5.com/presentation/758e736dc6f699ef4c50402bed708c75/image-47.jpg" alt=" " /> Israel 6, 199, 008 End Jerusalem Element Ashdod France 60, 424, 213 47

" src="https://present5.com/presentation/758e736dc6f699ef4c50402bed708c75/image-48.jpg" alt=" " /> Israel 6, 199, 008 End Document Ashdod Jerusalem France 60, 424, 213 48

. . . SAX Parser When you see" src="https://present5.com/presentation/758e736dc6f699ef4c50402bed708c75/image-49.jpg" alt="SAX Parsers . . . SAX Parser When you see" /> SAX Parsers . . . SAX Parser When you see the start of the document do … When you see the start of an element do … When you see the end of an element do … 49

Used to create a SAX Parser Handles document events: start tag, end tag, etc. Used to create a SAX Parser Handles document events: start tag, end tag, etc. Handles Parser Errors Handles DTD Handles Entities 50

Creating a Parser • The SAX interface is an accepted standard • There are Creating a Parser • The SAX interface is an accepted standard • There are many implementations of many vendors - Standard API does not include an actual implementation, but Sun provides one with JDK • We would like to be able to change the implementation used, without changing any code in the program - How is this done? 51

Factory Design Pattern • Have a “factory” class that creates the actual parsers - Factory Design Pattern • Have a “factory” class that creates the actual parsers - org. xml. sax. helpers. XMLReader. Factory • The factory checks configurations, mainly the value of a system property, that specify the implementation - Can be set outside the Java code: a configuration file, a command-line argument, etc. • In order to change the implementation, simply change the system property Read more about XMLReader. Factory Class 52

Creating a SAX Parser import org. xml. sax. *; import org. xml. sax. helpers. Creating a SAX Parser import org. xml. sax. *; import org. xml. sax. helpers. *; public class Echo. With. Sax { public static void main(String[] args) throws Exception { System. set. Property("org. xml. sax. driver", "org. apache. xerces. parsers. SAXParser"); Implements XMLReader reader = XMLReader. Factory. create. XMLReader(); reader. parse("world. xml"); } } Read more about XMLReader Interface, Xerces SAXParser class 53

Implementing the Content Handler • A SAX parser invokes methods such as start. Document, Implementing the Content Handler • A SAX parser invokes methods such as start. Document, start. Element and end. Element of its content handler as it runs • In order to react to parsing events we must: - implement the Content. Handler interface - set the parser’s content handler with an instance of our Content. Handler implementation 54

Content. Handler Methods • start. Document - parsing begins • end. Document - parsing Content. Handler Methods • start. Document - parsing begins • end. Document - parsing ends • start. Element - an opening tag is encountered • end. Element - a closing tag is encountered • characters - text (CDATA) is encountered • ignorable. Whitespace - white spaces that should be ignored (according to the DTD) • and more. . . Read more about Content. Handler Interface 55

The Default Handler • The class Default. Handler implements all handler interfaces (usually, in The Default Handler • The class Default. Handler implements all handler interfaces (usually, in an empty manner) - i. e. , Content. Handler, Entity. Resolver, DTDHandler, Error. Handler • An easy way to implement the Content. Handler interface is to extend Default. Handler Read more about Default. Handler Class 56

A Content Handler Example import org. xml. sax. helpers. Default. Handler; import org. xml. A Content Handler Example import org. xml. sax. helpers. Default. Handler; import org. xml. sax. *; public class Echo. Handler extends Default. Handler { int depth = 0; public void print(String line) { for(int i=0; i

A Content Handler Example public void start. Document() throws SAXException { print( A Content Handler Example public void start. Document() throws SAXException { print("BEGIN"); } public void end. Document() throws SAXException { print("END"); } public void start. Element(String ns, String local. Name, String q. Name, Attributes attrs) throws SAXException { We will discuss this print("Element " + q. Name + "{"); interface later… ++depth; for (int i = 0; i < attrs. get. Length(); ++i) print(attrs. get. Local. Name(i) + "=" + attrs. get. Value(i)); } 58

A Content Handler Example public void end. Element(String ns, String l. Name, String q. A Content Handler Example public void end. Element(String ns, String l. Name, String q. Name) throws SAXException { --depth; print("}"); } public void characters(char buf[], int offset, int len) throws SAXException { String s = new String(buf, offset, len). trim(); ++depth; print(s); --depth; } } 59

Fixing The Parser public class Echo. With. Sax { public static void main(String[] args) Fixing The Parser public class Echo. With. Sax { public static void main(String[] args) throws Exception { XMLReader reader = XMLReader. Factory. create. XMLReader(); reader. set. Content. Handler(new Echo. Handler()); reader. parse("world. xml"); } } What would happen without this line? run the Echo. With. Sax 2 run Echo. With. Sax 60

Empty Elements • What do you think happens when the parser parses an empty Empty Elements • What do you think happens when the parser parses an empty element? run Echo. With. Sax 3 61

Attributes Interface • The Attributes interface provides an access to all attributes of an Attributes Interface • The Attributes interface provides an access to all attributes of an element - get. Length(), get. QName(i), get. Value(i), get. Type(i), get. Value(qname), etc. #attributes • The following are possible types for attributes: CDATA, IDREF, IDREFS, NMTOKENS, ENTITY, ENTITIES, NOTATION • There is no distinction between attributes that are defined explicitly from those that are specified in the DTD (with a default value) Read more about Attributes Interface 62 run Echo. With. Sax and check “capital” attribute, compare to xml source

Error. Handler Interface • We implement Error. Handler to receive error events (similar to Error. Handler Interface • We implement Error. Handler to receive error events (similar to implementing Content. Handler) • Default. Handler implements Error. Handler in an empty fashion, so we can extend it (as before) • An Error. Handler is registered with - reader. set. Error. Handler(handler); • Three methods: - void error(SAXParse. Exception ex); - void fatal. Error(SAXParser. Excpetion ex); - void warning(SAXParser. Exception ex); 63

Parsing Errors • Fatal errors disable the parser from continuing parsing - For example, Parsing Errors • Fatal errors disable the parser from continuing parsing - For example, the document is not well formed, an unknown XML version is declared, etc. • Errors (that is recoverable ones) occur for example when the parser is validating and validity constrains are violated • Warnings occur when abnormal (yet legal) conditions are encountered - For example, an entity is declared twice in the DTD Read more about Error. Handler Interface 64

Entity. Resolver and DTDHandler • The interface Entity. Resolver enables the programmer to specify Entity. Resolver and DTDHandler • The interface Entity. Resolver enables the programmer to specify a new source for translation of external entities e. g. external DTD • The interface DTDHandler enables the programmer to react to notations and unparsed entities declarations inside the DTD Usually appear with external non-xml resources and describe their type Read more about Entity. Resolver Interface, DTDHandler Interface 65

Features and Properties • SAX parsers can be configured by setting their features and Features and Properties • SAX parsers can be configured by setting their features and properties • Syntax: - reader. set. Feature("feature-url", boolean) - reader. set. Property("property-url", Object) • Standard feature URLs have the form: http: //xml. org/sax/features/feature-name • Standard property URLs have the form http: //xml. org/sax/properties/prop-name 66

Feature/Property Examples • Features: - namespaces - are namespaces supported? - validation - does Feature/Property Examples • Features: - namespaces - are namespaces supported? - validation - does the parser validate (against the declared DTD) ? - http: //apache. org/xml/features/nonvalidating/load-external-dtd • Ignore the DTD? (spec. to Xerces implementation) Read more about Features • Properties: - xml-string - the actual text that caused the current event (read-only with get. Property()) - lexical-handler - see the next slide. . . Read more about Properties 67

Lexical Events • Lexical events have to do with the way that a document Lexical Events • Lexical events have to do with the way that a document was written and not with its content • Examples: - A comment is a lexical event () - The use of an entity is a lexical event (> ) • These can be dealt with by implementing the Lexical. Handler interface, and setting the lexicalhandler property to an instance of the handler 68

Lexical. Handler Methods • comment(char[] ch, int start, int length) • start. CDATA() • Lexical. Handler Methods • comment(char[] ch, int start, int length) • start. CDATA() • end. CDATA() • start. Entity(java. lang. String name) • end. Entity(java. lang. String name) • and more. . . Read more about Lexical. Handler Interface 69

SAX vs. DOM 70 SAX vs. DOM 70

Parser Efficiency • The DOM object built by DOM parsers is usually complicated and Parser Efficiency • The DOM object built by DOM parsers is usually complicated and requires more memory storage than the XML file itself - A lot of time is spent on construction before use - For some very large documents, this may be impractical • SAX parsers store only local information that is encountered during the serial traversal • Hence, programming with SAX parsers is, in general, more efficient 71

Programming using SAX is Difficult • In some cases, programming with SAX is difficult: Programming using SAX is Difficult • In some cases, programming with SAX is difficult: - How can we find, using a SAX parser, elements e 1 with ancestor e 2? - How can we find, using a SAX parser, elements e 1 that have a descendant element e 2? - How can we find the element e 1 referenced by the IDREF attribute of e 2? 72

Node Navigation • SAX parsers do not provide access to elements other than the Node Navigation • SAX parsers do not provide access to elements other than the one currently visited in the serial (DFS) traversal of the document • In particular, - They do not read backwards - They do not enable access to elements by ID or name • DOM parsers enable any traversal method • Hence, using DOM parsers is usually more comfortable 73

More DOM Advantages • DOM object compiled XML • You can save time and More DOM Advantages • DOM object compiled XML • You can save time and effort if you send and receive DOM objects instead of XML files - But, DOM object are generally larger than the source • DOM parsers provide a natural integration of XML reading and manipulating - e. g. , “cut and paste” of XML fragments 74

Which should we use? DOM vs. SAX • If your document is very large Which should we use? DOM vs. SAX • If your document is very large and you only need a few elements – use SAX • If you need to manipulate (i. e. , change) the XML – use DOM • If you need to access the XML many times – use DOM (assuming the file is not too large) 75