4544b9002175fba7d966dab8819ca273.ppt
- Количество слайдов: 51
SAX and more… 1
Resources used for this presentation • The Hebrew University of Jerusalem – CS Faculty. • Wikipedia • An Introduction to XML and Web Technologies – Course’s Literature. • http: //www. saxproject. org/event. html • http: //www. xml. com/ 2
SAX Parser • SAX = Simple API for XML • XML is read sequentially • When a parsing event happens, the parser invokes the corresponding method of the corresponding handler • The handlers are programmer’s implementation of standard Java API (i. e. , interfaces and classes) 3
. . . SAX Parser When you see" src="https://present5.com/presentation/4544b9002175fba7d966dab8819ca273/image-14.jpg" alt="SAX Parsers xml version="1. 0"? >. . . SAX Parser When you see" />
SAX Parsers xml version="1. 0"? >. . . SAX Parser When you see the start of the document do … When you see the start of an element do … When you see the end of an element do … 14
Used to create a SAX Parser Handles document events: start tag, end tag, etc. Handles Parser Errors Handles DTD Handles Entities 15
Creating a Parser • The SAX interface is an accepted standard • There are many implementations of many vendors - Standard API does not include an actual implementation, but Sun provides one with JDK • We would like to be able to change the implementation used, without changing any code in the program - How is this done? 16
System Properties • Properties are configuration values managed as key/value pairs. In each pair, the key and value are both String values. • The System class maintains a Properties object that describes the configuration of the current working environment. - System properties include information about the current user, the current version of the Java runtime, and the character used to separate components of a file path name. • Java standard System Properties. 17
Factory Design Pattern • Have a “factory” class that creates the actual parsers - org. xml. sax. helpers. XMLReader. Factory • The factory checks configurations, mainly the value of a system property, that specify the implementation • In order to change the implementation, simply change the system property Read more about XMLReader. Factory Class 18
The factory will look for this… Creating a SAX Parser import org. xml. sax. *; import org. xml. sax. helpers. *; public class Echo. With. Sax { public static void main(String[] args) throws Exception { Implements System. set. Property("org. xml. sax. driver", "org. apache. xerces. parsers. SAXParser"); XMLReader (not a must) XMLReader reader = XMLReader. Factory. create. XMLReader(); reader. parse("world. xml"); } } Read more about XMLReader Interface, Xerces SAXParser class 19
Implementing the Content Handler • A SAX parser invokes methods such as start. Document, start. Element and end. Element of its content handler as it runs • In order to react to parsing events we must: - implement the Content. Handler interface - set the parser’s content handler with an instance of our Content. Handler implementation 20
Content. Handler Methods • start. Document - parsing begins • end. Document - parsing ends • start. Element - an opening tag is encountered • end. Element - a closing tag is encountered • characters - text (CDATA) is encountered • ignorable. Whitespace - white spaces that should be ignored (according to the DTD) • and more. . . Read more about Content. Handler Interface 21
The Default Handler • The class Default. Handler implements all handler interfaces (usually, in an empty manner) - i. e. , Content. Handler, Entity. Resolver, DTDHandler, Error. Handler • An easy way to implement the Content. Handler interface is to extend Default. Handler Read more about Default. Handler Class 22
A Content Handler Example import org. xml. sax. helpers. Default. Handler; import org. xml. sax. *; public class Echo. Handler extends Default. Handler { int depth = 0; public void print(String line) { for(int i=0; i
A Content Handler Example public void start. Document() throws SAXException { print("BEGIN"); } public void end. Document() throws SAXException { print("END"); } public void start. Element(String ns, String local. Name, String q. Name, Attributes attrs) throws SAXException { We will discuss this print("Element " + q. Name + "{"); interface later… ++depth; for (int i = 0; i < attrs. get. Length(); ++i) print(attrs. get. Local. Name(i) + "=" + attrs. get. Value(i)); } 24
A Content Handler Example public void end. Element(String ns, String l. Name, String q. Name) throws SAXException { --depth; print("}"); } public void characters(char buf[], int offset, int len) throws SAXException { String s = new String(buf, offset, len). trim(); ++depth; print(s); --depth; } } 25
Fixing The Parser public class Echo. With. Sax { public static void main(String[] args) throws Exception { XMLReader reader = XMLReader. Factory. create. XMLReader(); reader. set. Content. Handler(new Echo. Handler()); reader. parse("world. xml"); } } What would happen without this line? 26
Empty Elements • What do you think happens when the parser parses an empty element?
Attributes Interface • The Attributes interface provides an access to all attributes of an element - get. Length(), get. QName(i), get. Value(i), get. Type(i), get. Value(qname), etc. #attributes • The following are possible types for attributes: CDATA, IDREF, IDREFS, NMTOKENS, ENTITY, ENTITIES, NOTATION Read more about Attributes Interface 28
Error. Handler Interface • We implement Error. Handler to receive error events (similar to implementing Content. Handler) • Default. Handler implements Error. Handler in an empty fashion, so we can extend it (as before) • An Error. Handler is registered with - reader. set. Error. Handler(handler); • Three methods: - void error(SAXParse. Exception ex); - void fatal. Error(SAXParser. Excpetion ex); - void warning(SAXParser. Exception ex); 29
Parsing Errors • Fatal errors disable the parser from continuing parsing - For example, the document is not well formed, an unknown XML version is declared, etc. • Errors (that is recoverable ones) occur for example when the parser is validating and validity constrains are violated • Warnings occur when abnormal (yet legal) conditions are encountered - For example, an entity is declared twice in the DTD Read more about Error. Handler Interface 30
Entity. Resolver and DTDHandler • The interface Entity. Resolver enables the programmer to specify a new source for translation of external entities e. g. external DTD • The interface DTDHandler enables the programmer to react to notations (unparsed entities) declarations inside the DTD ( Notation Example ) Usually appear with external non-xml resources and describe their type Read more about Entity. Resolver Interface, DTDHandler Interface 31
Features and Properties • SAX parsers can be configured by setting their features and properties • Syntax: - reader. set. Feature("feature-url", boolean) - reader. set. Property("property-url", Object) • Standard feature URLs have the form: http: //xml. org/sax/features/feature-name • Standard property URLs have the form http: //xml. org/sax/properties/prop-name 32
Feature/Property Examples • Features: - namespaces - are namespaces supported? - validation - does the parser validate (against the declared DTD) ? - http: //apache. org/xml/features/nonvalidating/load-external-dtd • Ignore the DTD? (spec. to Xerces implementation) Read more about Features • Properties: - lexical-handler - see the next slide. . . Read more about Properties 33
Lexical Events • Lexical events have to do with the way that a document was written and not with its content • Examples: - A comment is a lexical event () - The use of an entity is a lexical event (> ) • These can be dealt with by implementing the Lexical. Handler interface, and setting the lexicalhandler property to an instance of the handler 34
Lexical. Handler Methods • comment(char[] ch, int start, int length) • start. CDATA() • end. CDATA() • start. Entity(java. lang. String name) • end. Entity(java. lang. String name) • and more. . . Read more about Lexical. Handler Interface 35
Different Approaches 36
The DOM Approach • Tree-based API : map an XML document into an internal tree structure, then allow an application to navigate that tree. • The application is active. • Provides random-access. • Remember that it is possible to construct a parse tree using an event-based API, and it is possible to use an event-based API to traverse an in-memory tree. (Actually DOM is a level above SAX) 37
The DOM Tree 39
The SAX Approach • Event based API. • The Application is passive. • Provides Serial access parser. Given the following XML document: This XML document, when passed through a SAX parser, will generate the following sequence of events (Pushing)… 40
• XML Processing Instruction, named xml, with attributes version equal to "1. 0" and encoding equal to "UTF-8" • XML Element start, named Root. Element, with an attribute param equal to "value" • XML Element start, named First. Element • XML Text node, with data equal to "Some Text" (note: text processing, with regard to spaces, can be changed) • XML Element end, named First. Element • XML Element start, named Second. Element, with an attribute param 2 equal to "something" • XML Text node, with data equal to "Pre-Text" • XML Element start, named Inline • XML Text node, with data equal to "Inlined text" • XML Element end, named Inline • XML Text node, with data equal to "Post-text. " • XML Element end, named Second. Element • XML Element end, named Root. Element 41
Pull vs. Push • SAX is known as a push framework - the parser has the initivative - the programmer must react to events • An alternative is a pull framework - the programmer has the initiative - the parser must react to requests 42
St. AX - Streaming API for XML • An API to read and write XML documents in the Java programming language (not like DOM & SAX that were written for Parsing). • The St. AX approach: the programmatic entry point is a cursor that represents a point within the document. The application moves the cursor forward - 'pulling' the information from the parser as it needs. • Thus, the application is active. • Not part of the standard Java API. 43
• For example, here's a simple bit of code that iterates through an XML document and prints out the names of the different elements it encounters: • Here's the beginning of the output when I ran this across a simple well-formed HTML file: html head title meta link meta. . . 44
JDOM • An implementation of generic XML trees in Java. • JDOM provides a way to represent XML document for easy and efficient reading, manipulation, and writing. • one can construct a tree of elements, then generate a XML file from it, like: 45
SAX vs. DOM A more detailed comparison 46
Parser Efficiency • The DOM object built by DOM parsers is usually complicated and requires more memory storage than the XML file itself - A lot of time is spent on construction before use - For some very large documents, this may be impractical • SAX parsers store only local information that is encountered during the serial traversal • Hence, programming with SAX parsers is, in general, more efficient 47
Programming using SAX is Difficult • In some cases, programming with SAX is difficult: - How can we find, using a SAX parser, elements e 1 with ancestor e 2? - How can we find, using a SAX parser, elements e 1 that have a descendant element e 2? - How can we find the element e 1 referenced by the IDREF attribute of e 2? 48
Node Navigation • SAX parsers do not provide access to elements other than the one currently visited in the serial (DFS) traversal of the document • In particular, - They do not read backwards - They do not enable access to elements by ID or name • DOM parsers enable any traversal method • Hence, using DOM parsers is usually more comfortable 49
More DOM Advantages • You can save time and effort if you send and receive DOM objects instead of XML files - But, DOM object are generally larger than the source • DOM parsers provide a natural integration of XML reading and manipulating - e. g. , “cut and paste” of XML fragments 50
Which should we use? DOM vs. SAX • If your document is very large and you only need a few elements – use SAX • If you need to manipulate (i. e. , change) the XML – use DOM • If you need to access the XML many times – use DOM (assuming the file is not too large) 51


