Web Standards and the Hy. Li. Fe Project (including authentication and distributed searching) Brian Kelly Email Address UK Web Focus B. Kelly@ukoln. ac. uk UKOLN URL University of Bath 1 http: //www. ukoln. ac. uk/ UKOLN is funded by the British Library Research and Innovation Centre, the Joint Information Systems Committee of the Higher Education Funding Councils, as well as by project funding from the JISC’s Electronic Libraries Programme and the European Union. UKOLN also receives support from the University of Bath where it is based.
UK Web Focus / W 3 C UK Web Focus: • JISC funded post based at UKOLN (Bath Univ) • Advises UK HE community on web issues • Represents JISC on W 3 C (World Wide Web Consortium): • International consortium, with headquarters at MIT, INRIA and Keio University (Japan) • Coordinates development of web protocols • Four domains: • Architecture • Technology & Society • User Interface • Web Accessibility 2
What Are Your Interests? What interests do you have in web standards and technologies? 3
Contents • Introduction • Web Standards Overview • Web Standards: • Data Formats • Transport • Addressing • • • 4 Metadata Distributed Searching Authentication Deployment Issues Questions Aims of Talk • To give brief overview of web architecture • To describe developments to web standards • To review emerging developments with metadata, distributed searching and authentication • To briefly address implementation models
Standardisation HTML extensions PDF and Java? W 3 C 5 • Produces W 3 C Recommendations on Web protocols • Managed approach to developments • Protocols initially developed by W 3 C members • Decisions made by W 3 C, influenced by member and public review PNG HTML HTTP Proprietary • De facto standards • Often initially appealing (cf Power. Point, PDF) PNG • May emerge as HTML ISO standards • Produces ISO Z 39. 50 Java? Standards • Can be slow moving and bureaucratic • Produce robust IETF standards • Produces Internet Drafts on Internet protocols • Bottom-up approach to developments • Protocols developed by HTTP interested individuals URN • "Rough consensus and working whois++ code"
The Web Vision Tim Berners-Lee's (and W 3 C's) vision for the Web: 6 • Evolvability is critical • Automation of information management: If a decision can be made by machine, it should • All structured data formats should be based on XML • Migrate HTML to XML • All logical assertions to map onto RDF model • All metadata to use RDF See keynote talk at WWW 7 conference at <URL: http: //www. w 3. org/Talks/1998/ 0415 -Evolvability/slide 1 -1. htm>
Web Protocols Web initially based on three simple protocols: Data Format HTML Addressing. Transport URL HTTP • Data Formats HTML (Hyper. Text Markup Language) provides the data format for native documents • Addressing URLs (Uniform Resource Locator) provides an addressing mechanism for web resources • Transport HTTP (Hyper. Text Transfer Protocol) defines transfer of resources between client and server 7
HTML History 1992 1994 1995 1997 1998 8 HTML 1. 0 HTML 2. 0 Dilemma Proprietary extensions cause problems. But experiments are needed Unpublished specification. Spec. based on innovations from NCSA (forms and inline images!) HTML 3. 0 Proposed spec. (renamed from HTML+). Very comprehensive Failed to complete IETF standardisation Little implementation experience Proprietary Introduction of proprietary HTML elements by Netscape and Microsoft HTML 3. 2 Spec. based on description of mainstream innovations in marketplace HTML 4. 0 Current recommendation
HTML 4. 0, CSS 2. 0 and DOM HTML 4. 0 used in conjunction with CSS 2. 0 (Cascading Style Sheets) and the DOM provides an architecturally pure, yet functionally rich environment HTML 4. 0 - W 3 C-Rec • Improved forms • Hooks for stylesheets • Hooks for scripting languages • Table enhancements • Better printing CSS Problems • Changes during CSS development • Netscape & IE incompatibilities • Continued use of browsers with known bugs 9 CSS 2. 0 - W 3 C-Rec • Support for all HTML formatting • Positioning of HTML elements • Multiple media support DOM - W 3 C-Rec • Document Object Model • Hooks for scripting languages • Permits changes to HTML & CSS properties and content
HTML Limitations HTML 4. 0 / CSS 2. 0 have limitations: • Difficulties in introducing new elements – Time-consuming standardisation process (<ABBREV>) – Dictated by browser vendor (<BLINK>, <MARQUEE>) • Area may be inappropriate for standarisation: – Covers specialist area (maths, music, . . . ) – Application-specific (<STUD-NUM>) • HTML is a display (output) format • HTML's lack of arbitrary structure limits functionality: 10 – Find all memos copied to John Smith – How many unique tracks on Jackson Browne CDs
XML XML: • • Extensible Markup Language A lightweight SGML designed for network use Addresses HTML's lack of evolvability Arbitrary elements can be defined (<STUDENTNUMBER>, <PART-NO>, etc) • Agreement achieved quickly - XML 1. 0 became W 3 C Recommendation in Feb 1998 • Support from industry (SGML vendors, Microsoft, etc. ) • Support in Netscape 5 and IE 5 11
XML Concepts Well-formed XML resources: Make end-tags explicit: <LI>. . . </LI> Make empty elements explicit: <IMG. . . /> Quote attributes <IMG SRC="logo" HEIGHT="20" Use consistent upper/lower case Valid XML resources: Need DTD XML Namespaces: Mechanism for ensuring unique XML elements: 12 <? xmlns: FOO="http: //foo. org/ 1998 -001" prefix="i"> <P>Insert <i: PART>M-471</i: PART></P>
XML Deployment Ariadne issue 15 has article on "What Is XML? " Describes how XML support can be provided: • Natively by new browsers • Back end conversion of XML - HTML • Client-side conversion of XML - HTML / CSS • Java rendering of XML Examples of intermediaries 13 See http: //www. ariadne. ac. uk/issue 15/what-is/
XLink, XPointer and XSL XLink will provide sophisticated England hyperlinking missing in HTML: France • Links that lead user to multiple destinations • Bidirectional links • Links with special behaviors: – Expand-in-place / Replace / Create new window – Link on load / Link on user action <commentary xml: link="extended" inline="false"> • Link databases <locator href="smith 2. 1" role="Essay"/> <locator href="jones 1. 4" role="Rebuttal"/> XPointer will provide <locator href="robin 3. 2" role="Comparison"/> access to arbitrary </commentary> portions of XML resource XSL stylesheet language will provide extensibility and transformation facilities (e. g. create a table of contents) 14
XML Update Data / Schemas XML-Data: Submitted to W 3 C Jan 98 (Obsolete? ) Document Content Description: Submitted Aug 98 XSchema: Independent effort Programming Interface DOM level 1: W 3 C Recommendation, May 98 Style & Presentation CSS level 2: W 3 C Recommendation, May 98 Extensible Style Language: Working Draft, Aug 98 Relationship to Other Resources XLink , XPointer: Working Drafts, Mar 98 XML Namespaces: Working Draft, Aug 98 Query Languages 15 XML Query Language: Submitted to W 3 C Aug 98 XQL: Independent effort
Addressing URLs (e. g. http: //www. bristol-poly. ac. uk/depts/music/) have limitations: • Lack of long-term persistency – Organisation changes name – Department shut down or merged – Directory structure reorganised • Inability to support multiple versions of resources (mirroring) URNs (Uniform Resource Names): • Proposed as solution • Difficult to implement (no W 3 C activity in this area) 16
Addressing - Solutions DOIs (Document Object Identifiers): • Proposed by publishing industry as a solution • Aimed at supporting rights ownership • Business model needed PURLs (Persistent URLs): • Provide single level of redirection Pragmatic Solution: • URLs don't break - people break them • Design URLs to have long life-span Further information: 17 <URL: http: //www. ukoln. ac. uk/metadata/resources/urn/> <URL: http: //hosted. ukoln. ac. uk/biblink/wp 2/ links. html>
Transport HTTP/0. 9 and HTTP/1. 0: L Design flaws and implementation problems HTTP/1. 1: J J J K L Addresses some of these problems 60% server support Performance benefits! (60% packet traffic reduction) Is acting as fire-fighter Not sufficiently flexible or extensible HTTP/NG: J J 18 Radical redesign using object-oriented technologies Undergoing trials Gradual transition (using proxies) Integration of application (distributed searching? )
Metadata - the missing architectural component from the initial implementation of the web DF R a - N, dat TC eta S, Addressing M IC ig, S P F, D URL C. . . , M DC Metadata Needs: 19 • • • Resource discovery Content filtering Authentication Improved navigation Multiple format support Rights management Transport Data format HTTP HTML
Metadata Examples DSig (Digital Signatures initiative): • Key component for providing trust on the web • DSig 2. 0 will be based on RDF and will support signed assertion: – This page is from the University of Bath – This page is a legally-binding list of courses provided by the University P 3 P (Platform for Privacy Preferences): • Developing methods for exchanging Privacy Practices of Web sites and user Note that discussions about additional rights management metadata are currently taking place 20
RDF (Resource Description Framework): • Highlight of WWW 7 conference • Provides a metadata framework ("machine understandable metadata for the web") • Based on ideas from content rating (PICS), resource discovery (Dublin Core) and site mapping (MCF) • Applications include: – – 21 cataloging resources – resource discovery electronic commerce – intelligent agents digital signatures – content rating intellectual property rights – privacy • See <URL: http: //www. w 3. org/ Talks/1998/0417 -WWW 7 -RDF>
RDF Model RDF Data Model RDF: • Based on a formal data model (direct label graphs) • Syntax for interchange of data • Schema model page. html Cost Resource Property Prop. Name 22 Cost Value Property page. html £ 0. 05 Prop. Obj Instance. Of Property. Type Cost £ 0. 05 Valid. Until 11 -May-98 Value Valid. Until 11 -May-98 Note names may change before release of W 3 C recommendations
RDF Example of Dublin Core metadata in RDF <rdf: RDF xmlns: rdf="http: //www. w 3. org/TR/WD-rdf-syntax#" xmlns: dc="http: //purl. org/dc/elements/1. 0/"> <rdf: RDF> <rdf: Description RDF: HREF="page. html"> <dc: Creator>John Smith</dc: Creator> <dc: Title>John’s Home Page</dc: Title> </rdf: Description> </rdf: RDF> 23
Browser Support for RDF Trusted Mozilla (Netscape's 3 rd source code release) Party provides support for Metadata RDF. Mozilla supports site maps in RDF, as well as bookmarks and history lists Embedded See Netscape's or Metadata Hot. Wired home page e. g. sitemaps for a link to the RDF file. Image from http: //purl. oclc. org/net/eric/talks/www 7/devday/ 24
RDF Conclusion · RDF is a general-purpose framework · RDF provides structured, machineunderstandable metadata for the Web · Metadata vocabularies can be developed without central coordination · Role for e. Lib projects in defining schemas? · RDF Schemas describe the meaning of each property name · Signed RDF is the basis for trust 25
Distributed Searching Distributed searching important for the DNER (Distributed National Electronic Resource) http: //prospero. ahds. ac. uk: 8080/ahds_live/ ROADS prototype provides cross-searching using whois++ 26 AHDS prototype provides cross -searching using Z 39. 50
Distributed Searching Issues Providing access to resources by software rather than by humans raises several issues: • • Loss of visibility of service / value-added web services Possible performance problems Information overload Finding the service Solutions: • Giving visibility and pointers in results sets • Service metadata: – Service only available for cross-searching by non AC. UK users outside peak hours • Need for agreed metadata standards (profiles, rights issues, …) 27
Collection Description Work Collection Description Group: • UKOLN involvement in producing list of attributes for collection level description (in the library, museum, archival sense), which includes databases of Internet resource descriptions such as SOSIG. • Work of interest to clumps and hybrid libraries. • WG membership: Dan Brickley (ROADS), Andy Powell (ROADS), Matthew Dovey (Music Online, MALIBU), Verity Brack (RIDING), Dennis Nicholson (BUBL/CAIRNS) and David Kay (FD) • See <URL: http: //www. ukoln. ac. uk/metadata/cld/> • Collection Description e. Lib supporting study due out in Oct. Will define attribute set (cf Dublin Core) 28
Relevant Protocols Number of formats and protocols could be used to implement distributed searching: • Z 39. 50 ISO standard. Well-known in library world, but heavy -weight • whois++ Lightweight IETF standard. Used in several ANR gateways, but not widely deployed • LDAP Lightweight version of X. 500 directory service. • HTTP/NG? Opportunity to develop new solution using objectoriented technologies based on above experiences? 29
Protocols & Collections Which formats and protocols are relevant to collection descriptions for use by software developers? XML: Structured data formats should be based on XML W 3 C RDF: All metadata applications should be based on RDF - W 3 C IETF Web. Dav: Requirement for distributed authoring include author metadata and collection definitions. 30
IETF Web. Dav: • Web Distributed Authoring and Versioning • An IETF Application Area • Relevant proposals: – "Web. DAV Advanced Collections Protocol" – "Requirements for Advanced Collection Functionality in Web. DAV " – "Requirements for DAV Searching and Locating" • See <URL: http: //www. ietf. org/ 31 html. charters/webdav-charter. html> and <URL: http: //www. ietf. org/ids. by. wg/ webdav. html>
How Metadata Could Be Used Database Description • Music resources, including. . . Policy (Terms & Conditions / Resource and Service) • For licensing reasons, access is restricted to authorised HEIs • For performance reasons, access restricted to UK HEI between 9. 00 -17. 00 • The service logo must be included in results set, unless results only come from service • Permission for cross-searching restricted to other e. Lib projects • You're only allowed to link to the main entry point Individual • Give me HTML or PDF resources, not Word, … • I'm blind. Include ACSS in results and deliver a sitemap 32 Client Software • My browser doesn't support XML, so send me HTML
Deployment Models Today integration with crosssearching services uses Web technologies such as CGI server on top of HTTP. It is difficult to provide rich functionality, due to the simplicity of HTML and HTTP. RDF defn. Explain database Loose integration Z 39. 50 server whois++ server Centroids RDF defn. HTTP/NG may provide closer integration between Web Distributed applications and the web. Server Searching 33 NOTE need for open authentication system (public key infrastructure / DSig? ) RDF defn.
What's Needed? In order to deploy distributed cross-searching in an open, application-independent way we need: • Metadata in a machine-readable format - RDF • Syntax for describing the metadata - see RDF pages at <URL: http: //www. w 3. org/RDF/> • Language for processing metadata - see XML-QL, A Query Language for XML at <URL: http: //www. w 3. org/Submission/1998/12/> • An open authentication infrastructure Issues: • Timescales • Costs • Software support • Protocol support • Short-term pragmatic solution vs long-term purer solution 34
Authentication Deployment of an open, scaleable, flexible authentication system is difficult & expensive Current solutions include: • Server-based username and password schemes • IP-based schemes • Athens - Based on replicated Sybase application See <URL: http: //www. athens. ac. uk/> • W 3 C DSig work - Digital Signatures Initiative. See <URL: http: //www. w 3. org/DSig/> • Other Public Key developments - e. g. reports of Post Office involvement, statements from Tony Blair, EU, . . 35 "In May 1998 the Commission published its proposal for a "European Parliament and Council Directive on a Common Framework for Electronic Signatures" (COM(1998)297). "
Certificates http: //www. verisign. com/ Should we be looking into using commerciallysupported digital ids, such as Verisign's? • Can purchase server ID for $349 • End user certificates available 36
Browser Support Browsers such as IE provide support for certificates: Use certificates to positively identify yourself, certificate authorities and publishers Trust sites, people and publishers with credentials issued by the following Certifying Authorities 37 You have designated the following software publishers and credential agencies as trustworthy. Windows software can install software. . certified by these publishers with asking you first
Using Digital Keys Client Server Client initiates a connection hello Server Digital ID The client verifies the server's digital ID. If requested the client sends its digital ID in response to the server's request. When authentication is complete the client sends the server a session key encrypted using the server's public key. 38 Server responds, sending the client its digital ID. The server might also request the client's digital ID for client authentication. Client Digital ID Session Key Diagram taken from a Versign White Paper Once a session key is established, secure communications commence between client & server
Authentication & EU A search for "digital signature" at <URL: http: //www. open. gov. uk/> provided interesting hits: • DTI Briefing Paper on "Encryption and Digital Signatures" at <URL: http: //www. dti. gov. uk/ eurobrief/3 encrypt. htm> • European Internet Forum Policy Papers at <URL: http: //www. ispo. cec. be/eif/#digital> • "Towards A European Framework for Digital Signatures And Encryption" at <URL: http: //www. ispo. cec. be/ eif/policy/97503 toc. html> Will we see development of an open authentication infrastructure funded through Fifth Framework? • See http: //www. cordis. lu/fifth/src/comm. htm 39
Further Information Further Reading: • Microsoft Security Advisor at <URL: http: //microsoft. com/security/> • JISC Reports at <URL: http: //www. jisc. ac. uk/ pub/index. html#issues> • WWW Security FAQ at <URL: http: //www. w 3. org/Security/Faq/> 40
Deployment Issues More sophisticated deployment techniques can be adopted to overcome deficiencies in simple model Original Model HTML resource Web server browser Sophisticated Model HTML / XML / database resource Intelligent Web server Intermediaries can provide functionality not available at client: • DOI support • XML support 41 • Format conversion Web server simply sends file to client File contains redundant information (for old browsers) plus client interrogation support Client proxy browser Server proxy Example of an intermediary
Conclusions To conclude: • Standards are important, especially for national initiatives, such as e. Lib • Proprietary solutions are often tempting because: – – 42 They are available They are often well-marketed and well-supported They may become standardised Solutions based on standards may not be properly supported by applications • Metadata is an important new protocol area • Metadata work to support distributed searching is beginning • Intermediaries may have a role to play in deploying standards-based solutions
