3411bcf11fcc0f1c2b8a18bede8f28c2.ppt
- Количество слайдов: 25
NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum CEO Inera Incorporated Journal Article Tag Suite Conference, 1 November 2010 Copyright 2010 Inera Incorporated. All Rights Reserved
Remember When… Copyright 2010 Inera Incorporated. All Rights Reserved
Ca al m edic P re br n M si a ersc idge ch Am. Oxf s r ord Un a U Re iv ic P res al nivere srisem ityt ion ts ad y a Na Ac nad Ca r er ge uw rin Lk Sp ess Pr ic Scholarly DTDs, Circa 2001 dem ca A ess Pr ic A m G e e r o i P ca h n y s o ic IE E One al ag A o. E ic Bim e ric. Ch ure C h Nat em U anss ic P e alr Elsevier Keton Camd Elsevier 1. 1. 0 Capital Ne Elsevier us 2. 1. 1 y Charleswor w e. Elsevier 4. 1 Jo City Eln 3. 0. 0 Alden ur Wi g ISO th na la nd Highwire l Blackwell PMC 1. 0 12083 AIP 4. 2. 8 Blackwell 2. 2 Blackwell UCP 3. 0 4. 0 Copyright 2010 Inera Incorporated. All Rights Reserved
Scholarly DTDs 2010 u NLM · · DTD Elsevier DTD Springer DTD Wiley-Blackwell DTD And a few others… u No longer a grand mess, but… · NLM DTD Suite applications vary · Specific tagging practices meet publisherspecific requirements Copyright 2010 Inera Incorporated. All Rights Reserved
Data and Methodology u Data from 25 e. Xtyles and ref. Xpress implementations since 2003 u Not a scientific survey u However useful to show NLM DTD usage variations u Supplier requirements differ from publishers · Serve multiple publishers who deliver to different platforms Copyright 2010 Inera Incorporated. All Rights Reserved
NLM DTD Adoption By Year Organization Publisher 1 Publisher 2 Publisher 3 Publisher 4 Publisher 5 Publisher 6 Publisher 7 Publisher 8 Publisher 9 Publisher 10 Publisher 11 Publisher 12 Publisher 13 Publisher 14 Publisher 15 Publisher 16 Publisher 17 Publisher 18 Publisher 19 Publisher 20 Publisher 21 JATS-con Supplier 1 Supplier 2 Supplier 3 DTD Year Version Prior XML Archive * Archive Publish & book Book * Publish Archive Publish & book Publish Book Publish * Archive Publish * Authoring Publish Book 2003 2005 2006 2007 2007 2008 2009 2010 2010 2008 2007 2010 3. 0 † 2. 0 2. 3 2. 2 2. 3 2. 3 3. 0 2. 3 3. 0 No No Yes No No No No Yes Yes No Yes * Customized version of DTD beyond OASIS-CALS addition † Upgraded from 1. 0 to 3. 0 in 2010 Copyright 2010 Inera Incorporated. All Rights Reserved
Year of DTD Adoption u Few implementations prior to 2006 · Mostly related to PMC deposit u Adoption rate grows in 2006 and later · Maturity of version 2. 0 in August 2004 · Greater public awareness by 2006 - Freely available and modifiable - Flexible - Not just for life science content · More off-the-shelf tool support from NCBI and others u 3. 0 upgrade not automatic; not fully backwards compatible Copyright 2010 Inera Incorporated. All Rights Reserved
Prior Markup Experience u Most had not used full-text XML or SGML · Driven to NLM DTD for: - More modern XML-based workflow - Desire for full-text to drive HTML and archive needs - PMC deposit u Those with SGML experience · SGML to XML conversion choice - Convert existing DTD to XML - Adopt NLM DTD Copyright 2010 Inera Incorporated. All Rights Reserved
DTD Selection u Most adopters use Journal Publishing (blue) DTD u Early adopters chose Archive and Interchange (green) DTD · Blue was too restrictive prior to 2. 0 · ISSN optional in green; hosts non-serial publications without modification u Book DTD use growing in recent years · Not as mature as journals, but useful Copyright 2010 Inera Incorporated. All Rights Reserved
Organization Publisher 1 Publisher 2 Publisher 3 Publisher 4 Publisher 5 Publisher 6 Publisher 7 Publisher 8 Publisher 9 Publisher 10 Publisher 11 Publisher 12 Publisher 13 Publisher 14 Publisher 15 Publisher 16 Publisher 17 Publisher 18 Publisher 19 Publisher 20 Publisher 21 JATS-con Supplier 1 Supplier 2 Supplier 3 Implementation Characteristics Char Encoding Math Tables List Labels Ref PCDATA ISO ISO Unicode Unicode Unicode Unicode Unspecified Unicode ISO Unicode Math. ML Graphic Math. ML Te. X Graphic Math. ML Graphic Graphic NA Math. ML Te. X Math. ML+graphic HTML CALS HTML HTML HTML CALS NA CALS HTML CALS DROP DROP KEEP DROP DROP KEEP NA KEEP DROP KEEP KEEP DROP DROP KEEP KEEP NA KEEP KEEP Copyright 2010 Inera Incorporated. All Rights Reserved
Character Encoding u Most implementations use Unicode entities (e. g. , &#x 03 B 2; ) · Quasi-human readable (unlike UTF-8) u Some use ISO entities (e. g. β ) · Most human-readable · But Transform required for HTML Copyright 2010 Inera Incorporated. All Rights Reserved
Generated and Boilerplate text u Generated Text: · Inconsequential, formulaic, or stereotypical text, punctuation, and formatting omitted from an XML file, which is applied to content by a style sheet when an XML file is rendered u Boilerplate Text: · Inconsequential, formulaic, or stereotypical text, punctuation, and formatting that could have been omitted but which the publisher has chosen to keep in the XML file rather than to generate with a style sheet Copyright 2010 Inera Incorporated. All Rights Reserved
NLM DTD Structure u NLM DTD is flexible · Permits generated or boilerplate text u Degree varies by tag set · Green DTD allows greatest degree of Boilerplate Text · Includes the <x> element u Hypothesis: Flexibility of generated versus boilerplate text increased NLM DTD adoption Copyright 2010 Inera Incorporated. All Rights Reserved
List Labels u List-type attribute carries format information u Most publishers don’t keep list label · Possibly because HTML excludes list label u Books are an exception · List label useful for dis-continuous lists (e. g. items 1 to 4, intervening text, then items 5 to 8) Copyright 2010 Inera Incorporated. All Rights Reserved
Early Reference Models u Versions 1. 0 through version 2. 3 had the <citation> and <nlm-citation> elements · <citation> allowed PCDATA and any element order · <nlm-citation> allowed only elements in proscribed order u No way to restrict PCDATA without enforcing element order · Problematic when mixing parsed and Copyright 2010 Inera Incorporated. All Rights Reserved unparsed references (e. g. gray literature)
Reference Tagging 3. 0 u <mixed-citation> and <element-citation> · Former allows PCDATA · Latter allows only semantic elements · Neither proscribes order Copyright 2010 Inera Incorporated. All Rights Reserved
Reference Tagging u Most u All publishers keep PCDATA suppliers keep PCDATA u Reasons · Less style sheet setup (PDF, HTML, etc. ) · PCDATA can easily be dropped · Suppliers: multiple publisher styles require less setup Copyright 2010 Inera Incorporated. All Rights Reserved
PCDATA Correlations u All element-citation users drop list labels u Some mixed-citation users drop list labels u Publishers decide on boilerplate text on per-element basis, not global all or nothing Copyright 2010 Inera Incorporated. All Rights Reserved
Math & Tables by Comp Application Organization Composition Application Math Tables Publisher 8 Publisher 21 Publisher 6 Supplier 2 Publisher 1 Supplier 1 Publisher 5 Publisher 11 Publisher 4 Publisher 19 Publisher 2 Publisher 3 Publisher 15 Publisher 16 Publisher 18 Publisher 13 Publisher 14 Supplier 3 3 B 2 3 B 2 & In. Design Antenna House Frame In. Design In. Design/Typefi Graphic Math. ML Te. X Math. ML Graphic Graphic Math. ML+graphic CALS HTML CALS HTML HTML CALS Publisher 7 JATS-con Publisher 20 Publisher 17 Publisher 9 Publisher 12 Publisher 10 In. Design/Typefi NA NA PDF from Word Ventura Graphic Math. ML NA Graphic Math. ML Graphic HTML NA HTML Copyright 2010 Inera Incorporated. All Rights Reserved
Table Markup u XHTML is default NLM DTD model u CALS requires DTD modification · CALS has cell borders and table groups · In. Design & Frame support CALS, but not XHTML tables · 3 B 2 users seem to prefer CALS tables · Must be converted to XHTML for online delivery u Theory: publishers adopt CALS when more appropriate for PDF/print Copyright 2010 Inera Incorporated. All Rights Reserved composition systems
Math Markup u NLM DTD permits Math. ML, Te. X, pointers to graphic files u Math. ML is native XML markup, but… · Math. ML has limited browser support - Firefox is good; Safari is OK; IE has no Math. ML support - Most publishers deliver online math as images · Math. ML has limited composition support - In. Design does not have native Math. ML rendering - 3 B 2 native rendering is Te. X u Math model driven by PDF creation requirements Copyright 2010 Inera Incorporated. All Rights Reserved
Composition and Hosting Organization Publisher 1 Publisher 2 Publisher 3 Publisher 4 Publisher 5 Publisher 6 Publisher 7 Publisher 8 Publisher 9 Publisher 10 Publisher 11 Publisher 12 Publisher 13 Publisher 14 Publisher 15 Publisher 16 Publisher 17 Publisher 18 Publisher 19 Publisher 20 Publisher 21 JATS-con Supplier 1 Supplier 2 Supplier 3 Comp Application Comp Location Online PMC 3 B 2 In. Design Frame 3 B 2 & In. Design 3 B 2 In. Design/Typefi 3 B 2 PDF from Word Ventura Antenna House PDF from Word In. Design/Typefi In. Design PDF from Word In. Design NA 3 B 2 In. Design/Typefi Outsource In-House Outsource In-House In-House In-House In-House NA Supplier Self-hosted Self-hosted Highwire Self-hosted Self-hosted Self-hosted Various No Yes Yes Yes No No No Yes No No Some No No Copyright 2010 Inera Incorporated. All Rights Reserved
Composition and Online Hosting u Majority of users · Typeset in-house · Self-host online version u PMC delivery requirement for half of users u However… this correlation may be significant only among organizations that have chosen to create XML inhouse Copyright 2010 Inera Incorporated. All Rights Reserved
Conclusions u NLM DTD flexibility led to broader adoption · Application of DTD can be adjusted to meet needs of specific publishing requirements or tools u NLM DTD standard facilitates in-house XML implementation · Eliminates R&D requirement to create a DTD · Customizable off-the-shelf tools available Copyright 2010 Inera Incorporated. All · Cost-effective solution for small and Rights Reserved
Questions? Bruce Rosenblum Inera Incorporated +1 (617) 932 - 1932 brosenblum@inera. com www. inera. com Copyright 2010 Inera Incorporated. All Rights Reserved
3411bcf11fcc0f1c2b8a18bede8f28c2.ppt