Скачать презентацию The Biodiversity Heritage Library A Knowledge Domain Enterprise Скачать презентацию The Biodiversity Heritage Library A Knowledge Domain Enterprise

ad84bcacefe8a4ea0b50535b0d242728.ppt

  • Количество слайдов: 50

The Biodiversity Heritage Library: A Knowledge Domain Enterprise Community-Driven Open Access Indra Neil Sarkar, The Biodiversity Heritage Library: A Knowledge Domain Enterprise Community-Driven Open Access Indra Neil Sarkar, Ph. D Marine Biological Laboratory Thomas Garnett Smithsonian Institution Libraries Coalition for Networked Information Washington, DC December 11, 2007

Overview • Biodiversity Heritage Library (Tom Garnett) – – Overview Why? How? Sustainability • Overview • Biodiversity Heritage Library (Tom Garnett) – – Overview Why? How? Sustainability • Building Knowledge Links (Neil Sarkar) – – Knowledge Integration Taxonomic Names Management Taxonomic Intelligence Linking to the Encyclopedia of Life 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

The Biodiversity Heritage Library Tom Garnett The Biodiversity Heritage Library Tom Garnett

Biodiversity Heritage Library • Museums – – Field Museum Natural History Museum (London) Smithsonian Biodiversity Heritage Library • Museums – – Field Museum Natural History Museum (London) Smithsonian Institution American Museum of Natural History • Botanical Gardens – Missouri Botanical Garden – New York Botanical Garden – Royal Botanic Gardens, Kew • University Libraries – Botany Libraries, Harvard University – Ernst Meyer Library of the Museum of Comparative Zoology, Harvard University • Research Institute Library – Marine Biological Laboratory / Woods Hole Oceanographic Institution Library (MBL/WHOI) 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library Collaborators: Internet Archive International Commission on Zoological Nomenclature Open Content Alliance Biodiversity Heritage Library Collaborators: Internet Archive International Commission on Zoological Nomenclature Open Content Alliance European Distributed Institute of Taxonomy Global Biodiversity Information Facility (GBIF) Many more under negotiation 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library Mandates: Digitize the core literature on biodiversity. Open Access: all content Biodiversity Heritage Library Mandates: Digitize the core literature on biodiversity. Open Access: all content can be repurposed, reused, reformatted. Congruent: must fit in to a healthy knowledge ecology. Reptilia and Batrachia. (1885 -1902) by Albert C. L. G. Günther 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library –Why? In any well-appointed Natural History Library there should be found Biodiversity Heritage Library –Why? In any well-appointed Natural History Library there should be found every book and every edition of every book dealing in the remotest way with the subjects concerned. One never knows wherein one edition differs from or supplements the other and unless these are on the same table at the same time it is not possible to collate them properly. Moreover for accurate work it is necessary for the student to verify every reference he may find; it is not enough to copy from a previous author; he must verify each reference itself from the original. Charles Davies Sherborn (1861 -1942) 2007 12 11 - Coalition for Networked Information, Washington, DC Charles Davies Sherborn, Epilogue to Index Animalium, March 1922 Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library –Why? Yet another physical difficulty is the task of assembling the Biodiversity Heritage Library –Why? Yet another physical difficulty is the task of assembling the library and indexes which will enable the student to work under proper conditions…. the beginner must now be prepared to spend liberally, or else must establish himself in an institution where a large library exists; if he work by himself with only a few books, he will have to confine himself to a very narrow specialty indeed. Insecta. Diptera. Volume I (1886 -1901) 2007 12 11 - Coalition for Networked Information, Washington, DC 'The Limitations of Taxonomy' by J. M. Aldrich, Science, April 22, 1927, vol. LXV, no. 1686, p. 381 Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library –Why? • The cited half-life of publications in taxonomy is longer Biodiversity Heritage Library –Why? • The cited half-life of publications in taxonomy is longer than in any other scientific discipline -Macro-economic case for open access, ~Tom Moritz • Current taxonomic literature often relies on texts and specimens > 100 years old. Levinus Vincent Elenchus tabularum, pinacothecarum, 1719 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library –Why? The Taxonomic Impediment “The taxonomic impediment is a term that Biodiversity Heritage Library –Why? The Taxonomic Impediment “The taxonomic impediment is a term that describes the gaps of knowledge in our taxonomic system” - Darwin Declaration, 1998 Georges Louis Leclerc, comte de Buffon Histoire naturelle : générale et particulière (Oiseaux), 1799 -1808 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library –Why? Convention on Biological Diversity: Article 17 “… exchange of information Biodiversity Heritage Library –Why? Convention on Biological Diversity: Article 17 “… exchange of information shall include exchange of results of technical, scientific and socio-economic research … It shall also, where feasible, include repatriation of information. ” Henry Bates Insecta. Coleoptera, 1881 -1884 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library –How? • Internet Archive establishes scanning centers in London, New York, Biodiversity Heritage Library –How? • Internet Archive establishes scanning centers in London, New York, Boston, Washington, etc. • High-quality, non-destructive scans. • Image files and text derived from OCR. 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library –How? “Guano diggers among the albatrosses. Laysan Island” What good are Biodiversity Heritage Library –How? “Guano diggers among the albatrosses. Laysan Island” What good are page image files, “dirty OCR”, and some metadata? Researchers are stuck like these guano diggers in Hawaii. Lionel Walter Rothschild The avifauna of Laysan and the neighboring islands, 1893 -1900 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library –How? BHL Portal http: //www. biodiversitylibrary. org Serve image and text Biodiversity Heritage Library –How? BHL Portal http: //www. biodiversitylibrary. org Serve image and text files; create volume, part, piece metadata; ingest page level metadata at scanning level; apply Globally Unique Identifiers (GUIDs) for linking to other taxonomic services Jacob Christian Schäffer Elementa entomologica. . . 1766. 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library Classes of texts Each class presents a unique set of issues Biodiversity Heritage Library Classes of texts Each class presents a unique set of issues to resolve: Public Domain – pre-1923 Post-1923 monographs some with copyright renewals some without copyright renewals Non-profit learned society journals with permissions Commercial journals; Grey literature Archival material; field and expedition notebooks 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library BHL Seeks Permissions from Copyright Holders Opt in Copyright Model: The Biodiversity Heritage Library BHL Seeks Permissions from Copyright Holders Opt in Copyright Model: The BHL will actively work with professional societies and associations to integrate their publications into the BHL in a way that serves the societies’ missions and goals BHL will digitize learned society backfiles and mount them through the BHL Portal at no cost. Will provide a set of files to the publishers for reuse as they see fit. Will index the articles using Taxonomic Intelligence, thereby vastly increasing their usability. 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library Embedding Content in the Knowledge Ecology The BHL is primarily funded Biodiversity Heritage Library Embedding Content in the Knowledge Ecology The BHL is primarily funded as a component of the Encyclopedia of Life, an international effort to create an authoritative website for every species of the earth’s biota. 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library • Legal Sustainability Strategy – Avoid legal conflicts. – Keep copyright Biodiversity Heritage Library • Legal Sustainability Strategy – Avoid legal conflicts. – Keep copyright infringement risk low. It is impossible to eliminate it altogether. – Obtain permissions where feasible. – Where it isn’t feasible, move on. 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library • Scientific and Scholarly Support Strategy – Make it too useful Biodiversity Heritage Library • Scientific and Scholarly Support Strategy – Make it too useful not to support. – Embed it current and developing workflows for the identification, tracking, documenting, and researching the biota. BHL is building on many documented use cases. – Network with many professional societies. – Automated structural markup of journal literature to bring the digitized ocr into conformance with the NLM DTD. 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library • Financial Sustainability Strategy – Quick ramp-up high early costs – Biodiversity Heritage Library • Financial Sustainability Strategy – Quick ramp-up high early costs – development, mass scanning, etc. Drive long-term costs down the asymptote toward zero. – Derive some long-term costs from the operating budgets of the member institutions. (examples under consideration: acquisitions budget, staff positions, etc. ) – Integrate functions/tasks with wider efforts where appropriate, e. g. mass storage. – Clear roles for staff who wear multiple hats. Two full-time grant funded positions currently but >15 staff who make substantive contributions. – Make the BHL absolutely essential. 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library • The Long Now Strategy – Institutions that are creating the Biodiversity Heritage Library • The Long Now Strategy – Institutions that are creating the BHL exist to persist through time. That’s an important part of their business. Use them. – The future is uncertain, the technology landscape changes, people pass on. So create consortial structures that are lowoverhead, flexible, and can respond quickly. F 2 F interaction is surprisingly necessary to create this. 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library • The Long Now Strategy (cont. ) – Take Risks. Why? Biodiversity Heritage Library • The Long Now Strategy (cont. ) – Take Risks. Why? – “We must, indeed, all hang together, or most assuredly we shall hang separately. “ – Interoperability is the key. Repository islands will sink. 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library Embedding Content in the Knowledge Ecology Species names, taxon concepts, and Biodiversity Heritage Library Embedding Content in the Knowledge Ecology Species names, taxon concepts, and the classification of living organisms are the basis for linking multiple disciplines such as evolutionary biology, taxonomy, genomics, agriculture, conservation, etc. Taxonomic intelligence algorithms are being developed to mine the BHL content to link species names with other biological resources. 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

BHL-based Knowledge Integration Neil Sarkar BHL-based Knowledge Integration Neil Sarkar

http: //www. idiagram. com/ideas/knowledge_integration. html 2007 12 11 - Coalition for Networked Information, Washington, http: //www. idiagram. com/ideas/knowledge_integration. html 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Knowledge Integration • • Meet Information Needs Map to Other Knowledge Extract Domain Specific Knowledge Integration • • Meet Information Needs Map to Other Knowledge Extract Domain Specific Features Perform Data Mining for Novel Correlations • Automated Methods – Biomedical (Yes!) – Biodiversity (No!) 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biological Data Revolution Biomedical Knowledge 2007 12 11 - Coalition for Networked Information, Washington, Biological Data Revolution Biomedical Knowledge 2007 12 11 - Coalition for Networked Information, Washington, DC Biodiversity Knowledge Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Literature, Literature • Retrospective Biological Knowledge – Not Just PDF’s! – Biodiversity Heritage Library Literature, Literature • Retrospective Biological Knowledge – Not Just PDF’s! – Biodiversity Heritage Library • Contemporary Biological Knowledge – Titles, Abstracts, Metadata (Me. SH) – Medline • Prospective Biological Knowledge – Track New Literature – Services Integrated Into Interfaces 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

“All accumulated information of a species is tied to a scientific name, a name “All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge. ” ~ Grimaldi & Engel, 2005, Evolution of the Insects 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Names Are Often Misspelled Loligo pealeii Loligo pealei 2007 12 11 - Coalition for Names Are Often Misspelled Loligo pealeii Loligo pealei 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Peranema – the fern Peranema – the euglenid 2007 12 11 - Coalition for Peranema – the fern Peranema – the euglenid 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Who Cares? Libraries Publishers 2007 12 11 - Coalition for Networked Information, Washington, DC Who Cares? Libraries Publishers 2007 12 11 - Coalition for Networked Information, Washington, DC Museums Federal Agencies Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Life on Earth 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Life on Earth 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Names for Life on Earth No Complete List of Scientific Names* 112, 133 741, Names for Life on Earth No Complete List of Scientific Names* 112, 133 741, 872 49, 382 *Scientific Names ≠ Species Published Variants Objective Synonyms Bacterium coli Escherichia coli Bacillus coli 2007 12 11 - Coalition for Networked Information, Washington, DC Mis-spellings Escheria coli Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Taxonomic Knowledge 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett Taxonomic Knowledge 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Scientific Names Management • Collect Scientific Names – Digital Taxonomy Resources – Data Marts Scientific Names Management • Collect Scientific Names – Digital Taxonomy Resources – Data Marts – Natural Language Text • Scientific Name Reconciliation – Many Names for Same Organism • Objective: Escherichia coli, Bacterium coli, Bacillus coli • Subjective: Brucella melitensis, Brucella canis, Brucella ovis – Many Organisms for Same Name • Agathis montana is both a wasp and plant 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

u. Bio • 10. 7 Million+ Name Strings • Reconciliation Groups • http: //www. u. Bio • 10. 7 Million+ Name Strings • Reconciliation Groups • http: //www. ubio. org 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Ogden-Richards Semiotic Triangle Thought/Reference “White” “Blanc” “Weiss” Symbols 2007 12 11 - Coalition for Ogden-Richards Semiotic Triangle Thought/Reference “White” “Blanc” “Weiss” Symbols 2007 12 11 - Coalition for Networked Information, Washington, DC XVFD Referent Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Ogden-Richards Semiotic Triangle Species “Bacterium coli” “Escherichia coli” Scientific Names 2007 12 11 - Ogden-Richards Semiotic Triangle Species “Bacterium coli” “Escherichia coli” Scientific Names 2007 12 11 - Coalition for Networked Information, Washington, DC “E. coli” urn: lsid: ubio. org: namebank: 5369544 LSID Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Taxonomic Intelligence • • • Lexicon of Scientific Names Reconciliation and Disambiguation Hierarchical Inclusion Taxonomic Intelligence • • • Lexicon of Scientific Names Reconciliation and Disambiguation Hierarchical Inclusion Integration into Information Retrieval Linkage to Other Data Types (e. g. , Molecular, Morphological, Phenotype) 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Biodiversity Heritage Library 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biodiversity Heritage Library 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Biodiversity Heritage Library 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Biomedical Knowledge 2007 12 11 - Coalition for Networked Information, Washington, DC Biodiversity Knowledge Biomedical Knowledge 2007 12 11 - Coalition for Networked Information, Washington, DC Biodiversity Knowledge Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Extracting Taxonomic Names • Named Entity Recognition – Taxonomic Name Recognition (TNR) • Current Extracting Taxonomic Names • Named Entity Recognition – Taxonomic Name Recognition (TNR) • Current TNR Tools – Taxon. Grab (AMNH) – Find. IT (u. Bio) – FAT (Karlsruhe) 2007 12 11 - Coalition for Networked Information, Washington, DC Taxon. Finder (u. Bio) Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Tracking Biodiversity Knowledge • Taxonomically Intelligent Applications – Real-time Taxonomic Indexing – RSS – Tracking Biodiversity Knowledge • Taxonomically Intelligent Applications – Real-time Taxonomic Indexing – RSS – Taxonomic Portals 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Encyclopedia of Life • Create one Web page for each species that is currently Encyclopedia of Life • Create one Web page for each species that is currently named (~1. 8 million) • Integrate relevant literature (e. g. , BHL) • BHL represented on EOL Board • $25 M of funding in place 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

The Encyclopedia of Life www. EOL. org 2007 12 11 - Coalition for Networked The Encyclopedia of Life www. EOL. org 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Acknowledgments Christopher Freeland Martin Kalfatovic Graham Higley BHL & EOL Teams Catherine Norton Patrick Acknowledgments Christopher Freeland Martin Kalfatovic Graham Higley BHL & EOL Teams Catherine Norton Patrick Leary David Remsen David Patterson A. W. Mellon Foundation Alfred P. Sloan Foundation John D. & Catherine T. Mac. Arthur Foundation 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library

Neil Sarkar sarkar@mbl. edu Tom Garnett garnett. T@si. edu 2007 12 11 - Coalition Neil Sarkar [email protected] edu Tom Garnett garnett. [email protected] edu 2007 12 11 - Coalition for Networked Information, Washington, DC Tom Garnett & Neil Sarkar, Biodiversity Heritage Library