99bdeb0766085a4568c8ad708fd9994e.ppt
- Количество слайдов: 31
Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule Math. Doc, CNRS–Université Joseph Fourier Grenoble (France) Cornell July 25, 2002 NUMDAM
Cellule Math. Doc www-mathdoc. ujf-grenoble. fr • An institute on Scientific Information & Communication in Mathematics, supported by Centre National de la Recherche Scientifique (CNRS) and Ministère de la Recherche. • General mission: documentation issues in mathematics at the national level in France, in cooperation with mathematics libraries and institutes. Cornell July 25, 2002 NUMDAM
Cornell July 25, 2002 NUMDAM
NUMDAM Digitisation of Ancient Mathematics Documents NUMérisation de Documents Anciens Mathématiques A digitisation program supported by and Ministère de la Recherche, managed by the Cellule Math. Doc. Cornell July 25, 2002 NUMDAM
NUMDAM: aims • Reinforce French mathematical journals (visibility, accessibility, durability). • Hand down digitised archives of the French mathematical heritage to future generations and participate in international efforts with the same endeavour. • Strive towards making this digitised mathematical heritage freely accessible. Cornell July 25, 2002 NUMDAM
Political choices • Database freely accessible on the web. • Full text freely accessible after a moving – wall (depending on each serial). • Scheduled interoperability between retro-digitized and natively digital collections. • National and international co-operations in as far as possible. Cornell July 25, 2002 NUMDAM
Technical choices • Scan from first to last page @ 600 dpi. • OCR (non-corrected @99, 9%, mathematical formulae and images excluded). • Multi-page files for logical units (TIFF, PDF + hidden text, Dj. Vu). • End-of-article bibliographies treated (corrected OCR @ 99, 99% + markup of “ author ”, “ title ”, “ year ” fields) • Database: cataloguing data for each article, summary (if present), end-of -article bibliography (if present), hidden OCRed text. Structured data exchange in XML. • In as far as possible links to/from JFM, ZM and MR databases. • Future enhancements scheduled depending on technology available. Cornell July 25, 2002 NUMDAM
Production choices • Use of an external operator for the technical treatments. • « In house » study, segmentation, cataloguing, quality control, and display. • Quality and durability policy : ü Prefer standard and easily convertible formats, as sources of future processing if necessary (TIFF, XML), not be tied to a proprietary system. ü Archive high quality images, which should allow to regenerate the text (formula OCR, structure recognition). Cornell July 25, 2002 NUMDAM
NUMDAM Phase I Journals Cornell July 25, 2002 NUMDAM
NUMDAM Phase I: Chronology • Spring 2003. — End of the industrial phase of NUMDAM Phase I, public access to articles via the web. • Autumn 2002. — Start of NUMDAM Phase II. Dealing with © issues continued. • August 2002. — First 50, 000 pages delivered by vendor. • Feb. - May 2002. — Setting-up production chain (vendor) and quality control (Cellule Math. Doc). Dealing with © issues. • Dec. 2001. — Choice of vendor validated by CNRS. • Nov. 2000 - Oct. 2001. — Cataloguing and checking database. • Oct. 2000 - May 2001. — Writing up schedule of conditions/vendor. • July 2000. — Funding by CNRS. Cornell July 25, 2002 NUMDAM
NUMDAM Phase II • Take an active part in the Digital Mathematics Library project. Cooperate with other digitisation projects (Gallica–Bn. F, possibly EMANI digitisation part). Inventory of resources & cooperation with historians and mathematicians to make scientific choices and establish priorities, in order to • Digitise all French mathematics journals (Annales de l’Institut Henri Poincaré, Annales de l’Université de Toulouse, Comptes Rendus de l’Académie, Journal de l’École polytechnique, . . ), and possibly some mathematically important general science journals. • Digitise important seminar series (séminaires Bourbaki, Cartan, séminaire de Probabilités de Strasbourg, . . . ). • Digitise a substantial set of important monographies. Cornell July 25, 2002 NUMDAM
NUMDAM programme: overview Examination of collections and settingup the database Schedule of technical conditions Vendor Digitisation Segmentation Treatements (ocr & bibliographies) Quality control Software developments SQL XML Quality control Authors id & © Display: Search and Browsing Display Links: JFM, MR, ZM Cornell July 25, 2002 Database maintenance NUMDAM Copyright issues and negotiations with publishers
Quality control procedure LOG Automatic control Perl (Log of errors) Rejection Synthesis Files received from vendor TIFF; XML, TIFF, PDF and Dj. Vu Sorting samples Perl Samples Check-list Php Log of errors (files TIFF; XML, TIFF, PDF, Dj. Vu) BD My. SQL Cornell July 25, 2002 Validation Visual control NUMDAM
NUMDAM Programme XML description of physical volumes Cornell July 25, 2002 NUMDAM
Publications Mathématiques de l’Institut des Hautes Études Scientifiques Physical volume: Year 1962, Volume 12 Cornell July 25, 2002 NUMDAM
A paper in a physical volume Article by Bernard Dwork in Publications Mathématiques IHÉS, 12 (1962), 5 -68 Cornell July 25, 2002 NUMDAM
Cornell July 25, 2002 NUMDAM
Bibliographies Cornell July 25, 2002 NUMDAM
Cross-linking MR 28#3039 ZM 0173. 48601 MR 10, 592 e ZM 0032. 39402 PMIHES_1962__12__5_0 SQL EDBM DB of articles & DB of images Cornell July 25, 2002 PDF Dj. Vu NUMDAM External databases JFM, MR, ZM, . . .
MR —— NUMDAM MR–lookup |Publications IHES|Shih||13||1962||PMIHES_1962__13__5_0|| Bd. D NUMDAM MR–lookup |Inst. Hautes Etudes Sci. Publ. Math. |Shih||13||1962||PMIHES_1962__13__5_0| 26#1893|Homologie des espaces fibr'es. Cornell July 25, 2002 NUMDAM MR
JFM & ZM —— NUMDAM New identification tool in development in the LIMES framework (EU project) |Publications IHES|Shih||13||1962||PMIHES_1962__13__5_0|| Bd. D NUMDAM ZM–lookup |Inst. Hautes Etudes Sci. Publ. Math. |Shih||13||1962||PMIHES_1962__13__5_0| 0105. 16903|Homologie des espaces fibr'es. Cornell July 25, 2002 NUMDAM ZM
Identification of authors: two purposes • Improve search facilities by setting-up a reference list of authors. • Provide a tool to help address copyright issues. Cornell July 25, 2002 NUMDAM
Internal tool. . . Cornell July 25, 2002 NUMDAM
Cornell July 25, 2002 NUMDAM
Cornell July 25, 2002 NUMDAM
Cornell July 25, 2002 NUMDAM
Cornell July 25, 2002 NUMDAM
NUMDAM: search interface based on EDBM (in development) Cornell July 25, 2002 NUMDAM
Abstract if available JFM MR ZM Cornell July 25, 2002 NUMDAM
NUMDAM URLs • • • Main: www-mathdoc. ujf-grenoble. fr/NUMDAM/ Visitors (sample files): www-mathdoc. ujf-grenoble. fr/NUMDAM/Visitors/ Login: VISITORS Pwd: vtonum Li. Nu. M (Books at Bn. F, Cornell, Göttingen, Michigan): www-mathdoc. ujf-grenoble. fr/Li. Nu. M/ Journal de Mathématiques Pures et Appliquées 1836 – 1880 (Bn. F): www-mathdoc. ujf-grenoble. fr/JMPA/ Search NUMDAM database: math-sahel. ujf-grenoble. fr/NUMDAM/Public/Bd/consultation. htm Inventory: math-sahel. ujf grenoble. fr/NUMDAM/Public/Inventaire/inventaire. htm Cornell July 25, 2002 NUMDAM
Thank you for your attention. . . Cornell July 25, 2002 NUMDAM
99bdeb0766085a4568c8ad708fd9994e.ppt