
3d9c574785a9afffbe95dfb528c93990.ppt
- Количество слайдов: 24
PARADISEC background, current structures, and thoughts on international collaborations Pacific and Regional Archive for Digital Sources in Endangered Cultures Linda Barwick, University of Sydney DELAMAN workshop, MPI Nijmegen, 29 November 2004
PARADISEC structure CIs: Cliff Goddard Hugh de Ferranti CIs: Andrew Pawley John Bowden Malcolm Ross Alan Rumsey CIs: Steve Bird Nick Evans Cathy Falk Janet Fletcher John Hajek CIs: William Foley Allan Marett Jane Simpson Audio Archiving Unit Director: Linda Barwick Audio: Frank Davey Project Liaison: Amanda Ha Store account - web interface Stuart Hungerford Project Manager (Metadata guru) Nick Thieberger
PARADISEC rationale • prioritises Asia-Pacific region materials not otherwise catered for; • provides a rational framework for prioritising and managing University research recordings using international archival formats and standards; • implements IP arrangements tailored to University needs and practices; • involves researchers in specialist description of resources; • streamlines consortium processes to salvage important recordings and make them available for research in a timely and cost-effective way
Research applications • Making Australian research available internationally • Fieldwork - use for elicitation and documentation, and for language learning in preparation for fieldwork • Return of materials to communities • Digital tools for optimal transcription and analysis • Comparative studies - historical recordings give time depth for area language and music studies • Better understanding of diversity - data from some languages only in older recordings • Incorporation of primary data in presentations and, ultimately, publications
Staged approach • Metadata - 1623 records, to make resources discoverable even if not yet digitised • PIs and content metadata need to be assigned before digitisation (some refinement during process) • Repository - 807 items digitised to date, some complex e. g. fieldnotes (page images) or transcripts accompanying tapes
Metadata November 2004 • 1623 records in the metadata repository with data from 24 countries in Asia-Pacific (Australia, Chile, Cook Islands, Fiji, French Polynesia, Hong Kong, Indonesia, India, Japan, Korea, Lao, Malaysia, Federated States of Micronesia, Myanmar (Burma), New Zealand, Palau, Papua New Guinea, Reunion, Singapore, Solomon Islands, Taiwan, Tonga, Vanuatu, Vietnam)
Metadata OLAC harvest
Repository contents • Repository totals 26 November 2004 • total files: 2582 • total items: 807 • total size: 1. 0 TB • total hours audio: 627. 3 hours • file types: . wav, . mp 3 (1040); . tif, (179), . jpg (46), . pdf (34), . txt (3), . rtf (8), . xml (32)
Repository Collections Bradley (5 hr) Mc. Intyre (10 hr) Capell (9 hr)* Margetts (17 hr) Rumsey (17 hr)* Corris (6 hr) Crowther (2 hr) San Roque (1 hr) Donohue (3 hr) Sam (4 hr)* Dutton (266 hr) Tepano (19 hr) Fedden (7 hr) Thieberger (39 hr) Foley (23 hr) Toulmin (35 hr) Gardner (56 hr) Voorhoeve (33 hr)* Kartomi (2 hr)* Wurm (2)* Laycock (29 hr) Evans (Hons thesis) Lawton (3 hr) Thieberger (Ph. D thesis) Mc. Elhanon (41 hr) * Ingestion ongoing November 2004
PARADISEC Repository Languages November 2004 PAPUA N. GUINEA Dimadima INDIA Abau Dina Ambonese Pidgin Rajbangsi Doga Angoram (Kanduanuin) Domu Angoram (Moim dialect) Doromu Aomie Doura Arapesh PALAU Efogi Arifama Efogi Dialects Palauan Aunalei Emo Auwim Enivilogo Awomo INDONESIA Fore Asmat Ba Fuyugey Balawaia Brat Gabadi Hatam Barai Ginuman Baruga Inanwatan Gwedena Manikion. Barupu (Warapu) Herei Be'anivia Moi Hiae Motu Ningrum Biage Hiri Motu Bibo Sahu Hube Sebyar Binandere Hula Tinam Bodinumu I'ai Todahe Boera Ikega Boine Tok Pisin Ioma Boku Yahadian Isaka (Krisa) Boridi Kaipi Bouxula Kairi Brat. Momire Kambot Buin Kanga Burum Karama Chimba Karawari Lg Chirima (Ambinwari) Daga Karukaru Darava Kâte Dawawa Qld Pidgin Kinalaknga Mari Rabuka Kimi Maria Raepa Tati. Kiriwina Mekeo Saliba Koiari Melpa Samo Koita Mian Sene Koitabu Mid-Wahgi Sepik Tok Pisin Kokila Migabac SOLOMONS Sialum Kokoro Babatana Mindik Sinaugoro Komba Ririo Miniafa Sona Kopar Mogoni Ruviana Suau Koriki Varese. Mom Suku Koriko Lau Mor Surai Kosorong Santa Cruz Motu Taboro Kovai Muhiang Arapesh Tairuma COOK Kovio Nabak VANUATU Tauade Kubuirubu Naga South Efate ISLANDS FRENCH Tobo Kuman Namanadza Tok Pisin Rarotongan Kumukio Bislama Naoro FIJI POLYNESIA Tolai Lelepa Kuni Nara Pukapuka Lauan NEW Tahitian Kunimaipa New Ireland Pidgin Uberi TONGA CHILE >>> Ubir CALEDONIA Kwale Ngala Tongan Ubir Gonjoe Laimodo Nomu Rapa Nui Dehu Vesilogo Mada'a Notu Vioribaiwa Magi Ondoro Wamora Mâgobineng One (Onne) Wangun Magore Onjab Wiga Maisin Ono Wosera Maiwa Opao Yele. Managalas Orokaiva Yewudu Manam Orokolo Yimas Manubara Ouma Yoba Manumu Paiwa Mapei Police Motu Mapena Porome
Regional links • Institute of Papua New Guinea Studies • Vanuatu Kaljoral Senta • Archive of Maori and Pacific Music, U. Auckland • University of Hawai’i • New Caledonia - Tjibaou Cultural Centre • Indonesia - UIN, Jakarta • Malaysia - Universiti Malaya • Rapa Nui - Museo antropologico P. Sebastian Englert • Micronesia - Historical Preservation Office,
Audio Ingest • Initially ingested as raw WAV on Audio. Cube 5 Dell 670 workstations running Wavelab (2005 will add remote Pyramix workstations) • Masters 24 -bit 96 khz Broadcast WAV Format (uncompressed audio with encapsulated metadata) • Some lower rate if digital original (e. g. 16 bit 48 khz from DAT) • WAV > BWF by Quadriga software • derivatives produced by batch processing - CD-audio quality (16 -bit, 44. 1 khz) and mp 3 quality(128 bps)
Digital preservation • “Azoulay” server partitioned for working files and archive partition for sealed masters - current capacity 750 GB (>3 TB in 2005) • Sealed masters archived to 100 GB data tapes on University of Sydney LTO Mass Data Storage System (high-low watermark script) - duplicate data tapes kept at 2 locations on campus • Sealed masters mirrored to APAC national Store facility (Canberra) nightly - nearline storage • Password-protected online access to Store facility
PDSC data flow
Networking • Main campuses (University of Sydney, University of Melbourne, Australian National University) connected by Grangenet (next generation research network, 10 Gbps connections) • Pay subscription, not traffic costs • Satellite campus UNE connected by AARnet (Australian research and education network - currently billed traffic cost, 155 Mbps connection) • Both with connections to APAN community (Asia Pacific Advanced Networks) - potential for linking to regional and international R&E networks - potential traffic costs an issue
Storage • Australian Partnership for Advanced Computing National Facility Mass Data Storage System - Hierarchical Storage Manager system • Funded by consortium of Australian higher education bodies • Tape robot system - can handle 1. 2 PB • PARADISEC will add 2 -3 TB per year once satellite ingest commissioned • Current horizon of facility 2008 - project PARADISEC collection up to 9 TB by then • Will need to apply to host material/share data from other DELAMAN collections
Streaming • Grange. Net streaming server currently in trial mode - only available within network • Soon to have automatic copying of main collection to streaming server • Foresee higher demand for access when scaled streaming access to excerpts available; but also greater resources needed to mount and manage • Will depend on researchers’ provision of timecoded transcripts/glosses • Access and authentication protocols yet to be developed • Testbed for citation/integration into e-publications
Software • Initial metadatabase in Filemaker Pro 6 with periodic XML dumps for OLAC static harvesting • Currently being ported to My. SQL/PHP to allow dynamic harvesting and other functionality • Python software for managing repository and website (Stuart Hungerford, ANU) • Developing Java-based geographic search interface (Time. Map) • All based on Open Source tools
Implications • Implementations will change over time - foundation for cooperation must be agreements and alignment of strategic objectives • Minimal shared standards needed on formats, ethics, description, rights - what else? • Possibility of staged modular approach • federated discovery platform • proof-of-concept pilot studies/trials • targeted data sets for exchange • dark hosting/mirroring • tools development and testing
Issues • Transnational projects - how to identify and coordinate international funding opportunities? • Projections of international traffic & storage charges - funding implications • Sustainability of our collections - how to cost overheads and source long-term funding commitments • DELAMAN governance and administration structures? How to resource and support without duplication/reinventing the wheel, adding to administrative burden? • How to involve all stakeholders (including
APAN Bangkok 2005 • E-science workshop: Toward a semantic web for digital data archives (convenor V. Balaji, Princeton) • Immense quantities of digital data and images are now archived and publicly available through the web. These include domain-specific data archives, covering such domains as weather and climate, seismology and geophysics, astronomy and particle physics, as well as images and digital copies of nontextual human cultural production. Describing, cataloguing, searching and locating information within digital data and image archives is one of the grand technological challenges of the semantic web era. This session will draw together participants from diverse fields of science and the humanities to share their experience on metadata, standards and techniques for access to large digital archives. • Tentative Titles of presentations: • 1) The Hierarchical Data Format for EOS (HDF-EOS), Richard Ullman, NASA Goddard Space Flight Center (Invited) • • 2) Metadata Requirements for Global Climate Models, V. Balaji, NOAA Geophysical Fluid Dynamics Laboratory 3) DELAMAN? ? Remote presentation…
PARADISEC gratefully acknowledges support from: • Partner Universities (Sydney, Melbourne, ANU, UNE) • Australian Research Council LIEF scheme • Australian Partnership for Sustainable Repositories (SORRT testbed) • Australian Partnership for Advanced Computing • Grangenet • ANU Internet Futures
Contact us • http: //www. paradisec. org. au • Linda. Barwick@paradisec. org. au (Director) • Nicholas. Thieberger@paradisec. org. au (Project Manager)
Relevant URLs • PARADISEC website http: //paradisec. org. au/ • PARADISEC repository login http: //store. apac. edu. au/cgi-bin/pdscv 3. 0. cgi/login • PARADISEC streaming trial http: //paradisec. org. au/streamingtrial. html • Transcript page image trial http: //www. austehc. unimelb. edu. au/~gavan/lana /hdms. htm • Time. Map digitiser tool proof of concept http: //acl. art. usyd. edu. au/TMDigitiser/
3d9c574785a9afffbe95dfb528c93990.ppt