
81286b79911c51723124752fd74cec6f.ppt
- Количество слайдов: 10
Digital Object Storage and Retrieval (DOSR) Vision Josh Alspector June 2008 Approved for Public Release, Distribution Unlimited
Disclaimer This presentation discusses areas of technology investigation and interest. It does not relate to any existing DARPA program, nor should it be inferred to anticipate a future DARPA program. 3/19/2018 Approved for Public Release, Distribution Unlimited
The Mundaneum In 1910 Belgians Paul Otlet and future Nobel Peace Prize laureate Henri La Fontaine opened the Palais Mondial, later renamed the Mundaneum. The Mundaneum’s mission was to collect metadata on every book, journal, and periodical ever published and record it in a card file system that embodied what we would call a faceted classification scheme. By 1934 it contained over 15 million entries. Unique identifiers included embedded links to related documents. Staff responded to search requests received by post and telegraph and returned hand-copied cards by post. In 1934 Otlet conceived a global network of “electric telescopes” that would allow people to search and browse through interlinked documents, images, audio and motion picture recordings. He wrote that, “from his armchair, everyone will hear, see, participate, will even be able to applaud, give ovations, sing in the chorus, add his cries of participation to those of all the others. ” 3/19/2018 Mundaneum Infrastructure Telegraph and postal “network” Approved for Public Release, Distribution Unlimited “Social Network” Feedback Human Search Engine “Hyper-linked” Card Catalog Documents, Images, Recordings Fatal Flaw: Scalability
DOSR Vision Create a resilient, distributed, scalable, and secure network of information that does not require a completely trusted or stable network of processing nodes [employ network overlays, and advanced cryptographic techniques] Videos E-mail Images Advance the state-of-the art in automated metadata generation and interoperability [apply machine learning techniques] Automatically get information where it is needed, or may be needed, using less bandwidth and processing. [integrate user models, compact information retrieval encodings, and distributed content delivery] Web pages Text files Spreadsheets Automated Metadata Generation User and Data Models Reliably track where information goes, and where it came from [encapsulate provenance and audit information in network-maintained virtual objects] Enable secure, resilient information storage, characterization, retrieval, and collaboration across barriers of time, geography, community of interest, technology, and administrative domain 3/19/2018 What we can find defines what we can do Approved for Public Release, Distribution Unlimited Photos courtesy of U. S. Army, U. S. Navy
Hard Problems Automated metadata extraction and generation Ø Ø Ø Do. D has many stovepipe systems with limited metadata Automatic extraction of metadata, especially from non-textual information is an unsolved problem requiring some form of artificial intelligence Email, papers, presentations, forms, databases do not possess a community-maintained mesh of reciprocal references, so Google-like search, relevance, and ranking algorithms do not work Scalable security for sharable objects Ø Ø Decentralized (for scalability) key distribution systems present security challenges Protection from known cryptographic and corruption attacks is hard; protection from unknown attacks is harder Usable secure sharing (as convenient as email) is needed or system won’t be used Scalable, revocable group access to synchronized, encrypted, versioned documents is essential Scalable replicated storage and parallel data distribution Ø Ø Ø Globally unique identifiers (GUIDs) for retrieval and update are essential, and must be unbreakable, verifiable, and afford scalable resolution of a retreivable, trackable object How to track fragmented and replicated objects for persistence and provenance Object replication for secure, scalable, high-bandwidth distribution (secure Bit. Torrent-style) Enhance resiliency and service in network-poor, areas Respond adaptively to service degradation for high-demand data and large-scale disruptions Personalization, intelligent agents and user models Ø Ø 3/19/2018 Intelligent agents needed to locate content near likely users, based on user models User models based on authorization, active input and passive tracking Approved for Public Release, Distribution Unlimited
Key Capabilities Object 1 Architecture and protocols – Protocols for exchanging objects, metadata, and security controls – Mobile agents and federated requests for information Retrieve latest version from closest fragments or replica Version 1 Replicas and fragments Persistence of digital objects – Distribute replicas and coded fragments – Global, persistent, verifiable, unique identifiers (GUIDs) – Version-controlled, collaborative updates Trust, security and provenance – – Authorized, authenticated access Decentralized encryption for scalability Verifiable provenance and tracking of all objects Resilience to attacks Object 1 Decentralized, scalable key distribution Version 2 update Scalability – “Scale-free” architecture – Decentralized, peer-to-peer techniques – Manage latency, consistency and security as scale grows Metadata and search Scalable resources, storage and participant networks – Extract metadata from video, maps, images – Relevance feedback – Efficient federated search Accessibility and User Models – User models include authorization, preferences, location, need-to-know – Content finds you without search – Information locally available is personally relevant Needed objects migrate to local server for user 3/19/2018 Approved for Public Release, Distribution Unlimited
Interesting Research Ongoing in… Automated metadata extraction Decentralized, self-configuring, location and routing Federated search Information retrieval Personalization and user models Proxy re-encryption Scalable security and PKI Search over encrypted indexes Securing resilient peer-to-peer networks DOSR Workshop will address these areas 3/19/2018 Approved for Public Release, Distribution Unlimited
Preliminary Schedule July 15 Talks 8: 30 am Opening remarks – DARPA Architecture 8: 45 am Dr. Robert Kahn - keynote address 9: 15 am Dr. Peter Lucas – MAYA 9: 35 am Dr. Daniel Crichton – NASA 9: 55 am Break Metadata 10: 15 am Dr. Ajay Divakaran - Sarnoff Corp. 10: 35 am Dr. Randal Burns - JHU 10: 55 am Dr. Shmuel Peleg - HU-J 11: 15 am Mr. Jason Byassee - Northrop Grumman Security 11: 35 am Dr. James Allan - U. Mass-Amherst 11: 55 am Dr. Rafail Ostrovsky – UCLA 12: 15 pm Lunch 1: 40 pm Dr. Urs Muller - Net-Scale Tech. 2: 00 pm Dr. Matt Staker - IBM Research 2: 20 pm Dr. Angelos Stavrou - Global Info. Tek Inc. 2: 40 pm Break User Models 3: 00 pm Dr. Peter Brusilovsky – U. Pittsburgh 3: 20 pm Dr. Michael Walfish - UT-Austin 3: 40 pm Dr. Rafael Alonso - SET Corp. 4: 00 pm Mr. Peter Haglich - Lockheed Martin 3/19/2018 Approved for Public Release, Distribution Unlimited July 15 Posters 4: 20 pm Break 4: 40 pm Poster Session 1 5: 20 pm Poster Session 2 6: 00 pm Adjourn July 16 Breakouts 9: 00 am Dr. Josh Alspector - DOSR vision and breakout group instructions 9: 30 am Breakout group discussions Noon Lunch 1: 30 pm Brief out Group 1 2: 00 pm Brief out Group 2 2: 30 Break 2: 50 Brief out Group 3 3: 20 Brief out Group 4 3: 45 Plenary Session 4: 15 Adjourn
Levels of Success Do. D adopts system internally Portions of system are made available for open-source uses by Apache Legal, medical, and financial records management firms adopt GUID’s, protocols, and system components ISPs and media companies adopt GUID’s, protocols, and system components for subscription services Amazon, Google and i. Tunes use GUID’s and protocols 3/19/2018 Approved for Public Release, Distribution Unlimited
Prior Art Coda (CMU) Cooperative File System (MIT) FARSITE (Microsoft) Grid (Argonne National Laboratory) Lustre (now owned by Sun Microsystems) Ocean. Store (UC Berkeley) PASIS (CMU) Universal Database (Maya Design) 3/19/2018 Approved for Public Release, Distribution Unlimited
81286b79911c51723124752fd74cec6f.ppt