ec5782ef2bb3c4036cd005eb966b89d5.ppt
- Количество слайдов: 77
Automation and Quality in Image Digital Libraries with Annotations Edward Fox, Uma Murthy and Ricardo Torres Florence, Italy 17 February 2007
Outline • Acknowledgements • • Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary 2
Acknowledgements: Students • Pavel Calado, Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Robert France, Marcos Gonçalves, Doug Gorton, Nithiwat Kampanya, Rohit Kelapure, S. H. Kim, Neill Kipp, Aaron Krowne, Bing Liu, Ming Luo, Roberto Marchesini, Paul Mather, Sudarshan Murthy, Uma Murthy, Sanghee Oh, Ananth Raghavan, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo da Silva Torres, Srinivas Vemuri, Wensi Xi, Seungwon Yang, Baoping Zhang, Qinwei Zhu, … 3
Acknowledgements: Faculty, Staff • Lillian Cassel, Lois Delcambre, Debra Dudley, Roger Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Sandy Grant, Eric Hallerman, Eberhard Hilf, John Impagliazzo, Filip Jagodzinski, Douglas Knight, Deborah Knox, Alberto Laender, David Maier, Gail Mc. Millan, Claudia Medeiros, Manuel Perez. Quinones, Jeff Pomerantz, Naren Ramakrishnan, Layne Watson, Barbara Wildemuth, … 4
Other Collaborators (Selected) • • Brazil: FUA, UFMG, UNICAMP Case Western Reserve University Emory, Notre Dame, Oregon State Germany: Univ. Oldenburg Mexico: UDLA (Puebla), Monterrey College of NJ, Hofstra, Penn State, Villanova Portland State University of Arizona, University of Florida, Univ. of Illinois, University of Virginia • VTLS (slides on digital repositories, NDLTD) 5
Acknowledgements: Support ACM, Adobe, AOL, CAPES, CNI, CNPq, CONACy. T, DFG, FAEPEX, FAPESP, IBM, IMLS, Microsoft, NASA, NDLTD, NLM, NSF (IIS-9986089, 0080748, 0086227, 0307867, 0325579, 0532825, 0535057, 0535060; ITR-0325579; DUE-0121679, 0121741, 0136690, 0333531, 0333601, 0435059), OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed. (FIPSE), VTLS, …
Outline • Acknowledgements • Digital Libraries • • Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary 7
Digital Libraries --- Objectives • World Lit. : 24 hr / 7 day / from desktop • Integrated “super” information systems: 5 S: Table of related areas and their coverage • Ubiquitous, Higher Quality, Lower Cost • Education, Knowledge Sharing, Discovery • Disintermediation -> Collaboration • Universities Reclaim Property • Interactive Courseware, Student Works • Scalable, Sustainable, Useful
9
Alliteration • 5 S • 3 C – Societies • Users • Collaboration, Web 2. 0 – Scenarios • Workflow, Stories • Services, Components – Spaces: GIS – Structures: DBMS – Streams: DSMS – Content • Content Management Systems – Context • Link Structure • NLP • Mental models – Criticism, commentary • • • Annotation, Talmud Cataloging, indexing Abstracting Summarizing Secondary literature 10
11
Outline • Acknowledgements • Digital Libraries • Scenarios, Requirements • • • Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary 12
Consider this scenario 2. In a field visit, she finds a unique-looking fish, and wants to know more. Source: http: //umd. edu/ 1. Ingrid is a graduate student in the Fisheries department doing research on freshwater fish Source: http: //umd. edu/ 3. She wants to search for related information based on others’ observations, in the dept. DB. Also, she wants to enter new information about the fish into the DB. 13
EKEY: The electronic key for identifying freshwater fishes 14
• Next, Ingrid works on an assignment to gain familiarity with the capabilities of a new Biodiversity Information System. She is required to make the system help her with her complex integrated information need: • “Retrieve fish descriptions of all fish whose shape is similar to that shown in the figure below, which belong to genus “Notropis”, which have “large eyes” and “dorsal stripe”, and have been observed within the catchments of the “Tennessee” river. ” 15
Here is another scenario … • An archeologist wants to write commentaries on artifacts discovered in the field • Using an Archeology digital library in his study, he wants to be able to: – Manually annotate images (and parts) – Search for images (and parts), and annotations – Automatically annotate/tag similar images (and parts) – Share annotations and images Source: http: //www. bewegende-plaatjes. net Sources: http: //www. dorsetforyou. com, http: //www. archaeology. org 16
Functionality required • Digital Library (DL) users need, but get little assistance, regarding tasks: – Selecting and Annotating images and parts of images • Preserve original context of information • Manual and automated annotation – Content-based image retrieval of images and parts of images (+ GIS + metadata + text …), machine learning of proper set of descriptors – Sharing selections and annotations 17
New Microsft Research grant • Virginia Tech and UNICAMP (Brazil) • Fisheries & Wildlife, Computer Science • Tablet PCs: Content-Based Image Retrieval + Superimposed Information 18
Outline • Acknowledgements • Digital Libraries • Scenarios, Requirements • Superimposed Information • • • Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary 19
Superimposed information (SI) • New interpretation of existing information – New content, new structures • Focuses on – Information at sub-document granularity – Information from heterogeneous sources (multimedia content) – Working with information in situ 20
Origin of SI • This basic need had been addressed in diverse ways, with varying degrees of success, for many years: – concordances, annotations, comments – bookmarks, concept maps, digital annotations, … • The term “SI” was coined in 1999 by researchers, currently collaborating with us, now at Portland State University – Lois Delcambre – David Maier 21
Layers in an SI system * Source: ICDE 04 presentation by Murthy, et. al 22
Benefits • Specificity of reference • Flexibility – Identifying interesting (parts of) objects – Making connections between selections – Managing collections of selections • References sub-document information – Preservation of context – Facilitates easy sharing of information 23
Superimposed Applications C A Enhanced CMap. Tools B 0 20 5 10 15 SIMPEL: A Super. Imposed Multimedia Presentation Editor and p. Layer 24
Combining CBIR and SI • Associate images and parts of images, with related information such as annotations, hyperlinks, metadata records, etc. • Perform CBIR on images and parts of images that have been annotated • Combine text- (on annotations and other associated text information) and content-based (image content) search for more effective retrieval of images and parts of images 25
Outline • • Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information • Content Based Information Retrieval • • CBISC, SIERRA Theory, Quality References Summary 26
Content-Based Image Retrieval (CBIR) • Retrieve images similar to a user-defined specification or pattern (e. g. , shape sketch, image example) • Goal: To support image retrieval based on content properties (e. g. , shape, color or texture), usually encoded into feature vectors 27
Textual information retrieval Query on Google using Sunset and Rio de Janeiro Query result 28
Content Based Information Retrieval 29
Effective Image Description + Feature Extraction R G B B Feature Vector [0. 98, 0. 91, 0. 73, ……] 30
Image descriptors • Image Descriptor
Example: Histogram • Frequency count of each individual color • Most commonly used color feature representation Image Corresponding histogram 32 Source: Andrade, D.
Texture Descriptors 33
Contour Saliences 34
Contour Segment Saliences 35
Multiscale Fractal Dimension • Complex geometric shapes • Defined by simple algorithms • Non integer dimension • Invariant under scaling 36
Multiscale Fractal Dimension (Experiments) 37
Tensor Scale Descriptor • Introduced by Punam et al. in 2003. • For a pixel p, it is the largest ellipse centered at p within the same homogeneous region. • It extracts local structure information (thickness, orientation, and anisotropy). 38
Tensor Scale Image 0° 90° 180° 39
Tensor Scale Image 40
Tensor Scale Descriptor 41
Tensor Scale Descriptor 42
43
A typical CBIR system Interface Data Insertion Query Specification Visualization Query Pattern Feature Vector Extraction Query-processing Module Feature Vectors Image Database Similar Images Ranking Similarity Computation Images 44
Outline • • • Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval • CBISC, SIERRA • Theory, Quality • References • Summary 45
CBISC • An OAI-compliant component that supports queries on image collections using content-based image retrieval • May be customized to support different image collections 46
CBISC in ETANA 47
CBISC Descriptor Training 48
System’s Architecture Interface Mediator Data Insertion Module Databases Image DB Query Processing Module DBMS GIS Metadata Geo. DB 49
Interface HTTP Request (List. Descriptors) Query Specification Visualization HTTP Request (Get. Capabilities) Query Mediator Analysis Merging HTTP Request (Get. Feature. Type) Execution HTTP Request (Get. Feature) BIS Manager HTTP Request (Get. Images) Geographic Data Search Component (GDSC) HTTP Request (keywords) Content-Based Image Search Component (CBISC) Metadata-Based Search Component (ESSEX) OAI Web Feature Server (WFS) Image Collection Images Image Collection Descriptors Image Metadata Eco Taxonomic Collection Trees Metadata Maps Geo Collection Metadata 50
CBISC Configuration Tool 51
52
Integrated support for SI applications in Biomedical Information Systems 53
SIERRA • A tool that allows users to select parts of images and associate them with text annotations. • Performs information retrieval as annotations and associated marks in two ways, either for: – images or marks similar (in content) to a specified image or mark – annotations containing specified query terms 54
Annotating an image 55
Searching over annotations 56
Searching over images/sub-images 57
Formal frameworks DL services and tools drive quality 58
Outline • • • Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA • Theory, Quality • References • Summary 59
The 5 S framework • A DL framework that defines constructs that lead to the definition of a minimal digital library • Then, an archaeological DL • Then, a practical DL • Then, DL handling superimposed information. . . • Plus, theory based Quality Models and Digital Librarian’s Quality Toolkit 60
The 5 S’s Ss Examples Objectives Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Collection; catalog; hypertext; document; metadata Specifies organizational aspects of the DL content Spaces Measure; measurable, topological, vector, probabilistic Defines logical and presentational views of several DL components Scenarios Searching, browsing, recommending Details the behavior of DL services Societies Service managers, learners, teachers, etc. Defines managers, responsible for running DL services; actors, that use those services; and relationships among them 61
62
5 S and DL formal definitions and compositions (April 2004 TOIS) 63
A Minimal DL in the 5 S Framework Streams Structured Stream Structures Spaces Structural Metadata Specification Scenarios Societies services Descriptive Metadata Specification indexing browsing searching hypertext Digital Object Collection Metadata Catalog Repository Minimal DL 64
A Minimal Arch. DL in the 5 S Framework Streams Structured Stream Spaces Descriptive Metadata specification Scenarios Societies services Spa. Tem. Org Stra. Dia Arch. Obj Arch Descriptive Metadata specification indexing browsing searching hypertext Arch. DO Arch Metadata catalog Arch. Coll Arch. DR Minimal Arch. DL 65
Formalizing CBIR services in DLs 66
Information model 67
Tools/Applications 68
5 SQual: A Quality Assessment Tool for Digital Libraries
5 SQual - Dimensions Digital Objects Metadata • Similarity • Accessibility • Significance • Timeliness • Completeness • Conformance Numeric Indicators Services • Efficiency • Reliability 70
5 SQual Architexture 71
Evaluations – XML Report
Evaluations – Charts
Evaluations – Charts
Outline • • Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality • References • Summary 75
References (selected) • Uma Murthy, Ricardo da Silva Torres, Edward A. Fox: SIERRA - A Superimposed Application for Enhanced Image Description and Retrieval. ECDL 2006: 540 -543 • Uma Murthy, Ricardo da Silva Torres, Edward A. Fox: Integrated Support for Superimposed Applications in Biomedical Information Systems, Virginia Tech, 2006 (for the National Library of Medicine), http: //si. dlib. vt. edu/publications/NLMWhite. Paper. SI 2. pdf. • M. A. Gonçalves. Streams, Structures, Spaces, Scenarios, and Societies: A Formal Framework for Digital Libraries and Its Applications: Defining a Quality Model for Digital Libraries (Chapter 8) – PHD thesis, Virginia Tech CS Dept. , Blacksburg, VA, 2004. http: //scholar. lib. vt. edu/theses/available/etd_12052004_135923/ • M. A. Gonçalves, B. L. Moreira, E. A. Fox, L. T. Watson. What is a good digital library? - defining a quality model for digital libraries. To appear in Information Processing and Management, 2007. 76 • http: //fox. cs. vt. edu/cv. htm
Summary • • • Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary 77
ec5782ef2bb3c4036cd005eb966b89d5.ppt