Скачать презентацию Automation and Quality in Image Digital Libraries with Скачать презентацию Automation and Quality in Image Digital Libraries with

ec5782ef2bb3c4036cd005eb966b89d5.ppt

  • Количество слайдов: 77

Automation and Quality in Image Digital Libraries with Annotations Edward Fox, Uma Murthy and Automation and Quality in Image Digital Libraries with Annotations Edward Fox, Uma Murthy and Ricardo Torres Florence, Italy 17 February 2007

Outline • Acknowledgements • • Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Outline • Acknowledgements • • Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary 2

Acknowledgements: Students • Pavel Calado, Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Robert France, Acknowledgements: Students • Pavel Calado, Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Robert France, Marcos Gonçalves, Doug Gorton, Nithiwat Kampanya, Rohit Kelapure, S. H. Kim, Neill Kipp, Aaron Krowne, Bing Liu, Ming Luo, Roberto Marchesini, Paul Mather, Sudarshan Murthy, Uma Murthy, Sanghee Oh, Ananth Raghavan, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo da Silva Torres, Srinivas Vemuri, Wensi Xi, Seungwon Yang, Baoping Zhang, Qinwei Zhu, … 3

Acknowledgements: Faculty, Staff • Lillian Cassel, Lois Delcambre, Debra Dudley, Roger Ehrich, Joanne Eustis, Acknowledgements: Faculty, Staff • Lillian Cassel, Lois Delcambre, Debra Dudley, Roger Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Sandy Grant, Eric Hallerman, Eberhard Hilf, John Impagliazzo, Filip Jagodzinski, Douglas Knight, Deborah Knox, Alberto Laender, David Maier, Gail Mc. Millan, Claudia Medeiros, Manuel Perez. Quinones, Jeff Pomerantz, Naren Ramakrishnan, Layne Watson, Barbara Wildemuth, … 4

Other Collaborators (Selected) • • Brazil: FUA, UFMG, UNICAMP Case Western Reserve University Emory, Other Collaborators (Selected) • • Brazil: FUA, UFMG, UNICAMP Case Western Reserve University Emory, Notre Dame, Oregon State Germany: Univ. Oldenburg Mexico: UDLA (Puebla), Monterrey College of NJ, Hofstra, Penn State, Villanova Portland State University of Arizona, University of Florida, Univ. of Illinois, University of Virginia • VTLS (slides on digital repositories, NDLTD) 5

Acknowledgements: Support ACM, Adobe, AOL, CAPES, CNI, CNPq, CONACy. T, DFG, FAEPEX, FAPESP, IBM, Acknowledgements: Support ACM, Adobe, AOL, CAPES, CNI, CNPq, CONACy. T, DFG, FAEPEX, FAPESP, IBM, IMLS, Microsoft, NASA, NDLTD, NLM, NSF (IIS-9986089, 0080748, 0086227, 0307867, 0325579, 0532825, 0535057, 0535060; ITR-0325579; DUE-0121679, 0121741, 0136690, 0333531, 0333601, 0435059), OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed. (FIPSE), VTLS, …

Outline • Acknowledgements • Digital Libraries • • Scenarios, Requirements Superimposed Information Content Based Outline • Acknowledgements • Digital Libraries • • Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary 7

Digital Libraries --- Objectives • World Lit. : 24 hr / 7 day / Digital Libraries --- Objectives • World Lit. : 24 hr / 7 day / from desktop • Integrated “super” information systems: 5 S: Table of related areas and their coverage • Ubiquitous, Higher Quality, Lower Cost • Education, Knowledge Sharing, Discovery • Disintermediation -> Collaboration • Universities Reclaim Property • Interactive Courseware, Student Works • Scalable, Sustainable, Useful

9 9

Alliteration • 5 S • 3 C – Societies • Users • Collaboration, Web Alliteration • 5 S • 3 C – Societies • Users • Collaboration, Web 2. 0 – Scenarios • Workflow, Stories • Services, Components – Spaces: GIS – Structures: DBMS – Streams: DSMS – Content • Content Management Systems – Context • Link Structure • NLP • Mental models – Criticism, commentary • • • Annotation, Talmud Cataloging, indexing Abstracting Summarizing Secondary literature 10

11 11

Outline • Acknowledgements • Digital Libraries • Scenarios, Requirements • • • Superimposed Information Outline • Acknowledgements • Digital Libraries • Scenarios, Requirements • • • Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary 12

Consider this scenario 2. In a field visit, she finds a unique-looking fish, and Consider this scenario 2. In a field visit, she finds a unique-looking fish, and wants to know more. Source: http: //umd. edu/ 1. Ingrid is a graduate student in the Fisheries department doing research on freshwater fish Source: http: //umd. edu/ 3. She wants to search for related information based on others’ observations, in the dept. DB. Also, she wants to enter new information about the fish into the DB. 13

EKEY: The electronic key for identifying freshwater fishes 14 EKEY: The electronic key for identifying freshwater fishes 14

 • Next, Ingrid works on an assignment to gain familiarity with the capabilities • Next, Ingrid works on an assignment to gain familiarity with the capabilities of a new Biodiversity Information System. She is required to make the system help her with her complex integrated information need: • “Retrieve fish descriptions of all fish whose shape is similar to that shown in the figure below, which belong to genus “Notropis”, which have “large eyes” and “dorsal stripe”, and have been observed within the catchments of the “Tennessee” river. ” 15

Here is another scenario … • An archeologist wants to write commentaries on artifacts Here is another scenario … • An archeologist wants to write commentaries on artifacts discovered in the field • Using an Archeology digital library in his study, he wants to be able to: – Manually annotate images (and parts) – Search for images (and parts), and annotations – Automatically annotate/tag similar images (and parts) – Share annotations and images Source: http: //www. bewegende-plaatjes. net Sources: http: //www. dorsetforyou. com, http: //www. archaeology. org 16

Functionality required • Digital Library (DL) users need, but get little assistance, regarding tasks: Functionality required • Digital Library (DL) users need, but get little assistance, regarding tasks: – Selecting and Annotating images and parts of images • Preserve original context of information • Manual and automated annotation – Content-based image retrieval of images and parts of images (+ GIS + metadata + text …), machine learning of proper set of descriptors – Sharing selections and annotations 17

New Microsft Research grant • Virginia Tech and UNICAMP (Brazil) • Fisheries & Wildlife, New Microsft Research grant • Virginia Tech and UNICAMP (Brazil) • Fisheries & Wildlife, Computer Science • Tablet PCs: Content-Based Image Retrieval + Superimposed Information 18

Outline • Acknowledgements • Digital Libraries • Scenarios, Requirements • Superimposed Information • • Outline • Acknowledgements • Digital Libraries • Scenarios, Requirements • Superimposed Information • • • Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary 19

Superimposed information (SI) • New interpretation of existing information – New content, new structures Superimposed information (SI) • New interpretation of existing information – New content, new structures • Focuses on – Information at sub-document granularity – Information from heterogeneous sources (multimedia content) – Working with information in situ 20

Origin of SI • This basic need had been addressed in diverse ways, with Origin of SI • This basic need had been addressed in diverse ways, with varying degrees of success, for many years: – concordances, annotations, comments – bookmarks, concept maps, digital annotations, … • The term “SI” was coined in 1999 by researchers, currently collaborating with us, now at Portland State University – Lois Delcambre – David Maier 21

Layers in an SI system * Source: ICDE 04 presentation by Murthy, et. al Layers in an SI system * Source: ICDE 04 presentation by Murthy, et. al 22

Benefits • Specificity of reference • Flexibility – Identifying interesting (parts of) objects – Benefits • Specificity of reference • Flexibility – Identifying interesting (parts of) objects – Making connections between selections – Managing collections of selections • References sub-document information – Preservation of context – Facilitates easy sharing of information 23

Superimposed Applications C A Enhanced CMap. Tools B 0 20 5 10 15 SIMPEL: Superimposed Applications C A Enhanced CMap. Tools B 0 20 5 10 15 SIMPEL: A Super. Imposed Multimedia Presentation Editor and p. Layer 24

Combining CBIR and SI • Associate images and parts of images, with related information Combining CBIR and SI • Associate images and parts of images, with related information such as annotations, hyperlinks, metadata records, etc. • Perform CBIR on images and parts of images that have been annotated • Combine text- (on annotations and other associated text information) and content-based (image content) search for more effective retrieval of images and parts of images 25

Outline • • Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information • Content Based Information Outline • • Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information • Content Based Information Retrieval • • CBISC, SIERRA Theory, Quality References Summary 26

Content-Based Image Retrieval (CBIR) • Retrieve images similar to a user-defined specification or pattern Content-Based Image Retrieval (CBIR) • Retrieve images similar to a user-defined specification or pattern (e. g. , shape sketch, image example) • Goal: To support image retrieval based on content properties (e. g. , shape, color or texture), usually encoded into feature vectors 27

Textual information retrieval Query on Google using Sunset and Rio de Janeiro Query result Textual information retrieval Query on Google using Sunset and Rio de Janeiro Query result 28

Content Based Information Retrieval 29 Content Based Information Retrieval 29

Effective Image Description + Feature Extraction R G B B Feature Vector [0. 98, Effective Image Description + Feature Extraction R G B B Feature Vector [0. 98, 0. 91, 0. 73, ……] 30

Image descriptors • Image Descriptor Image descriptors • Image Descriptor

Example: Histogram • Frequency count of each individual color • Most commonly used color Example: Histogram • Frequency count of each individual color • Most commonly used color feature representation Image Corresponding histogram 32 Source: Andrade, D.

Texture Descriptors 33 Texture Descriptors 33

Contour Saliences 34 Contour Saliences 34

Contour Segment Saliences 35 Contour Segment Saliences 35

Multiscale Fractal Dimension • Complex geometric shapes • Defined by simple algorithms • Non Multiscale Fractal Dimension • Complex geometric shapes • Defined by simple algorithms • Non integer dimension • Invariant under scaling 36

Multiscale Fractal Dimension (Experiments) 37 Multiscale Fractal Dimension (Experiments) 37

Tensor Scale Descriptor • Introduced by Punam et al. in 2003. • For a Tensor Scale Descriptor • Introduced by Punam et al. in 2003. • For a pixel p, it is the largest ellipse centered at p within the same homogeneous region. • It extracts local structure information (thickness, orientation, and anisotropy). 38

Tensor Scale Image 0° 90° 180° 39 Tensor Scale Image 0° 90° 180° 39

Tensor Scale Image 40 Tensor Scale Image 40

Tensor Scale Descriptor 41 Tensor Scale Descriptor 41

Tensor Scale Descriptor 42 Tensor Scale Descriptor 42

43 43

A typical CBIR system Interface Data Insertion Query Specification Visualization Query Pattern Feature Vector A typical CBIR system Interface Data Insertion Query Specification Visualization Query Pattern Feature Vector Extraction Query-processing Module Feature Vectors Image Database Similar Images Ranking Similarity Computation Images 44

Outline • • • Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Outline • • • Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval • CBISC, SIERRA • Theory, Quality • References • Summary 45

CBISC • An OAI-compliant component that supports queries on image collections using content-based image CBISC • An OAI-compliant component that supports queries on image collections using content-based image retrieval • May be customized to support different image collections 46

CBISC in ETANA 47 CBISC in ETANA 47

CBISC Descriptor Training 48 CBISC Descriptor Training 48

System’s Architecture Interface Mediator Data Insertion Module Databases Image DB Query Processing Module DBMS System’s Architecture Interface Mediator Data Insertion Module Databases Image DB Query Processing Module DBMS GIS Metadata Geo. DB 49

Interface HTTP Request (List. Descriptors) Query Specification Visualization HTTP Request (Get. Capabilities) Query Mediator Interface HTTP Request (List. Descriptors) Query Specification Visualization HTTP Request (Get. Capabilities) Query Mediator Analysis Merging HTTP Request (Get. Feature. Type) Execution HTTP Request (Get. Feature) BIS Manager HTTP Request (Get. Images) Geographic Data Search Component (GDSC) HTTP Request (keywords) Content-Based Image Search Component (CBISC) Metadata-Based Search Component (ESSEX) OAI Web Feature Server (WFS) Image Collection Images Image Collection Descriptors Image Metadata Eco Taxonomic Collection Trees Metadata Maps Geo Collection Metadata 50

CBISC Configuration Tool 51 CBISC Configuration Tool 51

52 52

Integrated support for SI applications in Biomedical Information Systems 53 Integrated support for SI applications in Biomedical Information Systems 53

SIERRA • A tool that allows users to select parts of images and associate SIERRA • A tool that allows users to select parts of images and associate them with text annotations. • Performs information retrieval as annotations and associated marks in two ways, either for: – images or marks similar (in content) to a specified image or mark – annotations containing specified query terms 54

Annotating an image 55 Annotating an image 55

Searching over annotations 56 Searching over annotations 56

Searching over images/sub-images 57 Searching over images/sub-images 57

Formal frameworks DL services and tools drive quality 58 Formal frameworks DL services and tools drive quality 58

Outline • • • Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Outline • • • Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA • Theory, Quality • References • Summary 59

The 5 S framework • A DL framework that defines constructs that lead to The 5 S framework • A DL framework that defines constructs that lead to the definition of a minimal digital library • Then, an archaeological DL • Then, a practical DL • Then, DL handling superimposed information. . . • Plus, theory based Quality Models and Digital Librarian’s Quality Toolkit 60

The 5 S’s Ss Examples Objectives Streams Text; video; audio; image Describes properties of The 5 S’s Ss Examples Objectives Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Collection; catalog; hypertext; document; metadata Specifies organizational aspects of the DL content Spaces Measure; measurable, topological, vector, probabilistic Defines logical and presentational views of several DL components Scenarios Searching, browsing, recommending Details the behavior of DL services Societies Service managers, learners, teachers, etc. Defines managers, responsible for running DL services; actors, that use those services; and relationships among them 61

62 62

5 S and DL formal definitions and compositions (April 2004 TOIS) 63 5 S and DL formal definitions and compositions (April 2004 TOIS) 63

A Minimal DL in the 5 S Framework Streams Structured Stream Structures Spaces Structural A Minimal DL in the 5 S Framework Streams Structured Stream Structures Spaces Structural Metadata Specification Scenarios Societies services Descriptive Metadata Specification indexing browsing searching hypertext Digital Object Collection Metadata Catalog Repository Minimal DL 64

A Minimal Arch. DL in the 5 S Framework Streams Structured Stream Spaces Descriptive A Minimal Arch. DL in the 5 S Framework Streams Structured Stream Spaces Descriptive Metadata specification Scenarios Societies services Spa. Tem. Org Stra. Dia Arch. Obj Arch Descriptive Metadata specification indexing browsing searching hypertext Arch. DO Arch Metadata catalog Arch. Coll Arch. DR Minimal Arch. DL 65

Formalizing CBIR services in DLs 66 Formalizing CBIR services in DLs 66

Information model 67 Information model 67

Tools/Applications 68 Tools/Applications 68

5 SQual: A Quality Assessment Tool for Digital Libraries 5 SQual: A Quality Assessment Tool for Digital Libraries

5 SQual - Dimensions Digital Objects Metadata • Similarity • Accessibility • Significance • 5 SQual - Dimensions Digital Objects Metadata • Similarity • Accessibility • Significance • Timeliness • Completeness • Conformance Numeric Indicators Services • Efficiency • Reliability 70

5 SQual Architexture 71 5 SQual Architexture 71

Evaluations – XML Report Evaluations – XML Report

Evaluations – Charts Evaluations – Charts

Evaluations – Charts Evaluations – Charts

Outline • • Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval Outline • • Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality • References • Summary 75

References (selected) • Uma Murthy, Ricardo da Silva Torres, Edward A. Fox: SIERRA - References (selected) • Uma Murthy, Ricardo da Silva Torres, Edward A. Fox: SIERRA - A Superimposed Application for Enhanced Image Description and Retrieval. ECDL 2006: 540 -543 • Uma Murthy, Ricardo da Silva Torres, Edward A. Fox: Integrated Support for Superimposed Applications in Biomedical Information Systems, Virginia Tech, 2006 (for the National Library of Medicine), http: //si. dlib. vt. edu/publications/NLMWhite. Paper. SI 2. pdf. • M. A. Gonçalves. Streams, Structures, Spaces, Scenarios, and Societies: A Formal Framework for Digital Libraries and Its Applications: Defining a Quality Model for Digital Libraries (Chapter 8) – PHD thesis, Virginia Tech CS Dept. , Blacksburg, VA, 2004. http: //scholar. lib. vt. edu/theses/available/etd_12052004_135923/ • M. A. Gonçalves, B. L. Moreira, E. A. Fox, L. T. Watson. What is a good digital library? - defining a quality model for digital libraries. To appear in Information Processing and Management, 2007. 76 • http: //fox. cs. vt. edu/cv. htm

Summary • • • Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Summary • • • Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary 77