90837984ded27f91b0aba1e1370ef944.ppt
- Количество слайдов: 25
1/25 Detection and Extraction of Artificial Text for Semantic Indexing Christian Wolf and Jean-Michel Jolion Laboratoire Reconnaissance de Formes et Vision Bât. Jules Verne, INSA de Lyon 69621 Villeurbanne cedex, France January 9 th 2002 Dagstuhl Seminar on Content-Based Image and Video Retrieval This presentation can be downloaded from: http: //rfv. insa-lyon. fr/~wolf/presentations
Plan of the presentation êIntroduction êDetection and tracking êEnhancement and binarization of the text boxes êExperiments and results êOpen problems êConclusion and Outlook 2/25 Slides: 6 3 4 2 9 1 25 This work resulted in a patent submitted by France Télécom on May 23 th, 2001 under the reference FR 01 06776. Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
Content based image retrieval 3/25 Result Example image Similarity Function Indexing phase Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
Similarity measures 4/25 similar Introduction Detection Not similar Enh/Binarization Exp. Results Open problems Conclusion
5/25 Indexing using Text Result Key word Keyword based Search Patrick Mayhew Indexing phase Patrick Mayhew Min. chargé de l´irlande de Nord ISRAEL Jerusalem montage T. Nouel. . . . Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
Video properties 6/25 80 px 12 px 8 px Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
Text extraction: general scheme Detection of the text in single frames Tracking 7/25 Image enhancement Multiple frame integration Video "EVENEMENT" "ACTU" "SPELEOS" "Gouffre Berger (Isére)" "aujourd'hui" "France 3 Alpes" "un spéléologue sauveteur" Introduction Detection Segmentation/ Binarisation OCR Enh/Binarization Exp. Results Open problems Conclusion
8/25 Text detection by accumulation of horizontal gradients (Le. Bourgeois, 1997). Justification: Text forms a regular texture containing vertical edges which are aligned horizontally. Post processing by mathematical morphology. Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
9/25 Detection in video sequences Detection per single frame Text occurrences List of rectangles per frame Frame nr. (time) Tracking keeping track of text occurrences Suppression of false alarms Image Enhancement Multiple frame integration Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
10/25 Image enhancement Integration of multiple frames to create a single image of higher quality. Super-resolution (interpolation) M 1 M 2 M 4 M 3 An additional weight is included into the interpolation scheme, which decreases the weights of temporal outlier pixels. Multiple frame integration: Averaging Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
11/25 Binarization Niblack: m s Sauvola et al. : k R Contrast in the center of the image The maximum local contrast mean of the window standard deviation of the window parameter dynamics of the gray values of the image The contrast of the window M minimum gray value of the image Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
Binarization methods: examples 12/25 Original image Fisher (windowed) Yanowitz B. Niblack Sauvola et al. Our method Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
13/25 Binarization using a priori knowledge Bayesian MAP estimation using prior knowledge on the spatial relationships in the image, modeled as a Markov random field. (In collaboration with David Doermann from the Language and Media Processing Laboratory of the University of Maryland) Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
5 different MPEG 1 videos of resolution 384 x 288. 14/25 62 minutes 93000 frames 413 text appearances Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
Detection and OCR results Detection results True pos. True neg. Detection OCR Results, classified by binarization method False pos. False neg. Introduction 15/25 Enh/Binarization Exp. Results Open problems Conclusion
Open questions 16/25 ê Scene text (general orientations, deformations) ê Moving text Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
17/25 What is scene text? Frames containing scene text Frames containing artificial text Video frames We do not have enough information about the importance of text in the destination domain. How many frames do contain text and scene text? Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
Detection: From artificial text to scene text 18/25 Several constraints have to be removed passing from artificial text to scene text: ! The constraints on temporal stability need to be abandoned or at least softened (no initial frame integration) ! Text can be aligned in all orientations (Creation of an oriented feature in multiple directions, similar to invariant features) ! Contrast is possibly lower because scene text is not designed to be read easily (Is detection of unreadable text necessary? ). Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
19/25 Text models Simple Models Complex Models sets of edges or vertical strokes. . . templates, probabilistic models (MRF). . . + Generalize well, respond to many kinds of text - Many false alarms + Powerful less false alarms - Do not generalize well Main problem: Distinction between characters and structures similar to text according to the chosen model. Assumptions are necessary (on the font, size, style, contrast, color, length, etc. ) but not sufficient. Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
20/25 Sven Dickinson: evolution of models Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
21/25 What is text? Whatever model we choose, we cannot detect/recognize all kinds of text without solving the general image understanding problem. The best thing we can do is to include richer features into the detection process: a composite model for text. ê Structural analysis (e. g. detection and recognition of characters by strokes). Very hard and very unlikely to work in the case of noisy images, low resolutions and difficult fonts. ê Statistical modeling of text features (e. g. by learning techniques). Problem: For a robust detection high neighborhood sizes are needed, which lead to combinatorial explosions. E. g. : Texture based methods for small text and segmentation + perceptual grouping, structural methods for big text. Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
Learning techniques: pro et contra 22/25 Bibliography: ê Learning directly the gray levels of the input image (Jung 2001) ê Learning features, i. e. coefficients of the Haar wavelet (Li and Doermann 2000) or edge strength (Lienhart 2000) + Learning is an easy way to handle the complexity of text. - Text can appear in videos in many different fonts, sizes, styles, colors, orientations etc. Learning all different forms is maybe not feasible. Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
Color processing for detection? Original image Sobel on grayscale image 23/25 Sobel on L*u*v* image ê Saturating distance or non saturating distance? ê Reflection processing? Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
Tracking of moving scene text 24/25 Do we detect the text in single frames (like artificial text), or do we treat the flow in its integrality? ê Single frames: Multiple frame integration of moving text needs robust registration of the text boxes in different frames (e. g. rough segmentation into text and background pixels before the registration of the text pixels only). Robust methods, which are able to track objects in clutter, are needed. ê Detection of moving objects, e. g. by optical flow, spatiotemporal methods. ê Mosaicing techniques can be employed for image enhancement. Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion
25/25 Conclusion and Outlook ê We developed a system for detection, tracking, enhancement and binarization of artificial text in videos. ê The total recognition rate for artificial text is surprisingly high, given the quality of the text, but not yet good enough for indexing purposes. ê The remaining problems in text extraction seem to be typical for applications in visual information management: We went as far as we could with low level features. We can’t do the necessary step to semantic information. What is text? Possible definition: text is, what (a human or an OCR) can recognize as text. ê We have to include as much a priori knowledge as possible into the process. Introduction Detection Enh/Binarization Exp. Results Open problems Conclusion


