TREC 2003 Video Retrieval Evaluation Overview Coordinators Alan

Скачать презентацию TREC 2003 Video Retrieval Evaluation Overview Coordinators Alan

de747f8fe79377a06150f4cd58d83502.ppt

Количество слайдов: 156

TREC 2003 Video Retrieval Evaluation Overview Coordinators: Alan Smeaton Centre for Digital Video Processing Dublin City University Wessel Kraaij Department of Multimedia Technology Information Systems Division TNO TPD NIST: Paul Over Retrieval Group Information Access Division Information Technology Laboratory National Institute of Standards and Technology 17. Nov 2003 TRECVID 2003

Origins o Problem: n n n o Rapidly growing quantities of digital video Increasing research in content-based retrieval from digital video But no common basis for evaluation/comparison of approaches Approach: n Find as much video data as possible and make it available to the community of researchers n Use the data to build an open, metrics-based evaluation in the Cranfield/TREC tradition n Invite participation and see what happens… 17. Nov 2003 TRECVID 2003 2

Goals o Promote progress in content-based retrieval from large amounts of digital video o Answer some questions: n How can systems achieve such retrieval (in collaboration with a human)? n How can one reliably benchmark such systems? 17. Nov 2003 TRECVID 2003 3

Evolution… 2001 q TREC 2001 Video retrieval track q Data: 11 hrs (Open. Video, NIST) o 2 Tasks: n Shot boundary determination n Search o Fully automatic o Interactive o Participating groups: 12 17. Nov 2003 TRECVID 2003 4

Evolution… 2002 q TREC 2002 Video retrieval track q Data: 73 hrs (Prelinger Archive) o 3 Tasks: n Shot boundary determination n High-level feature extraction (10) n Search (manual and interactive) o Participating groups: 17 o New: n Common shot reference defines unit of retrieval n Common key frames n Shared features, ASR output provided by LIMSI 17. Nov 2003 TRECVID 2003 5

Evolution… 2003 o o o TRECVID Workshop Data: 133 hrs (1998 ABC/CNN news + C-SPAN) 4 Tasks: n n o o Shot boundary determination High-level feature extraction (17) Story segmentation and classification Search (manual and interactive) Participating groups: 24 New: n n Common annotation effort Advisory committee 17. Nov 2003 TRECVID 2003 6

Advisory committee o John Eakins (University of Northumbria at Newcastle) o Peter Enser (University of Brighton) o Alex Hauptmann (CMU) o Annemieke de Jong (Netherlands Institute for Sound & Vision) o Michael Lew (Leiden Insitute of Advanced Computer Science) o Georges Quenot (CLIPS-IMAG Laboratory) o John Smith (IBM) o Richard Wright (BBC) 17. Nov 2003 TRECVID 2003 7

Shot Boundary Detection task o o SBD is an enabling function for almost all content-based operations on digital video, so its important; (Still) not a new problem, but a challenge because of gradual transitions and false positives caused by photo flashes, rapid camera movement, object movement, etc. ; Task is to identify transitions and determine whether each is “cut”, “dissolve”, “fadeout/in” or “other”; TRECVID 2003 dataset is slightly (10%) larger than 2002 but has many more (78%) shot transitions; 17. Nov 2003 TRECVID 2003 9

Shot Boundary Detection task o o o Manually created ground truth of 3, 734 transitions (thanks again to Jonathan Lasko) with 70. 7% hard cuts, 20. 2% dissolves, 3. 1% fades and 5. 9% other … very similar ratios to 2002; Up to 10 submissions per group, measured using precision and recall, with a bit of flexibility for matching gradual transitions; Most participating groups use their 10 submissions to “tweak” some parameter; 17. Nov 2003 TRECVID 2003 10

14 Groups in Shot Boundary Detection Shots Stories Features Search Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) 17. Nov 2003 TRECVID 2003 X X X X X X X X X X X 11

What do the results look like ? 17. Nov 2003 TRECVID 2003 12

Evaluation Measures Precision = # Transitions Correctly Reported # Transitions Reported Recall = # Transitions Correctly Reported # Transitions in Reference Frame Precision = # Frames Correctly Reported in Detected Transitions # Frames reported in Detected Transitions Frame Recall = 17. Nov 2003 # Frames Correctly Reported in Detected Transitions # Frames in Reference Data for Detected Transitions TRECVID 2003 13

Recall and precision for cuts 17. Nov 2003 TRECVID 2003 14

Recall and precision for cuts (zoomed) 17. Nov 2003 TRECVID 2003 15

… and for Gradual Transitions … 17. Nov 2003 TRECVID 2003 16

Recall and precision for gradual transitions 17. Nov 2003 TRECVID 2003 17

Frame-recall & -precision for GTs 17. Nov 2003 TRECVID 2003 18

So, who did what ? The approaches…. 17. Nov 2003 TRECVID 2003

24 Participating Groups Shots Stories Features Search Accenture Technology Laboratories: X X Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) X Extract I-frames from encoded stream; X X X X X Compute 3 Chi-square values across X X 3 separate histograms … global X X intensity, row intensity and column X X intensity and apply threshold, then X combine; X X X This gives indicator location and is X followed by frame decoding and fine. X X grained examination; X X 17. Nov 2003 TRECVID 2003 X X 20

Recall and precision for cuts (zoomed) 17. Nov 2003 TRECVID 2003 21

Gradual Transitions 17. Nov 2003 TRECVID 2003 22

Frame-recall & -precision for GTs 17. Nov 2003 TRECVID 2003 23

24 Participating Groups Shots Stories Features Search CLIPS-IMAG: Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) X X Based on image differences with motion compensation which uses X X X optical flow as a pre-process and X X direct detection of dissolves; X X X X X Same as used in TV 2001 and TV 2002 X X with little modification; X TRECVID 2003 X X X X X Also includes direct detection of X X camera flashes; 17. Nov 2003 X X X 24

Recall and precision for cuts (zoomed) 17. Nov 2003 TRECVID 2003 25

Gradual Transitions 17. Nov 2003 TRECVID 2003 26

Frame-recall & -precision for GTs 17. Nov 2003 TRECVID 2003 27

24 Participating Groups Fudan University: Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) Shots Stories Features Search X X X Reused TV 2002 SBD approach based on frame-frame comparison using X X X luminance difference and colour X X X histogram similarity; X X Adaptive thresholding X X X X GTs are searched seeking Xa black X frame to determine whether they are X X fades, else dissolves; X X X TRECVID 2003 X X Detection of camera flashes; 17. Nov 2003 X X X X X 28

Recall and precision for cuts (zoomed) 17. Nov 2003 TRECVID 2003 29

Gradual Transitions 17. Nov 2003 TRECVID 2003 30

Frame-recall & -precision for GTs 17. Nov 2003 TRECVID 2003 31

24 Participating Groups Shots Stories Features Search FX-PAL: Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) X X X For each frame compute selfsimilarity against all in a window of X X past and future frames, as. X well as X X cross-similarity between past & X X X future frames; X X TRECVID 2003 X X X X Includes a clever way to reduce X X X computation costs; X X 17. Nov 2003 X X Generates a similarity matrix and examine characteristics of this Xmatrix to indicate cuts and GTs. X; Presentation to follow X ; X X X 32

Recall and precision for cuts (zoomed) 17. Nov 2003 TRECVID 2003 33

Gradual Transitions 17. Nov 2003 TRECVID 2003 34

Frame-recall & -precision for GTs 17. Nov 2003 TRECVID 2003 35

24 Participating Groups Shots Stories Features Search IBM Research: Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) Used SBD from X X X Cue. Video. X system X Presentation to follow 17. Nov 2003 X TRECVID 2003 X X X X X X X X X 36

Recall and precision for cuts (zoomed) 17. Nov 2003 TRECVID 2003 37

Gradual Transitions 17. Nov 2003 TRECVID 2003 38

Frame-recall & -precision for GTs 17. Nov 2003 TRECVID 2003 39

24 Participating Groups Shots Stories Features Search Imperial College London: X Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) X Colour histogram similarity of adjacent frames with a constant X X X similarity threshold; X X Same as TV 2002 and showing tradeoff of P vs. R as threshold X X varies; X Good performance for simple X X approach; 17. Nov 2003 TRECVID 2003 X X X X X X X X 40

Recall and precision for cuts (zoomed) 17. Nov 2003 TRECVID 2003 41

Gradual Transitions 17. Nov 2003 TRECVID 2003 42

Frame-recall & -precision for GTs 17. Nov 2003 TRECVID 2003 43

24 Participating Groups KDDI: Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) Shots Stories Features Search X X X X For cuts, preprocess the encoded X MPEG-1 stream to locate high inter- X X X frame differences using motion X X vectors then decode likely Xframes X and X X test for luminance and chrominance X X differences; X X For dissolves, detect gradual X X X changing over time using DCT activity X data; X X X X Specific detection looking for wipes, X and for camera flashes; X Because it processes encoded stream, TRECVID 2003 24 x real time on PC; 17. Nov 2003 X X 44

Recall and precision for cuts (zoomed) 17. Nov 2003 TRECVID 2003 45

Gradual Transitions 17. Nov 2003 TRECVID 2003 46

Frame-recall & -precision for GTs 17. Nov 2003 TRECVID 2003 47

24 Participating Groups Shots Stories Features Search KU Leuven: Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) X X X X Adaptive thresholding on the average X intensity differences between X X X adjacent frames; X X X Includes motion compensation which computes an affine transformation. X X X between consecutive frames; X 17. Nov 2003 TRECVID 2003 X X X X X 48

Recall and precision for cuts (zoomed) 17. Nov 2003 TRECVID 2003 49

Gradual Transitions 17. Nov 2003 TRECVID 2003 50

Frame-recall & -precision for GTs 17. Nov 2003 TRECVID 2003 51

24 Participating Groups Shots Stories Features Search Ramon Llull University: X Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) X X X Global colour histogram differences as a measure of discontinuity is used X X to detect cuts; X X X For GTs , a method to account for linear colour variation of images X X X across the duration of the GT, with X specific treatment of moving objects X X during the GT which can distort this; 17. Nov 2003 TRECVID 2003 X X X X X X 52

Recall and precision for cuts (zoomed) 17. Nov 2003 TRECVID 2003 53

Gradual Transitions 17. Nov 2003 TRECVID 2003 54

Frame-recall & -precision for GTs 17. Nov 2003 TRECVID 2003 55

24 Participating Groups RMIT University: Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) Target GTs ; Shots Stories Features Search X X X Using a moving window of X(200) X frames, use current frame Xas a. X QBE X X X against all in the window with a 6 -X frame DMZ around current frame; X X X 17. Nov 2003 TRECVID 2003 X X X X Based on frame-frame similarity and X adaptive thresholding ; X A refinement on TV 2002; X X X X 56

Recall and precision for cuts (zoomed) 17. Nov 2003 TRECVID 2003 57

Gradual Transitions 17. Nov 2003 TRECVID 2003 58

Frame-recall & -precision for GTs 17. Nov 2003 TRECVID 2003 59

24 Participating Groups University of Bremen: Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) Shots Stories Features Search X X X Combination of 3 approaches: -changes in image luminance; X X X X -gray level histogram differences; X X -FFT feature extraction; Combined, with adaptive thresholding ; 17. Nov 2003 TRECVID 2003 X X X X X X X 60

Recall and precision for cuts (zoomed) 17. Nov 2003 TRECVID 2003 61

Gradual Transitions 17. Nov 2003 TRECVID 2003 62

Frame-recall & -precision for GTs 17. Nov 2003 TRECVID 2003 63

24 Participating Groups Shots Stories Features Search University of Central Florida: X Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) X X X Colour histogram intersection of frames with sub-sampling of video at X X 5 fps; X X X X This gives approximate location of X shot bounds, followed by fine-grained X X frame-frame comparison using 24 -bin X X colour histogram; X X Post-processing to detect abrupt X changes in illumination (camera X X X flashes); X X X Also determined transition types; 17. Nov 2003 TRECVID 2003 X X 64

Recall and precision for cuts (zoomed) 17. Nov 2003 TRECVID 2003 65

Gradual Transitions 17. Nov 2003 TRECVID 2003 66

Frame-recall & -precision for GTs 17. Nov 2003 TRECVID 2003 67

24 Participating Groups University of Iowa: Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) Shots Stories Features Search X X X X Comparison of adjacent frames based X on X X X X -512 -bin global colour histogram X X X X -60 x 60 pixel thumbnail vs. thumbnail X based on pixel/pixel X X X -Sobel filtering and detected edge X X differences X X and then Boolean and arithmetic X X X product combinations of these; X X Presentation to follow 17. Nov 2003 TRECVID 2003 X X X X 68

Recall and precision for cuts (zoomed) 17. Nov 2003 TRECVID 2003 69

Gradual Transitions 17. Nov 2003 TRECVID 2003 70

Frame-recall & -precision for GTs 17. Nov 2003 TRECVID 2003 71

24 Participating Groups University of Kansas: Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) Shots Stories Features Search X X X No details available at this Xtime 17. Nov 2003 TRECVID 2003 X X X X X X X X X 72

Recall and precision for cuts (zoomed) 17. Nov 2003 TRECVID 2003 73

Gradual Transitions 17. Nov 2003 TRECVID 2003 74

Frame-recall & -precision for GTs 17. Nov 2003 TRECVID 2003 75

Observations o o o Most techniques are based on frame-frame comparisons, some with sliding windows; Comparisons are based on colour and on luminance, mostly; Some use adaptive thresholding, some don’t; Most operate on decoded video stream; Some have special treatment of motion during GTs, of flashes, of camera wipes; Performances are getting better; 17. Nov 2003 TRECVID 2003 76

Task definition o o o 1. Identify the individual news items in a news show New task in TRECVID, has been studied in ASR/IR community (TDT) Hope to show the gain of using video features Segmentation task n n n 2. Identify story boundaries in CNN and ABC news shows Ground truth based on TDT 2 annotations Evaluation based on precision & recall, boundaries have to be within +/- 5 seconds interval around ground truth boundaries News classification task n n Annotate stories as either news or non-news Evaluation based on percentage of correctly identified news story footage 17. Nov 2003 TRECVID 2003 77

8 Participating Groups Dublin City University (Irl) Fudan Univ. (China) IBM Research (US) KDDI (JP) National Univ. Singapore (Sing. ) Stream. Sage (US) Univ. of Central Florida (US) Univ. of Iowa (US) 17. Nov 2003 TRECVID 2003 78

Story segmentation: recall and precision by condition 17. Nov 2003 TRECVID 2003 79

Story segmentation: recall and precision by system and condition (1 -4) 44 4 1 2 2 33 1 2 12 1 3 3 1 TDT system Conditions: 1: V+A 2: V+A+ASR 3: ASR 4: Other 24 1 2 1 3 14 3 33 4 3 3 3 17. Nov 2003 TRECVID 2003 80

Segmentation, within system (F) 17. Nov 2003 TRECVID 2003 81

Story classsification: news recall and precision by condition 17. Nov 2003 TRECVID 2003 82

Story classsification: news recall and precision by condition - zoomed 17. Nov 2003 TRECVID 2003 83

Story classifcation: news recall and precision by system 17. Nov 2003 TRECVID 2003 84

Story classifcation: news recall and precision by system and condition (1 -4) zoomed 22 1 1 1 4 4 2 2 2 1 1 2 2 1 2 3 Conditions: 1: V+A 2: V+A+ASR 3: ASR 4: Other 4 1 3 3 3 17. Nov 2003 TRECVID 2003 2 3 3 3 1 2 85

Classification, within system (F) 17. Nov 2003 TRECVID 2003 86

Group headlines Fudan University Segmentation Dublin City University (Irl) Fudan • Anchor detection based on clustering and Univ. (China) heuristics IBM Research (US) KDDI (JP) • Commercial detection based on ? National Univ. Singaporevariant of Text-tiling • ASR segmentation using a (Sing. ) Stream. Sage (US) Univ. • Rule. Central Maxent classifiers of based and Florida (US) News classification Univ. of Iowa (US) • GMM/Maxent using music, commercial and speech proportion as features 17. Nov 2003 TRECVID 2003 87

Group headlines KDDI Segmentation Dublin City University (Irl) 1. Univ. (China) Fudan All shots are classified as ANCHOR, REPORT or COMMERCIAL, IBM Research (US) using audio & motion intensity, color SVM. Subsequently rule based KDDI (JP) segmentation. National Univ. Singapore (Sing. ) 2. Direct (US) Stream. Sageclassification of boundaries, using the Univ. features of two shots before and after the of Central Florida (US) boundary candidate. SVM Univ. of Iowa (US) 3. Classification 4. 17. Nov 2003 SVM for NEWS-NEWS, NEWS-MISC and MISC NEWS TRECVID 2003 88

Group headlines Stream. Sage (/ DCU) Dublin City University (Irl) ASR only segmentation runs Fudan Univ. (China) Three methods: IBM Research (US) KDDI 1. lexical chaining to define topically coherent (JP) segments National Univ. Singapore (Sing. ) Stream. Sage (US)text-tiling 2. Variant of Univ. of Central Florida (US) 3. Use methods 1 and 2 for compiling a list of Univ. of Iowa (US) announce topic introduction cue-phrases that or closure 17. Nov 2003 TRECVID 2003 89

Group headlines University of Central Florida Dublin City University Classification: Combined Segmentation and (Irl) Fudan Univ. (China) 1. Story boundaries IBM Research (US)are marked by blank frames 2. Long KDDI (JP)story news, short story non-news National Univ. Singapore (Sing. ) 3. Merge adjacent non-news stories Stream. Sage (US) Univ. of Central Florida (US) Conclusion: story length Univ. of Iowa (US) is a strong feature for news classification 17. Nov 2003 TRECVID 2003 90

Group headlines Dublin City University Dublin Research IBM City University (Irl) Fudan Univ. (China) National University Singapore IBM Research (US) University KDDI (JP) of Iowa National Univ. Singapore (Sing. ) presentations follow…. Stream. Sage (US) Univ. of Central Florida (US) Univ. of Iowa (US) 17. Nov 2003 TRECVID 2003 91

Observations o o o Video provides strong clues for story segmentation and even more for classification, best runs are either type 1 or 2 AV runs generally have a higher precision Combination of AV and ASR gives a small gain for segmentation Most approaches are generic Are the combination methods optimal? Are the ASR segmentation runs state of the art? 17. Nov 2003 TRECVID 2003 92

FE Task definition o Goal: Build benchmark for detection methods of high-level features o Secondary goal: feature-indexing can help search and navigation o New: common feature annotation n n Helps (a. o. ) to standardize training resources across sites Category A: sites work with just the common development data and common annotations Category B: sites work with just the common development data and any annotation set Category C: other 17. Nov 2003 TRECVID 2003 93

FE evaluation o o Each feature is assumed to be binary: absent or present for each shot Find shots that contain a certain feature, rank them according to confidence measure, submit the top 2000 Submissions are pooled Evaluate performance quality by measuring the average precision of each feature detection method 17. Nov 2003 TRECVID 2003 94

10 Participating Groups Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Fudan Univ. (China) IBM Research (US) Imperial College London (UK) Institut Eurecom (FR) Univ. of Central Florida (US) Univ. Oulu/VTT (FI) 17. Nov 2003 TRECVID 2003 95

17 Features 11. 12. 13. 14. 15. 16. 17. 18. 19. Indoors News subject face – not a news show person People – at least three humans Building – walled structure with roof Road Vegetation – living vegetation in its natural env. Animal Female speech – woman speaking (visible, audible) Car/truck/bus – exterior of. . 17. Nov 2003 TRECVID 2003 96

17 Features 20. 21. 22. 23. 24. 25. 26. 27. Aircraft News subject monologue – uninterrupted Non-studio setting Sporting event Weather news Zoom in Physical violence – between people / objects Madeleine Albright – visible 17. Nov 2003 TRECVID 2003 97

in Groups N do or ew s pe s fa op ce l bu e ild ro ing ad ve ge an tati im on a Fe l m ca ale r sp ee ai rc ch ra N ft ew s N m on on Sp stu o or dio t W ing ea e v Zo the en o r n t Ph m i ew s ys n Pe ical rs vio on le X nc e Who worked on which features 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) X X X X X CLIPS-IMAG (FR) X CWI Amsterdam / Univ. of Twente (NL) X X X X Fudan Univ. (China) X X X X IBM Research (US) X X X Imperial College London (UK) Institut Eurecom (FR) Univ. of Central Florida Univ. Oulu/VTT (FI) X X X X X X X X X X X (US) X X X X 6 17. Nov 2003 X 6 6 7 6 6 4 7 6 8 TRECVID 2003 X 3 6 6 98

Avg. P by feature (all runs) Middle half of the data Median 17. Nov 2003 TRECVID 2003 99

Avg. P by feature (top 10 runs) Median -> 17. Nov 2003 TRECVID 2003 100

Avg. P by feature (top 5 runs by per feature) Zoom Female speech 17. Nov 2003 News subject monologue TRECVID 2003 101

Avg. P by feature (top 5 runs by per feature) zoomed: Hard features Female speech aircraft M. A. vegetation Car/truck people indoors animal road Non-studio News face violence building 17. Nov 2003 TRECVID 2003 102

Avg. P by feature (top 5 runs per feature) zoomed: Easy features weather sports News subject monologue zoom 17. Nov 2003 TRECVID 2003 103

Avg. precision vs total number true for each feature Maximums Medians weather Non-studio 17. Nov 2003 TRECVID 2003 104

33 of 60 runs contributed one or more unique, true shots 17. Nov 2003 TRECVID 2003 105

True shots contributed uniquely by run for a feature 17. Nov 2003 TRECVID 2003 106

True shots contributed uniquely for a feature by a participating group 17. Nov 2003 TRECVID 2003 107

Group headlines Accenture Technology Laboratories (US) Accenture Technology Laboratories: Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) People CWI Amsterdam / Univ. of Twente (NL) Fudan • Skin tone detection, count faces Univ. (China) IBM Research (US) Weather Imperial College London (UK) Institut Eurecom (FR) color distribution + position • 200

Group headlines Accenture Technology Laboratories (US) Carnegie Mellon University: Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) All features CWI Amsterdam / Univ. of Twente (NL) Fudan Univ. (China) Presentation follows IBM Research (US) Imperial College London (UK) Institut Eurecom (FR) Univ. of Central Florida (US) Univ. Oulu/VTT (FI) 17. Nov 2003 TRECVID 2003 109

Group headlines Accenture Technology Laboratories (US) CLIPS-IMAG: Carnegie Mellon Univ. (US) CLIPS-IMAGM. A. 1 feature: (FR) CWI Amsterdam / Univ. of Twente (NL) How would (China) Fudan Univ. a blind person locate a shot containing Madeline Albright IBM Research (US) Imperial College London (UK) • Speaker detection (acoustic model) Institut Eurecom (FR) Univ. of Central mentioned in one of the preceding • M. A. is probably Florida (US) Univ. Oulu/VTT (FI) shots 17. Nov 2003 TRECVID 2003 110

Group headlines Accenture Technology Laboratories (US) CWI Amsterdam / (US) Carnegie Mellon University of Twente: CLIPS-IMAG (FR) 14 features CWI Amsterdam / Univ. of Twente (NL) Fudan Univ. (China) Feature extraction == query by Working hypothesis: IBM Research (US) sample Imperial College London (UK) Generative probabilistic Institut Eurecom (FR) retrieval model (same as used for search task), divide (US) Univ. of Central Florida frame in pixel blocks Univ. Oulu/VTT (FI) Take a sample of the annotated frames, rank the keyframes based on the likelihood that they generate the query sample 17. Nov 2003 TRECVID 2003 111

Group headlines Accenture Technology Laboratories (US) Fudan University: all features Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) Scene features: grid, color histogram, edge direction, CWI Amsterdam / Univ. of Twente (NL) texture, KNN, Ada. Boost Fudan Univ. (China) Vegetation, (US) IBM Research. Weather: texture+color, SVM, GMM, Max. Ent Imperial College London (UK) Institut Eurecom (FR) Objects: Univ. of Central Florida (US) Univ. • Car: Schneiderman Oulu/VTT (FI) • Animal: vegetation with KNN • Aircraft: detect context of aircraft Audio: female speech : 12 -MFCC, Pitch, 10 -LPC 17. Nov 2003 TRECVID 2003 112

Group headlines Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) All features CWI Amsterdam / Univ. of Twente (NL) Fudan Univ. (China) Presentation follows IBM Research (US) Imperial College London (UK) Institut Eurecom (FR) Univ. of Central Florida (US) Univ. Oulu/VTT (FI) IBM Research: 17. Nov 2003 TRECVID 2003 113

Group headlines Accenture Technology Laboratories (US) Imperial College London: Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) Feature 16: Vegetation CWI Amsterdam / Univ. of Twente (NL) Based on grass detector Fudan Univ. (China) using a colour feature KNN IBM Research (US) Imperial College London (UK) Institut Eurecom (FR) Univ. of Central Florida (US) Univ. Oulu/VTT (FI) 17. Nov 2003 TRECVID 2003 and 114

Group headlines Accenture Technology Laboratories (US) Institut Eurecom: Apply LSI Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) 15 features CWI Amsterdam / Univ. of Twente (NL) Keyframes (China) Fudan Univ. are segmented into regions IBM Research (US) Regions are clustered using K-means Imperial College London (UK) Institut X frame matrix is reduced by LSI Cluster Eurecom (FR) Univ. of Central Florida (US) Use new feature space for GMM and KNN detectors Univ. Oulu/VTT (FI) 17. Nov 2003 TRECVID 2003 115

Group headlines Accenture Technology Laboratories (US) University of Univ. (US) Carnegie Mellon Central Florida CLIPS-IMAG (FR) 2 features CWI Amsterdam / Univ. of Twente (NL) Weather news Fudan Univ. (China) IBM Research (US) • Color histogram similarity Imperial College London (UK) Institut Eurecom (FR) Univ. of Central Florida (US) Univ. Oulu/VTT (FI) Non-studio setting • Taken as: all non anchor shots 17. Nov 2003 TRECVID 2003 116

Group headlines Accenture Technology Laboratories (US) University of Univ. VTT Carnegie Mellon Oulu / (US) : CLIPS-IMAG 15 features using: Extracted (FR) CWI Amsterdam / Univ. of Twente (NL) • Motion Fudan Univ. (China) IBM Research (US) • Temporal color correlogram Imperial College London (UK) Institut Eurecom (FR) • Edge gradients Univ. of Central Florida (US) • Several low level audio features (used for outdoors, Univ. Oulu/VTT (FI) vehicle noise, sport, monologue • Feature fusion based on Borda count voting 17. Nov 2003 TRECVID 2003 117

Observations o Some feature detectors had quite good results o Are features well chosen for search ? o Is detection quality good enough? o Which combination methods work well? Which don’t? 17. Nov 2003 TRECVID 2003 118

TRECVID 2003: Search Task o o Search, summarisation, linking, etc. are the ultimate operations on digital video and SBD, features, segmentation, are all enablers for this; TRECVID search is an extension of its text-only analogue where systems, including a human in the loop, are presented with a topic and are to return up to 1, 000 shots which meet the need; Note the unit of retrieval is the shot, not the news story; Two search modes … manual and interactive, and we’re not yet able for full automatic; 17. Nov 2003 TRECVID 2003 119

Search Types: Interactive and Manual 17. Nov 2003 TRECVID 2003 120

Search Types: Interactive and Manual o o o Topics are MM and the interactions between text, image, video, audio, are complex and understanding how exemplars represent information need, is not really understood; This task really benefitted from the ASR donated by Jean-Luc Gauvain of LIMSI which is (anecdotally) very accurate; One baseline run based on ASR-only was required of every manual system; 17. Nov 2003 TRECVID 2003 121

Topics o o o We can’t achieve the ideal of topics from real users searching our dataset; NIST created topics based on a number of basic search types: generic/specific and person/thing/event where there are multiple relevant shots coming from more than one video; Videos were viewed by NIST personnel (sound off), notes taken on content, and candidates emerged and were chosen; 17. Nov 2003 TRECVID 2003 122

25 Topics [total relevant found] 100. 101. 102. 103. 104. 105. 106. 107. 108. Find shots with aerial views containing both one or more buildings and one or more roads [87] Find shots of a basket being made - the basketball passes down through the hoop and net [104] Find shots from behind the pitcher in a baseball game as he throws a ball that the batter swings at [183] Find shots of Yasser Arafat [33] Find shots of an airplane taking off [44] Find shots of a helicopter in flight or on the ground [52] Find shots of the Tomb of the Unknown Soldier at Arlington National Cemetery [31] Find shots of a rocket or missile taking off. Simulations are acceptable [62] Find shots of the Mercedes logo (star) [34] 17. Nov 2003 TRECVID 2003 123

25 Topics 109. 110. 111. 112. 113. 114. 115. 116. 117. Find shots of one or more tanks [16] Find shots of a person diving into some water [13] Find shots with a locomotive (and attached railroad cars if any) approaching the viewer [13] Find shots showing flames [228] Find more shots with one or more snow-covered mountain peaks or ridges. Some sky must be visible behind them. [62] Find shots of Osama Bin Laden [26] Find shots of one or more roads with lots of vehicles [106] Find shots of the Sphinx [12] Find shots of one or more groups of people, a crowd, walking in an urban environment (for example with streets, traffic, and/or buildings) [665] 17. Nov 2003 TRECVID 2003 124

25 Topics 118. 119. 120. 121. 122. 123. 124. Find shots of Congressman Mark Souder [6] Find shots of Morgan Freeman [18] Find shots of a graphic of Dow Jones Industrial Average showing a rise for one day. The number of points risen that day must be visible. (Manual only) [47] Find shots of a mug or cup of coffee. [95] Find shots of one or more cats. At least part of both ears, both eyes, and the mouth must be visible. The body can be in any position. [122] Find shots of Pope John Paul II [45] Find shots of the front of the White House in the daytime with the fountain running [10] 17. Nov 2003 TRECVID 2003 125

Evaluation o o o Groups allowed to submit up to 10 runs and 37 interactive and 38 manual runs were submitted from 11 groups; All submissions were pooled and judged by NIST assessors to variable depths depending on “hit rate” of finding relevant shots; Evaluation was trec_eval; 17. Nov 2003 TRECVID 2003 126

Results q o o Absolute performance figures must be taken in their context, so don’t believe the numbers … read the papers ! We tried to level the field by standardising on time spent (15 min. ) and thought of introducing a reference system at each site, but TRECVID not yet mature enough for that; Also, submitted runs do not necessarily correspond to 1 user, but can be aggregates of multiple users, 2+ groups did this; 17. Nov 2003 TRECVID 2003 127

20 of 75 runs contributed 1+ unique, relv. shots 17. Nov 2003 TRECVID 2003 129

Relevant shots contributed uniquely for a topic by a participating group 17. Nov 2003 TRECVID 2003 130

Manual runs - top 10 (of 38) (with mean human effort / topic) 17. Nov 2003 TRECVID 2003 131

Interactive runs - top 10 (of 36) (with mean elapsed time) 17. Nov 2003 TRECVID 2003 132

17. Nov 2003 O n bi a sa m La de n TRECVID 2003 ud er ie r T om b nx hi ld k So ar M Sp So n fa t ra A w no nk U er ss Ya Avgerage precision by topic 133

Average precision (interactive max) vs number relevant shots found 17. Nov 2003 TRECVID 2003 134

24 Participating Groups Shots Stories Features Search Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) X X 1. Carnegie Mellon University: X X X X Interactive: same system as TV 2002 – X X X split topics among 5 individuals, X text X X X search across ASR, CC, OCR with X X storyboarding of keyframes, layout X under X X user control, filtering based Xon features; X another run used improved version with. X X X more effective visualisation and X browsing; X X X X X Manual: multiple retrieval agents across X colour , texture, ASR, OCR and some X X X features, combined in different ways, incl. 17. Nov 2003 TRECVID 2003 Negative pseudo-RF and “co-retrieval”; 135

24 Participating Groups Shots Stories Features Search Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) X X X 2. Lowlands (CWI & U. Twente): merging information from multiple X X modalities: X X X X - run separate Qs for each topic example; X - combine different models of Qs; X X X - combine sims from system / user X X X judgments; X X to build a language model for each shot; X Pre-computing data; X X X NNs for each keyframe in X X Interactive better than manual and TRECVID 2003 combination of text/visual better than text 17. Nov 2003 136

24 Participating Groups Shots Stories Features Search Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) X X X 3. Dublin City University: X X X Variation of Físchlár in interactive setting X X X with 16 users, 7 mins each, doing 12 X X X X topics; X X Two system variations were. X ASR search X X only and ASR plus query image vs. shot. X X X keyframe; X X Both had shot-level browsing, user X X controlled ASR/image search balance, RF X X X allowed by expanding text and/or image; X Aim was to see if users used and 17. Nov 2003 TRECVID 2003 benefited from text & image; X X X 137

24 Participating Groups Shots Stories Features Search Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) 4. Fudan University: X X X X X Manual search using 4 different X X X approaches and then combinations: X X X X - ASR X X - colour histogram X X - multiple feature ( colour Xhist , edge, X coocurrence texture) X X X - “special search” where user selects X most appropriate for topic, from 1. X X human face recog , 2. general shot X X features, 3. multiple features, 4. Xmotion X X (camera and object), 5. colour /texture, X X 6. colour regions; 17. Nov 2003 TRECVID 2003 X X 138

24 Participating Groups Shots Stories Features Search Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) 5. IBM Research: X X X X X Examined Spoken Document Retrieval X X X and content based techniques in. X manual X X X rins X X SDR used automatic and phonetic X X X techniques and SDR fusion across X multiple match functions, re-ranking X shots based on color blobs; X X X Also did fully automatic multiple. X X X example content-based (which is X beyond “manual”) and fusion of content. X X X -based and SDR-based via linear 17. Nov 2003 TRECVID 2003 weighting; 139

24 Participating Groups Shots Stories Features Search Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) X X X 6. Imperial College London: X X Used ASR & 11 low-level colour /texture, X X X disregarding image footer likely Xto X X X contain news ticker; X X Features include global colour , colour X X from frame centre, colour X structure X X X descriptors, RGB colour moments, 44 x 27 X X pixel gray thumbnails, convolution X filters, variance, image smoothness and X X uniformity, ASR; X X X Retrieval of k. NNs , thumbnails on 2 D X display, RF by user movement of 17. Nov 2003 TRECVID 2003 thumbnails, demo ? 2 x manual, 4 x X X 140

24 Participating Groups Shots Stories Features Search Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) 7. Indiana University: X X X Used ASR and built a system around X X X interactive text search and query X X X expansion plus video shot browsing; X X X X X Interactive search with 1 subject doing X X X all topics, 15 mins max but used only X 10 X X mins ; X X Future work is to include search Xbased X on visual features; X X X 17. Nov 2003 TRECVID 2003 X X X 141

17. Nov 2003 TRECVID 2003 142

24 Participating Groups Shots Stories Features Search Accenture Technology Laboratories (US) X Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) X CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) X FX-Pal (US) X IBM Research (US) X Imperial College London (UK) X Indiana University (US) Institut Eurecom (FR) KDDI (JP) X KU Leuven (BE) X Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) X RMIT University (Aus) X Stream. Sage (US) Univ. of Bremen (D) X Univ. of Central Florida (US) X Univ. of Iowa (US) X Univ. of Kansas (US) X Univ. of North Carolina (US) “Best” (per topic) objectively selected Univ. Oulu/VTT (FI) X X X 8. Media. Mill/University of Amsterdam: X X Interactive search with 22 groups of 2 Xusers (in X pairs? ), using a combination of: X X X - CMU donated features X X - derived “concepts” from LSI over ASR X X - keywords from ASR X to yield an active set of 2, 000 shots then a X snazzy shot browser to select examples; X Only 1 of 11 complete runs submitted. X X Used 1 system so no local variant to compare against, and selectively combined sets. Xof users’ X X outputs per topic to generate submission; by submitting X X X the result where the most shots were selected by the 17. Nov 2003 TRECVID 2003 users 143

17. Nov 2003 TRECVID 2003 144

24 Participating Groups Shots Stories Features Search Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) X X X 9. National University of Singapore: X X X 1. News story retrieval based on ASR and X X using Word. Net and web to X X expand Xthe X X original query, POS tagging of query; X 2. Filter shots from story based X shot on X X X features; X X 3. Use image & video matching to re-rank X remaining shots; X X X X In interactive runs user views top 100 shots X and marks relevant ones X X Results show marked impact of manual vs. 17. Nov 2003 interactive, I. e. TRECVID 2003 user RF; X 145

24 Participating Groups Shots Stories Features Search Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) X X X X 10. University of North Carolina (1): Compare ASR-only, features-only, X X X ASR+features, in interactive search X X X task; X X Features: aggregated results of X 10 X X groups from 17 features used in X extraction task; X X ASR was LIMSI, combination was X 2 x. ASR; X X X X 36 searchers, each doing 12 topics over X systems in 15 mins per topic; X X Shot browser had TRECVID 2003 annotated storyboard of keyframe + ASR, lots of pre- and post 17. Nov 2003 146

17. Nov 2003 TRECVID 2003 147

17. Nov 2003 TRECVID 2003 148

17. Nov 2003 TRECVID 2003 149

17. Nov 2003 TRECVID 2003 150

17. Nov 2003 TRECVID 2003 151

24 Participating Groups Shots Stories Features Search Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) X X X X 10. University of North Carolina (2): X X X Results … no statistical difference in. X X X Precision, but statistical difference in X X X recall where features-only was less X than X X the other two … poor feature X X recognition accuracy ? X X X Large variability in time taken per X X search, avg 4 to 6 minutes; X X Much evaluation of user’s perception X X X and satisfaction; X Some helpful pointers on future 17. Nov 2003 TRECVID 2003 assessment of interactive search; X X X 152

24 Participating Groups Shots Stories Features Search Accenture Technology Laboratories (US) Carnegie Mellon Univ. (US) CLIPS-IMAG (FR) CWI Amsterdam / Univ. of Twente (NL) Dublin City University ( Irl) Fudan Univ. (China) FX-Pal (US) IBM Research (US) Imperial College London (UK) Indiana University (US) Institut Eurecom (FR) KDDI (JP) KU Leuven (BE) Mediamill/U Amsterdam (NL) National Univ. Singapore (Sing. ) Ramon Llull Univ. (ES) RMIT University (Aus) Stream. Sage (US) Univ. of Bremen (D) Univ. of Central Florida (US) Univ. of Iowa (US) Univ. of Kansas (US) Univ. of North Carolina (US) Univ. Oulu/VTT (FI) X X X 11. University of Oulu/VTT: X X VIRE has interactive cluster/temporal X X X shot browsing and shot similarity based X X X on visual ( colour , edge structure, X X X motion), conceptual (15 x features from X X feature set) and lexical (from ASR) X X similarity; X X X Manual runs. . Pre-select combinations X of features and images from topic; X X X X Interactive runs … 8 people, 2 systems, X 9. 5 mins per topic, (a) browse by visual X X features only and (b) browse by visual 17. Nov 2003 TRECVID 2003 features plus ASR … result indicates no 153

17. Nov 2003 TRECVID 2003 154

Observations o o Lots of variation, interesting shot browsing interfaces, mixture of interactive & manual; Approximately as much use of donated features as TV 2002; A lot more participation, more runs, better at the upper end … quite respectable curves ! Nearly a dozen groups can now complete the search task and the demos are impressive; 17. Nov 2003 TRECVID 2003 155

Plans o o o Make notebook papers, presentations, and feedback on plans for TRECVID 2004 available on the website in December Make final papers available on the website by mid-March 2004 Plans: probably complete 2 -yr plan n n o Add 80 hours of new test data from same news sources Repeat 2003 tasks with some improvements More information as it develops at: www-nlpir. nist. gov/projects/trecvid 17. Nov 2003 TRECVID 2003 156