Machine learning category recognition Cordelia Schmid Jakob

Machine learning & category recognition Cordelia Schmid Jakob Verbeek

Content of the course • Visual object recognition • Robust image description • Machine learning

Visual recognition - Objectives • Particular objects and scenes, large databases …

Visual recognition - Objectives • Object classes and categories (intra-class variability)

Visual object recognition

Visual object recognition outdoors countryside indoors outdoors car exit person through house enter person a door building kidnapping car drinking car crash person glass roadcarpeople field car street candle car street

Visual recognition - Objectives • Human motion and actions

Difficulties: within object variations Variability: Camera position, Illumination, Internal parameters Within-object variations

Difficulties: within-class variations

Visual recognition • Robust image description – Appropriate descriptors for objects and categories • Statistical modeling and machine learning for vision – Selection and adaptation of existing techniques

Robust image description • Invariant detectors and descriptors • Scale and affine-invariant keypoint detectors

Matching of descriptors Significant viewpoint change

Contour features Basis: contour segment network edgel-chains partitioned into straight contour segments connected at edgelchains’ endpoints and junctions [Ferrari, Fevrier, Jurie & Schmid, Pami’ 07] Ferrari et al. ECCV 2006

Localization of “shape” categories Window descriptor + SVM Horse localization

Why machine learning? • Early approaches: simple features + handcrafted models • Can handle only few images, simples tasks L. G. Roberts, Machine Perception of Three Dimensional Solids, Ph. D. thesis, MIT Department of Electrical Engineering, 1963.

Why machine learning? • Early approaches: manual programming of rules • Tedious, limited and does not take into accout the data Y. Ohta, T. Kanade, and T. Sakai, “An Analysis System for Scenes Containing objects with Substructures, ” International Joint Conference on Pattern Recognition, 1978.

Why machine learning? • Today lots of data, complex tasks Internet images, personal photo albums Movies, news, sports

Why machine learning? • Today lots of data, complex tasks Surveillance and security Medical and scientific images

Why machine learning? • Today: Lots of data, complex tasks • Instead of trying to encode rules directly, learn them from examples of inputs and desired outputs

Types of learning problems • Supervised – Classification – Regression • • • Unsupervised Semi-supervised Reinforcement learning Active learning ….

Supervised learning • Given training examples of inputs and corresponding outputs, produce the “correct” outputs for new inputs • Two main scenarios: – Classification: outputs are discrete variables (category labels). Learn a decision boundary that separates one class from the other – Regression: also known as “curve fitting” or “function approximation. ” Learn a continuous input-output mapping from examples (possibly noisy)

Unsupervised Learning • Given only unlabeled data as input, learn some sort of structure • The objective is often more vague or subjective than in supervised learning. This is more of an exploratory/descriptive data analysis

Unsupervised Learning • Clustering – Discover groups of “similar” data points

Unsupervised Learning • Quantization – Map a continuous input to a discrete (more compact) output 2 1 3

Unsupervised Learning • Dimensionality reduction, manifold learning – Discover a lower-dimensional surface on which the data lives

Unsupervised Learning • Density estimation – Find a function that approximates the probability density of the data (i. e. , value of the function is high for “typical” points and low for “atypical” points) – Can be used for anomaly detection

Other types of learning • Semi-supervised learning: lots of data is available, but only small portion is labeled (e. g. since labeling is expensive)

Other types of learning • Semi-supervised learning: lots of data is available, but only small portion is labeled (e. g. since labeling is expensive) – Why is learning from labeled and unlabeled data better than learning from labeled data alone? ?

Other types of learning • Active learning: the learning algorithm can choose its own training examples, or ask a “teacher” for an answer on selected inputs

Other types of learning • Reinforcement learning: an agent takes inputs from the environment, and takes actions that affect the environment. Occasionally, the agent gets a scalar reward or punishment. The goal is to learn to produce action sequences that maximize the expected reward (e. g. driving a robot without bumping into obstacles)

Visual object recognition - tasks • Image classification: assigning label to the image Car: present Cow: present Bike: not present Horse: not present …

Visual object Tasks recognition - tasks • Image classification: assigning label to the image Car: present Cow: present Bike: not present Horse: not present … • Object localization: define the location and the category Car Cow Location Category

Bag-of-features for image classification • Excellent results in the presence of background clutter bikes books building cars people phones trees

Bag-of-features for image classification SVM Extract regions Compute descriptors Find clusters and frequencies Compute distance matrix Classification [Nowak, Jurie&Triggs, ECCV’ 06], [Zhang, Marszalek, Lazebnik&Schmid, IJCV’ 07]

Spatial pyramid matching Perform matching in 2 D image space [Lazebnik, Schmid & Ponce, CVPR’ 06]

Retrieval examples Query

Localization of object categories

Localization approach Histogram of oriented image gradients as image descriptor SVM as classifier, importance weighted descriptors

Unsupervised learning using Markov field aspect models [Verbeek & Triggs, CVPR’ 07] • Goal: automatic interpretation of natural scenes – assign pixels in images to visual categories – learn models from image-wide labeling, without localization • Per training image a list of present categories Example scene interpretation of training image • Approach: capture local and image-wide correlations – Markov fields capture local label contiguity – Aspect models capture image-wide label correlation – Interleave: • Region-to-category assignments using Loopy Belief Propagation and labeling • Category model estimation

Localization based on shape [Ferrari, Jurie & Schmid, CVPR’ 07] [Marzsalek & Schmid, CVPR’ 07]

Master Internships • Internships are available in the LEAR group – Object localization (C. Schmid) – Video recognition (C. Schmid) – Semi-supervised / text-based learning (J. Verbeek) • If you are interested send an email to us