Utility f Vision — A Review Perception

Скачать презентацию Utility f Vision — A Review Perception

943fda6a3ec989e09d4cbdac33de1ee4.ppt

Количество слайдов: 55

Utility = f(Vision) - A Review

Perception “To perceive is also about how to approach and what to do with an object …” “Perception/cognition is determined by aspects and form of the agent (Embodiment) …”

Affordances “An affordance is an intrinsic property of an object, allowing an action to be performed with the object. It also depends on the embodiment of the agent performing the action …” “Objects which are cars for residents of Lilliput, are merely toys for Gulliver… ”

A Condition for Survival “One of the most basic function of all organisms is the cutting up of environment into classifications by which nonidentical stimuli can be treated as equivalent …”

Clustering Visual Input • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level Tremendous variation in shape ! (Hard for state of art algorithms based on appearance to recognize them) BUT All are sittable surfaces! (for humans) Or, dimensionality = 1 in affordance space.

So, the question to ask is: What are the affordances an object can support given its visual features such as shape, texture and color ?

Why to answer this question ? • Obtaining semantic clustering of objects Generalization ! • Building vision perception for robotic platforms. • Generating scene descriptions in a utilitarian framework Visual aid devices for blind ! • For the sake of science !

Points to Note Shape Is not one to one Affordances “The proposition is to use appearance cues as a supplement to affordance learning and not to totally ignore them…”

Continued… Implicit and Explicit Knowledge • Click to edit Master text styles • Second level • Third level “Shapes can only represent explicit knowledge. . ” “Knowledge about hooks/fixture is implicit in (b)…” • Fourth level • Fifth level

A Survey of efforts in the past “If I have seen further it is by standing on ye sholders of Giants” -Isaac Newton

Affordance Learning From Activity From Simulation From Shape Global Features Local Features Body Activity Hand Activity Interactive Robot

Freeman & Newell [1971] • Structure is a unit that provides a set of functions. • Laid down a formalism for • When • How Can structures be combined to provide required functions.

The first efforts ! (Winston, Binford et al [1983]) • Functional description of an object cup • ako: A kind of • hq: Has quality

Input to System:

First Vision System using functional information (Connel & Brady [1987]) • Describe functional concepts geometrically. • Generalize !

• Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level

Understanding Functional Reasoning ([Di Manzo, Ricci et al 1989]) • Knowledge representation Semantic Networks • Objects 3 D octree models (Synthetic). • Try to account for real-world noise • Functional Elements: Support, Grasp, Hang, Cut, Equilibrium, Enter, Contain, Pierce, Stop

Understanding Functional Reasoning 1989]) • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level ([Di Manzo, Ricci et al

More Attempts 1994]) • Concept of Knowledge Primitives • • • Dimensions (length or area of surface) Relative orientation: between surfaces Proximity: between surfaces/faces Clearance : Lack of obstacles in a defined area Stability: being in rest in certain orientation. • Pre-define Categories and Sub-Categories • CAD and Range-sensor data. ([Stark et al. 1991 -

([Stark et al. 1991 -1994]) • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level

([Stark et al. 1991 -1994]) • Categories Considered • Chairs, Tables, Bench, Bookshelf, Bed, Not Known • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level

A Part based approach ([Rivlin et al 1995]) • Extract 4 parts Reason about their relative configuration • Sticks, Blobs, Plate, Strip

Criticism • Highlight the importance of Knowledge representation • Hard-Coded definitions • Almost no testing on real world data • Instead of trying to recognize surfaces for sitting, sleeping, keeping objects ended up recognizing chairs, beds and tables ! • Pseudo-functional Space

Using Affordance Cues for Object Detection • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level

Continued. .

Use of Coarse Features ([Dillman et al. ICRA 2011]) • 2 Oranges, 1 Apple, Can, Tissue Packs, Beaker, Bottle • Coarse features generalize the most. • Active Stereo, Multiple Viewpoints

Affordance Learning by Actions

Human Actions and Object Context (Moore et al. [ICCV 1999]) • Jointly Model actions and Image features • Pre-defined object model • Shape: Pixel area, size of bounding box, L 2 -distance from known classes • Action: HMM based hand pose estimation

Results • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level

Interaction Signatures ([Venkatesh et al ICCV-05]) • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level

Interaction Signatures ([Venkatesh et al ICCV-05]) Consider only printer, chair, keyboard and paper !

Observing Humans 2005]) • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level ([Veloso et al. ICRA

Objects in Action ([Gupta et al CVPR 2007]) • • HOG Initial guess on probability of object in a window. Reach (Mr) Manipulation (Mm) Reaction (Or)

Objects in Action ([Gupta et al CVPR 2007]) • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level

Interactive Learning (Leonardis et al 2009]) • Object Shape Ellipses (Curvature, area, etc) • Action features Color and Edge histograms • SVM object features to clustered action features.

Object-Action Recognition ([Kragic 2011]) • Consider • Book, Magazine, Hammer, Pitcher, Box, Cup • Hammering, opening, pouring • Video Data • Object Recognition HOG • Hand pose (velocity, angle b/w joints, orientation) SVM • Learn a joint model using Factorial CRF.

Affordance Learning by Simulation

Learning Spatial Relations Using Functional Simulation (Sjoo et al [IROS 2011]) • Learn relation between 2 objects • • Support Protection Constraint Move Together • Features • Pose, closest seperation, area, distance, contact patch area etc. • Predict Relation given feature.

What makes a chair ? • Discussed !

Indoor Scenes • Highly Structured ! • Surface Orientations: Mainly Vertical and Horizontal • Components • Boundaries • Walls, Floors, Doors • Furniture • Tables, Chairs, Beds, Shelves, Cabinets • Actions • Cups, Bottles, Glasses, Books, Pens, Kitchen Appliances etc. • Current Proposition Discover the first 2 categories of scene components

Scene Interpretation

Most Relevant Work (Rusu et al [2010]) • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level

Framework • Kitchen Environment • Co-Register 16 scans Laser and TOF Cameras. • Bottom and Topmost regions Floor and Roof • Determine X and Y axes • Use heuristics on remaining vertical surfaces to get walls. • Label other vertical surfaces as furniture.

Segmentation • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level

Furniture Labelling

Moving Ahead (Replacing Heuristics) • Horizontal L-1 features • Z-Coordinate, Length and Width • Vertical L-1 features • Height, Floor Distance, Roof Distance, Width • L 2 features • Height, Width • Num Handles, Knobs • Learn using CRF.

Some Results • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level • Legend • Horizontal Planes: Floors, Tables, Ceilings • Vertical Planes: Walls, Furniture Candidates • Furniture: Cupboards, Drawers, Kitchen Appliances

Leftover Objects • Like cups, bottles etc. • Application: Grasping, Manipulation

Geometrical Primitives • Planes, Spheres, Cylinders, Cones, Tori, Edges and Corners • Use local point features for primitive labeling using CRF. • Further using point labels, an SVM modeling capturing shape is used for identifying class of object. (4 object classes).

Proposition

Pipeline Point Cloud Observe Clusters Surface Normal Clustering Identify floor, roof, Z axis Segmentatio n Normal Edges Walls, X and Y Axes Compute Features Identify Horizontal and Vertical Surfaces

Features • Defined for each horizontal/vertical surface • • • Orientation of Surface Area of Surface Volume of Object Distance from floor Distance from walls • At a 2 nd level • Relation with other surfaces in the object • Metrics • Human Height

Hopeful Objects • Identification • Walls, Floors, Navigable Spaces • Emerge by un-supervised clustering (pure geometrical features) • • • Tables/Desks Chairs Beds Shelves Almirahs Doors Cabinets Windows Dustbins

Further Extensions • Poselet driven affordance learning: • Human moving around in an environment. • Vision system Tracks humans, associates poses and objects. • Supplement object detection by using poses. • Eg: Recognizing bean bags for sitting. • Predict the affordance pose given the object. • f. MRI Study: • Learn a model by showing common tools. • Use say a screwdriver for hammering would be interesting to see if it is predicted as hammer or screw-driver.