Скачать презентацию What is Computer Vision Finding meaning in images Скачать презентацию What is Computer Vision Finding meaning in images

8ec008868667772304ccebd80e281e7f.ppt

  • Количество слайдов: 43

What is Computer Vision? Finding “meaning” in images How many cells are on this What is Computer Vision? Finding “meaning” in images How many cells are on this slide? Is there a brain tumor here? Find me some pictures of horses. Where is the road? Is there a safe path to the refrigerator? Where is the “widget” on the conveyor belt? Is there a flaw in the "widget"? 1 Where’s Waldo? Who is at the door? Ellen L. Walker

Some Applications of Computer Vision Scanning parts for defects (machine inspection) Highlighting suspect regions Some Applications of Computer Vision Scanning parts for defects (machine inspection) Highlighting suspect regions on CAT scans (medical imaging) Creating 3 D models of objects (or the earth!) based on multiple images Alerting a driver of dangerous situations (or steering the vehicle) Fingerprint recognition (or other biometrics) 2 Sorting envelopes with handwritten addresses (OCR) Creating performances of CGI (computer generated imagery) characters based on real actors’ movements Ellen L. Walker

Why is vision so difficult? The bar is high – consider what a toddler Why is vision so difficult? The bar is high – consider what a toddler ‘knows’ about vision Vision is an ‘inverse problem’. Forward: one scene => one image Reverse: one image => many possible scenes ! The human visual system makes assumptions 3 Why optical illusions work (see fig. 1. 3) Ellen L. Walker

3 Approaches to Computer Vision (Szeliski) Scientific: derive algorithms from detailed models of the 3 Approaches to Computer Vision (Szeliski) Scientific: derive algorithms from detailed models of the image formation process Vision as “reverse graphics” Statistical: use probabilistic models to describe the unknowns and noise, derive ‘most likely’ results Engineering: Find techniques that are (relatively) simple to describe and implement, but work. 4 Requires careful testing to understand limitations and costs Ellen L. Walker

Testing Vision Algorithms Pitfall: developing an algorithm that “works” on your small set of Testing Vision Algorithms Pitfall: developing an algorithm that “works” on your small set of test images used during development Surprisingly common in early systems Suggested 3 -part strategy 1. 2. Add noise to your data and study degradation 3. 5 Test on clean synthetic data (e. g. graphics output) Test on real-world data, preferably from a wide range of sources (e. g. internet data, multiple ‘standard’ datasets) Ellen L. Walker

Engineering Approach to Vision Applications Start with a problem to solve Consider constraints and Engineering Approach to Vision Applications Start with a problem to solve Consider constraints and features of the problem Choose candidate techniques We will cover many techniques in class ! If you’re doing an IRC, I’ll try to point you in the right directions to get started 6 Implement & evaluate one or more techniques (careful testing!) Choose the combination of techniques that works best and finish implementation of system Ellen L. Walker

Scientific and Statistical Approaches Find or develop the best possible model of the physics Scientific and Statistical Approaches Find or develop the best possible model of the physics of the system of image formation Scene geometry, light, atmospheric effects, sensors … Scientific: Invert the model mathematically to create recognition algorithms Simplify as necessary to make it mathematically tractable Take advantage of constraints / appropriate assumptions (e. g. right angles) Statistical: Determine model (distribution) parameters and/or unknowns using Bayesian techniques 7 Many machine learning techniques are relevant here Ellen L. Walker

Levels of Computer Vision Low level (image processing) Use similar algorithms for all images Levels of Computer Vision Low level (image processing) Use similar algorithms for all images Nearly always required as preprocessing for HL vision Makes no assumptions about image content Techniques from signal processing, “linear systems” High level (image understanding) Often specialized for particular types of images 8 Requires models or other knowledge about image content Techniques from artificial intelligence (especially nonsymbolic AI) Ellen L. Walker

Overview of Topics (Szeliski, ch. 1) 9 Ellen L. Walker Overview of Topics (Szeliski, ch. 1) 9 Ellen L. Walker

Operations on Images Low-level operators Neighborhood operations Pixel operations Whole image operations (often neighborhood Operations on Images Low-level operators Neighborhood operations Pixel operations Whole image operations (often neighborhood in a loop) Multiple-image combination operations Image subtraction (to highlight motion) Higher-level operations 10 Compute features from an image (e. g. holes, perimeter) Compute non-iconic representations Ellen L. Walker

Object Recognition I have a model (something I want to find) Image (iconic) Geometric Object Recognition I have a model (something I want to find) Image (iconic) Geometric (2 D or 3 D) Pattern (image or features) Generic model (“idea”) I have an image (1 or more) I have questions 11 Where is M in I (if at all)? What are parameters of M that can be determined from I? Ellen L. Walker

Top-Down vs. Bottom up Top-down Example: image of “balls” - search for circles Use Top-Down vs. Bottom up Top-down Example: image of “balls” - search for circles Use knowledge to guide image processing Danger: Too much top-down reasoning leads to hallucination! Bottom-up Example: edge detection -> thresholding -> feature detection 12 Extract as much from image as possible without any models Danger: “Correct” results might have nothing to do with the actual image contents Ellen L. Walker

Geometry: Point Coordinates 2 D Point x = (x, y) Actually a column vector Geometry: Point Coordinates 2 D Point x = (x, y) Actually a column vector (for matrix multiplication) Homogeneous 2 D point (includes a scale factor) x = (x, y, w) (2, 1, 1) = (4, 2, 2) = (6, 3, 3) = … Transformation: 13 (x, y) => (x, y, 1) (x, y, w) => (x/w, y/w) Special case: (x, y, 0) is “point at infinity” Ellen L. Walker

Modifying Homogeneous Points Increase y Increase x Increase w 14 Ellen L. Walker Modifying Homogeneous Points Increase y Increase x Increase w 14 Ellen L. Walker

Lines L = (a, b, c) (homogeneous vector) x*l = ax + by + Lines L = (a, b, c) (homogeneous vector) x*l = ax + by + c (line equation) Normal form: L = (n_x, n_y, d) 15 n is the direction, d is the distance to origin Theta = acos(n_y / n_x) Ellen L. Walker

Transformations 2 D to 2 D (3 x 3 matrix, multiply by homogeneous point) Transformations 2 D to 2 D (3 x 3 matrix, multiply by homogeneous point) Coordinates r 00, r 01, r 10, r 11 specify rotation or shearing For rotation: r 00 and r 11 are cos(theta), r 01 is –sin(theta) and r 11 is sin(theta) Coordinates tx and ty are translation in x and y Coordinate s adjusts overall scale; sx and sy are 0 except for projective transform (next slide) 16 Ellen L. Walker

Hierarchy of 2 D Transformations (Table 2. 1) 17 Ellen L. Walker Hierarchy of 2 D Transformations (Table 2. 1) 17 Ellen L. Walker

3 D Geometry Points: add another coordinate, (x, y, z, w) Planes: like lines 3 D Geometry Points: add another coordinate, (x, y, z, w) Planes: like lines in 2 D with an extra coordinate Lines are more complicated Possibility: represent line by 2 points on the line Any point on the line can be represented by combination of the points 18 r = (lambda)p 1 + (1 -lambda)p 2 If 0<=lambda<=1, then r is on the segment from p 1 to p 2 See 2. 1 for more details and more geometric primitives! Ellen L. Walker

3 D to 2 D Transformations These describe ways that 3 D reality can 3 D to 2 D Transformations These describe ways that 3 D reality can be viewed on a 2 D plane. Each is a 3 x 4 matrix Many options, see Section 2. 1. 4 19 Multiply by 3 D Homogeneous vector (4 coordinates) to get a 2 D homogeneous vector (3 coordinates) Most common is perspective projection Ellen L. Walker

Perspective Projection Geometry (Simplified) See Figure 2. 7 20 Ellen L. Walker Perspective Projection Geometry (Simplified) See Figure 2. 7 20 Ellen L. Walker

Simplifications of Simplifications of "Pinhole Model" Image plane is between the center of projection and the object rather than behind the lens as in a camera or an eye Objects are really imaged upside-down All angles, etc. are the same, though Center of projection is a virtual point (focal point of a lens) rather than a real point (pinhole) 21 Real lenses collect more light than pinholes Real lenses cause some distortion (see Figure 2. 13) Ellen L. Walker

Photometric Image Formation A surface element Reflects radiation from a single source 22 (with Photometric Image Formation A surface element Reflects radiation from a single source 22 (with angle to N) Toward the sensor (with normal N) (This is called irradiance) Which senses and records it Figure 2. 14 Ellen L. Walker

Light Sources Geometry (point vs. area) Location Spectrum (white light, or only some wavelengths) Light Sources Geometry (point vs. area) Location Spectrum (white light, or only some wavelengths) Environment map (measure ambient light from all directions) Model depends on needs 23 Typical: sun = point at infinity More complex model needed for soft shadows, etc. Ellen L. Walker

Reflected Light Diffuse reflection (Lambertian, matte) Amount of light in a given direction (apparent Reflected Light Diffuse reflection (Lambertian, matte) Amount of light in a given direction (apparent brightness) depends on angle to surface normal Specular reflection All light reflected in one ray; angle depends on light source and surface normal Figure 2. 17 24 Ellen L. Walker

Image Sensors Charge couple device (CCD) Count photons (unit of light) that hit (one Image Sensors Charge couple device (CCD) Count photons (unit of light) that hit (one counter pixel) (Light energy converted to electrical charge) “Bleed” from neighboring pixels Each pixel reports its value (scaled by resolution) Result is a stream of numbers (0=black, MAX=white) 25 Ellen L. Walker

Image Sensors: CMOS No bleed; each pixel is independently calculated Each pixel can have Image Sensors: CMOS No bleed; each pixel is independently calculated Each pixel can have an independent color filter Common in current (2009) digital cameras Figure 2. 24 26 Ellen L. Walker

Digital Camera Image Capture Figure 2. 25 27 Ellen L. Walker Digital Camera Image Capture Figure 2. 25 27 Ellen L. Walker

Color Image Color requires 3 values to specify (3 images) Cyan, Magenta, Yellow, Black Color Image Color requires 3 values to specify (3 images) Cyan, Magenta, Yellow, Black (CMYK): printing YIQ (Y is intensity, I is “lightness”): color TV signal (Y is B/W signal) Red, green, blue (RGB) : computer monitor Hue, Saturation, Intensity: Hue = pure color, saturation = density of color, intensity = b/w signal (“color-picker”) Visible color depends on color of object, color of light, material of object, and colors of nearby objects! (There is a whole subfield of vision that “explains” color in images. See section 2. 3. 2 for more details and pointers) 28 Ellen L. Walker

Problems with Images Geometric Distortion (e. g. barrel distortion) - from lenses Scattering - Problems with Images Geometric Distortion (e. g. barrel distortion) - from lenses Scattering - e. g. thermal "lens" in atmosphere - fog is an extreme case Blooming - CCD cells affect each other Sensor cell variations - "dead cell" is an extreme case Discretization effects (clipping or wrap around) - (256 becomes 0) Chromatic distortion (color "spreading" effect) Quantization effects (fitting a circle into squares, e. g. ) 29 Ellen L. Walker

Aliasing: An Effect of Sampling Our vision system interpolates between samples (pixels) If not Aliasing: An Effect of Sampling Our vision system interpolates between samples (pixels) If not enough samples, data is ambiguous 30 Ellen L. Walker

Image Types Analog image - the ideal image, with infinite precision spatial (x, y) Image Types Analog image - the ideal image, with infinite precision spatial (x, y) and intensity f(x, y) is called the picture function Digital image - sampled analog image; a discrete array I[r, c] with limited precision (rows, columns, max I) If all pixel values are 0 or 1, I[r, c] is a binary image M[r, c] is a multispectral image. Each pixel is a vector of values, e. g. (R, G, B) 31 I[r, c] is a gray-scale image L[r, c] is a labeled image. Each pixel is a symbol denoting the outcome of a decision, e. g. grass vs. sky vs. house Ellen L. Walker

Coordinate systems Raster coordinate system Origin (0, 0) is at upper left Derives from Coordinate systems Raster coordinate system Origin (0, 0) is at upper left Derives from printing an array on a line printer Row (R) increases downward; Column (C) increase to right Cartesian coordinate system Origin (0, 0) is at lower left Typical system used in mathematics X increases to the right; Y increases upward Conversions 32 Y = Max. Rows - R ; X = C Or, pretend X=R, Y=C then rotate your printout 90 degrees! Ellen L. Walker

Resolution In general, resolution is related to a sensor's measurement precision or ability to Resolution In general, resolution is related to a sensor's measurement precision or ability to detect fine features Nominal resolution of a sensor is the size of the scene element that images to a singel pixel on the image plane Resolution of a camera (or an image) is also the number of rows & columns it contains (or their product), e. g. "8 megapixel resolution" Subpixel Resolution means that the precision of measurement is less than the nominal resolution (e. g. subpixel resolution of positions on a line segment) 33 Ellen L. Walker

Variation in Resolution 34 Ellen L. Walker Variation in Resolution 34 Ellen L. Walker

Quantization Errors One pixel contains a mixture of materials 10 m x 10 m Quantization Errors One pixel contains a mixture of materials 10 m x 10 m area in a satellite photo Across the edge of a painted stripe or character Subpixel shift in location has major effect on image! Shape distortions caused by quantization ("jaggies") Change / loss in features 35 Thin stripe lost Area varies based on resolution (e. g. circle) Ellen L. Walker

Representing an Image file header Type (binary, grayscale, color, video sequence) Creation date Title Representing an Image file header Type (binary, grayscale, color, video sequence) Creation date Title Dimensions (#rows, #cols, #bits / pixel) History (nice) Data 36 Values for all pixels, in a pre-defined order based on the format Might be compressed (e. g. JPEG is lossy compression) Ellen L. Walker

PNM: a simple image representation Portable N Map Pgm = portable gray map Pbm PNM: a simple image representation Portable N Map Pgm = portable gray map Pbm = portable bit map Ppm = portable pixel map (color image) Image. J reads, displays, and converts PNM images. (pbm, pgm, ppm) – and much more! 37 GIF, JPG and other formats can be converted (both ways) Image. J does not appear to convert color to grayscale Irfanview (Windows only) reads, displays and converts Ellen L. Walker

PNM Details Comments can be anywhere after Px - lines begin with # First PNM Details Comments can be anywhere after Px - lines begin with # First Px (where x is an integer from 1 -6) P 1/4 = binary, P 2/5 = gray, P 3/6 = color P 1 -P 3: data in ascii, P 4 -P 6: data in binary Next come 2 integers (#cols, #rows) Next (unless it’s P 1 or P 4) comes 1 integer (#greylevels) The rest of the image is pixel values from 0 to #greylevels – 1 (If color: red image, then green, then blue) 38 Ellen L. Walker

PGM image example This one is really boring! P 2 32 4 000123 39 PGM image example This one is really boring! P 2 32 4 000123 39 Ellen L. Walker

Other Image Formats GIF (Compuserve - commercial) 8 -bit color (uses a colormap) LZW Other Image Formats GIF (Compuserve - commercial) 8 -bit color (uses a colormap) LZW lossless compression available TIFF (Aldus Corp. , for scanners) Multiple images, 1 -24 bits / pixel color Lossy or lossless compression available JPEG (Joint Photographic Experts Group - free) Real-time encoding/decoding in hardware 40 Lossy compression Up to 64 K x 24 bits Ellen L. Walker

Specifying a vision system Inputs Environment (e. g. light(s), fixtures for holding objects, etc. Specifying a vision system Inputs Environment (e. g. light(s), fixtures for holding objects, etc. ) OR unconstrained environments Sensor(s) OR someone else's images Resolution & formats of image(s) Algorithms To be studied in detail later(!) Results 41 Image(s) Non-iconic results Ellen L. Walker

If you're doing an IRC… (Example from 2002) What is the goal of your If you're doing an IRC… (Example from 2002) What is the goal of your project? How will you get data (see "Inputs" last slide) Camera above monitor; user at (relatively) fixed distance Determine what kind of results you need Eye-tracking to control a cursor - hands-free game operation Outputs to control cursor How will you judge success? 42 User is satisfied that cursor does what he/she wants Works for many users, under range of conditions Ellen L. Walker

Staging your project What can be done in 3 weeks? 6 weeks? 9 weeks? Staging your project What can be done in 3 weeks? 6 weeks? 9 weeks? 1. 2. Reliably track eye direction between a single pair of images (output "left", "right", "up", "down") [DONE] 3. Find the eyes in a single image [DONE] Use a continuous input stream (preferably real time) [NOT DONE] Program defensively Keep printouts as last-ditch backups When a milestone is reached, make a copy of the code and freeze it! (These can be smaller than the 3 -week ideas above) 43 Back up early and often! (and in many places) When time runs out, submit and present your best frozen milestone. Ellen L. Walker