Advanced Computer Vision Chapter 7 STRUCTURE FROM MOTION

Скачать презентацию Advanced Computer Vision Chapter 7 STRUCTURE FROM MOTION

e51f7302118bbc7bc67d447d2c3ead1a.ppt

Количество слайдов: 61

Advanced Computer Vision Chapter 7 STRUCTURE FROM MOTION Presented by Prof. Chiou-Shann Fuh & Tz-Chia Tseng 0910315537 r 05945052@ntu. edu. tw Structure from Motion 1

What Is Structure from Motion? 1. Study of visual perception. 2. Process of finding the three-dimensional structure of an object by analyzing local motion signals over time. 3. A method for creating 3 D models from 2 D pictures of an object. Structure from Motion 2

Example Picture 1 Picture 2 Structure from Motion 3

Example (cont). 3 D model created from the two images Structure from Motion 4

Example Figure 7. 1: Structure from motion systems: Orthographic factorization Structure from Motion 5

Example Figure 7. 2: line matching Structure from Motion 6

Example a b c d e Figure 7. 3: (a-e) incremental structure from motion Structure from Motion 7

Example Figure 7. 4: 3 D reconstruction of Trafalgar Square Structure from Motion 8

Example Figure 7. 5: 3 D reconstruction of Great Wall of China. Structure from Motion 9

Example Figure 7. 6: 3 D reconstruction of the Old Town Square, Prague Structure from Motion 10

Today’s Lecture Structure from Motion • What is structure from motion? • Triangulation and pose • Two-frame methods Structure from Motion 11

7. 1 Triangulation • A problem of estimating a point’s 3 D location when it is seen from multiple cameras is known as triangulation. • It is a converse of pose estimation problem. • Given projection matrices, 3 D points can be computed from their measured image positions in two or more views. Structure from Motion 12

Triangulation (cont). • Find the 3 D point p that lies closest to all of the 3 D rays corresponding to the 2 D matching feature locations {xj} observed by cameras {Pj = Kj [Rj | tj] } tj = -Rjcj cj is the jth camera center. Structure from Motion 13

Triangulation (cont). Figure 7. 7: 3 D point triangulation by finding the points p that lies nearest to all of the optical rays Structure from Motion 14

Triangulation (cont). • The rays originate at cj in a direction • The nearest point to p on this ray, which is denoted as qj, minimizes the distance. which has a minimum at Hence, Structure from Motion 15

Triangulation (cont). • Structure from Motion 16

Triangulation (cont). • Structure from Motion 17

Triangulation (cont). • The squared distance between p and qj is • The optimal value for p, which lies closest to all of the rays, can be computed as a regular least square problem by summing over all the rj 2 and finding the optimal value of p, Structure from Motion 18

Triangulation (cont). • If we use homogeneous coordinates p=(X, Y, Z, W), the resulting set of equation is homogeneous and is solved as singular value decomposition (SVD). • If we set W=1, we can use regular linear least square, but the resulting system may be singular or poorly coordinated (i. e. all of the viewing rays are parallel). Structure from Motion 19

Triangulation (cont). For this reason; it is generally preferable to parameterize 3 D points using homogeneous coordinates, especially if we know that there are likely to be points at generally varying distances from the cameras. Structure from Motion 20

7. 2 Two-Frame Structure from Motion • In 3 D reconstruction we have always assumed that either 3 D points position or the 3 D camera poses are known in advance. Structure from Motion 21

Two-Frame Structure from Motion (cont). Figure 7. 8: Epipolar geometry: The vectors t=c 1 – c 0, p – c 0 and p-c 1 are co-planar and the basic epipolar constraint expressed in terms of the pixel measurement x 0 and x 1 Structure from Motion 22

Two-Frame Structure from Motion (cont). • Figure shows a 3 D point p being viewed from two cameras whose relative position can be encoded by a rotation R and a translation t. • We can set the first camera at the origin c 0=0 and at a canonical orientation R 0=I Structure from Motion 23

Two-Frame Structure from Motion (cont). • The observed location of point p in the first image, is mapped into the second image by the transformation : the ray direction vectors. Structure from Motion 24

Two-Frame Structure from Motion (cont). • Taking the cross product of both the sides with t in order to annihilate it on the right hand side yields • Taking the dot product of both the sides with yields Structure from Motion 25

Two-Frame Structure from Motion (cont). • The right hand side is triple product with two identical entries • We therefore arrive at the basic epipolar constraint : essential matrix Structure from Motion 26

Two-Frame Structure from Motion (cont). • The essential matrix E maps a point in image 0 into a line in image 1 since • All such lines must pass through the second epipole e 1, which is therefore defined as the left singular vector of E with 0 singular value, or, equivalently the projection of the vector t into image 1. Structure from Motion 27

Two-Frame Structure from Motion (cont). • The transpose of these relationships gives us the epipolar line in the first image as and e 0 as the zero value right singular vector E. Structure from Motion 28

Two-Frame Structure from Motion (cont). Given the relationship How can we use it to recover the camera motion encoded in the essential matrix E? l l If we have n corresponding measurements {(xi 0, xi 1)}, we can form N homogeneous equations in the elements of E= {e 00…. . e 22} Structure from Motion 29

Two-Frame Structure from Motion (cont). : element-wise multiplication and summation of matrix elements zi and f: the vector forms of the and E matrices. Given N>8 such equation, we can compute an estimate for the entire E using a Singular Value Decomposition (SVD). Structure from Motion 30

Two-Frame Structure from Motion (cont). • In the presence of noisy measurement, how close is this estimate to being statistically optimal? • In the matrix, some entries are product of image measurement such as xi 0 yi 1 and others are direct image measurements (even identity). Structure from Motion 31

Two-Frame Structure from Motion (cont). • If the measurements have noise, the terms that are product of measurement have their noise amplified by the other element in the product, which lead to poor scaling. • In order to deal with this, a suggestion is that the point coordinate should be translated and scaled so that their centroid lies at the origin, variance is unity; i. e. Structure from Motion 32

Two-Frame Structure from Motion (cont). such that and n= number of points. Once the essential matrix has been computed from the transformed coordinates; the original essential matrix E can be recovered as Structure from Motion 33

Two-Frame Structure from Motion (cont). • When the essential matrix has been recovered, the direction of the translation vector t can be estimated. • The absolute distance between two cameras can never be recovered from pure image measurement alone. • Ground control points in Photogrammetry: knowledge about absolute camera, point positions or distances. Required to establish the final scale, position and orientation. Structure from Motion 34

Two-Frame Structure from Motion (cont). • To estimate direction observe that under the ideal noise-free conditions, the essential matrix E is singular, i. e. , • This singularity shows up as a singular value of 0 when an SVD of E is performed, Structure from Motion 35

Pure Translation Figure 7. 9: Pure translation camera motion results in visual motion where all the points move towards (or away from) a common focus of expansion (FOE). Structure from Motion 36

Pure Translation (cont). • Known rotation: The resulting essential matrix E is (in the noise-free case) skew symmetric and can estimate more directly by setting eij= -eji and eii = 0. Two-point parallax now suffices to estimate the FOE. Structure from Motion 37

Pure Translation (cont). • A more direct derivation of FOE estimates can be obtained by minimizing the triple product. which is equivalent to finding null space for the set of equations Structure from Motion 38

Pure Translation (cont). • In a situation where large number of points at infinity are available, (when the camera motion is small compared to distant objects, this suggests a strategy. • Pick a pair of points to estimate a rotation, hoping that both of the points lie at infinity (very far from camera). • Then compute FOE and check whether residual error is small and whether the motions towards or away from the epipoler (FOE) are all in the same direction. Structure from Motion 39

Pure Rotation • This results in a degenerate estimate of the essential matrix E and the translation direction. • If we consider that the rotation matrix is known, the estimates for the FOE will be degenerate, since and hence is degenerate. Structure from Motion 40

Self-calibration • Auto-calibration is developed for covering a projective reconstruction into a metric one, which is equivalent to recovering the unknown calibration matrix Kj associated with each image. • In the presence of additional information about scene, different methods can be applied. • If there are parallel lines in the scene, three or more vanishing points, which are the images of points at infinity, can be used to establish homography for the plane at infinity, from which focal length and rotation can be recovered. Structure from Motion 41

Self-calibration (cont). • In the absence of external information: consider all sets of camera matrices Pj = Kj[ Rj | tj ] projecting world coordinates pi=(Xi, Yi, Zi, Wi) into screen coordinates xij ~ Pjpi. • Consider transforming the 3 D scene {pi} through an arbitrary 4 4 projective transformation yielding a new model consisting of points Structure from Motion 42

Self-calibration (cont). • A technique that can recover the focal lengths (f 0, f 1) of both images from fundamental matrix F in a twoframe reconstruction. • Assume that camera has zero skew, a known aspect ratio, and known optical center. • Most cameras have square pixels and an optical center near middle of image and are likely to deviate from simple camera model due to radial distortion • Problem occurs when images have been cropped offcenter. Structure from Motion 43

Application: View Morphing • Application of basic two-frame structure from motion. • Also known as view interpolation. • Used to generate a smooth 3 D animation from one view of a 3 D scene to another. • To create such a transition: smoothly interpolate camera matrices, i. e. , camera position, orientation, focal lengths. More effect is obtained by easing in and easing out camera parameters. Structure from Motion 44

Application: View Morphing • Triangulate set of matched feature points in each image. 1. To generate in-between frames: establish full set of 3 D correspondences or 3 D models for each reference view. 2. As the 3 D points are re-projected into their intermediate views, pixels can be mapped from their original source images to their new views using affine or projective mapping. 3. The final image then composited using linear blend of the two reference images as with usual morphing. Structure from Motion 45

Factorization • When processing video sequences, we often get extended feature track from which it is possible to recover the structure and motion using a process called factorization. Structure from Motion 46

Factorization (cont. ) Figure 7. 10: 3 D reconstruction of a rotating pong ball using factorization (Tomasi and Kanade 1992) : (a) sample image with tracked features overlaid; (b)sub-sampled feature motion stream ; (c) two views of the reconstructed 3 D model. Structure from Motion 47

Factorization (cont. ) • A disadvantage is they require a complete set of tracks i. e. , each point must be visible in each frame, in order for the factorization approach to work. Structure from Motion 48

Perspective and Projective Factorization • Factorization disadvantage is that it cannot deal with perspective cameras. • Perform an initial affine (e. g. , orthographic) reconstruction and to then correct for the perspective effects in an iterative manner. Structure from Motion 49

Bundle Adjustment • The most accurate way to recover structure from motion is to perform robust nonlinear minimization of the measurement (re-projection) errors, which is known as photogrammetry (in computer vision) communities as bundle adjustment. • Our feature location measurement xij now depends only on the point (track index) i but also on the camera pose index j. • xij = f(pi, Rj, cj, Kj) • 3 D point positions pi are also updated simultaneously Structure from Motion 50

Exploiting Sparsity • Large bundle adjustment problems, such as those involving 3 D scenes from thousands of Internet photographs can require solving non-linear least square problems with millions of measurements • Structure from motion is bipartite problem in structure and motion. • Each feature point xij in a given image depends on one 3 D point position pi and 3 D camera pose (Rj, cj). Structure from Motion 51

Uncertainty and Ambiguity • Structure from motion involves the estimation of so many highly coupled parameters, often with no known “ground truth” components. • The estimates produces by structure from motion algorithm can often exhibit large amounts of uncertainty. • Example: bas-relief ambiguity, which makes it hard to simultaneously estimate 3 D depth of scene and the amount of camera motion. Structure from Motion 52

Reconstruction from Internet Photos • Widely used application of structure from motion: the reconstruction of 3 D objects and scenes from video sequences and collection of images. • Before structure from motion comparison can begin, it is first necessary to establish sparse correspondences between different pairs of images and to then link such correspondences into feature track, which associates individual 2 D image feature with global 3 D points. Structure from Motion 53

Reconstruction from Internet Photos (cont). • For the reconstruction process, it is important to select good pair of images and a significant amount of out-of-plane parallax to ensure that a stable reconstruction can be obtained. Structure from Motion 54

Reconstruction from Internet Photos (cont). Figure 7. 15: Incremental structure from motion: Starting with an initial two-frame reconstruction of Trevi Fountain, batches of images are added using pose estimation, and their positions (along with 3 D model) are refined using bundle adjustment Structure from Motion 55

Reconstruction from Internet Photos (cont). Figure 7. 16: 3 D reconstruction produced by the incremental structure from motion algorithm. (a) cameras and point cloud from Trafalgar Square; (b) cameras and points overlaid on an image from the Great Wall of China. ; (c) overhead view of reconstruction of Old Town Square in Prague registered to an aerial photograph. Structure from Motion 56

Constrained Structure and Motion • If the object of interest is rotating around a fixed but unknown axis, specialized techniques can be used to recover this motion. • In other situation, the camera itself may be moving in a fixed arc around some center of rotation. • Specialized capture steps, such as mobile stereo camera rings or moving vehicles equipped with multiple fixed cameras, can also take advantage of the knowledge that individual cameras are mostly fixed with respect to the capture rig. Structure from Motion 57

Constrained Structure and Motion (cont). Line-based technique: • Pairwise epipolar geometry cannot be recovered from line matches alone, even if the cameras are calibrated. • Consider projecting the set of lines in each image into a set of 3 D planes in space. You can move the two cameras around into any configuration and still obtain a valid reconstruction for 3 D lines. Structure from Motion 58

Constrained Structure and Motion (cont). • When lines are visible in three or more views, the trifocal tensor can be used to transfer lines from one pair of image to another. • The trifocal tensor can also be computed on the basis line matches alone. • For triples of images, the trifocal tensor is used to verify that the lines are in geometric correspondence before evaluating the correlations between line segments. Structure from Motion 59

Constrained Structure and Motion (cont). Figure 7. 18: Two images of toy house along with their matched 3 D line segments. Structure from Motion 60

End Structure from Motion 61