a5b3f11820987660f1b684852de81e58.ppt
- Количество слайдов: 28
Information Visualization, Nonlinear Dimensionality Reduction and Sampling for Large and Complex Data Sets Misha Pesenson, Isaac Pesenson*, Bruce Mc. Collum California Institute of Technology, *Temple University 1/7/2010 215 th AAS Meeting, Washington DC
Acknowledgment u. We would like to thank Dr. Mike Egan for his support This work was carried out at the SSC, Caltech and supported by u The National Geospatial-Intelligence Agency, Grant # HM 1582 -08 -1 -0019 1/7/2010 215 th AAS Meeting, Washington DC
Motivation Ø Ø Ø 1/7/2010 The Data Big Bang The Expanding Digital Universe Inflationary Epoch 215 th AAS Meeting, Washington DC
Motivation (cont. ) ØData is now produced faster than it can be meaningfully analyzed ØModern data are complex - dozens or hundreds of useful parameters associated with each astronomical object • LSST: The ten-year survey will result in tens of petabytes of image and catalog data and will require ~250 TFlops of processing to reduce. • A discussion related to LSST can be found in: The Spectrum of LSST Data Analysis Challenges: Kiloscale to Petascale, 2010, by T. Loredo, G. Babu, K. Borne, E. Feigelson, A. Gray, 215 th AAS 1/7/2010 215 th AAS Meeting, Washington DC
Motivation (cont. ) Ø To capitalize on the opportunities provided by these data sets one needs to be able to organize, analyze and visualize them Ø Traditional methods are often inadequate not merely because of the size in bytes of the data sets, but also because of the complexity of modern data sets Ø To be successful, these approaches must extend beyond traditional scientific analysis and information visualization 1/7/2010 215 th AAS Meeting, Washington DC
Motivation (cont. ) Ø Moreover, to detect the expected and discover the unexpected in massive data sets requires a synergistic approach that utilizes recent advances in: ² ² ² ² 1/7/2010 Statistics Applied mathematics Computer science Artificial intelligence Machine learning Knowledge representation Cognitive and perceptual sciences Decision sciences, and more 215 th AAS Meeting, Washington DC
Motivation (cont. ) Ø Valuable results pertaining to these problems are mostly to be found only in the publications outside of astronomy Ø There is a big gap between applied mathematics, artificial intelligence and computer science on the one side and astronomy on the other 1/7/2010 215 th AAS Meeting, Washington DC
Goals of This Presentation Ø To attract attention of the astronomical community to the aforementioned gap Ø To help bridge this gap by briefly reviewing the some of the advanced methods Ø “To increase the general awareness and avoidance of unprincipled data analysis methods” (Xiao Li Meng, 2009, Desired and Feared—What Do We Do Now and Over the Next 50 Years? , American Statistician, v. 63, 3, 202 -210). 1/7/2010 215 th AAS Meeting, Washington DC
Complex Data: Spectral Imaging 224 spectral channels 1/7/2010 215 th AAS Meeting, Washington DC
Astronomical Data Types and Approaches to their Representation and Processing Data Types Some Astronomical Applications Traditional Approaches to Data Advanced Approaches to Data Vector Data 1. Multiwavelength observations. 1. Linear dimension 2. Multitemporal observations. reduction: PCA and 3. VO its modifications. 4. Spectra. 1. Spectral methods, eigenmaps, diffusion maps, LLE, ISOMAP. 2. Sampling on graphs. 3. Methods based on nonlinear dynamics. 4. Neural networks. 5. Genetic algorithms. 6. Scientific visualization. 7. Compressed sensing. Manifold –Valued and/or Manifold -Defined 1. Polarization measurements (CMB). 2. Gravitational lensing. 3. Solar astrophysics. 1. Healpix (2 D sphere). 2. Needlets. 3. Sampling on manifolds 4. Scientific vizualization. 1/7/2010 1. Various sampling distributions on a sphere. 215 th AAS Meeting, Washington DC
Scientific Visualization vs. Illustrative Visualization Ø Scientific Visualization (SV) does not simply reproduce visible things, but makes the things visible Ø SV enables extraction of meaningful patterns from multiparametric data sets 1/7/2010 215 th AAS Meeting, Washington DC
The Curse of Dimensionality and Dimension Reduction (DR) Ø Extraction and Visualization of meaningful structures from multiparametric, high-dimensional data sets require an accurate low-dimensional representation of data Ø DR is motivated by the fact that the more we are able to reduce the dimensionality of a data set, the more regularities (correlations) we have found in it and therefore, the more we have learned from the data • Pesenson M. , Pesenson I. , Mc. Collum B. , 2010, “The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch”, Advances in Astronomy, special issue on Robotic Astronomy (accepted) 1/7/2010 215 th AAS Meeting, Washington DC
Dimension Reduction (cont. ) Ø Greatly increases computational efficiency of machine learning algorithms Ø Improves statistical inference Ø Enables effective scientific visualization and classification 1/7/2010 215 th AAS Meeting, Washington DC
Dimension Reduction: “Linear” Data, PCA If the data are mainly confined to an almost linear low-dimensional subspace, then simple linear methods such as principal component analysis (PCA) can be used to discover the subspace and estimate its dimensionality 1/7/2010 215 th AAS Meeting, Washington DC
Limitations of Linear Methods Ø Linear methods such as PCA have a serious drawback in that they do not explicitly consider the structure of the manifold on which the data may possibly reside Ø PCA is intrinsically linear, so if data points form a nonlinear manifold, then obviously, there is no rotation & shift of the axis (this is what a linear transform like PCA provides) that can “unfold” such a manifold as the on the next slide: 1/7/2010 215 th AAS Meeting, Washington DC
Data Laying on Manifolds Formally applying geometrically linear methods would produce a complete misrepresentation of the data 1/7/2010 215 th AAS Meeting, Washington DC
Data Laying on Manifolds + Noise (Balasubramanian, Schwartz 2002 ) The practical usage of dimension reduction demands: Ø Representation of measurement errors in high-dimensional instrument calibration • Connors A. , van Dyk D. , Freeman P. , Kashyap V. , Siemiginowska A. , et al. 2008 Ø Careful improvement of signal-to-noise ratio without smearing essential features • Pesenson M. , Roby W. , Mc. Collum, 2008 1/7/2010 215 th AAS Meeting, Washington DC
Handling Geometrically Nonlinear Data Ø The modern approach to multidimensional images or data sets is to approximate them by graphs or Riemannian manifolds Ø Next, after constructing a weighted graph, one can introduce the corresponding combinatorial Laplace operator • Belkin M. , Niyogi P. , 2005; Coifman R. , Lafon S. , 2006 • Application to astronomy: Richards J. , Freeman P. , Lee A. , & Schafer C. , 2009 1/7/2010 215 th AAS Meeting, Washington DC
Nonlinear Dimension Reduction as an Approach to Nonlinear Data Ø The eigenfunctions and eigenvalues of the Laplacian form a basis, thus allowing one to develop a harmonic or Fourier analysis on graphs Ø This set of basis functions captures patterns intrinsic to a particular state space Ø Finds a lower-dimensional representation of high-dimensional data without losing a significant amount of information 1/7/2010 215 th AAS Meeting, Washington DC
Nonlinear Dimension Reduction and Harmonic Analysis on Manifolds and Graphs Ø We have devised innovative algorithms for nonlinear data dimension reduction and data compression: Ø Ø enable one to overcome PCA’s limitations for handling nonlinear data manifolds allow one to deal effectively with: 1) missing observations 2) partial sky coverage 3) non-regular sampling For details: • Pesenson I. , 2009, J. of Geometric Analysis, 19 (2), 390; • Pesenson I. , Pesenson M. , 2010, J. of Math. Analysis and Applications, accepted; • Pesenson I. , Pesenson M. , 2010, J. of Fourier Analysis and Applications, accepted • Pesenson M. , Pesenson I. , Mc. Collum B. , 2010, Advances in Astronomy, accepted 1/7/2010 215 th AAS Meeting, Washington DC
Visualization - Multispectral From a set of images obtained at multiple wavebands, effective dimension reduction provides a comprehensible, information-rich single image with minimal information loss and statistical details, unlike a simple coadding with arbitrary, empirical weights 1/7/2010 215 th AAS Meeting, Washington DC
Manifold-Valued Data and Data Laying on Manifolds Ø Application: ² Cosmic Microwave Background (CMB) • Gorski K. , et al. 2005 ² Solar Astrophysics Ø A powerful approach to the problem is based on Needlets second generation spherical wavelets • Geller D. , & Marinucci D. , 2008 1/7/2010 215 th AAS Meeting, Washington DC
Manifold-Valued Data and Data Laying on Manifolds (cont. ) Ø Important properties of needlets that are not shared by other spherical wavelet constructions: ² do not rely on any kind of tangent plane approximation; ² have good localization properties in both pixel and harmonic space; ² Needlet coefficients are asymptotically uncorrelated at any fixed angular distance (which makes their use in statistical procedures very promising) • Pesenson, I. , 2006, Integral Geometry and Tomography, Contemporary Mathematics, 405, 135 -148, American Mathematical Society; • Geller D. , Pesenson I. , 2010, Tight Frames and Besov Spaces on Compact Homogeneous Manifolds, J. of Geometric Analysis (accepted) 1/7/2010 215 th AAS Meeting, Washington DC
Unsupervised Manifold Learning and Information Visualization Ø Manifold Learning and Visualization based on Nonlinear Dynamics Ø One needs to distinguish between geometrically nonlinear data and nonlinear methods of analysis 1/7/2010 215 th AAS Meeting, Washington DC
Unsupervised Manifold Learning – A Nonlinear Approach Ø Approximating a multidimensional image or a data set by a graph and associating a nonlinear dynamical system with each node enables us to unify the three seemingly unrelated tasks: Ø Ø Ø 1/7/2010 image segmentation, unsupervised learning data visualization 215 th AAS Meeting, Washington DC
Testing the Algorithm: a Simulated 3 D set of a 103 uniformly distributed random points with a double-diamond pattern ØLeft and middle: two screen shots from a running animation – each point in the set oscillates (in this case in 3 dimensions) with its own, random frequency ØRight: synchronization made the points that are connected with high-weight edges oscillate in-phase thus allowing to reveal the pattern visually or by automatically selecting in-phase oscillating points and highlighting the pattern in red • • Pesenson M. , Pesenson I. , Mc. Collum B. , 2010, Advances in Astronomy, (accepted). Pesenson M. , Pesenson I. 2010, Image Segmentation, Unsupervised Manifold Learning and Information Visualization: A Unified Approach Based on Nonlinear Dynamics (submitted). 1/7/2010 215 th AAS Meeting, Washington DC
Conclusions Ø Many important challenges have been identified by various authors and presentations Ø Different groups have already been working on some of them the problems: Ø The Center for Astrostatistics at PSU (E. Feigelson, G. Babu) Ø BIPS at Cornell (T. Loredo) Ø In. CA at CMU (C. Schafer et al. ) Ø SAMSI-Sa. Fe. De Collaboration (V. Kashyap et al. ) Ø Caltech (M. Pesenson et al. ) Ø Caltech (G. Djorgovski et al. ) Ø Astro. Neural collaboration (G. Longo et al. ) Ø Georgia Tech (A. Gray et al. ) Ø GMU (K. Borne et al. ) Ø IIC at Harvard (A. Goodman et al. ) 1/7/2010 215 th AAS Meeting, Washington DC
Conclusions (cont. ) ØThe concepts and approaches described in this presentation also contribute to the actual steps in creating needed novel approaches and algorithms ØAll the described efforts when combined together will enable effective automated analysis and processing of giant, complex data sets such as LSST 1/7/2010 215 th AAS Meeting, Washington DC


