0d4a928b7d55f8f8e79b7c519d4dd16c.ppt
- Количество слайдов: 26
Online Science -The World-Wide Telescope Archetype Jim Gray Microsoft Research Collaborating with: Alex Szalay, Ani Thakar, … @ JHU Roy Williams, George Djorgovski, Julian Bunn @ Caltech Robert Brunner @ U. I. 1
Outline • The revolution in Computational Science • The Virtual Observatory Concept == World-Wide Telescope 2
Computational Science Third Science Branch is Evolving • In the beginning science was empirical. • Then theoretical branches evolved. • Now, we have computational branches. – Was primarily simulation – Growth areas: data analysis & visualization of peta-scale instrument data. • Help both simulation and instruments. • Are primitive today. 3
Computational Science • Traditional Empirical Science – Scientist gathers data by direct observation – Scientist analyzes data • Computational Science – Data captured by instruments Or data generated by simulator – Processed by software – Placed in a database / files – Scientist analyzes database / files 4
What Do Scientists Do With The Data? They Explore Parameter Space • There is LOTS of data – people cannot examine most of it. – Need computers to do analysis. • Manual or Automatic Exploration – Manual: person suggests hypothesis, computer checks hypothesis – Automatic: Computer suggests hypothesis person evaluates significance • Given an arbitrary parameter space: – – – – Data Clusters Points between Data Clusters Isolated Data Groups Holes in Data Clusters Isolated Points / clusters similar to “this one” Nichol et al. 5 2001 Slide courtesy of and adapted from Robert Brunner @ Cal. Tech.
Challenge to Data Miners: Rediscover Astronomy • Astronomy needs deep understanding of physics. • But, some was discovered as variable correlations then “explained” with physics. • Famous example: Hertzsprung-Russell Diagram star luminosity vs color (=temperature) • Challenge 1 (the student test): How much of astronomy can data mining discover? • Challenge 2 (the Turing test): Can data mining discover NEW correlations? 6
What’s needed? (not drawn to scale) Miners Scientists Data Mining Algorithms Plumbers Database To store data Execute Queries Question & Answer Visualization Tools 7
Some science is hitting a wall FTP and GREP are not adequate • • You can GREP 1 MB in a second You can GREP 1 GB in a minute You can GREP 1 TB in 2 days You can GREP 1 PB in 3 years. • • You can FTP 1 MB in 1 sec You can FTP 1 GB / min (= 1 $/GB) … 2 days and 1 K$ … 3 years and 1 M$ • Oh!, and 1 PB ~3, 000 disks • At some point you need indices to limit search parallel data search and analysis • This is where databases can help 8
The Digital Shoebox Personal • In the old days people took photos had them developed put them in a shoe box • Some people actually put them in picture albums. • But mostly, pictures are never seen again it is hard to find anything Science • In the old days scientists kept notebooks. • Now they keep ftp servers • Some put them in indexed databases • But mostly, data are never seen again and it is hard to find anything. How do we find data subsets in the shoebox? 9
Goal: Easy Data Publication & Access • Augment FTP with data query: Return intelligent data subsets • Make it easy to – Publish: Record structured data – Find: • Find data anywhere in the network • Get the subset you need – Explore datasets interactively • Realistic goal: – Make it as easy as publishing/reading web sites today. 10
Web Services: The Key? • Web SERVER: – Given a url + parameters – Returns a web page (often dynamic) Your h t program tp • Web SERVICE: – Given a url + XML document (soap msg) – Returns an XML document – Tools make this look like an RPC. • F(x, y, z) returns (u, v, w) – Distributed objects for the web. – + naming, discovery, security, . . • Internet-scale distributed computing b We e pag Your s o program ap Data In your address space Web Server t jec l ob m in x Web Service 11
Grid and Web Services Synergy • I believe the Grid will be many web services • IETF standards Provide – Naming – Authorization / Security / Privacy – Distributed Objects Discovery, Definition, Invocation, Object Model – Higher level services: workflow, transactions, DB, . . • Synergy: commercial Internet & Grid tools 12
Outline • The revolution in Computational Science • The Virtual Observatory Concept == World-Wide Telescope 13
Data Federations of Web Services • Massive datasets live near their owners: – – Near the instrument’s software pipeline Near the applications Near data knowledge and curation Super Computer centers become Super Data Centers • Each Archive publishes a web service – Schema: documents the data – Methods on objects (queries) • Scientists get “personalized” extracts • Federation: Uniform access to multiple Archives – A common global schema 14
Why Astronomy Data? IRAS 25 m • It has no commercial value –No privacy concerns –Can freely share results with others –Great for experimenting with algorithms 2 MASS 2 m • It is real and well documented –High-dimensional data (with confidence intervals) –Spatial data –Temporal data • Many different instruments from many different places and many different times • Federation is a goal • The questions are interesting IRAS 100 m WENSS 92 cm NVSS 20 cm –How did the universe form? • There is a lot of it (petabytes) DSS Optica 15 ROSAT ~ke. V GB 6 cm
Astronomy Data Growth • • • In the “old days” astronomers took photos. Now instruments are digital (100 s of GB/nite) Detectors are following Moore’s law. Data avalanche: double every 2 years all data more than 2 years old is public About 1 PB public now Total area of world’s 3 m+ telescopes (m 2) Courtesy of Alex Szalay Total number of CCD pixels (megapixel) Growth over 25 years is a factor of 30 in glass, a factor of 3000 in pixels. 16
Time and Spectral Dimensions The Multiwavelength Crab Nebulae Crab star 1053 AD X-ray, optical, infrared, and radio views of the Crab Nebula, which is now chaotically expanding after a supernova sighted in 1054 A. D. by Chinese Astronomers. Szalay’s variant of Metcalf’s Law: The utility of N different data sets is approxmately N 2/2 Each pair of comparisons gives additional information. The Federation value is superlinear in size. 17
The Age of Mega-Surveys • Large number of new surveys – multi-TB in size, 100 million objects or more – Data publication an integral part of the survey – Software bill a major cost in the survey • These mega-surveys are different – – top-down design large sky coverage sound statistical plans well controlled/documented data processing • Each survey has a publication plan MACHO 2 MASS DENIS SDSS PRIME DPOSS GSC-II COBE MAP NVSS FIRST GALEX ROSAT OGLE LSST. . . • Federating these archives Slide courtesy of Alex Szalay, 18 Virtual Observatory modified by Jim
Data Publishing and Access • But…. . • How do I get at that petabyte of public of the data? • Astronomers have culture of publishing. – FITS files and many tools. http: //fits. gsfc. nasa. gov/fits_home. html – Encouraged by NASA. – FTP what you need. • But, data “details” are hard to document. Astronomers want to do it, but it is VERY difficult. (What programs where used? What were the processing steps? How were errors treated? …) • And by the way, few astronomers have a spare petabyte of storage in their pocket (today). • THESIS: Challenging problems are publishing data providing good query & visualization tools 19
Virtual Observatory http: //www. astro. caltech. edu/nvoconf/ http: //www. voforum. org/ • Premise: Most data is (or could be online) • So, the Internet is the world’s best telescope: – – It has data on every part of the sky In every measured spectral band: optical, x-ray, radio. . As deep as the best instruments (2 years ago). It is up when you are up. The “seeing” is always great (no working at night, no clouds no moons no. . ). – It’s a smart telescope: links objects and data to literature on them. 20
Sky Server • Alex Szalay of Johns Hopkins buil. Sky. Server (based on Terra. Server design) http: //skyserver. sdss. org/ • Data access & Astronomy education • ~7 M web hits, usage growing 15%/month • Moving to V 4 DB & Schema (1. 5 TB DB + 5 TB image by 7/1/2003) • Recent CS efforts have been automated data pipeline (workflow engine) and – web services integration with VO – • Template widely used and cloned in the Astronomy and Computer Science communities • Prototype for publishing an Astronomy archive on web. 300 M Photo Objects ~ 400 attributes 1 M Spectra with ~30 lines/ spectrum 21
Virtual Observatory Status • Lots of meetings (too many) • VO table defined (a successor to FITS? ) – Tool suite emerging • Defining Astronomy Objects and Methods. • Federated 5 Web Services (fermilab/sdss, jhu/first, Cal Tech/dposs, Cambrige/nt) – http: //skyquery. net/ multi-survey cross. ID match and select Distributed query optimization – http: //Sky. Service. jhu. pha. edu/Sdss. Cutout Image access service (cutout + annotated) • WWT is a great Web Services (. Net) application – Federating heterogeneous data sources. – Cooperating organizations – An Information At Your Fingertips challenge. 22
Sky. Query Web Services http: //skyquery. net/ Basic Services • Metadata about resources – Waveband – Sky coverage – Translation of names to universal dictionary (UCD) • Simple search resources – Cone Search – Image mosaic – Unit conversions • Filtering, counting, histograms • On-the-fly recalibrations Higher Level Services • Built on Atomic Services • Perform more complex tasks • Examples – – – Automated resource discovery Cross-identifications Photometric redshifts Outlier detections Visualization facilities • Goal: – Build custom portals in days from existing building blocks (like today in IRAF or IDL) 23
Sky. Query Cross-id Steps • • • Parse query Get counts Sort by counts Make plan Cross-match http: //skyquery. net/ – Recursively, from small to large SELECT o. obj. Id, o. r, o. type, t. obj. Id FROM SDSS: Photo. Primary o, TWOMASS: Photo. Primary t WHERE XMATCH(o, t)<3. 5 AND AREA(181. 3, -0. 76, 6. 5) AND (o. i - t. m_j) > 2 AND o. type=3 • Select necessary attributes only • Return output • Insert cutout image 24
Summary • The revolution in Computational Science simulation & analysis • The Virtual Observatory Concept == World-Wide Telescope • I finally found a distributed database • I have found a distributed system and a distributed object system. 25
References NVO (Virtual Observatory) WWT (world wide telescope) • NVO Science Definition (an NSF report) http: //www. nvosdt. org/ • VO Forum website http: //www. voforum. org/ • World-Wide Telescope paper in Science V. 293 pp. 2037 -2038. 14 Sept 2001. (MS-TR-2001 -77 word or pdf. ) 26