ca69f60675b3ebb0a1ac33e1dbced844.ppt
- Количество слайдов: 43
Brazilian Academy of Sciences, Annual Meeting, May 2012 Databases and Global Environmental Change: Information Technology for Sustainable Development Gilberto Câmara INPE, Instituto Nacional de Pesquisas Espaciais
The fundamental question of our time source: IGBP How is the Earth’s environment changing, and what are the consequences for human civilization?
Global Change Where are changes taking place? How much change is happening? Who is being impacted by the change?
Uncertainty on basic equations Limits for Models Social and Economic Systems Quantum Gravity Particle Physics Living Systems Chemical Hydrological Reactions Models Solar System Dynamics Global Change Meteorology Complexity of the phenomenon source: John Barrow (after David Ruelle)
Limits for Models Uncertainty on basic equations e-science Social and Economic Systems Quantum Gravity Particle Physics Living Systems Chemical Hydrological Reactions Models Solar System Dynamics Global Change Meteorology Complexity of the phenomenon source: John Barrow (after David Ruelle)
Collaborative e-science Connect expertise from different fields Make the different conceptions explicit Territory (Geography) Money (Economy) Modelling (IT) Culture (Antropology)
Deforestation in Amazonia Até 10% 10 - 20% 20 – 30% 30 – 40% 40 – 50% 50 – 60% 60 – 70% 70 – 80% 80 – 90% 90 – 100% Amazonia (4. 000 km 2 = size of Europe)
Data (we need a lot of it) Deforestation in Brazilian Amazonia (1988 -2011) dropped from 27, 000 km 2 to 6, 200 km 2
Real-time Deforestation Monitoring Daily warnings of newly deforested large areas
How much it takes to survey Amazonia? 116 -112 30 Tb of data 500. 000 lines of code 150 man/years of software dev 200 man/years of interpreters 116 -113 166 -112
Terra. Amazon – open source software for large-scale land change monitoring 116 -112 116 -113 Spatial database (Postgre. SQL with vectors and images) 166 -112 2004 -2008: 5 million polygons, 500 GB images
Welcome to the Age of Data-intensive Science! Permanent Vantage Points Capabilities Far. Space L 1/HEO/GEO TDRSS & Commercial Satellites LEO/MEO Commercial Satellites and Manned Spacecraft Near. Space Aircraft/Balloon Event Tracking and Campaigns Deployable Airborne Terrestrial Forecasts & Predictions User Community
Weather and climate source: WMO 11, 000 land stations (3000 automated) 900 radiosondes, 3000 aircraft 6000 ships, 1300 buoys 5 polar, 6 geostationary satellites
ARGOS Data Collection System (16000 plats) 650, 000 messages processed daily
Argo bouy network
Data chain in Earth System Science fonte: NASA
Data-intensive Science = principles and applications of information technology for handling very large data sets
Conjectures IT concepts are essential to global change researchers (but most of them don’t know it) Global change challenges will motivate new research in IT (but most of us are not looking there)
Challenges for data-intensive science Which data is out there? How to organize big data? How to get the data I need? How to model big data? How to access and use big data?
Stage 1 – A scientist’s personal database User interface Database creation Database access Local database Analysis
Stage 1 – A scientist’s personal database User interface The good: data is close to you (or so you think) The bad: no long-term data preservation Database creation Database access Analysis no data sharing Local database
Stage 2 – A scientific lab database User interface Database access Database creation Corporate database Analysis
Stage 2 – A scientific lab database User interface The good: long-term data preservation data sharing inside the Analysis lab Database access reusable corporate software The bad: substantial costs on data admin Corporate Database creation database little outside data sharing
Metview – MOPTC June 2004 - 24 ECMWF
Field plotting Metview – MOPTC June 2004 - 25 ECMWF
Stage 3 – A scientific lab database in the cloud User interface Database access Database creation Corporate database Analysis
Stage 3 – A scientific lab database in the cloud User interface The good: long-term data preservation shared costs on data admin Database access Analysis The bad: rewrite software for cloud processing outside. Corporatesharing still not solved data Database creation database
Risk Analysis
On-line data feed DCP Rain total Fixed time and irregular – alert Point data One file per DCP Satellite/Radar Grid 4 km Total rain 1 h Total rain 24 h Current (mm/h) Binary file Models ETA 40, 20, 5 Km Ensemble 40 Km Total rain 72 h 72 files ASCII grid file
Terra. MA 2 - Natural Disasters Monitoring and Alert System
Stage 4 – Multidatabase access Modelling Data discovery Data source Remote Analysis Data access Data source Remote Analysis
Stage 4 – Multidatabase access Modelling Data discovery Analysis The good: long-term. Data access data preservation shared costs on data admin access to large external database The bad: rewrite software for cloud processing Data source finding data is a major problem source Remote Analysis
Data Access Hitting a Wall Current science practice based on data download How do you download a petabyte?
Data Access Hitting a Wall Current science practice based on data download How do you download a petabyte? You don’t! Move the software to the archive
Scientific Data Management in the Coming Decade (Jim Gray, 2005) Next-generation science instruments and simulations will produce peta-scale datasets. Such peta-scale datasets will be housed by science centers that provide substantial storage and processing for scientists who access the data via smart notebooks. The procedural stream-of-bytes -file-centric approach to data analysis is both too cumbersome and too serial for such large datasets. Database systems will be judged by their support of common metadata standards and by their ability to manage and access peta-scale datasets.
Virtual Observatory If data is online, internet is the world’s best telescope Scientific Data Management in the Coming Decade (Jim Gray) 36
Where is scientific database going?
From tables to arrays nome CPF cargo relation (table) selection, projection, join, relational algebra SELECT * FROM images WHERE date=“today” SQL language Spatial queries, Math operations Scientific data SELECT Mean (A. B) FROM Array Algebra AQL language
Communicating concepts is hard vulnerability? climate change? poverty? Image source: WMO
Communicating concepts is hard We’re bad at representing meaning deforestation? degradation? disturbance? degradation
Communicating change is very hard When did the Aral Sea reach the tipping point?
Describing events and processes is very hard When did the flood occur?
Conclusions Earth System Science data management poses a major challenge for the database community We need new techniques, architectures and data handling techniques to deal with scientific data
ca69f60675b3ebb0a1ac33e1dbced844.ppt