8d382ca06f66bbe1eaa289c15e46aacf.ppt

- Количество слайдов: 92

Spatial Data Analysis: Course Outline Gilberto Câmara National Institute for Space Research, Brazil Fall School 2005

INPE - brief description n National Institute for Space Research ¨ main civilian organization for space activities in Brazil ¨ staff of 1, 800 ( 800 Ms. C. and Ph. D. ) n Areas: ¨ Space Science, Earth Observation, Meteorology and Space Engineering

CBERS-2 Launch (21 October 2003)

CBERS-2 CCD, Minas Gerais, Brazil

n CBERS-2 image from Louisiana, EUA n Obtained from on-board data recorder

Amazon Deforestation 2003 Deforestation 2002/2003 Deforestation until 2002 Fonte: INPE PRODES Digital, 2004. Fonte:

Amazônia in 2005 source: Greenpeace

Amazônia in 2015? fonte: Aguiar et al. , 2004

R&D in GIScience at INPE n n Graduate programs in Computer Science and Remote Sensing Research areas ¨ Spatial statistics ¨ Spatial dynamical modelling ¨ Spatio-temporal databases ¨ Image databases and image processing n Technology ¨ Terra. Lib – open source library for ST DBMS

Course outline n Motivation: why do need spatial data analysis? n Point pattern analysis n Areal data analysis n Surface data analysis (geostatistics) n Trends in spatial data analysis

Course outline: 1 st week n Monday – Introduction ¨ n Tuesday – Basic concepts ¨ n 10: 30 – 12: 00 (2) Friday – Areal analysis III (LAB work) ¨ n 9: 00 – 10: 30 and 11: 00 – 12: 30 (4) Thursday – Areal analysis II ¨ n 10: 30 – 12: 00 (2) Wednesday – Areal analysis I (LAB work) ¨ n 10: 30 – 12: 00 (2) 9: 00 – 10: 30 and 11: 00 – 12: 30 (4) Saturday – QUIZ ¨ 14: 00 – 17: 00 (LAB)

Course outline: 2 nd week n Monday – Surface analysis I ¨ 10: 30 n Tuesday – Surface analysis (LAB) ¨ 10: 30 n – 10: 30 and 11: 00 – 12: 30 (4) Thursday – Point pattern analysis (LAB work) ¨ 9: 00 n – 12: 00 (2) Wednesday – Surface analysis II (LAB) ¨ 9: 00 n – 12: 00 (2) – 10: 30 and 11: 00 – 12: 30 (4) Friday – Trends in spatial data analysis + quiz ¨ 9: 00 – 10: 30 (2) and 11: 00 – 12: 30 (quiz)

Course material n Course homepage www. dpi. inpe. br/gilberto/tutorials. html ¨ Bailey and Gattrel, “Spatial Data Analysis by example” ¨ n Software ¨ R – statistical suite (open source) n www. r-project. org Geo. Da – analysis of areal data (gratis) ¨ Terra. View – visualisation and analysis (open source) ¨ n ¨ www. terralib. org Terra. Lib – GIS library (open source)

Spatial Data Analysis: An Introduction Gilberto Câmara National Institute for Space Research, Brazil Fall School 2005

Why does GIS matter? n Why do we do care about spatial data? n Because. . . n Space is an essential component of everyday life! n How do you choose a house? n How does a disease propagate? n How do I start a new business?

And the answer is. . . n Location, location! n Space matters. . . n My new house will be in a nice neighbourhood. . . n He caught malaria when he was in Amazonia. . . n His new business will be a shopping center in a recently developed area. .

What is Spatial Data? n Spatial == “catch-all” word n General feature ¨ Refers to a geographical location ¨ Either “in situ” or indirectly (remote sensing, place names, adresses) ¨ In many cases, in “someone else’s backyard” n And. . . there’s LOTS of it, it’s everywhere and it’s about almost everything!!

LBA Flux Towers on Amazonia Source: Carlos Nobre (INPE

Source: Carlos Nobre (INPE Biodiversity. . .

CBERS Image

Fire Monitoring in Brazil Landsat/CBERS Reception NOAA Reception Imagem TM NOAA Image Cartographic Base Products Internet CPTEC Weather Forecast Decision Making

We know how much you spend. . . Source: Stan Openshaw

…where you spend it. . . Source: Stan Openshaw

…who you talk to. . . Source: Stan Openshaw

…where you live. . . LS 2 9 JT What your neighbours are like… Source: Stan Openshaw

. . . your neighbors. . . Census tracts and Houses for data collection

. . . your misbehaviour and. . . n n n crime type crime location insurance data Source: Stan Openshaw

. . . your health n n n environmental data socio-economic data admissions data Source: Stan Openshaw

Geo. Sensors: New technology of earth observations Smart Dust (UC Berkeley) “Spec” mote UC Berkeley Intel mote MICA mote

The Road Ahead: Geosensors n n n Advances in remote sensing are giving computer networks the eyes and ears they need to observe their physical surroundings. Sensors detect physical changes in pressure, temperature, light, sound, or chemical concentrations and then send a signal to a computer that does something in response. Scientists expect that billions of these devices will someday form rich sensory networks linked to digital backbones that put the environment itself online. (Rand Corporation, “The Future of Remote Sensing”)

n n n A new international organization tasked with implementation a Global Earth Observation System of Systems (GEOSS). GEOSS shall coordinate a wide range of spacebased, air-based, land-based, and ocean-based environmental monitoring platforms, resources and networks – presently often operating independently. Membership in GEO currently includes 51 countries plus the European Commission, and 29 participating international organisations.

Coordinating Earth Observing Systems Permanent Vantage Points Capabilities Far. Space L 1/HEO/GEO TDRSS & Commercial Satellites LEO/MEO Commercial Satellites and Manned Spacecraft Near. Space Aircraft/Balloon Event Tracking and Campaigns Deployable Airborne Terrestrial Forecasts & Predictions User Community

Remote Sensing: Increased EO capability

What do we do with so much spatial data? n First, we collect it. . . ¨ GPS, remote sensing, field surveys ¨ Data conversion n Then, we organize it. . . ¨ Spatial modelling ¨ Spatial databases ¨ Spatial visualization n But more important is to analyse and understand it!

Vision: from data to knowledge fonte: NASA

Space Objects Actions Material world Events “Space is a system of entities and a system of actions” Milton Santos

Spatial Data Natural Domain Human Domain INFRASTRUCTURE IMAGES -planes -satellites ENVIRONMENTAL DATA -topography -soils -temperature -hidrography -geology -roads -utilities -dams CADASTRAL DATA CENSUS DATA -parcels -streets -land use -Demographics -Economics

FROM DATA TO COMPUTER REPRESENTATION X, Y, Z EVENTS / POINT SAMPLES SURFACES / REGULAR GRIDS AREA DATA / POLIGONS FLUX DATA / NETWORKS X, Y, Z

Remote Sensing LANDSAT 5 TM image of São Paulo, 1997

Aerial Photos Favela da maré, Rio de Janeiro - 2001

Choropletic Maps São Paulo - 96 districts per capita income São Paulo – 270 survey areas per capita income

Trend Surfaces iex Social Exclusion 1995 Social Exclusion 2002

FLUXES

The Five Orders of Ignorance n 0 th Order Ignorance (0 OI): Lack of Ignorance ¨ I (provably) know something n 1 st Order Ignorance (1 OI): Lack of Knowledge ¨ I do not know something n 2 nd Order Ignorance (2 OI): Lack of Awareness ¨ I do not know that I do not know something n 3 rd Order Ignorance (3 OI): Lack of Process ¨ I do not know a suitably effective way to find out that I don’t know something n 4 th Order Ignorance (4 OI): Meta-Ignorance ¨ I do not know about the Five Orders of Ignorance The five orders of ignorance, Phillip G. Armour, CACM, 43(10), Oct 2000

The First Law of Geography n Tobler’s Law ¨ Everything is related to everything else, but near things are more related than distant things ¨ We call this “spatial dependence” n Can we see Tobler’s law in action? n Yes, there are lots of exemples. . . Here are some. .

Lung Cancer for White Males in USA

Log homicide rate for males of ages 15 -50 per 100. 000 residents of the same sex and age, 1990 -92 Source: Marilia Carvalho (FIOCRUZ/Brasil)

Homicides in Belo Horizonte (1998) Source: Renato Assunção (UFMG/Brasil)

Crimes in Belo Horizonte Aggression Burglaries Source: Renato Assunção (UFMG/Brasil)

80% of homicidies in Belo Horizonte occured in these regions Source: Renato Assunção (UFMG/Brasil)

Spatial Inequalities in São Paulo Per capita income Source: Fred Ramos (CEDEST/Brasil) Jobs/ populations Illiterate / population

Social Exclusion in São Paulo n “Hot spots” of social exclusion/inclusion in São Paulo

We also try to describe nature n Serra dos Carajás n World’s largest iron ore deposit n We also try to find Zn and Cu PARÁ Serra dos Carajás 50 o. W 6 o. S

TIN GRADE RETANGULAR Where is the Zync? PONTOS AMOSTRADOS DE Zn AMOSTRAS KRIGEAGEM + AMOSTRAS GRADE RETANGULAR

Where is the Copper? 930. 3 1001 171 20. 7 Krig. Ordinária 80000 53956. 26 1001 171 Krig. por Indicação – Médias (2 ) Krig. por Indicação – Mediana 731. 84 1641. 29 0 0

Byrsonima subterranea (Malpighiaceae) Considered to be extinct in the state of São Paulo, Brazil One specimen was found. . . Where can we try to find others? Source: Marinez Siqueira (CRIA/Brasil)

After building a species occurence model Source: Marinez Siqueira (CRIA/Brasil) Found 4 additional specimens of B. subterranea

What can we deal with so much spatial data? n We need a body of scientific theories that Extract knowledge from spatial data ¨ Expresses what is “special about space” ¨ n Remember. . . Science is more than a body of knowledge; it is a way of thinking. [. . . ] The method of science. . . is far more important than the findings of science. (Carl Sagan) ¨ Science is made of “conjectures and refutations” ¨ n Therefore. . . ¨ Spatial data analysis is what makes GIS more than a collection of data

What is Spatial Data Analysis? n Formal quantitative study of phenomena that manifest themselves in space (Anselin) n Collection of tools for investigating ¨ ¨ n What we want is to. . ¨ n Spatial patterns of values and places Variations of spatial phenomena based on their association discover not only where, but also what defines and structures the places we live in Spatial data analysis ¨ Transforms “Tobler’s law” into quantifiable assessments

What is Spatial Data Analysis? n Analytical capacity ¨ Understand the spatial distribution of values (identify trends and clusters) ¨ Develop possible explanations (models) for the observed patterns ¨ Use the models to indicate what can happen in other ocasions n Remember ¨ Our primary aim is not to describe the data accurately ¨ It is more important to understand the spatial patterns and to explore relationships between variables

Infant Mortality in Minas Gerais State City rates in 1994. (756 cities) 600 Standardized mortality rate SMR vary from 0 to 600 ! Observe the funnel effect. 500 400 300 200 100 0 0 5 Log newborn children Source: Renato Assunção (UFMG/Brasil) 10

Can we believe everything we see? n Infant mortality rates in Minas Gerais ¨ 15 cities with 0 deaths and < 30 born alive. ¨ If one death is recorded, rates jump from 0 to values between 116 and 1048!!! ¨ The current extreme value is 608. 9 n Dealing with incomplete data ¨ Requires statistical hypothesis ¨ What we see is a blurred picture of reality. .

Can we believe everything we see? Infant mortality rates in Rio de Janeiro – Average is 24 per 1000 But note extreme values of 130 per 1000 Are these values real?

Statistics 101 n Statistics is about Systematically studying phenomena in which we are interested ¨ Quantifying variables in order to use mathematical techniques ¨ Summarizing these quantities in order to describe and make inferences ¨ Using these descriptions and inferences to make decisions or understand ¨ n Why do we need statistics to handle spatial data? Help us to make sense of incomplete and misleading data ¨ Allows inclusion of “external” knowledge (things we know about the data) ¨ Enables us to make informed assessments ¨

Random Variables n Things that change ¨ Environmental events or conditions ¨ Personal characteristics or attributes ¨ Behaviors n Anything that takes on different values in different situations (e. g. , in different places)

Distribution of Random Variables n Statistics deals with regularities and variability of events n Statistics measures the consistency of variables and the variability around this consistency n Expression of these regularities/variabilities Distribution function ¨ Mathematical expression of the likelihood that a random variable has a particular value ¨ n Properties of spatial distributions Mean value ¨ Variance ¨

Counts as a Random Variable n Counts ¨ Typical type of spatial survey data ¨ Number of children born, AIDS patients, crimes in a district n The statistical way ¨ Number of observed counts Oi in area i is a random variable ¨ This means that Oi has a probability distribution with a mean and a variance, etc. n Usual assumption: Oi ~ Poisson( i) ¨ i = expected number of counts in area

Poisson Probability Distribution =0, 3 =10 =1 =25 =60 is the expected arrival/occurence rate of a discrete random variable Source: Renato Assunção (UFMG/Brasil)

A general advice Look at your data!!

Ways to look at spatial data Spatial data as a map

Ways to look at spatial data Spatial data as a combined distribution

Ways to look at spatial data Each area has a unique probability distribution (the same model is assumed)

Ways to look at spatial data If each area has a unique probability distribution, what are we seeing?

Types of spatial data analysis n Lattice data ¨ Discrete variation over space, with observations associated with regular or irregular areal units n Surface data ¨ observations associated with a continuous variation over space, typically in function of distance) n Point patterns ¨ occurrences of events at locations in space

Point patterns Dr. Snow’s cholera cases in London

Point Patterns: violence data Santos, S. M. , 1999

Surface data – homicide risk in São Paulo – 1996

Areal data: social exclusion in São José dos Campos [-1. 00~-0. 75] [-0. 75~-0. 50] [-0. 50~-0. 25] [-0. 25~0. 00] [0. 00~0. 25] [0. 25~0. 50] [0. 50~0. 75] [0. 75~1. 00]

From Areas to Surfaces

Perceptions of space n Different representations Images ¨ Areas ¨ Surfaces ¨

Perceptions of space Space as an areal data set Space as a continuous surface

Space as Clusters

From color maps. . .

Mapas coloridos. . . to spatial patterns n “Clusters” de exclusão/inclusão social em São Paulo

Spatial Analysis Methodologies n Data-driven approach information is derived from the data without a prior notion of what theoretical framework should be. ¨ "data speak for themselves“ ¨ information on spatial pattern, spatial structure and spatial interaction without the constraints of a preconceived theoretical notion. ¨

Spatial Analysis Methodologies n Model-driven approach ¨ starts from a theoretical specification, which is subsequently confronted with the data. ¨ Generalized linear models ¨ Linear regression ¨ Problem: how does "space“ affect these models?

Model-Driven Approaches n Model of discrete spatial variation ¨ Each subregion is described by is a statistical distribution Zi ¨ e. g. , homicides numbers are Poisson ( , ). ¨ The main objective of the analysis is to estimate the joint distribution of random variables Z = {Z 1, …, Zn} n Model of continuous spatial variation ¨ All of the area is a continuous surface ¨ The main objective is to estimate the distribution Z(x), x A

Models of Discrete Spatial Variation Random variable in area i • n° of ill people • n° of newborn babies • per capita income Source: Renato Assunção (UFMG/Brasil)

Models of Continuous Spatial Variation Temperature, Water ph, soil acidity. . . Sampling stations in locations marked by Location to predict value: shown as Source: Renato Assunção (UFMG/Brasil)

Spatial Data Analysis Data Types Point Pattern Analysis Geostatistics (surface modelling) Areal Data Example Typical Problems Localized Events Disease Mapping Cluster tests Field Samples Mineral Deposit Surface Interpolation Features and Attributes Census Data Spatial Autocorr. Indicators

Thematic Maps Digital Terrain Models Multiple Representations of Space Networks Features Images