Скачать презентацию Geographical analysis Overlay cluster analysis autocorrelation trends models Скачать презентацию Geographical analysis Overlay cluster analysis autocorrelation trends models

41e4474d36c085121dd938541d9ae56b.ppt

  • Количество слайдов: 70

Geographical analysis Overlay, cluster analysis, autocorrelation, trends, models, network analysis, spatial data mining Geographical analysis Overlay, cluster analysis, autocorrelation, trends, models, network analysis, spatial data mining

How geographic analysis started John Snow’s cholera map of 1854 How geographic analysis started John Snow’s cholera map of 1854

Geographical analysis • Combination of different geographic data sets or themes by overlay or Geographical analysis • Combination of different geographic data sets or themes by overlay or statistics, in particular suitability analysis • Discovery of patterns, dependencies • Discovery of trends, changes (time) • Development of models • Interpolation, extrapolation, prediction • Spatial decision support, planning • Consequence analysis (What if? )

Example overlay • Two subdivisions with labeled regions soil Soil vegetation type 1 2 Example overlay • Two subdivisions with labeled regions soil Soil vegetation type 1 2 3 4 Birch forest Beech forest Mixed forest Birch forest on soil type 2

Kinds of overlay • Two subdivisions with the same boundaries - nominal and nominal Kinds of overlay • Two subdivisions with the same boundaries - nominal and nominal Religion and voting per municipality - nominal and ratio Voting and income per municipality - ratio and ratio Average income and age of employees • Two subdivisions with different boundaries Soil type and vegetation • Subdivision and elevation model Vegetation and precipitation

Kinds of overlay, cont’d • Subdivision and point set quarters in city, occurrences of Kinds of overlay, cont’d • Subdivision and point set quarters in city, occurrences of violence on the street • Two elevation models elevation and precipitation • Elevation model and point set elevation and epicenters of earthquakes • Two point sets money machines, street robbery locations • Network and subdivision, other network, elevation model

Result of overlay • New subdivision or map layer, e. g. for further processing Result of overlay • New subdivision or map layer, e. g. for further processing • Table with combined data • Count, surface area Soil Type …. Vegetation 1 2 3 4 Area Beech Birch Mixed Beech …. 30 15 8 2 ha ha #patches 2 2 1 1

Buffer and overlay • Neighborhood analysis: data of a theme within a given distance Buffer and overlay • Neighborhood analysis: data of a theme within a given distance (buffer) of objects of another theme Sightings of nesting locations of the great blue heron (point set) Rivers; buffer with width 500 m of a river Overlay Nesting locations great blue heron near river

Overlay: ways of combination • Combination (join) of attributes • One layer as selection Overlay: ways of combination • Combination (join) of attributes • One layer as selection for the other – Vegetation types only for soil type 2 – Land use within 1 km of a river

Overlay in raster • Pixel-wise operation, if the rasters have the same coordinate (reference) Overlay in raster • Pixel-wise operation, if the rasters have the same coordinate (reference) system Pixel-wise AND Forest Population increase above 2% per year Both

Overlay in vector • E. g. the plane sweep algorithm as given in Computational Overlay in vector • E. g. the plane sweep algorithm as given in Computational Geometry (line segment intersection), to get the overlay in a topological structure • Using R-trees as an indexing structure to find intersections of boundaries

Combined (multi-way) overlays • Site planning, new housing sites depending on multiple criteria – Combined (multi-way) overlays • Site planning, new housing sites depending on multiple criteria – – Proximity infrastructure Proximity facilities (hospitals, schools) Not in nature areas … • Another example (earth sciences): Parametric land classification: partitioning of the land based on chosen, classified themes

Elevation Annual precipitation Elevation Annual precipitation

Types of rock Overlay: partitioning based on the three themes Types of rock Overlay: partitioning based on the three themes

Suitability analysis • Selection of location of new housing, a new store, a new Suitability analysis • Selection of location of new housing, a new store, a new factory, etc. • Typically done by overlay using multiple map layers

Analysis point set • Points in an attribute space: statistics, e. g. regression, principal Analysis point set • Points in an attribute space: statistics, e. g. regression, principal component analysis, dendrograms (area, population, #crimes) (12, 34. 000, (14, 45. 000, (15, 41. 000, (17, 63. 000, (17, 66. 000, …… …… #crimes #population 34) 31) 14) 82) 79)

Analysis point set • Points in geographical space without associated value: clusters, patterns, regularity, Analysis point set • Points in geographical space without associated value: clusters, patterns, regularity, spread Actual average nearest neighbor distance versus expected Av. NN. Dist. for this number of points in the region For example: volcanoes in a region; crimes in a city

Analysis point set • Points in geographical space with value: up to what distance Analysis point set • Points in geographical space with value: up to what distance are measured values “similar” (or correlated)? 11 10 12 13 12 19 14 16 18 21 21 20 22 17 15 16

Analysis point set • Temperature at location x and 5 km away from x Analysis point set • Temperature at location x and 5 km away from x is expected to be nearly the same • Elevation (in Switzerland) at location x and 5 km away from x is not expected to be related (even over 1 km), but it is expected to be nearly the same 100 meters away • Other examples: – depth to groundwater – soil humidity – nitrate concentration in the soil

Analysis point set • Points in geographical space with value: auto-correlation (~ up to Analysis point set • Points in geographical space with value: auto-correlation (~ up to what distance are measured values “similar”, or correlated) 11 10 12 13 12 19 14 16 18 21 21 20 22 17 15 16 n points (n choose 2) pairs; each pair has a distance and a difference in value

difference 2 Average difference 2 observed expected 2 difference distance Classify distances and determine difference 2 Average difference 2 observed expected 2 difference distance Classify distances and determine average per class distance

Observed variogram Model variogram (linear) Average 2 difference observed expected 2 difference sill nugget Observed variogram Model variogram (linear) Average 2 difference observed expected 2 difference sill nugget distance range distance Smaller distances more correlation, smaller variance

Importance auto-correlation • Descriptive statistic of a data set: describes the distance-dependency of auto-correlation Importance auto-correlation • Descriptive statistic of a data set: describes the distance-dependency of auto-correlation • Interpolation based on data further away than the range is nonsense 11 10 12 16 20 13 14 ? ? 21 16 19 18 21 12 22 15 17 range

Importance auto-correlation • If the range of a geographic variable is small, more sample Importance auto-correlation • If the range of a geographic variable is small, more sample point measurements are needed to obtain a good representation of the geographic variable through spatial interpolation influences cost of an analysis or decision procedure, and quality of the outcome of the analysis

Analysis subdivision • Nominal subdivision: auto-correlation (~ clustering of equivalent classes) • Ratio subdivision: Analysis subdivision • Nominal subdivision: auto-correlation (~ clustering of equivalent classes) • Ratio subdivision: auto-correlation Pvd. A CDA VVD Auto-correlation No auto-correlation

Auto-correlation nominal subdivision Join count statistic: Pvd. A CDA VVD • 22 neighbor relations Auto-correlation nominal subdivision Join count statistic: Pvd. A CDA VVD • 22 neighbor relations (adjacencies) among 12 provinces • Pr(prov. A =VVD and prov. B =VVD) = 4/12 * 3/11 • E(VVD adj. VVD) = 22 * 12/132 = 2 • Reality: 4 times • E(CDA adj. Pvd. A) = 5. 33; reality once 4/12 * 4/11 * 22

Geographical models • Properties of (geographical) models: – – – selective (simplification, more ideal) Geographical models • Properties of (geographical) models: – – – selective (simplification, more ideal) approximative analogous (resembles reality) structured (usable, analyzable, transformable) suggestive re-usable (usable in related situations)

Geographical models • Functions of models: – – – psychological (for understanding, visualization) organizational Geographical models • Functions of models: – – – psychological (for understanding, visualization) organizational (framework for definitions) explanatory constructive (beginning of theories, laws) communicative (transfer scientific ideas) predictive

Example: forest fire • Is the Kröller-Müller museum well enough protected against (forest)fire? • Example: forest fire • Is the Kröller-Müller museum well enough protected against (forest)fire? • Data: proximity fire dept. , burning properties of land cover, wind, origin of fire • Model for: fire spread Time neighbor pixel on fire: [1. 41 *] b * ws * (1 - sh) * (0. 2 + cos ) b = burn factor ws = wind speed = angle wind – direction pixel sh = soil humidity

Forest fire Wind, speed 3 Forest; burn factor 0. 8 Heath; burn factor 0. Forest fire Wind, speed 3 Forest; burn factor 0. 8 Heath; burn factor 0. 6 Road; burn factor 0. 2 Museum Soil humidity Origin < 3 minutes < 6 minutes < 9 minutes > 9 minutes

Forest fire model • Selective: only surface cover, humidity and wind; no temperature, seasonal Forest fire model • Selective: only surface cover, humidity and wind; no temperature, seasonal differences, … • Approximative: surface cover in 4 classes; no distinction in forest type, etc. , pixel based so direction discretized • Structured: pixels, simple for definition relations between pixels • Re-usable: approach/model also applies to other locations (and other spread processes)

Forest fire model • The forest fire model is deterministic (every run gives the Forest fire model • The forest fire model is deterministic (every run gives the same outcome) stochastic models exist too and can account for randomness • There are static and dynamic models; dynamic models involve change over time • Error analysis: assume an error distribution of a parameter and run Monte Carlo simulation (burn factor values) • Sensitivity analysis: examine how changing a parameter influences the outcome (wind in forest fire example)

More models • • • Population growth Landslides, avalanches Crime change over time Road More models • • • Population growth Landslides, avalanches Crime change over time Road accidents …

Network analysis • When distance or travel time on a network (graph) is considered Network analysis • When distance or travel time on a network (graph) is considered • Dijkstra’s shortest path algorithm • Reachability measure for a destination: potential value wj = weight origin j = distance decay parameter c ij = distance cost between origin j and destination i Think of i as a potential shop location and j as the population (potential customers)

Example reachability • Law Ambulance Transport: every location must be reachable within 15 minutes Example reachability • Law Ambulance Transport: every location must be reachable within 15 minutes (from origin of ambulance)

Example reachability • Physician’s practice: - optimal practice size: 2350 (minimum: 800) - minimize Example reachability • Physician’s practice: - optimal practice size: 2350 (minimum: 800) - minimize distance to practice - improve current situation with as few changes as possible

Current situation: 16 practices, 30. 000 people, average 1875 per practice Computed, improved situation: Current situation: 16 practices, 30. 000 people, average 1875 per practice Computed, improved situation: 13 practices

Example in table Original New 16 13 Number of practice locations 9 7 Number Example in table Original New 16 13 Number of practice locations 9 7 Number of practices < 800 size 2 0 3957 4624 Average travel distance (km) 0, 9 1, 2 Largest distance (km) 5, 2 5, 4 Number of practices Number of people > 3 km

Analysis elevation model • Landscape shape recognition: - peaks and pits - valleys and Analysis elevation model • Landscape shape recognition: - peaks and pits - valleys and ridges - convexity, concavity • Water flow, erosion, watershed regions, landslides, avalanches

Spatial data mining • Finding spatial patterns in large spatial data sets – within Spatial data mining • Finding spatial patterns in large spatial data sets – within one spatial data set – across two or more data sets • With time: spatio-temporal data mining • Main operations: – clustering – co-location patterns – spatial association rule mining (spatio-temporal association rule mining)

Clustering • Partition-based clustering: produce clusters – k-means clustering – DBSCAN –. . . Clustering • Partition-based clustering: produce clusters – k-means clustering – DBSCAN –. . . • Hierarchical clustering: produce a hierarchy – agglomerative (root-down) – divisive (bottom-up)

k-Means clustering • Assume k (number of clusters) is known • Start with k k-Means clustering • Assume k (number of clusters) is known • Start with k points as seed set S • Repeat – Assign every point to the nearest seed in S to make k clusters – For every cluster, compute the center of gravity to form a new seed set Until convergence Running time: no. of iterations times O(nk), or times O(n log k) (using VD and point location)

DBSCAN clustering • Popular method by Ester, Kriegel, Sander, and Wu (1996) • Assume DBSCAN clustering • Popular method by Ester, Kriegel, Sander, and Wu (1996) • Assume two parameters eps and min. Pts are given • A point q is core if there are min. Pts within distance eps • A point p is core-close to a point q if q is core and p is a point within eps of q • A point p is density-reachable from a point q if p=p 0, p 1, . . . , pm=q exist and pi is core-close to pi+1 • Two points p, p’ are density-connected if a point q exists from which p and p’ are density-reachable

DBSCAN clustering • DBSCAN clustering is the clustering of all densityconnected points into a DBSCAN clustering • DBSCAN clustering is the clustering of all densityconnected points into a cluster • Clustering of core points is unique, other points are not necessarily uniquely clustered density-reachable from two clusters of core points not core min. Pts = 4 eps outlier core cluster of core points

DBSCAN clustering • If min. Pts is constant, then DBSCAN can be implemented to DBSCAN clustering • If min. Pts is constant, then DBSCAN can be implemented to run in O(n log n) time using higher-order Voronoi diagrams: – a min. Pts-order Voronoi diagram gives for every point the min. Pts closest points – the distance to the furthest of these tells if a point is core or not – make a graph where every core point has a directed edge to the min. Pts nearest points – find a cluster by DFS from any core point, until all core points are in clusters – then assign non-core points to clusters, if possible

Divisive hierarchical clustering • Start with n clusters of single points • While #clusters Divisive hierarchical clustering • Start with n clusters of single points • While #clusters > k: merge the two nearest clusters (that have shortest minimum distance) • Can be implemented in O(n log n) time using Voronoi diagrams • Maximizes the distance between any two points in different clusters • Also called single-link clustering

Clustering • Largest cluster is of interest • Entities not involved in clusters may Clustering • Largest cluster is of interest • Entities not involved in clusters may be interesting (outliers) • Number of occurring clusters is of interest • No established way to know how many clusters to use • Setting the parameters is important

Co-location • Whenever there are two data sets, object view • Presence of objects Co-location • Whenever there are two data sets, object view • Presence of objects in one data set almost implies the presence of the other (need not be symmetric relation) Egyptian Plover bird and the Nile crocodile

Co-location • Degree of co-location of the two types may be interesting • Entities Co-location • Degree of co-location of the two types may be interesting • Entities not involved in co-location may be interesting • Asymmetry of co-location may be interesting

Spatial association rules • Association rules with a spatial aspect • Market basket analysis: Spatial association rules • Association rules with a spatial aspect • Market basket analysis: If a shopping basket contains also contains , then it • Quality of rule: – Support: number of transactions with – Confidence: fraction of transactions with that also have &

Support and confidence count ratio of counts Support and confidence count ratio of counts

Spatial Association Rules • Some examples with proximity: – If a house is close Spatial Association Rules • Some examples with proximity: – If a house is close to the sea, then it is expensive – If a hotel is near touristy sites, then it is frequented by tourists – If a lake is close to dump sites, then its water is polluted

Towards spatial support and spatial confidence • Need appropriate definitions – Option 1: define Towards spatial support and spatial confidence • Need appropriate definitions – Option 1: define “close” with a threshold of distance – Option 2: convert distance to a [0: 1] -score (degree of closeness) and use fuzzy association rule ideas

Towards spatial support and spatial confidence Towards spatial support and spatial confidence

Towards spatial support and spatial confidence not close close threshold for distance Towards spatial support and spatial confidence not close close threshold for distance

Towards spatial support and spatial confidence score 0. 1 score 0. 4 score 0. Towards spatial support and spatial confidence score 0. 1 score 0. 4 score 0. 6 score 0. 8 score 1 score for closeness

Advantages of thresholding • Simpler • Can use standard support and confidence measures Approach Advantages of thresholding • Simpler • Can use standard support and confidence measures Approach taken by Koperski and Han (1995), Gidofalvi and Pedersen (2005) not close

Advantages of distance conversion • More versatile • Correct “guess” of the thresholds not Advantages of distance conversion • More versatile • Correct “guess” of the thresholds not so critical as in the one-threshold case Approach taken by Chawla & Verhein (2006, 2008) not close somewhat close

Spatial association rules • The antecedent and/or the consequent are spatial (often involving spatial Spatial association rules • The antecedent and/or the consequent are spatial (often involving spatial proximity) • Transaction ≈ occurrence of object from antecedent – houses [close to sea] – hotels [close to touristy sites] – lakes [close to dump sites]

Spatial support and spatial confidence • Spatial support: sum over the objects of the Spatial support and spatial confidence • Spatial support: sum over the objects of the degree for which the rule is true for that object • Spatial confidence: spatial support divided by total sum of antecedent scores score 0. 1 score 0. 4 score 0. 6 score 0. 8 score 1 Example: “house is close to sea expensive” Assume all houses are expensive except for the left one Spatial support = 2. 3 Spatial confidence = 2. 3 / 2. 9

Spatial antecedents and spatial consequents • “If a village is close to a highway Spatial antecedents and spatial consequents • “If a village is close to a highway intersection, then it is close to a motel” motel village motel Antecedent score 1 Consequent score 0. 5 Antecedent score 0. 5 Consequent score 1 Antecedent score 0. 5 Consequent score 0. 5 Spatial support = 0. 25

Spatial antecedents and spatial consequents • “If a village is close to a highway Spatial antecedents and spatial consequents • “If a village is close to a highway intersection, then it is close to a motel” motel village motel Antecedent score 1 Consequent score 0. 5 Antecedent score 0. 5 Consequent score 1 Antecedent score 0. 5 Consequent score 0. 5 Spatial support = 0. 5 Spatial confidence = 1 Spatial support = 0. 25 Spatial confidence = 0. 5

Spatial support and spatial confidence: definition • Rule: A C on e. g. houses Spatial support and spatial confidence: definition • Rule: A C on e. g. houses from a set H • Spatial support: score. A(h) score. C(h) h H • Spatial confidence: score. A(h) score. C(h) h H score. A(h) h H All scores are in [0: 1]

Spatio-temporal data • Locations have a time stamp • Interesting patterns involve space and Spatio-temporal data • Locations have a time stamp • Interesting patterns involve space and time • Examples – earthquakes have an epicenter and a time of occurrence – trees have a location and a day of first blooming – traffic jams have a location and a start time

Summary • There are many types of geographical analysis, it is the main task Summary • There are many types of geographical analysis, it is the main task of a GIS • Overlay analysis is the most important type • Auto-correlation, modeling, network analysis are also important • Spatial and spatio-temporal data mining gives new types of analysis of geographic data