Application of c-squares spatial indexing to an archive

Скачать презентацию Application of c-squares spatial indexing to an archive

bc1d309d85c94aa9512f1472db1db381.ppt

Количество слайдов: 21

Application of c-squares spatial indexing to an archive of remotely sensed data Tony Rees Divisional Data Centre CSIRO Marine Research, Australia (Tony. Rees@csiro. au)

The starting point … • Archive of c. 60, 000 satellite scenes, 1986 current, accumulating at another 15+ per day • Region covered varies for each scene • Often is requirement to retrieve scenes including a specific region or place • Low-res GIF images (like examples shown) are available for last 24 months only • Polygons can be calculated for each scene, but currently are not stored in any spatially searchable system.

Options for spatial indexing - 1 Bounding rectangles • Mostly simple to generate/ store/ query • Frequently a poor fit to the actual dataset extent • Potentially become massively over-representative as data approaches the pole • Do not cope well with oblique or curved boundaries to the dataset footprint (common in satellite images) • Poor discrimination for finescale queries (many “false positives” returned) • Can be problems with footprints which cross the date line, or include a pole bounding rectangle (follows lat, long grid lines) Scene footprint

Options for spatial indexing - 2 Bounding polygons • OK to generate/ store • Best fit to the actual dataset extent • Need additional points as data approaches the pole (regions of high curvature) • Computationally expensive to query (often use bounding rectangles as well, as pre-filter) • Tend to require a dedicated GIS system to handle the spatial queries. bounding polygon inflection points

Options for spatial indexing - 3 Raster (tiled) approach • Simple to generate/ store (as list of tiles intersected) • Reasonable fit to the actual dataset extent; improves with decreasing unit square size • Automatically increases resolution as data approaches the pole (regions of high curvature) • Simple/ rapid to query, no computation or dedicated GIS required • No ambiguity with footprints which extend across date line, or include a pole. boundary of all tiles for this scene * * **** * **** ** *****

C-squares basics • Based on a tiled (gridded) representation of the earth’s surface, at choice of 10 x 10, 5 x 5, 1 x 1, 0. 5 x 0. 5 degree tiles, etc. (0. 5 x 0. 5 degree tiles are used in this example) • Every tile has a unique, hierarchical alphanumeric ID (incorporates the ID of all parent, grandparent, etc. in every child tile ID) • Dataset (=scene) extents are represented by a list of all the tiles included in, or intersected by the footprint • Spatial search comprises looking for one or more tile IDs in the set associated with any dataset (= simple text search). (more details – see www. marine. csiro. au/csquares/ )

Spatial queries supported • Retrieve all scenes which include all or part of a given tile (at 10 x 10, 5 x 5, 1 x 1, or 0. 5 x 0. 5 degrees), or set of tiles – optionally, filter also by other criteria e. g. date, satellite, etc. • Retrieve all tiles associated with a given scene, with option to: – draw representation on a map (or range of maps) – export the list to a file or another database – compare the list with other lists (~ = polygon overlays) NB, choice of initial tile size is important: - too few tiles, only coarse spatial queries supported; - too many tiles, indexes get very large (and queries potentially slow) However, compression of blocks of contiguous tiles can be quite effective (1 code can replace 4, 25, 100, or 400 codes in certain cases).

A real-world example NOAA-12. . 21 Jun 2003 06: 23 5 x 5 degree squares (99) = 36 after comp. 1 x 1 degree squares (1, 982) = 515 after comp. 10 x 10 degree squares (28) (base level of hierarchy, cannot compress) 0. 5 x 0. 5 degree squares (7, 691) = 704 after comp. 0. 5 x 0. 5 degree squares detail

Practicalities for satellite data • A single scene may measure (say) 40 x 50 degrees approx. , = 2000 1 x 1 degree squares, or 8000 0. 5 x 0. 5 degree squares • Quadtree-like compression reduces this to (e. g. ) 500 codes at 1 x 1 degree resolution, 700 codes at 0. 5 x 0. 5 degree resolution • Still require quite a lot of codes (e. g. 42 million) to represent a collection of 60, 000 scenes • Each code is 10 characters long, scene IDs are (say) 6 characters long, thus c. 670 million bytes required for raw index data (before compiling secondary indexes, etc. ) • With secondary indexing, probably need around 2 Gb (+) to hold the spatial index.

Designing the index • Option 1: a table of scenes, with string of codes comprising each scene footprint, 1 row per scene – comment: fast to retrieve all codes for a scene, slow for a spatial query. List of codes may present storage problems (e. g. in database field/s) owing to length. • Option 2: a table of c-square codes, with list of scenes containing any code, 1 row per code (= an “inverted index”) – comment: fast for a spatial search, slow to retrieve all codes for a scene. List of scenes may present storage problems (e. g. in database field/s) owing to length. Harder to manage with respect to scene addition/deletion etc. (continues…)

Designing the index (cont’d) • Option 3: a table of scenes/c-square pairs, 1 row per pair, indexed on both columns – comment: OK to retrieve all codes for a scene, also for a spatial query. Row length is always constant, new rows simply created as required. However, table potentially becomes very long (e. g. 40 million rows), may become slow to query … • Option 4: as option (3), however single table replaced by multiple smaller tables – e. g. split by blocks of c-squares, blocks of scenes, or year, or satellite – comment: will tend to be “tuned” to favour queries which can be completed by accessing smallest number of tables. Could overcome this by adding (say) element of Option 1 in parallel (with added cost of storing the same information in more than one place).

Present philosophy … • Currently going with Option 3: a table of scenes/csquare pairs, 1 row per pair, indexed on both columns. • However, will consider splitting this along spatial or date lines if performance degrades excessively as the index grows (= Option 4); may then add in Option 1 in parallel also, for rapid footprint retrieval if needed.

Actual table structure – 2 “search” tables only Scene details (1 row per scene) Scene/c-square pairs (600 -700 rows per scene)

Spatial Search Interface (prototype): www. marine. csiro. au/csquares/satsearch. htm

Clickable map interface

Example search result

SQL for spatial search (example for 0. 5 degree search square) select distinct A. scene_id, B. satellite, B. scene_date_time, B. image_location from satdata_csq_all A, satdata. scene_info B where ( (sqrsize = 0. 5 and (A. csquare = search_csq -- e. g. 3414: 100: 1 (0. 5 degree square) or A. csquare = substr(search_csq, 1, 8)|| ': *' or A. csquare = substr(search_csq, 1, 6)|| '**: *' or A. csquare = substr(search_csq, 1, 4)||': ***: *') -- 1 level of compression -- 2 levels of compression -- 3 levels of compression ) -- (plus other supported search square size options go here) and and order by ) (startdate is null or B. scene_date_time >= startdate) (enddate is null or B. scene_date_time <= enddate) (sat = 'any' or B. satellite = sat) A. scene_id = B. scene_id B. scene_date_time, B. satellite;

This can be triggered by a HTTP request, e. g. as follows: http: //aqua. hba. marine. csiro. au: 7272/marq/owa/satdata_search. get_scenes? sat=any&startday=20&startmon=06&startyr=2003&endday=22&endmon= 06&endyr=2003&W_long=150&N_lat=-25&E_long=155&S_lat=-30 … Generates either – HTML result page to browser, or machine parseable scenes list, e. g. : h 1 awj h 1 awl h 1 awm h 1 awo h 1 awr h 1 awt h 1 aww h 1 awy h 1 ax 0 h 1 ax 2 h 1 ax 5 h 1 ax 7 h 1 ax 9 h 1 axb h 1 axd h 1 axe h 1 axh h 1 axj h 1 axl h 1 axn h 1 axo h 1 axr h 1 axt h 1 axw h 1 axx h 1 ay 0 h 1 ay 1 h 1 ay 4 h 1 ay 8 h 1 ay 9 h 1 ayb h 1 ayc h 1 ayd h 1 aye h 1 ayg h 1 ayh h 1 ayj h 1 ayl h 1 ayo h 1 ayp h 1 ayr h 1 ays h 1 ayu h 1 ayw h 1 az 1 h 1 az 3 h 1 az 5 h 1 az 8 h 1 aza h 1 azg h 1 azj h 1 azl h 1 azm h 1 azn h 1 azp h 1 azq h 1 azt h 1 azw h 1 azz h 1 b 00 h 1 b 03 h 1 b 04 h 1 b 06 h 1 b 09 h 1 b 0 a h 1 b 0 c h 1 b 0 d h 1 b 0 f h 1 b 0 i h 1 b 0 j h 1 b 0 n h 1 b 0 q h 1 b 0 t h 1 b 0 u h 1 b 0 w h 1 b 0 y h 1 b 10 h 1 b 12 h 1 b 13 h 1 b 15 h 1 b 18 h 1 b 1 b h 1 b 1 e h 1 b 1 h h 1 b 1 j h 1 b 1 m h 1 b 1 o h 1 b 1 r h 1 b 1 u h 1 b 1 w h 1 b 20 h 1 b 21 h 1 b 24 h 1 b 26 h 1 b 28 h 1 b 2 a h 1 b 2 b h 1 b 2 e h 1 b 2 g

Automated scene processing and upload 1. An automated routine prepares 2 sets of files: scene information, and scene polygons, in batches of 100 -300 scenes 2. For the first batch, a script initiates file upload to the database, then file conversion (c-square generation) for each scene in the batch (currently takes c. 3 minutes/scene) 3. When all scenes have been converted, the c-squares are added to the master searchable table, and the scenes are flagged as “upload completed” 4. Close to the estimated completion time, a robot wakes up and checks at 10 minute intervals to detect when the job is complete (via a http call to the same procedure which creates the web scene count on the opening screen). 5. When the scene count returns “ 0 scenes currently being uploaded”, the robot calls the script to start again from stage (2)-(5), with the next batch of scenes.

C-square encoding algorithm • “First generation” conversion code constructed January-May 2003, at CMR Hobart • Currently runs in Oracle PL/SQL, reads/writes from/to 6 Oracle tables (including compression of the c-square string from each scene) • Includes special treatment for: – scenes crossing date line – scenes including a pole • Polar X-Y coordinates used to interpolate between quoted boundary points (gives better representation for southernmost boundary) • Special “anti-bleed” treatment incorporated (polygons touching, but not entering a given square do not trigger that square) • Could probably benefit from re-writing the code in a different language for faster operation, and/or refining method of operation.

Next steps … • Keep a watch on index performance (i. e. , retrieval times) as database size increases • Assess whether this approach is attractive, cf. systems already in use elsewhere – and whether of interest to others • Look into improving the encoding procedure speed (currently a bottleneck for processing a large archive).