Скачать презентацию Database Design and Decisions GISC 6383 Management and Скачать презентацию Database Design and Decisions GISC 6383 Management and

991b204acc5ce7f5b2ff005757fe4233.ppt

  • Количество слайдов: 59

Database Design and Decisions GISC 6383 Management and Implementation of GIS Note: some figures Database Design and Decisions GISC 6383 Management and Implementation of GIS Note: some figures in this document are adapted from: State of New York GIS Development Guides ftp: //ftp. sara. nysed. gov/pub/gis/sara. zip chapter 3: conceptual design chapter 6: database planning and design chapter 7: database construction Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 1

Reality Object Oriented Analysis and Design (OOAD) Data Base Design Process: steps (Review from Reality Object Oriented Analysis and Design (OOAD) Data Base Design Process: steps (Review from earlier lecture. ) Conceptual 1. Conceptually Model USERS view ESRI Steps for Building Geodatabase 1. Model the user’s View of the data (Tomlinson Chapter 6, 7) 2. Logically Model the Database Schema in UML Diagram (using Visio) Define data entities and their relationships – Identify representation of entities and match to spatial data model (Tomlinson Chapter 8, & 9 thru p. 136) 3. Physically Model the Data Geodatabase Generation and Population 2. Define objects and relationships – Create physical database design for selected software (Oracle, Arc. GIS, etc) 4. Design Process for obtaining and converting Data from source Physical 3. Select geographic representation. [ 4. Match to geodatabase elements 5. Organize geodatabase structure (Zeiler, p. 18) Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 2

 • We covered Conceptual Modeling/Needs Assessment in the lecture on Implementation Steps • • We covered Conceptual Modeling/Needs Assessment in the lecture on Implementation Steps • Tonight we focus on Logical Design and Physical Design • These are the guts of database design/data modeling Loosely corresponds to Tomlinson’s Chapter 8: Create a data design and Chapter 9 thru p. 136 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 3

Objectives of Database Design: – Satisfies and supports organization’s objectives – Contains all data Objectives of Database Design: – Satisfies and supports organization’s objectives – Contains all data but no redundant data • Minimizes redundancy across the organization – Allows for different users to access same data • Consistent and flexible data retrieval and analysis – Accommodates different views of the same data • Based on user and application needs • Increases likelihood of users developing applications – Appropriately represents & organizes geographic features – Maintains the data so its currency is assured – Secures data by distinguishing applications (& users) which • create data (add records for new entities) • update data (maintain & modify existing data records) • read data (use but can’t modify in any way) • delete data (remove records from database) (CRUD) Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 4

Logical Data Modeling Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 5 Logical Data Modeling Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 5

Why build a model? • We build models of systems in order to break Why build a model? • We build models of systems in order to break them up into well-defined subsystems, because it helps us to overcome the difficulties in comprehending such systems in their entirety. • As the complexity of systems increase, so does the difficulty of comprehending them and the importance of good modeling techniques to assist us in managing them. • Good models with well-defined semantics are essential for communication among project teams and to assure architectural soundness. In other words, developing a model for an industrial-strength software system prior to its construction or renovation is as essential as having a blueprint for a large building. Luis X. B. Mourão http: //cplus. about. com Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 6

Logical Data Modeling – It is an entirely data-driven process – It encourages comprehensive Logical Data Modeling – It is an entirely data-driven process – It encourages comprehensive understanding of business information requirements – It enables effective communication among designers, developers, and users throughout the design process – It forms the basis for designing correct, consistent, sharable, and flexible databases using any database technology • Correct – an accurate and faithful representation of the way information is used in the business • Consistent – no contradictions in the way information objects are named, defined, related, & documented • Sharable – accessible by multiple applications and users to meet varying access requirements • Flexible – easily updateable when implementation or business changes Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 7

Logical Data Modeling in practice • purpose of data model is to ensure that Logical Data Modeling in practice • purpose of data model is to ensure that data has been identified in a completely rigorous and unambiguous fashion on which both user and GIS analyst agree • logical data models define entities (the unit about which we collect information—e. g. people, companies), the attributes of entities (the information collected—age, salary), and the relationships between entities (companies pay salaries to people) • developed through use of – entity-relationship diagrams which show relationships among all data throughout the organization – data dictionaries (structured lists) which document each entity, its attributes, and its relationships Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 8

Entity Relationship (E-R) Diagrams and Logical Data Modeling • Entity (noun) – objects or Entity Relationship (E-R) Diagrams and Logical Data Modeling • Entity (noun) – objects or things to be included in the database – person, place thing or concept about which you wish to record info. – for example: employee, company, citizen, street • Attribute (adjective) (but they are often nouns—e. g. a name!!) – characteristics or measurements to be recorded for the entities – fact or nondecomposable piece of information describing an entity – for example: age, dob, owner, street type • Relationships association between entities (verb) employee---works for----company company----has-------employees transformers ----are mounted on----poles land parcel---has----owner • Cardinality of Relationships (“adverb”) – one to one Country ----has--- Capital city (one capital city per country) – one to many Company (one) <----Work for---- ---has------> (many) Employees – many to many Parcels < ----have----> owner s (parcels >1 owner; owners>1 parcel) • Business Rules (attribute domains and validation rules) – requirements that attributes or relationships must meet – specifications which preserve the integrity of the logical data model by governing the values attributes may assume or the cardinality relationships may take on – For example: # of kids is an integer between 0 and 24; poles have 0 to 3 transformers Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9

Name Age Conventional E-R Diagram Employee Works in Department Sex Function Size Job Title Name Age Conventional E-R Diagram Employee Works in Department Sex Function Size Job Title Has Name Age Occupant_Name Dependents Unit_Number Relationshlip to employer GIS E-R Diagram Building_Name Height Owner_Name Building Located on Parcel Owner_Address Situs_Address Floor_area ID # Has Occupant_Name Occupant Unit_Number Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 10

Entity or Attribute? owner as example Perennial problem in modeling Parcel PIN Owner This? Entity or Attribute? owner as example Perennial problem in modeling Parcel PIN Owner This? Parcel PIN Owner Name Or this? Some rules for attributes • Primitives: no meaning in itself – e. g 50 • All values of same kind • Do you think of it as a number or text – e. g all integers: 4, 6 not 4, SIX >>>attribute • Describe entity not another attribute • Does it have attributes of its own – e. g. owner name describes owner >>>entity – Owner has name, billing address, of parcel, not the parcel • Does it have a relationship with other • Never a list: e. g. owner: John Smith, Bill Jones things >>>entity – “Owner receives tax bills” • Never repeated • Does it repeat elsewhere >>>entity e. g. owner 1, owner 2, owner 3 – Owner could own many parcels • Undecomposable or composite? • Is it important by itself >>>entity e. g. 117 West Plano Road – An owner IS important Exceptions are always made!!! If in doubt, make it an entity, Can change later in physical model e. g. in 911 when speed matters Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 11

…. . . data dictionaries (structured lists) document each entity, its attributes, and its …. . . data dictionaries (structured lists) document each entity, its attributes, and its relationships Entity Definition Entity Name: Building Definition: A unique structure with roof Unique identifier: Bldg_Id Attributes: Bldg_Id, Bldg_Name, Floor_Area, Height Relationships: located on parcel has occupant Entities which have spatial expression need additional conceptualization… …see following slides When using UML techniques to create Entity-Relationship diagrams, entities, attributes and relationships can all be documented within the diagram. Data dictionaries become part of the metadata. Recording measurement units (meters, feet) for attributes is critical and problemattic. Can include in: --name: length_in_feet (clumsy) --metadata (metadata gets lost, forgotten, or separated from data) Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 12

Spatial Data Modeling Spatial data differs in two key ways, and these must be Spatial Data Modeling Spatial data differs in two key ways, and these must be incorporated: – entities have a corresponding spatial expression. In ESRI terms, • Objects: entities without spatial expression (e. g. owner) • Features: entities with spatial expression (e. g. parcel) • An entity, when given spatial properties, becomes a GIS spatial data layer – relationships may have a spatial expression also Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 13

…in essence, you need to decide how to represent each entity spatially Note that: …in essence, you need to decide how to represent each entity spatially Note that: • • • Point or a point symbol Line or line types Areas or polygons Surfaces or surface drapes Raster format such as scanned paper documents • Images such as photos, satellite or clip art • or plain old Alpha or numeric • one entity could be visualized through another – footprint on a lot map • might be different at different scales – airport a point at one scale and polygon at another • might be different for two applications – street as line for routing – street as polygon for pavement management • attributes might be displayed graphically as annotation Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 14

…. . identifying entities’ spatial representation (examples) • Property associated – – – – …. . identifying entities’ spatial representation (examples) • Property associated – – – – Legal parcel Assessor parcel Parcel boundary plat map Parcel photograph Owner Address Land value • Street associated – – – – Street segment Intersection Traffic light Traffic analysis zone Bus route Bus stop – – – – polygon line string raster image alphanumeric – – – – line string line segment node point polygon route point pixels abcdef 123 110210. 67 in essence, you need to decide how to represent each entity spatially (And these will need to be supported by the GIS software you select) Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 15

Additionally, you will need to identify the classic spatial data properties for each layer Additionally, you will need to identify the classic spatial data properties for each layer • Scale ranges at which data is required to be displayed • minimum resolution required to support intended applications • minimum accuracy required to support intended applications • Projection(s) in which data will be – stored and – used (may not be the same) Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 16

…identifying spatial relationships, and how they will be implemented Spatial Relationships Spatial Relationship Descriptive …identifying spatial relationships, and how they will be implemented Spatial Relationships Spatial Relationship Descriptive Verbs Common GIS E-R Model Implementation Symbol Connectivity Connect, link Topology Contiguity Adjacent, abutt Topology Containment Contained, containing, within Spatial join operation Proximity Closest, nearest Spatial join operation Coincidence Coincident, Coterminous Spatial join. operation street segments link to street network cities common border lot within floodplain house nearest fire_hydrant Valve and gauge same manhole Possible ER symbol (not standardized) How GIS system might implement the relationship Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 17

 CENSUS BLOCK ZONING CONTAINS POLYGON G T POLYGON SOILS CONTAINS G POLYGON G CENSUS BLOCK ZONING CONTAINS POLYGON G T POLYGON SOILS CONTAINS G POLYGON G T POLYGON G G T COINCIDENT LINE T STREET SYSTEM T FLOODPLAIN T CENSUS TRACT CONTAINS POLYGON G TRAFFIC ZONE CONTAINS PARCEL POLYGON G ADJACENT T POLYGON G INTERSECTION ABUTTING STREET SEGMENT POLYGON STREET T SEGMENT LINE G LINK T G T INTERSECT NODE G T WETLANDS POLYGON G T CONTAINS WITHIN VALVE Contains: --15 entities --16 relations Attributes not included for simplicity NODE BUILDING WATERMAIN POLYGON G LINE G G T NEAREST ADDRESS LINK T HYDRANT NODE HAS OCCUPANT G T LINK HAS WATER SERVICE CONNECTION LINE G T Corrected from original source Example E-R Diagram with Spatial Concepts for Urban Application Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 18

E-R Diagram Final E-R diagram should be verified with users for: • required entities E-R Diagram Final E-R diagram should be verified with users for: • required entities • appropriate spatial representation of entities • required attributes • appropriate spatial relations/operations Once verified, the E-R diagram becomes the basis for the physical data base design. Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 19

Creating E-R diagrams: Unified Modeling Language • E-R diagrams can be drawn by hand Creating E-R diagrams: Unified Modeling Language • E-R diagrams can be drawn by hand or with any drawing package • CASE (computer aided software engineering) tools emerged in the 1980 s and early 90 s) to aid in the process (e. g. TI’s Composer) • Each used their own proprietary language and symbolization • UML (Unified Modeling Language) developed in mid/late 1990 s to provide standardized modeling language based on objectoriented concepts. – Initiated at Rational Software Corporation in 1994/95 by merging of Grady Booch’s (Booch Model), Jim Rumbaugh’s (OMT--Object Modeling Technique) and Ivar Jacobson’s (OOSE--Object-Oriented Software Engineering) method • Existence of UML standard allows data base vendors to support automated conversion of conceptual data models to physical data base designs • Arc. GISs support use of MS Visio 2000/02/03 (enterprise edition [2000], professional [02/03] for full support) [see Zeiler, p. 19 -20] – Sample geodatabase schemas (templates) available for different industries • Other UML based products: Rational Rose, Paradigm Plus, Oracle 2000, ERwin (from Computer Associates) Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 20

Physical Data Modeling • Creating physical database design for the selected database software – Physical Data Modeling • Creating physical database design for the selected database software – Oracle, SQL Server, Arc. GIS, Intergraph, etc • We will focus on physical database design for an ESRI Geodatabase Loosely corresponds to Tomlinson’s Chapter 8: Create a Data Design Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 21

…physical data base design involves For entities and attributes • • Representing (“mapping”)* all …physical data base design involves For entities and attributes • • Representing (“mapping”)* all entities and their attributes into one or more relational tables in a selected RDBMS and determining keys forming relationships For each spatial entity, selecting an appropriate – GIS data type e. g. polygon, line, point, surface – GIS data set format for storage: e. g. geodatabase, coverage, grid, tin, image, etc – Spatial Reference System • coordinate system (geographic or projected, etc. ) • Spatial domain • Precision (or resolution) • Selecting an appropriate Data Type for each field (attribute) – e. g. for ESRI: string, short integer, long integer, float, double, blob. For relationships (associations) • for regular (non-spatial) relationships, identifying which of the RDBMS’s normal query structures or relational operators will handle the relationship – if won’t do it, develop specs for a custom application – See Appendix for discussion of relational operators • for spatial relations, identify which capabilities of software will handle the desired operation (e. g. nearest neighbor identification) – if won’t do it, develop specs for a custom application *Note: Computer people talk about “mapping” entities to tables rather than “representing”. Mapping in this sense does not mean producing cartographic output!

Conceptual E/R diagram PARCEL BUILDING CONTAINS POLYGON Physical Database design LAYER (Primary key=foreign key) Conceptual E/R diagram PARCEL BUILDING CONTAINS POLYGON Physical Database design LAYER (Primary key=foreign key) PARCEL TABLE PARCEL ID # (one) table join BUILDING LAYER TABLE BUILDING ID # (primary key) PARCEL ID # (foreign key) (many) Example of “mapping” a relationship for RDBMS attribute tables --the relation contains is mapped to a table join --the building table has a foreign or secondary key, (parcel ID#) to permit a join with the primary key (also parcel ID#) in parcel table The spatial data structure is shown as two generic layers Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 23

spatial expression represented as an Arc. Info polygon coverage Example of Spatial Database “Mapping” spatial expression represented as an Arc. Info polygon coverage Example of Spatial Database “Mapping” for Arc. Info coverage Parcel POLYGON G T ARC/INFO Spatial Database Structure (coverage) INFO ARC Parcel AAT TIC BND ETC. PAT=Polygon Attribute Table PAT Area Perimeter Poly ID # Sub_bl_lot# Attribute List of Entity "Parcel" Parcel [subdivision_block_lot#, owner_name, owner_address situs_address, area, depth, front_footage, assessed_value, last_sale_date, last_sale_price (owner_name, owner_address, assessed_value as of Jan. 1 for last two years)] Attributes represented in Oracle Tables Parcel ID# Owner_name Owner_add Situs_add Depth Front_footage Assessed_value Last_sale_date Last_sale_price Previous Values Parcel ID# Year Owner_name Owner_address Assessed_value What is wrong with this design? A key field for parcels is formed by concatenating subdivision, block & lot #s

VALVE WATER MAIN LINK Spatial: geodatabase Feature data set Feature class HYDRANT Example of VALVE WATER MAIN LINK Spatial: geodatabase Feature data set Feature class HYDRANT Example of Database Mapping: for geodatabase WATER MAIN ID # Attribute: SQL-Server TABLES WATER MAIN ID # VALVE ID # HYDRANT ID # Design decisions: • why distribmains and transmains as separate feature classes? • why not valve with gate and hydrant subclasses? (--different attributes) • why prodwell 1 and prodwell 2? HYDRANT ID #

ESRI-related DB Design Decisions Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 26 ESRI-related DB Design Decisions Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 26

ESRI DB Design Decision Overview • Type for spatial data layer – Vector (point, ESRI DB Design Decision Overview • Type for spatial data layer – Vector (point, line or polygon), raster, tin ? • Format for spatial data – Coverage, shapefile or geodatabase ? Geodatabase Design Decisions • Feature Datasets – Stand-alone feature classes or feature data sets (fds)? – Spatial Reference system for each feature data set • Coordinate system: lat/long or projected? which projection? what parameters? • Spatial Domain (extent) and Precision ? • Feature Classes (tables) – Subtypes or separate feature classes? • roads feature class with road_type subtype, or separate freeway, arterial, street feature classes? • Attributes Types and Validation – Type (string, long integer, short integer, etc. ) ? – Validation Rule through application of Attribute Domain ? • Domains and Defaults • Relationships and Associations between feature classes – Implement Geometric networks and/or topology rules? – Implement relates or joins in the database or in Arc. Map documents? • Type of Geodatabase – Personal geodatabase or SDE-based geodatabase ?

Spatial Data Types Overall, Arc. GIS 9 supports at least four representations of geographic Spatial Data Types Overall, Arc. GIS 9 supports at least four representations of geographic data. • Vector data for representing features – CAD, Coverages, shapefiles, geodatabases • Raster data for images, and surfaces. – Image data in . bmp, . tiff, . jpeg, . sid, ERDAS formats – Raster data in discrete or continuous GRIDS (ESRI’s native file format for raster) • Discrete grids can have an attribute data table, continuous do not – Raster data in a Geodatabase (as of Arc. GIS 9) • Triangulated Irregular Networks (TINS) for surfaces. – Although TINS are a vector format, as of Arc. GIS 9. 1, they are not yet supported by the Personal Geodatabase and must be stored in coverage workspaces or SDE geodatabase • Tabular data (sometimes called Event tables). – List of X, Y coordinates for points (such as may be output from GPS) – “Locators” for finding a geographic position from an address. A decision must be made as to the spatial data type for each layer. Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 28

Spatial File Formats--example Personal Geodatabase In a gdb, feature class can have Feature data Spatial File Formats--example Personal Geodatabase In a gdb, feature class can have Feature data set only one feature Feature class (feature type = polygon) type. Feature class (feature type = arc) Coverage (= feature class) A coverage can Feature type (arc) have multiple feature types. Feature type (point) now viewed as a Feature type (polygon) shortcoming. Feature type (point) Coverage (= feature class) Feature type (arc) Feature type (point) Tracts feature class table Locator (table) Raster Shapefile Features (rows) Feature ID Feature (key field) type Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation Secondary or Foreign key 29

Geodatabase Design Decisions—example Texas geodatabase Dallas County feature data set feature classes All feature Geodatabase Design Decisions—example Texas geodatabase Dallas County feature data set feature classes All feature classes extent within a feature data datum set must be in the projection same spatial reference system Plano feature data set feature class Stand-alone feature classes Each stand-alone feature class may be in a different spatial reference system US Geodatabase Stand-alone feature class Rasters and TINs can be stored within a SDE geodatabase but not a personal geodatabase Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 30

Anatomy of a Geodatabase Feature datasets Spatial Reference Object classes and subtypes Feature Classes Anatomy of a Geodatabase Feature datasets Spatial Reference Object classes and subtypes Feature Classes and subtypes Relationship classes Geometric Network Topology Domains Validation Rules Raster Datasets (SDE Only) rasters TIN datasets (SDE only ) nodes, edges, faces Locators addresses x, y locations Zip codes place names route locations Geodatabases contain: feature datasets, raster datasets, TIN datasets (planned 9. 2), locators Feature datasets contain various objects which all share a common spatial reference Objects (e. g. Jane Blow, land owner) are instances of object classes (e. g. land owners) and have no spatial form. Features, stored in feature classes, are spatial objects (e. g. land parcels) which are similar and have same spatial form (e. g. polygon) Object (or feature) classes are tables, with objects (or features) in the rows of the table Attributes are in the columns of the table Subtypes are an alternative to multiple object (or feature) classes (e. g. ‘concrete’, ‘asphalt’, ‘gravel’ road subtypes): think of subtype as the most significant classification variable (attribute) in the class table Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 31

Anatomy of a Geodatabase contd Geodatabase Feature datasets Spatial Reference Object classes and subtypes Anatomy of a Geodatabase contd Geodatabase Feature datasets Spatial Reference Object classes and subtypes Feature Classes and subtypes Relationship classes Geometric Network Topology Domains Validation Rules Raster Datasets (SDE only) rasters TIN datasets (SDE omly) nodes, edges, faces Locators addresses x, y locations Zip codes place names route locations Relationship classes are tables containing general relationships between objects and/or features (e. g. between work order object class and roads feature class) Geometric Networks models flows thru linear systems such as streams, sewers, raods Topology models relationships among lines and areas (e. g. common state/county boundary) Domains are sets of valid and/or default attribute values: (e. g. road lane count default is 2; valid values are integers 1 -12 ) Validation rules control feature and attribute integrity by applying domains. 3 types: attribute rules (applied to attributes or subtypes e. g gravel road 1 or 2 lanes only); connectivity rules (e. g. gravel road cannot connect to freeway); and relationship rules (constrains cardinality of a relationship e. g. gravel road can have no more than 4 segments at an intersection) Simple behaviors are realized thru domains and validation rules Complex behaviors and custom objects are realized by extending rules with custom programming 32 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Feature classes (FC), feature datasets (fds) and subtypes • Spatial features (e. g. a Feature classes (FC), feature datasets (fds) and subtypes • Spatial features (e. g. a land parcel) are grouped into feature classes: a table with spatial data – Data in FC must have same topology type (all points, all lines, all polygons) • Water feature class with lakes (polygon) and streams (line) not permitted – Minimizing the number of feature classes improves performance • Use different feature classes only when attributes are significantly different – Use roads feature class rather than freeway, arterial, streets feature classes – Use subtype to differentiate freeway, arterials, streets (all have similar attributes) • Subtypes are “subclasses” within a feature class that allow you to further distinguish objects without creating new feature classes – based on a single column’s values (must be integer or long integer) – Same subtype has similar attribute values and behaviors – Use where attributes are the same across all subtypes • Feature classes can be grouped into feature datasets (fds) or “spatial folders” – All feature classes in a fds must have the same spatial reference system, but may have different topology (can have points and lines and polygons in same fds) – Organize by thematic similarity e. g transportation – If you wish to create a geometric network, must be in same fds – If you wish to create topology, must be in same fds – If they share geometry (street forms political boundary), should be in same fds – Security (read/write permissions, etc. . ) applied at the fds not the fc level!!!! Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 33

Data Types for Attributes • For every attribute field, must select a data type Data Types for Attributes • For every attribute field, must select a data type • Each RDBMS stores data slightly differently • ESRI generic data types will translate into closest RDBMS equivalent • Values given below may differ with RDBMS used ESRI Generic Data Types String: text field. Be sure its length (number of characters), absolute or what you specify, is sufficient to record longest data value. Short Integer: (or integer) whole numbers (no decimal point) generally +/-32, 767 (2 bytes). OK for size of family, not OK for city size Long Integer: (or long) only supports integers to +/- 2, 147, 483, 647 (4 bytes) Float: (or single) single precision floating point; again, be careful-- supports decimal point but perhaps only 6 digits long with decimal moveable 34 places (E 34) (4 bytes) Double: double precision floating point; the safest-- supports 12 -15 digits with decimal moveable up to 308 places (E 308) (8 bytes) Blob: binary long decimal for special programming applications Note terminology: • Precision: the total number of digits (before plus after decimal) • Scale: number of digits after decimal Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 34

Domains and Defaults Why Use Them? • Data Integrity: prevents entry of invalid (“obviously Domains and Defaults Why Use Them? • Data Integrity: prevents entry of invalid (“obviously wrong”) data values • Data Efficiency: choose from a set of valid values rather than type in each time Domains define a set of legal values for a field’s attributes • Range domain: specifies a valid range of values for numerical attributes – A water pipe must be between 1 and 100 inches wide • Coded value domain: specifies a valid set of values for an attributes. Can apply to any type of attributes – Parcels can only have RES or VAC land use values • Domains are defined as a geodatabase property & then applied as appropriate – Multiple objects in the same database may use the same domain – May be applied to an entire field (attribute), or separately by subtype Defaults are values automatically assigned when a feature is created – Of course, may be changed during data entry/edit process – Again, may be applied to an entire field (attribute), or separately by subtype Again, the physical design process requires decisions about domains and defaults, and to what they should be applied. Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 35

Implementing Associations and Relationships Associations (general term) may be implemented as • A relationship Implementing Associations and Relationships Associations (general term) may be implemented as • A relationship class within the GDB. – – – • both classes (tables) in the relationship (i. e. the related tables) must be within the GDB permanent part of the data model and relationship rules are enforced Is itself a class, stored in tables, with properties and behaviors support 1: 1 and 1: M cardinality, and many to many using a key class table. Strictly speaking, everything else is an “association” An Arc. Map relate – – – Functionally similar to a relationship class supports both 1: 1 and 1: M cardinality Can form a relate to tables outside the GDB • • – • Local to the Arc. Map document therefore is essentially temporary An Arc. Map join – – – Supports 1: 1 cardinality only Links matching objects and visualizes them as rows in a single table Can match objects either • • • INFO tables, Access tables, dbf tables Tables in other databases via ODBC (object data base connectivity) --Use relationship class when the referential integrity of data is important --Joins are simpler and require less overhead. Non-spatially using key attribute fields in each table: exists only temporarily Spatially using containment, nearness, etc. criteria: saved as new feature class or new shapefile An Arc. Map hyperlink – Attribute in a table stores a hyperlink to a document outside GDB • path name to file-based documents (spreadsheets, text, photos, video or sound clips, etc. ) • URLs for Internet documents – Not part of the GDB in any way; implemented via layer properties in Arc. MAP Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 36

Some Special Case Relationships • Many to Many relationships – Many class A objects Some Special Case Relationships • Many to Many relationships – Many class A objects match many class B (parcels & owners) – Implemented via an “attributed relationship” – Intermediate table is created to store the relationship Ownership Parcel Owner Destination PK Origin FK Destination FK • Aggregation v. Composition and Simple v. Complex Relationships – Aggregation: e. g. dog has bowl, collar valve has valve box • Aggregation implemented through simple relationships • Peer to peer: delete one, the other remains: Dog dies but bowl and collar remain – Composition: e. g. dog has feet, tail valve has maintenance records • Composition implemented through complex relationships • Enforced dependency: delete one, and the other goes also: Dog dies, feet and tail gone Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 37

Spatial Reference All feature classes within a feature dataset must have the same spatial Spatial Reference All feature classes within a feature dataset must have the same spatial reference. • Coordinate System – Datum – Geographic (lat/long) or projected? – Projection parameters: central meridian, standard parallels, coordinate system origin (false easting and northing) – Measurement (map) units: dd (for lat/long); feet, meters, etc. (for projected) • Spatial domain – The allowable coordinate range for the geographic coordinates • X/Y Domain: Min. X, Max. X, Min. Y, Max. Y (horizontal extent) – Domain defaults to 3 times the actual data extent (100% on either side) • Z Domain: Min, Max (vertical extent) • M Domain: Min, Max (other parameter, e. g. distance from river mouth ) (can differ within feature data set) – Once created, the spatial domain for feature dataset/class cannot be changed. – Data outside domain will require a new feature dataset or standalone feature class. • Precision – Number of system storage units (SU) per one map measurement unit (MU) • If precision is 1 and mu= 1 meter ( 1 SU per MU), cannot record values less than 1 meter • If precision is 100 and mu= 1 meter (100 SUs per MU), can record values to 1/100 =. 01 = 1 cm 38 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Precision and Spatial Domain Wilson NC data 793707 X values 137, 438 Domain range Precision and Spatial Domain Wilson NC data 793707 X values 137, 438 Domain range in y 137, 438 47734 745972 41970 704002 52555 2249613 Extent range in y Since map units are feet, will support accuracy to 15624/12= 1, 302 th of an inch!!! Domain range in x 52555 32328 2302168 Extent range in x Precision is: 15, 624 2334496 2387052 Wilson. City. shp Shapefile Extent (range of actual data when fds created) Geodatabase Spatial Domain 47734 100%-200% wider on all sides therefore Domain range is at least 3 times Extent range 656268 Y values (Exact amount depends on Precision) You have 10 significant digits to work with. Precision in essence controls were you put the decimal. If map unit is meters and precision is 1000, you record down to the nearest millimeter. E. g. the map value 1, 123, 456. 1236 is stored as 1, 123, 456, 124 With GRS 80, world circumference is 40, 075, 016 m. therefore can map world at approx. 1. 9 cm accuracy (40, 075, 016*100)/ 2, 147, 483, 648 --Geodatabase coordinates are stored as 4 byte long integer. This provides 10 significant digits with max value of 2, 147, 483, 648 -- map value is multiplied by precision when stored (and converted back when displayed), so min Y values, for example, actually stored as 793707 x 15624=1, 2400, 888, 268 -- the values are also shifted when stored so that data is centered in storage space so you only have to ensure that max. range times precision is less than 2, 147, 483, 648 137438 x 15624=2, 147, 331, 300 < 2, 147, 483, 648 OK, otherwise reduce domain or precision 39 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Personal versus Multi. User (Arc. SDE) Geodatabase • Personal Geodatabase. – Implemented as a Personal versus Multi. User (Arc. SDE) Geodatabase • Personal Geodatabase. – Implemented as a Microsoft Access database (*. mdb file) by using MS Jet engine which is installed along with Arc. GIS 8. – Microsoft Access license not needed, but its handy to have for attribute data development – Can be placed on local or network drives – if on network drive and a user has edit access, other users can’t access (single user editing). – Intended for personal or small work-group use – can handle small to moderately sized datasets. • Max of 250, 000 features per feature class (table) • maximum size is 2. 0 GB – In general, has the full functionality of Arc. SDE geodatabase except • versioning. • Multi-user editing – If a personal ggdb is deleted in Arc. Catalog (or by Windows Explorer) , its gone. • One. mdb file can contain a lot of data. Be carefull!! Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 40

Personal versus Multi. User (Arc. SDE) Geodatabase • Enterprise Multiuser Arc. SDE Geodatabase – Personal versus Multi. User (Arc. SDE) Geodatabase • Enterprise Multiuser Arc. SDE Geodatabase – Arc. SDE is a data access extension to Arc. GIS 8 & 9 that serves geodatabases to Arc. GIS applications running on PC’s connected via a TCP/IP network. – supports concurrent editing by multiple users. • Supports versioning where multiple users can concurrently edit different versions of a layer and any conflicts are resolved when versions are saved back to the original layer – significantly higher speeds for data access than shapefiles or personal geodatabases – Supports very large databases without the need to tile or otherwise ‘subdivide’ the data – Arc. Catalog only creates and deletes connections to Arc. SDE geodatabases, it can’t delete the database – Can be deployed on UNIX or Windows NT. • Many use UNIX platform for Arc. SDE and DBMS, and XP for GIS applications – Arc. SDE is centrally tuned and managed by a DBA. – Back-up and security procedures implemented in the DBMS apply to the GIS data. – Can build SQL applications to access tables in a remote geodatabase. – Requires a server with a DBMS (e. g Oracle, SQL Server) and Arc. SDE. GIS User server SDE db Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 41

Generating Geodatabase Schema • “Schema” is the definition of objects contained within a database Generating Geodatabase Schema • “Schema” is the definition of objects contained within a database – For a geodatabse, objects may include Domains, Tables, Feature. Classes, Relationships or Geometric. Networks. • Four solutions for schema generation/management in Arc. GIS 9 1. 2. 3. 4. Manual creation using Arc. Catalog, or Design from scratch in Microsoft Visio, output to XMI* Repository, and use Arc. Catalog's Case Tool to import from XMI As above, but begin with one of ESRI existing Sample Data Models**, and edit Geodatabase. Designer extension (free Arc. Script downloadable from www. esri. com) which can be used in conjunction with first 2 or, in some cases, as an alternative. **ESRI has sample data models for a variety of areas: *XMI is a new standard for storing object --hydro, parcel, transportation, utilities, etc. . models. Supported by Visio 2002 and later --download from Web site Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 42

Pros and Cons • Arc. Catalog Menu and Wizards – Simple and easy to Pros and Cons • Arc. Catalog Menu and Wizards – Simple and easy to use; documented within Arc. Catalog – Easy to introduce gross errors in the schema through either omission or addition • Visio(UML) and the Arc. Catalog Case Tool – UML (Unified Modeling Language) is a standard for modeling object-oriented objects and their properties, thus can be applied to other databases – object inheritance significantly reduces duplication – Uses Visio's strong graphical functionality for an easy way to visualize the design. – However, • need to buy it • More complex with steep initial learning curve • only supports a subset of geodatabase properties • Geodatabase Designer (GD) – – Fast and free to use and distribute (although not officially supported) Supports all geodatabase properties and all Arc. SDE RDBMS's. The only bi-directional solution : schema can be EXPORTED and IMPORTED However, • A proprietary solution which only works with ESRI geodatabase • Only displays in html text, but can use Geodatabase Diagrammer (another free extension) to display in Visio Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 43

Database Normalization • An important step in the physical design is database normalization • Database Normalization • An important step in the physical design is database normalization • Developing a table structure which: – Reduces or eliminates redundancy – Makes tables easy to manage – Simplifies changes in the future • There is an entire theory of database normalization – we don’t have time to go into it • Just present an example – The usual goal is to create a table structure which is in 3 rd normal form (3 NF) Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 44

Unnormalized (flat file) 3 rd Normal form: --all fields are determined by primary key Unnormalized (flat file) 3 rd Normal form: --all fields are determined by primary key field See: Appendix II for more detail

Data Importing Vs. Data Loading • Importing – Creates new features within a new Data Importing Vs. Data Loading • Importing – Creates new features within a new feature class or geodatabase table. • The features class or table cannot exist before importing – Database schema is imported at the same time – Often involves conversion from other formats e. g. coverages • Loading – Appends features into an existing feature class. – Existing feature class must have the same schema as the data sources – Can be accomplished with: • Simple Data Loader (Arc. Catalog) • Object Loader Wizard (Arc. Map) Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 46

Conclusion • The outcome of these steps is: – A rigorous design for our Conclusion • The outcome of these steps is: – A rigorous design for our database: a “database schema” – The design of a process for obtaining the data elements that will populate our database schema • Identifying a data source and the necessary processing sequence for each layer • covered in Implementation Steps lecture Next time, we will go into the lab and look at some of this in practice. This will involve many ESRI-specific design decisions as outlined in: dbdecisions. ppt Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 47

Appendix I DBMS Relational Operators “for regular (non-spatial) relationships (in the ER Diagram or Appendix I DBMS Relational Operators “for regular (non-spatial) relationships (in the ER Diagram or UML model), physical database design involves identifying which of the RDBMS’s normal query structures or relational operators will handle the relationship” Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 48

 • RDBMS: Relational Operators Select (or Restrict) – • retrieves a subset of • RDBMS: Relational Operators Select (or Restrict) – • retrieves a subset of rows from a table based on value(s) in a column or columns Project – • retrieves a subset of columns from a table, removing duplicates from the result Product – produces the set of all rows that are the concatenation of a row from one relational table with a row from another relational table (usually an intermediate step; not useful otherwise) – • Join – – • Union – • results in rows common to two (or more) relational tables Difference – • vertically combines (stacks) rows of one table with rows in the same or a different table Intersection – • horizontally combines (contatenates) rows in one table with rows in another (or the same) table, including only rows which meet some selection criteria relating columns of the two tables Combines product and select results in rows that appear in one table but not another Division – results in common values in one table for which there are other matching column values corresponding to every row in another table Examples follow in the next three slides…… Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 49

Base Tables Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 50 Base Tables Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 50

Base Tables Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 51 Base Tables Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 51

Terri’s job function is to check if employees of Big City X have taken Terri’s job function is to check if employees of Big City X have taken required courses in the city’s employee training program. The process is to compare courses taken with required courses. Terri can be replaced by a RDBMS Division relational operator! Base Tables The division operator identifies Fred and Karen Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 52

Appendix II Database Normalization Detail • Developing a table structure which: –Reduces or eliminates Appendix II Database Normalization Detail • Developing a table structure which: –Reduces or eliminates redundancy –Makes tables easy to mange –Simplifies changes in the future • Our goal is normally to have all tables in third normal form (3 NF) Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 53

Unnormalized Data (Flat File) Unnormalized form (flat file) Not smallest meaningful value. How can Unnormalized Data (Flat File) Unnormalized form (flat file) Not smallest meaningful value. How can you sort by street, then number? repeating groups of fields. What if there are 3 (or 25) owners? • You work for the county. In this particular state, the county records land ownership, values property, and manages all elections held in the county. • Some of the information you need is shown in the flat file above • This format has many problems, some of which are pointed to above Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 54

First Normal Form (1 NF): • Each field contains smallest meaningful value • Parcel_ad First Normal Form (1 NF): • Each field contains smallest meaningful value • Parcel_ad split into two variables (street_no & street_name), thus can sort on street, then number • Owner_ad left as complex attribute ‘cos only used for mailing • No repeating fields (owner 1, owner 2, etc. . ) • There is now no limit on number of owners per parcel However, problems in that: • Must use multiple primary key fields (parcel_id and owner_id) to uniquely identify a record • multiple repeating values when there are two (or more) owners: Street_no, street_name, block, precinct, councillor, mayor, city, Owner_name, Owner_ad all have repeats. Wastes space, and • If an owner’s address changes, multiple records (rows) must be updated • If a parcel is sold, and the owner does not own any other property (e. g. Kroeger, Adams, M or Bertrand), information about that owner is lost

Second Normal Form (2 NF): concept of 2 NF and problem with 1 NF Second Normal Form (2 NF): concept of 2 NF and problem with 1 NF table • Second Normal Form (2 NF) requires that every non-key field (attribute) be “functionally dependent” on the primary key – Functional dependency is a relationship between attributes such that knowing one attribute automatically determines the other • Tables with multiple fields making up the primary key are not 2 NF – This usually shows up as repeating values in an attribute fields – For example, owner_ID repeats, and knowing the owner does not determine the councilor • knowing owner_ID (part of primary key) is 001, does not determine Councilor, which could be Jones or Smith. (1 NF table) Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 56

Second Normal Form (2 NF): example of 2 NF form (2 NF tables) • Second Normal Form (2 NF): example of 2 NF form (2 NF tables) • In each table, there is only one key field, and knowing its value determines the value of all other attributes – Satisfies criteria for 2 NF – Far fewer repeats and duplicate editing problems • Note that there are still shortcomings, for example – if the mayor of city “big” changes, we must update two records Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 57

Third Normal Form (3 NF) • 3 NF requires that no non-key field be Third Normal Form (3 NF) • 3 NF requires that no non-key field be a fact about another non-key field – This will be violated if there is transitive dependency in a table • Transitive functional dependency occurs when the value in a non-key field is determined by a value in another non-key field • The value for city determines mayor, (and neither of these is the key field) • In 3 NF, fields can only be attributes of the primary key, and not of some other field – Tables not in 3 NF usually have repeating values in a non-key field (e, g mayor field in PRECINCT table) – Mayor ‘Green’ is a fact about city (a non-key field), not about precinct (the key field) Precinct Table (in 2 NF) is split into Councilor and Mayor tables in 3 NF Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 58

4 th and 5 th Normal Form • • • Our goal is usually 4 th and 5 th Normal Form • • • Our goal is usually to have all tables in at least 3 rd normal form 4 th and 5 th normal forms also exist, but these can add disadvantages (for example, processing inefficiency) as well as advantages For example, 5 th Normal form has no duplicated data, but requires junction tables to link data and form relationships Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 59