Скачать презентацию Investigating Metadata for Long Lived Geospatial Resources An Скачать презентацию Investigating Metadata for Long Lived Geospatial Resources An

eb9fd29b713fcbc11555ed21148edb2f.ppt

  • Количество слайдов: 29

Investigating Metadata for Long. Lived Geospatial Resources: An Exploration By Nancy J. Hoebelheinrich Metadata Investigating Metadata for Long. Lived Geospatial Resources: An Exploration By Nancy J. Hoebelheinrich Metadata Coordinator Digital Library Systems & Services Stanford Digital Repository NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 1

To Be Discussed o The Study Question asked n Methodology used n o MD To Be Discussed o The Study Question asked n Methodology used n o MD standards’ strengths / weaknesses o Conclusions & Recommendations o Future work needed NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 2

The Study o NGDA Project: n Pertinent project objective: p Collect n and archive The Study o NGDA Project: n Pertinent project objective: p Collect n and archive major segments of at-risk digital geospatial data and images Partners / Backgrounds / Areas of experience p UCSB: Alexandria Digital Library / Presentation p Stanford Libraries: Stanford Digital Repository / Preservation o Differences in experiences gave rise to study question NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 3

Study Question n n What metadata is needed for long-lived geospatial data formats? Grounded Study Question n n What metadata is needed for long-lived geospatial data formats? Grounded in previous studies p p p Hunolt paper for USGCRP Office Digital Preservation Coalition (UK) NSF OAIS Reference model Duerr, Parsons articles OCLC / RLG Preservation studies NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 4

Methodology used n Evaluate four fairly typical geospatial data formats p Shapefiles, DOQQ’s, DRG’s Methodology used n Evaluate four fairly typical geospatial data formats p Shapefiles, DOQQ’s, DRG’s Landsat 7 satellite images (preliminary) n Compare / contrast 3 different approaches to documenting p FGDC Content Standard p CIESIN Geospatial Electronic Records p PREMIS NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 5

Categories of information about the resources p Environment (computer platforms) p Semantic Underpinnings p Categories of information about the resources p Environment (computer platforms) p Semantic Underpinnings p Domain specific terminology p Provenance p Data trustworthiness p Data quality p Appropriate use NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 6

Environment (computing platform) o Definition: characteristics of the hw / sw configuration that allow Environment (computing platform) o Definition: characteristics of the hw / sw configuration that allow a resource to function properly o Function could be defined as: n n n Rendering Viewing Using o May need to be repeated NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 7

Environment, cont. o All 3 systems have means for documenting these characteristics o Both Environment, cont. o All 3 systems have means for documenting these characteristics o Both PREMIS and GER provide more granularity & parsability, e. g. , creating. Application, sw, hw name, versions; dependencies, environment type, etc. o FGDC uses: “technical prerequisites” & “native data set” NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 8

Semantic Underpinnings o Detailed concepts: n Meaning or essence of data n Significance of Semantic Underpinnings o Detailed concepts: n Meaning or essence of data n Significance of data, i. e. , why preserve it? n Purpose or function served by data n Intended community o FGDC & GER have fairly extensive set, particularly GER o PREMIS NOT = “descriptive” or domain specific, so not covered NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 9

Domain specific terminology o For geospatial, particularly valuable: n n Keywords associated with data Domain specific terminology o For geospatial, particularly valuable: n n Keywords associated with data themes Spatial coverage Time period Stratum coverage / place names o GER & FGDC cover o PREMIS NOT = descriptive MD NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 10

Provenance o Detailed concepts: n n n Info about the events, parameters & source Provenance o Detailed concepts: n n n Info about the events, parameters & source data associated w/ construction of data set prior to ingestion Source of data Changes made to data inside the preservation archive o FGDC, GER, PREMIS all ok for 1 st 2 o FGDC NOT for last NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 11

Provenance, cont. o Greater level granularity / parsability in PREMIS using Object, Event & Provenance, cont. o Greater level granularity / parsability in PREMIS using Object, Event & Agent entities n See Example for Rumsey Historical Map Image Collection about descriptive MD transformation NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 12

Use of PREMIS Event Data Elements o Example Event 1: n Transform of descriptive Use of PREMIS Event Data Elements o Example Event 1: n Transform of descriptive MD from MS Access db => XML => MODS o Why this event? n In case of questions from outside data provider n Retain singular scripts & transform mechanisms NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 13

PREMIS Event Excerpt (v 1. 1) NDIIPP Annual Partners Mtg, Arlington VA 9 July PREMIS Event Excerpt (v 1. 1) NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 14

Use of PREMIS Event Data Elements o Example Event 2: n n Merge c: Use of PREMIS Event Data Elements o Example Event 2: n n Merge c: tempstates 1; c: temp states 2; c: tempUSA (includes process = “merge” and data sources Advantage – can describe events once in repository, unlike FGDC, but Can include if prior to ingestion? o Why this event? n Important to describe processes during different phases of lifecycle, even prior to ingestion n Not to be able to do so – problemmatic for geospatial resources n Is best practice issue for this domain NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 15

Data trustworthiness o Detailed concepts: n n Who are parties responsible for creation, development, Data trustworthiness o Detailed concepts: n n Who are parties responsible for creation, development, storage, maintenance of data set Where is data located How is data available What important factors about the data should be preserved NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 16

Data trustworthiness, cont. o Parties o FGDC, only o Location of data o Factors Data trustworthiness, cont. o Parties o FGDC, only o Location of data o Factors to preserve “originator”, GER & PREMIS more granular & parsable o GER & FGDC seem more specific & less inclusive (only POV of “distributor”) for last 2 NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 17

Data Quality o Detailed concepts: n n General condition statement Accuracy of the data Data Quality o Detailed concepts: n n General condition statement Accuracy of the data Fidelity of relationships within the data set Accuracy of measurements of the data o FGDC – has tags, but are very specific o GER – not much coverage here NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 18

Appropriate use o Detailed concepts: n Legal use and liability statements n Technical characteristics Appropriate use o Detailed concepts: n Legal use and liability statements n Technical characteristics that impact use o FGDC & PREMIS have, GER NOT o FGDC NOT, GER & PREMIS have means of linking to format registry info o More about format registries, later NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 19

PREMIS & “Significant properties” o Way within PREMIS to document: n n n Data PREMIS & “Significant properties” o Way within PREMIS to document: n n n Data trustworthiness: data creator / provider reliable = “authentic” Data quality: describing completeness, logical consistency, attribute accuracy Data Provenance: processes & sources for dataset = “understandable & reliable” Appropriate use: understanding of the specific needs of the “designated community”? Other important factors to preserve o More work needs to be done in this area NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 20

Strengths & weaknesses: FGDC o Rich in detail o Specificity for the geospatial domain Strengths & weaknesses: FGDC o Rich in detail o Specificity for the geospatial domain o Ubiquity o Very complex & laborious to complete o Poor means for describing relationships among file components of a digital resource o No way to describe digital resource once within preservation archive NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 21

Strengths & weaknesses: GER o Focus on archiving o Little known as yet o Strengths & weaknesses: GER o Focus on archiving o Little known as yet o Comprehensive o No data dictionary, so unclear how to apply tags (cardinality, repeatability, etc. ) o Relational DB format o Unclear if and/or how to describe digital resource once within preservation archive NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 22

Strengths, weaknesses: PREMIS o Applicable at many levels of digital resource: abstract & physical Strengths, weaknesses: PREMIS o Applicable at many levels of digital resource: abstract & physical o Capability for describing relationships among file components of digital resource o Capability for describing digital resource during its entire lifecycle within the preservation archive o Generic & focused upon preservation o Not specific enough for geospatial o Does not include critical semantics or “descriptive” information important for using digital resource o Fairly young specification; unclear how to document “significant properties” for digital resources NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 23

Recommendations o Use of content standard (e. g. , FGDC or ISO when replaces) Recommendations o Use of content standard (e. g. , FGDC or ISO when replaces) n Best used for semantics, domain specific terminology o PREMIS n Best used for management of resources over time using p p p PREMIS Object entity PREMIS Event entity PREMIS Agent entity o Useful to package resources & metadata together to facilitate tracking of aggregation of resource(s), MD & resource structure & file inventory, e. g. , METS NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 24

Issues & Challenges o What if domain specific MD is not available? o If Issues & Challenges o What if domain specific MD is not available? o If not, how can one get important info from data o o creators? How to determine what is truly necessary for use of data sets? Establishment of geospatial format registries Getting buy-in from geospatial domains for use of vocabularies, etc. (see Global Spatial Data Infrastructure: http: //www. gsdi. org/Default. asp ) More research needs to be done on “significant properties” like that done by JISC – DPC studies, e. g. , SP’s of vector images NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 25

Future directions for NGDA Project o Further investigation of other geospatial formats including more Future directions for NGDA Project o Further investigation of other geospatial formats including more vector based data such as: n n n layers of the National Atlas National Map (sections of California) Landsat 7 ETM imagery o Derived data sets from Stanford faculty NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 26

Future directions, cont. o Format Registry investigation - what should be included in a Future directions, cont. o Format Registry investigation - what should be included in a format registry for geospatial n Contact with key vendors, e. g. ESRI, Safe. Software, etc. o Monitoring what others are doing with e-science & social science data sets, e. g. , n n n NCSU, Johns Hopkins National Australian Archive (NAA) JISC and DPC in the UK NDIIPP US Multi-state project Those using new DDI v 3. 0 schema NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 27

References, contact info o JISC – DPC studies on significant properties: http: //www. dpconline. References, contact info o JISC – DPC studies on significant properties: http: //www. dpconline. org/graphics/events/080407 workshop. html n See Duce and Nielsen papers o Full paper available at: http: //www. ngda. org/research. php o National Geospatial Digital Archive: http: //www. ngda. org/index. php o Examples of METS with PREMIS on METS public wiki: o http: //www. socialtext. net/mim-2006/index. cgi? profile_playground NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008 28

Questions? / comments? Nancy J. Hoebelheinrich nhoebel@stanford. edu John Banning [jwbanning@gmail. com] NDIIPP Annual Questions? / comments? Nancy J. Hoebelheinrich nhoebel@stanford. edu John Banning [jwbanning@gmail. com] NDIIPP Annual Partners Mtg, Arlington VA 9 July 2008