Скачать презентацию Introduction to Ecoinformatics Past Present Future William Скачать презентацию Introduction to Ecoinformatics Past Present Future William

1e837a1c64370146419c8b12f49fc08c.ppt

  • Количество слайдов: 52

Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Outline Ecoinformatics: a definition p A science vision p Information challenges p Ecoinformatics “solutions” Outline Ecoinformatics: a definition p A science vision p Information challenges p Ecoinformatics “solutions” p

Outline Ecoinformatics: a definition p A science vision p Information challenges p Ecoinformatics “solutions” Outline Ecoinformatics: a definition p A science vision p Information challenges p Ecoinformatics “solutions” p

Ecoinformatics A broad S&T discipline that incorporates both concepts and practical tools for the Ecoinformatics A broad S&T discipline that incorporates both concepts and practical tools for the understanding, generation, processing, and propagation of ecological data, information and knowledge.

Outline Ecoinformatics: a definition p A science vision p Information challenges p Ecoinformatics “solutions” Outline Ecoinformatics: a definition p A science vision p Information challenges p Ecoinformatics “solutions” p

Many studies employ a restricted scale of observation -- Commonly 1 m 2 The Many studies employ a restricted scale of observation -- Commonly 1 m 2 The literature is biased toward single and small scale results

Thinking Outside the “Box” Time LTER Biocomplexity Parameters ce pa S NEON, WATERS, OOI, Thinking Outside the “Box” Time LTER Biocomplexity Parameters ce pa S NEON, WATERS, OOI, …. Increase in breadth and depth of understanding. . .

Grand environmental challenges 2000 1998 2001 2003 2004 Grand environmental challenges 2000 1998 2001 2003 2004

More and more of the ecological questions that confront society are national, continental and More and more of the ecological questions that confront society are national, continental and global in scope Source: CDC Drought Source: Drought Monitor

LTER 26 NSF LTER Sites in the U. S. and the Antarctic: > 1, LTER 26 NSF LTER Sites in the U. S. and the Antarctic: > 1, 600 Scientists; 6, 000+ Data Sets— different themes, methods, units, structure, ….

NEON Climate Domains 18 16 19 12 9 20 5 1 15 16 6 NEON Climate Domains 18 16 19 12 9 20 5 1 15 16 6 17 7 13 1 Mid Atlantic 3 2 Northeast 2 10 Southeast 8 14 11 4 Atlantic Neotropical 5 Great Lakes 6 Prairie Peninsula 12 Northern Rockies 7 Appalachians / Cumberland Plateau 13 Southern Rockies / Colorado Plateau 8 Ozarks Complex 14 Desert Southwest 9 Northern Plains 15 Great Basin 18 Tundra 10 Central Plains 16 Pacific Northwest 19 Taiga 11 Southern Plains 17 Pacific Southwest 20 Pacific Tropical 3 4

Bio. Mesonet Tower and Sensor Arrays Aquatic Arrays Bio. Mesonet Tower and Sensor Arrays Aquatic Arrays

Soil Sensor Arrays Micron-scale nitrate ISE Soil Sensor Arrays Micron-scale nitrate ISE

Small-Organism Tracking: Mobile animals as biosentinels for environmental change, forecasting biological invasions, emerging disease Small-Organism Tracking: Mobile animals as biosentinels for environmental change, forecasting biological invasions, emerging disease spread

Outline Ecoinformatics: a definition p A science vision p Information challenges p Ecoinformatics “solutions” Outline Ecoinformatics: a definition p A science vision p Information challenges p Ecoinformatics “solutions” p

Characteristics of Ecological Data High Data Volume (per dataset) Low Satellite Images Weather Stations Characteristics of Ecological Data High Data Volume (per dataset) Low Satellite Images Weather Stations Business Data Most Software Gene Sequences GIS Most Ecological Data Primary Productivity Biodiversity Surveys Population Data Soil Cores Complexity/Metadata Requirements High

Data Entropy Time of publication Information Content Specific details General details Retirement or career Data Entropy Time of publication Information Content Specific details General details Retirement or career change Accident Death Time (Michener et al. 1997)

Semantics A B • • Schema transform Coding transform Taxon Lookup Semantic transform Imagine Semantics A B • • Schema transform Coding transform Taxon Lookup Semantic transform Imagine scaling!! C

Semantics—Linking Taxonomic Semantics to Ecological Data §Taxon concepts change over time (and space) §Multiple Semantics—Linking Taxonomic Semantics to Ecological Data §Taxon concepts change over time (and space) §Multiple competing concepts coexist §Names are re-used for multiple concepts Elliot 1816 R. plumosa R. Plumosa v. intermedia Gray 1834 R. plumosa v. plumosa Chapman 1860 R. plumosa Rhynchospora plumosa s. l. Kral 1998 R. Plumosa v. interrupta R. intermedia R. pineticola R. plumosa v. pinetcola A R. plumosa v. plumosa B R. sp. 1 C from R. Peet 2002?

What Users Really Want… What Users Really Want…

Outline Ecoinformatics: a definition p A science vision p Information challenges p Ecoinformatics “solutions” Outline Ecoinformatics: a definition p A science vision p Information challenges p Ecoinformatics “solutions” p

Research Program Investigators Studies Field Computer Entry Electronically Interfaced Field Equipment Electronically Interfaced Lab Research Program Investigators Studies Field Computer Entry Electronically Interfaced Field Equipment Electronically Interfaced Lab Equipment Experimental Design Methods Data Design Data Forms Quality Control Raw Data File Data Entry Quality Assurance Checks no • Standard Operating Procedures • Policies • Data sharing • Computer use • Archive storage Summary Analyses Data Validated Investigators Data verified? Data Contamination yes Archive Data File Archival Mass Storage Magnetic Tape / Optical Disk / Printouts Access Interface Publication Synthesis Metadata Off-site Storage Secondary Users

Ecoinformatics solutions Data design p Data acquisition p QA/QC p Data documentation (metadata) p Ecoinformatics solutions Data design p Data acquisition p QA/QC p Data documentation (metadata) p Data archival p

Ecoinformatics solutions Data design p Data acquisition p QA/QC p Data documentation (metadata) p Ecoinformatics solutions Data design p Data acquisition p QA/QC p Data documentation (metadata) p Data archival p

Data Design p Conceptualize and implement a logical structure within and among data sets Data Design p Conceptualize and implement a logical structure within and among data sets that will facilitate data acquisition, entry, storage, retrieval and manipulation.

Database Types File-system based p Hierarchical p Relational p Object-oriented p Hybrid (e. g. Database Types File-system based p Hierarchical p Relational p Object-oriented p Hybrid (e. g. , combination of relational and object -oriented schema) p Porter 2000

Data Design: 7 Best Practices p p p p Assign descriptive file names Use Data Design: 7 Best Practices p p p p Assign descriptive file names Use consistent and stable file formats Define the parameters Use consistent data organization Perform basic quality assurance Assign descriptive data set titles Provide documentation (metadata) from Cook et al. 2000

1. Assign descriptive file names p p File names should be unique and reflect 1. Assign descriptive file names p p File names should be unique and reflect the file contents Bad file names n n p Mydata 2001_data A better file name n Sevilleta_LTER_NM_2001_NPP. asc p p p Sevilleta_LTER is the project name NM is the state abbreviation 2001 is the calendar year NPP represents Net Primary Productivity data asc stands for the file type--ASCII

2. Use consistent and stable file formats p p Use ASCII file formats – 2. Use consistent and stable file formats p p Use ASCII file formats – avoid proprietary formats Be consistent in formatting n n don’t change or re-arrange columns include header rows (first row should contain file name, data set title, author, date, and companion file names) column headings should describe content of each column, including one row for parameter names and one for parameter units within the ASCII file, delimit fields using commas, pipes (|), tabs, or semicolons (in order of preference)

3. Define the parameters p Use commonly accepted parameter names that describe the contents 3. Define the parameters p Use commonly accepted parameter names that describe the contents n p Use consistent capitalization n p e. g. , not temp, Temp, and TEMP in same file Explicitly state units of reported parameters in the data file and the metadata n p e. g. , precip for precipitation SI units are recommended Choose a format for each parameter, explain the format in the metadata, and use that format throughout the file n e. g. , use yyyymmdd; January 2, 1999 is 19990102

4. Use consistent data organization (one good approach) Station Date Temp Precip Units YYYYMMDD 4. Use consistent data organization (one good approach) Station Date Temp Precip Units YYYYMMDD C mm HOGI 19961001 12 0 HOGI 19961002 14 3 HOGI 19961003 19 -9999 Note: -9999 is a missing value code

4. Use consistent data organization (a second good approach) Station Date Parameter Value Unit 4. Use consistent data organization (a second good approach) Station Date Parameter Value Unit HOGI 19961001 Temp 12 C HOGI 19961002 Temp 14 C HOGI 19961001 Precip 0 mm HOGI 19961002 Precip 3 mm

5. Perform basic quality assurance p p p Assure that data are delimited and 5. Perform basic quality assurance p p p Assure that data are delimited and line up in proper columns Check that there no missing values for key parameters Scan for impossible and anomalous values Perform and review statistical summaries Map location data (lat/long) and assess errors Verify automated data transfers n p e. g. check-sum techniques For manual data transfers, consider double keying data and comparing 2 data sets

6. Assign descriptive data set titles p p Data set titles should ideally describe 6. Assign descriptive data set titles p p Data set titles should ideally describe the type of data, time period, location, and instruments used (e. g. , Landsat 7). Data set title should be similar to names of data files n n Good: “Shrub Net Primary Productivity at the Sevilleta LTER, New Mexico, 2000 -2001” Bad: “Productivity Data”

7. Provide documentation (metadata) 7. Provide documentation (metadata)

Ecoinformatics solutions Data design p Data acquisition p QA/QC p Data documentation (metadata) p Ecoinformatics solutions Data design p Data acquisition p QA/QC p Data documentation (metadata) p Data archival p

High-quality data depend on: Proficiency of the data collector(s) p Instrument precision and accuracy High-quality data depend on: Proficiency of the data collector(s) p Instrument precision and accuracy p Consistency (e. g. , standard methods and approaches) p n Design and ease of data entry Sound QA/QC p Comprehensive metadata (e. g. , documentation of anomalies, etc. ) p

What’s wrong with this data sheet? Plant ______________ ______________ Life Stage _______________ _______________ What’s wrong with this data sheet? Plant ______________ ______________ Life Stage _______________ _______________

Important questions How well does the data sheet reflect the data set design? p Important questions How well does the data sheet reflect the data set design? p How well does the data entry screen (if available) reflect the data sheet? p

PHENOLOGY DATA SHEET Rio Salado - Transect 1 Collectors: _________________ Date: __________ Time: _____ PHENOLOGY DATA SHEET Rio Salado - Transect 1 Collectors: _________________ Date: __________ Time: _____ Notes: _______________________________________________ Plant ardi arpu atca bamu zigr P/G = perennating or germinating V = vegetating B = budding FL = flowering FR = fruiting P/G P/G V V V V Life Stage B B B B FL FL FR FR M M M M S S S S M = dispersing S = senescing D = dead NP = not present D D D D NP NP

PHENOLOGY DATA SHEET Collectors Date: Notes: ardi Troy Maddux 16 May 1991 Time: V PHENOLOGY DATA SHEET Collectors Date: Notes: ardi Troy Maddux 16 May 1991 Time: V B Y N Y N P/G V B Y N deob P/G Y N asbr 13: 12 Cloudy day, 3 gopher burrows on transect Y N arpu Rio Salado - Transect 1 Y N P/G V B Y N Y N FL Y N FR M S D NP Y N Y N Y N Y N Y N

Ecoinformatics solutions Data design p Data acquisition p QA/QC p Data documentation (metadata) p Ecoinformatics solutions Data design p Data acquisition p QA/QC p Data documentation (metadata) p Data archival p

Generic Data Processing Research Program Investigators Studies Field Computer Entry Electronically Interfaced Field Equipment Generic Data Processing Research Program Investigators Studies Field Computer Entry Electronically Interfaced Field Equipment Electronically Interfaced Lab Equipment Experimental Design Methods Data Design Data Forms Quality Control Raw Data File Data Entry Quality Assurance Checks no Data verified? Data Contamination yes Summary Analyses Data Validated Investigators Archive Data File Archival Mass Storage Magnetic Tape / Optical Disk / Printouts Access Interface Metadata Off-site Storage Secondary Users Publication Synthesis Brunt 2000

Ecoinformatics solutions Project / experimental design p Data acquisition p QA/QC p Data documentation Ecoinformatics solutions Project / experimental design p Data acquisition p QA/QC p Data documentation (metadata) – to be addressed p Data archival p

Ecoinformatics solutions Project / experimental design p Data acquisition p QA/QC p Data documentation Ecoinformatics solutions Project / experimental design p Data acquisition p QA/QC p Data documentation (metadata) p Data archival p

Cycles of Research “A Conventional View” on ti blica Pu Data s Analysis and Cycles of Research “A Conventional View” on ti blica Pu Data s Analysis and modeling Problem Collection Planning

Cycles of Research “A New View” on ti blica Pu Archive of Data s Cycles of Research “A New View” on ti blica Pu Archive of Data s Collection Analysis and modeling Secondary Observations Original Observations Planning Selection and extraction Problem Definition (Research Objectives) Planning

Data Archive A collection of data sets, usually electronic, stored in such a way Data Archive A collection of data sets, usually electronic, stored in such a way that a variety of users can locate, acquire, understand use the data. p Examples: p n n ESA’s Ecological Archive NASA’s DAACs (Distributed Active Archive Centers)

References Brunt (2000) Ch. 2 in Michener and Brunt (2000) Porter (2000) Ch. 3 References Brunt (2000) Ch. 2 in Michener and Brunt (2000) Porter (2000) Ch. 3 in Michener and Brunt (2000) Edwards (2000) Ch. 4 in Michener and Brunt (2000) Michener (2000) Ch. 7 in Michener and Brunt (2000) Cook, R. B. , R. J. Olson, P. Kanciruk, and L. A. Hook. 2000. Best practices for preparing ecological and ground-based data sets to share and archive. (online at http: //www. daac. ornl. gov/cgibin/MDE/S 2 K/bestprac. html)