The Biology Workbench a community tool for

Скачать презентацию The Biology Workbench a community tool for Скачать презентацию The Biology Workbench a community tool for

4a425e86b24422d0bc8821fb241b330d.ppt

  • Количество слайдов: 59

The Biology Workbench – a community tool for teaching and research Mark A. Miller The Biology Workbench – a community tool for teaching and research Mark A. Miller Principal Investigator, Biology San Diego Supercomputer Center SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

SDSC Mission: To serve as a premiere resource for design, development, and deployment of SDSC Mission: To serve as a premiere resource for design, development, and deployment of cyberinfrastructure for the national scientific community. SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

What is Cyberinfrastructure anyway? Research Then, after many months or years of struggle…… Production What is Cyberinfrastructure anyway? Research Then, after many months or years of struggle…… Production Data. Bases Compute Resources Wet Labs Clinical Labs SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Cyberinfrastructure (We Think) Life (and Other) Scientists Need Development Wet Labs Clinical Labs Research Cyberinfrastructure (We Think) Life (and Other) Scientists Need Development Wet Labs Clinical Labs Research Grid Resources Discovery Portal Production Integration Software Data Capture Portals Personal Electronic Notebook Global Data Providers Grid Services D. L. Data. Bases Workflow Sequence Tools Wet Labs Clinical Labs Data Deposition Portals Web Services Structure Tools Compute Resources Microarray Tools SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

SDSC Production Resources for HEC and Grid Computing Tools we provide to the community SDSC Production Resources for HEC and Grid Computing Tools we provide to the community for U. S. NSF: Allocations on Large architectures via NRAC Data. Star; Tera. Grid; Blue Gene Allocations for Data Collection Storage 1 PB of on-line disc space; 12 PB of tape space User Services Allocation awards are accompanied by personal service to get you going. Everyone receives courteous advice and assistance! Development allocations are awarded on request. Software Services http: //www. sdsc. edu/user_services/allocations/ Rocks cluster management tools Storage Resource Broker (SRB) The Kepler Workflow Tool SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

What is the Next Generation Tools for Biology Group? Use the Resources of SDSC What is the Next Generation Tools for Biology Group? Use the Resources of SDSC to Focus on: l Both research and development. l Science is the driver. l Activities that can be uniquely conducted at SDSC. l Activities that partner with other institutions. l Activities that are community-building. SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Overview: Next Generation Tools for Biology at SDSC Current Projects at SDSC: l IBM Overview: Next Generation Tools for Biology at SDSC Current Projects at SDSC: l IBM Institute for Innovation in Biomedical Simulations and Imaging (IBM-I 3). l Cyberinfrastructure for Phylogenetic Research (CIPRES). l The Next Generation Biology Workbench SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Overview: Next Generation Tools for Biology at SDSC Current Projects at SDSC: l IBM Overview: Next Generation Tools for Biology at SDSC Current Projects at SDSC: l IBM Institute for Innovation in Biomedical Simulations and Imaging (IBM-I 3). l Cyberinfrastructure for Phylogenetic Research (CIPRES). l The Next Generation Biology Workbench SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Overview: Next Generation Tools for Biology at SDSC Current Projects at SDSC: l IBM Overview: Next Generation Tools for Biology at SDSC Current Projects at SDSC: l IBM Institute for Innovation in Biomedical Simulations and Imaging (IBM-I 3). l Cyberinfrastructure for Phylogenetic Research (CIPRES). l The Next Generation Biology Workbench SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Overview: Next Generation Tools for Biology at SDSC Current Projects at SDSC: l IBM Overview: Next Generation Tools for Biology at SDSC Current Projects at SDSC: l IBM Institute for Innovation in Biomedical Simulations and Imaging (IBM-I 3). l Cyberinfrastructure for Phylogenetic Research (CIPRES). l The Next Generation Biology Workbench SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Next Generation Tools for Biology Current Products: CIPRES middleware CIPRES portal CIPRES/Kepler workflow Biology Next Generation Tools for Biology Current Products: CIPRES middleware CIPRES portal CIPRES/Kepler workflow Biology Workbench SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

CIPRES middleware SDK/libraries for Win/Mac/Linux. • CORBA service architecture allows interactive access to tools CIPRES middleware SDK/libraries for Win/Mac/Linux. • CORBA service architecture allows interactive access to tools across platforms. • Currently supports tree inference/improvement. • Can be accessed through Mesquite SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Portal for Tree Inference Supports: Parsimony: (PAUP) Max Likelihood: (RAx. ML, GARLI) Coming Soon: Portal for Tree Inference Supports: Parsimony: (PAUP) Max Likelihood: (RAx. ML, GARLI) Coming Soon: User configurability (via applet) Mr. Bayes POY Sate SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

CIPRES/Kepler workflow Status: Proof of Concept Systematics Feature Set; In Usability Development Supports: Iteration CIPRES/Kepler workflow Status: Proof of Concept Systematics Feature Set; In Usability Development Supports: Iteration Check-pointing Data Forking Data Transfer and deposition Web services Provenance Tracking http: //www. phylo. org/sub_sections/kepler_workflow/help/creation. htm SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

The (current) Biology Workbench Created 1996 -1997 at NCSA by Shankar Subramaniam, Eric Jakobsson, The (current) Biology Workbench Created 1996 -1997 at NCSA by Shankar Subramaniam, Eric Jakobsson, Roger Unwin, Brian Saunders, Mark Stupar, Dawn Cotter, Jim Fenton, Curt Jamison, Brad Mills, George Pappas, David Tcheng SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

The original concept behind BWB: “Wouldn't it be nice if there was a web The original concept behind BWB: “Wouldn't it be nice if there was a web site that would let me run BLAST, CLUSTALW, etc. on my collection of sequences, or a collection of sequence alignments and let me store the results? ” SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Current Workbench Properties From a single browser interface, one can access: l l 66 Current Workbench Properties From a single browser interface, one can access: l l 66 individual tools. Sequences from 33 databases. All calculations provided by the Workbench Server. Individual login password security provided. Data storage area provided for results. No required plug-ins or downloads. Can be (and is) used over phone modem. SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Jobs Users Annual WB usage ’ 00 – ‘ 03 SAN DIEGO SUPERCOMPUTER CENTER Jobs Users Annual WB usage ’ 00 – ‘ 03 SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Some BW User statistics l l l l 71% of the user base is Some BW User statistics l l l l 71% of the user base is domestic. 44% are academic 15% noncommercial 11% commercial 1% government The 29% international user population represents over 40 countries 50% of present users employ the BW for government -funded research programs 48% of BW users are involved in education SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Cyberinfrastructure Provided by the Workbench Wet Labs Clinical Labs Data storage area Personal Electronic Cyberinfrastructure Provided by the Workbench Wet Labs Clinical Labs Data storage area Personal Electronic Notebook Grid Resources Integration Software Data Capture Portals Discovery Portal Global Data Providers Grid Services Data Integration Workbench Data. Base D. L. Workflow Tools Sequence Tools Wet Labs Clinical Labs Data Deposition Portals Web Services Structure Tools Compute Resources Microarray Tools SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Overall Architecture of the Biology Workbench Browser Web Server bw. cgi Software Tools Wrapper Overall Architecture of the Biology Workbench Browser Web Server bw. cgi Software Tools Wrapper html. pl Ndjinn Indexing Session Storage User Data Storage Databases Databases SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

? Current Data Integration System Public DBs Web Server User Data Storage Databases Flat ? Current Data Integration System Public DBs Web Server User Data Storage Databases Flat file Swissprot Database Chronjob: ftp download Parser NDJINN Flat file Gen. Bank Database Lookup Table SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Ndjinn Multiple Database Search The Ndjinn Multiple Database Search The "Ndjinn Multiple Database Search" allows the user to specify dbs to be searched SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Constructing Queries User selected databases may be searched for text. Permitted text searches are Constructing Queries User selected databases may be searched for text. Permitted text searches are “Contains", "Begins With", "Ends With", or is an "Exact Match". Boolean operators "AND", "NOT", or "OR” may also be used: Search order controlled by parentheses. Example: (myoglobin AND human) OR orangutan SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Introducing SWAMI The Next Generation Biology Workbench (www. ngbw. org) SAN DIEGO SUPERCOMPUTER CENTER Introducing SWAMI The Next Generation Biology Workbench (www. ngbw. org) SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Why SWAMI? SWAMI = Master We'll all be planning out a route We're gonna Why SWAMI? SWAMI = Master We'll all be planning out a route We're gonna take real soon We're waxing down our surfboards We can't wait for June We'll all be gone for the summer We're on surfari to stay Tell the teacher we're surfin' Surfin' U. S. A. Haggerties and Swamies Pacific Palisades San Onofre and Sunset Redondo Beach L. A. All over La Jolla At Waimia Bay Everybody's gone surfin' Surfin' U. S. A. SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

The User Says: The User Says: "There should be a New Biology workbench web site that can provide better search tools, support protein structure investigations, and allow my students to share files…. ” SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

The Developer Hears: The Developer Hears: "There should be a web site that can • host all the users biological data — not just sequences • allow them to analyze it using any modern tool they choose. " SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

New Workbench Architecture Ideas: Take 1. Web Services Grid Services Sequence Tools Structure Tools New Workbench Architecture Ideas: Take 1. Web Services Grid Services Sequence Tools Structure Tools Web Services Compute Resources Microarray Tools Registry/ Discovery Personal Electronic Notebook Integration Software Discovery Portal Workflow Wet Labs Clinical Labs Data Deposition Portals Local Data. Bases Computing and data management are D. L. handled at remote sites Registry/ Discovery Global Data Providers SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

New Workbench Architecture Ideas: Take 1. Web Services SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. New Workbench Architecture Ideas: Take 1. Web Services SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

New Workbench Architecture Ideas: Take 1. Web Services Issues: Tools: No control over tool New Workbench Architecture Ideas: Take 1. Web Services Issues: Tools: No control over tool availability. Published tool registries are weak. This approach is Robust tool descriptions (UDDI) pose enormous overhead. too loosely coupled! Data: Can’t query across all data sources. Unknown bandwidth and reliability of remote data sources. API of remote data sources can change without warning. SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

"There should be a. Reality Strikes: web site that can Priorities must be • host all the users biological data — not just sequences ordered • allow them to analyze it using any modern tool they choose. " SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

The Developer Concludes: The Developer Concludes: "There should be a web site that can • host all users biological data — not just sequences • allow them to analyze it using any modern tool they choose with as many tools as possible with enterprise class stability…. . " SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

New Workbench Architecture Ideas: Take 2. Enterprise Solution Sequence Tools Structure Tools Personal Electronic New Workbench Architecture Ideas: Take 2. Enterprise Solution Sequence Tools Structure Tools Personal Electronic Notebook Microarray Tools Discovery Portal Workflow Wet Labs Clinical Labs Compute Resources Data Deposition Portals Integration Software Local Data Warehouse Computing and data management are D. L. handled locally Global Data Providers SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

New Workbench Architecture Ideas: Take 2. EJB SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu New Workbench Architecture Ideas: Take 2. EJB SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

New Workbench Architecture Ideas: Take 2. EJB SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu New Workbench Architecture Ideas: Take 2. EJB SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

New Workbench Architecture Ideas: Take 2. EJB Issues: Architecture has 8 separate modules. • New Workbench Architecture Ideas: Take 2. EJB Issues: Architecture has 8 separate modules. • A change in any module breaks 1 - 7 others • Only a developer who can get zen with EJB can contribute to the development This approach has • Modifying a web page becomes a task that too cannot overhead! a web artist much manage alone. • After 12 months of development, we can login? SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Reality site that can Reality site that can "There should be a web Strikes Again: Priorities must be re • host all users biological data — not just sequences ordered • allow them to analyze it using any modern tool they choose with as many tools as possible with enterprise class stability…. . " SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

The User Re-states: The User Re-states: "There should be a web site that can • allow me can provide better search tools, and allow my students to share files and • allow me to analyze it using any modern tool I choose with as many tools as possible with enterprise class stability and with enough stability so I can teach reliably…. . as soon as is humanly possible…. " SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

New Workbench Architecture Ideas: Take 3. Integrated, Stable Solution Tom. Cat/JAVAStruts 2/Hibernate/My. SQL/Lucene This New Workbench Architecture Ideas: Take 3. Integrated, Stable Solution Tom. Cat/JAVAStruts 2/Hibernate/My. SQL/Lucene This approach is just right? SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Lesson Number 1: Get the user requirements right in the beginning SAN DIEGO SUPERCOMPUTER Lesson Number 1: Get the user requirements right in the beginning SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

The NEW Workbench will improve on the existing functionalities Wet Labs Clinical Labs Grid The NEW Workbench will improve on the existing functionalities Wet Labs Clinical Labs Grid Resources Integration Software Data Capture Portals Personal Electronic Notebook Discovery Portal Global Data Providers Grid Services Improved Data Handling Workbench Data Warehouse D. L. Workflow Sequence Tools Wet Labs Clinical Labs Data Deposition Portals Web Services Structure Tools Compute Resources Sequencing Tools SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Improved Data Handling Browser Web Server Session Storage bw. cgi Wrapper html. pl User Improved Data Handling Browser Web Server Session Storage bw. cgi Wrapper html. pl User Data Storage • The toolkit is limited by the ability to handle only sequences and alignments. Ndjinn Indexing Databases Databases Flat files • The ability to search is limited by storing data as free (unstructured) text. SAN DIEGO SUPERCOMPUTER CENTER Data Providers biology. sdsc. edu NIGMS

Improved Data Handling Improve Search Techniques Lucene indexing allows us to replace the single Improved Data Handling Improve Search Techniques Lucene indexing allows us to replace the single text match string with the ability to search on specific fields: SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Improved Data Handling: User data stored in RDB: Allow user to import and annotate Improved Data Handling: User data stored in RDB: Allow user to import and annotate data of many types, including a generic, unknown type. User-entered sequences and results are stored annotated along with other user selected sequences. Use of the RDB makes it possible to repurpose data easily. SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

The NEW Workbench will improve on the existing functionalities Wet Labs Clinical Labs Grid The NEW Workbench will improve on the existing functionalities Wet Labs Clinical Labs Grid Resources Integration Software Data Capture Portals Personal Electronic Notebook Discovery Portal Global Data Providers Grid Services Workbench Data Warehouse D. L. Improved Tool Selection Workflow Sequence Tools Wet Labs Clinical Labs Data Deposition Portals Web Services Structure Tools Compute Resources Sequencing Tools SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

New Discovery Portal Step 1. Improved User Access to Tools PISE XML Browser SWAMI New Discovery Portal Step 1. Improved User Access to Tools PISE XML Browser SWAMI XML PISE currently 2: has 300+ interfaces Lesson Number Software development is incredibly expensive. Tool Wrapper Web Broker bw. cgi. jsp html. pl Server Service Build nothing you can steal. Steal from the best. Session Storage Software Tools User Data Storage SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

The NEW Workbench will improve on the existing functionalities Wet Labs Clinical Labs Grid The NEW Workbench will improve on the existing functionalities Wet Labs Clinical Labs Grid Resources Improved Portal Integration Software Data Capture Portals Personal Electronic Notebook Discovery Portal Global Data Providers Grid Services Workbench Data Warehouse D. L. Workflow Sequence Tools Wet Labs Clinical Labs Data Deposition Portals Web Services Structure Tools Compute Resources Sequencing Tools SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

User-Requested Tool. Kits: Structural Biology: Tools to visualize protein structures. Molecular Biology: Tools to User-Requested Tool. Kits: Structural Biology: Tools to visualize protein structures. Molecular Biology: Tools to assemble contigs. Tools to visualize sequencer output. Role- Based Logins Licensed tools can be mounted for individual users Instructors and students have separate roles Folder sharing for collaborative work. NO BROWSER PLUGINS NO SUDDEN CHANGES SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Sneak Preview: http: //snooker. sdsc. edu/web SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS Sneak Preview: http: //snooker. sdsc. edu/web SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

New Discovery Portal Next Steps: Improved User Access to Data SAN DIEGO SUPERCOMPUTER CENTER New Discovery Portal Next Steps: Improved User Access to Data SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

New Discovery Portal Next Steps: Improved User Access to Data SAN DIEGO SUPERCOMPUTER CENTER New Discovery Portal Next Steps: Improved User Access to Data SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Tools to assemble contigs. Tools to visualize sequencer output. The AMOS consortium at TIGR Tools to assemble contigs. Tools to visualize sequencer output. The AMOS consortium at TIGR produces: BAMBUS, a genome sequence scaffolding program Auto. Editor, a tool for correcting sequencing and basecaller errors using sequence alignment and chromatogram data. Assembler, a tool for assembly of large sets of overlapping sequence data such as ESTs, BACs, or small genomes. LUCY, a sequence cleanup program that prepares raw DNA sequence fragments for sequence assembly. SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

The NEW Workbench will also create new infrastructure Pipelining Wet Labs Grid Capabilities Clinical The NEW Workbench will also create new infrastructure Pipelining Wet Labs Grid Capabilities Clinical Resources Global Data Providers Labs Integration Software Data Capture Portals Personal Electronic Notebook Discovery Portal Grid Services Workbench Data. Base D. L. Workflow Sequence Tools Wet Labs Clinical Labs Data Deposition Portals Web Services Structure Tools Compute Resources Microarray Tools SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Web-Based Workflow Capability Tool 2 Tool 3 Tool 4 Input Tool 5 Tool 1 Web-Based Workflow Capability Tool 2 Tool 3 Tool 4 Input Tool 5 Tool 1 Send output to Tool 6 Tool 7 Output Tool 8 Tool 9 Tool 10 SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Notebook Capability The Notebook will feature a local database to store results of computations, Notebook Capability The Notebook will feature a local database to store results of computations, results of searches, notify you of new updates available, and enable peer-to-peer data sharing. http: //www. notebookproject. org SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

We Need YOU! • Suggest features you need atcustomerservice@ngbw. org • Look and provide We Need YOU! • Suggest features you need [email protected] org • Look and provide feedback on our pre-alpha at http: //snooker. sdsc. edu/web SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS

Who Did the Work? Current WB: Brian Saunders Andrea Maer Shankar Subramaniam Current NGBW Who Did the Work? Current WB: Brian Saunders Andrea Maer Shankar Subramaniam Current NGBW Roger Unwin Hannes Niedner Ashton Taylor Rami Rifaieh Jeremy Carver “The BOSS” Celeste Brown (University of Idaho, Moscow) NGBW Alumni Andy Zhang Kevin Fowler CIPRES Team Mark Holder (Kansas) Paul Hoover Peter Midford ` Terri Liebowitz Lucie Chan Rutger Vos Kepler Project: Ilkay Altintas Zhijie Guan SAN DIEGO SUPERCOMPUTER CENTER biology. sdsc. edu NIGMS




  • Мы удаляем страницу по первому запросу с достаточным набором данных, указывающих на ваше авторство. Мы также можем оставить страницу, явно указав ваше авторство (страницы полезны всем пользователям рунета и не несут цели нарушения авторских прав). Если такой вариант возможен, пожалуйста, укажите об этом.