29a67c635655fdb0505b1ba5c8622a54.ppt
- Количество слайдов: 16
CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy Mc. Phillips 2, Shirley Cohen 3, Mark A. Miller 1, Ilkay Altintas 1 1 San biology. sdsc. edu Diego Supercomputer Center, UCSD 2 University of California, Davis 3 University of Pennsylvania
What is a Scientific Workflow? v Combination of Ø Ø v Mission of scientific workflow systems Ø Ø v data integration, analysis, and visualization steps larger, automated "scientific process" Promote “scientific discovery” by providing tools and methods to generate scientific workflows Create an extensible and customizable graphical user interface for scientists from different scientific domains Support computational experiment creation, execution, sharing, reuse and provenance Design frameworks which define efficient ways to connect to the existing data and integrate heterogeneous data from multiple resources Make technology useful through user’s monitor!!! biology. sdsc. edu
Promoter Identification Workflow Source: Matt Coleman (LLNL) biology. sdsc. edu
A Workflow for Phylogeny Analysis biology. sdsc. edu
Kepler is a Scientific Workflow System www. kepler-project. org … and a cross-project collaboration v June 2, 2006 Beta release v v Builds upon the open-source Ptolemy II framework Ptolemy II: A software system used for prototyping engineering system KEPLER: A platform to design and execute Scientific Workflows KEPLER = “Ptolemy II + X” for Scientific Workflows biology. sdsc. edu
Some Kepler Contributors Ptolemy II Griddles SKIDL Resurgence SRB NLADR Other contributors: - Chesire (UK Text Mining Center) LOOKING - DART (Great Barrier Reef, Australia) - National Digital Archives + UCSD-TV (US) -… biology. sdsc. edu Contributor names and funding info are at the Kepler website!!
A co-development in KEPLER: GEON Dataset Generation & Registration % Makefile $> ant run SQL database access (JDBC) biology. sdsc. edu
Phylogeny Analysis Workflows Local Disk Multiple Sequence Alignment biology. sdsc. edu Phylogeny Analysis Tree Visualization
Kepler Workflow: Actors v Actor Ø Ø v Port Ø Ø v Ø Ø biology. sdsc. edu Communication between input and output data The place where data get in/out Model of computation Ø Actor-Oriented Design Encapsulation of parameterized actions Interface defined by ports and parameters Flow of control Sequential / parallel execution Implementation is a framework
CIPRes Workflow: Actors Input Port: Nexus File Content Data Matrix Tree Taxa Info Output Ports: biology. sdsc. edu
Some actors in place for… • Generic Web Service Client and Web Service Harvester • Customizable RDBMS query and update • Command Line wrapper tools (local, ssh, scp, ftp, etc. ) • Some Grid actors-Globus Job Runner, Grid. FTP-based file access, Proxy Certificate Generator • SRB support • Native R and Matlab support • Interaction with Nimrod and APST • Communication with ORBs through actors and services • Imaging, Gridding, Vis Support • Textual and Graphical Output • …more generic and domain-oriented actors… biology. sdsc. edu
CIPRes Workflow Actor: GUIGen: Parameter Setting Choose the input file Run Clustal. W Channel: Convey the data Get the subset of the aligned sequences Read the tree Run PAUP for Tree Inference Parse the tree Display the tree biology. sdsc. edu Results:
CIPRes Workflows: Demo Read Sequences Multiple Sequence Alignment Display the Alignment v Matrix Alignment Tree Inference Consensus Tree Visualization v biology. sdsc. edu
Summary v Kepler is good at: Ø Ø Ø v Visual programming language Ø v Integrating data, programs, and computing resources Capturing your ideas and realizing them Supporting computational experiment creation, execution, sharing, and reuse Quickly prototyping scientific workflows Building streamlining applications Don’t write your application, “draw”/compose it Cipres-Kepler package can be used to build scientific workflows for phylogenetic data analyses biology. sdsc. edu
Future Work v v Cipres-Kepler can help you There is (always) a lot more to work on: Ø Ø v More actors for phylogeny analyses Automatically generating actors based on CORBA services Database (Tree. Base) support to store large amounts of data More computing power for large dataset processing Need your collaboration: Ø Ø Ø Sharing experiences Teaching each other the domain knowledge Locating a specific problem and solving it biology. sdsc. edu
Questions? Zhijie Guan guan@sdsc. edu 1 -858 -822 -3620 www. sdsc. edu Cipres-Kepler Release: ftp: //ftp. sdsc. edu/outgoing/borchers/cipres. Releases/20060621/cipres. Kepler_Dist. tgz biology. sdsc. edu


