Скачать презентацию E-science grid facility for Europe and Latin America Скачать презентацию E-science grid facility for Europe and Latin America

15566e1183175a3a47fca70c8aaa60eb.ppt

  • Количество слайдов: 20

E-science grid facility for Europe and Latin America Computational challenges on Grid Computing for E-science grid facility for Europe and Latin America Computational challenges on Grid Computing for workflows applied to Phylogeny R. Isea 1, E. Montes 2, A J. Rubio-Montero 2 and R. Mayo 2 1 Fundación IDEA (Venezuela) 2 CIEMAT (Spain) IWPACBB 2009 Salamanca, June 12 th, 2009 www. eu-eela. eu

Outline • Phylogenetics: a reminder • Challenges in Phylogenetics – Computational methods: Mr. Bayes Outline • Phylogenetics: a reminder • Challenges in Phylogenetics – Computational methods: Mr. Bayes – Exploiting of Grid technology • Mr. Bayes and Bioinformatic resources on Grid • The Phylo. Grid approach – – General description and objectives Taverna workflow Grid. Sphere portal Future work: Grid. Way metascheduler • Some results: HPV case study • Summary and conclusions www. eu-eela. eu IWPACBB 2009. Salamanca, June 12 th, 2009 2

Phylogenetics: a reminder • Phylogeny: reconstruction of the evolutionary history (evolutionary tree) of organisms Phylogenetics: a reminder • Phylogeny: reconstruction of the evolutionary history (evolutionary tree) of organisms – Influence and relationship between species – Evolution of selected populations At July 1837 Darwin draw his first-know sketch of a evolutionary tree • Applications on Life Sciences, Industry, etc: – Know real history of evolution: Tree of Life – Drug discovery – Tracing geographical origin, dating introduction of stumps – Prediction of gene’s and proteins’ function – Epidemiological studies www. eu-eela. eu Complete Tree of life IWPACBB 2009. Salamanca, June 12 th, 2009 3

Computational problem: so many trees… Nº of possible labelled topologies with n species or Computational problem: so many trees… Nº of possible labelled topologies with n species or taxa Rooted Trees: Unrooted Trees: Nº of Rooted Nº of taxa Nº of Unrooted trees Exhaustive enumeration of all possible phylogenies is not computationally feasible www. eu-eela. eu IWPACBB 2009. Salamanca, June 12 th, 2009 4

Computational methods • Phenetics: no evolutionary model – Distance-matrix based methods (Neighbour-Joining) • Cladistics: Computational methods • Phenetics: no evolutionary model – Distance-matrix based methods (Neighbour-Joining) • Cladistics: – Maximum Parsimony (not statistically consistent) – Maximum Likelihood – Bayesian inference (Markov Chain Monte Carlo): simulation techniques for approximating posterior probability distribution of trees • Mr. Bayes (http: //mrbayes. csit. fsu. edu) – Sequential and Parallel implementations (MPI enabled) – High CPU and memory consumption: § 50 taxa: simulation of 250. 000 generations ~ 50 hours in a P 4 2. 8 Ghz computational § 2900 sequences of HIV-1 challenge www. eu-eela. eu IWPACBB 2009. Salamanca, June 12 th, 2009 5

Challenges for Bioinformatics • Yet a computational problem – Partial scientific community: inefficient local Challenges for Bioinformatics • Yet a computational problem – Partial scientific community: inefficient local facilities – Rise in provision of HPC facilities: additional skills required • Different approach to access computing infrastructures irrespective of their location Grid Computing www. eu-eela. eu IWPACBB 2009. Salamanca, June 12 th, 2009 6

Why Grid Computing? • Grids represent a powerful new tool for e-Science – Provide Why Grid Computing? • Grids represent a powerful new tool for e-Science – Provide seamless sharing of computing and storage resources – Enable the creation of scalable VOs: Biomed VO – Service Grids (EGEE, EELA) and Opportunistic Grids • Benefit for applications demanding non-trivial computing capabilities • Local and remote computing and storage facilities www. eu-eela. eu IWPACBB 2009. Salamanca, June 12 th, 2009 7

Bioinformatics Grid resources • Wide range of Bioinformatics resources through Web Interfaces: – Projects Bioinformatics Grid resources • Wide range of Bioinformatics resources through Web Interfaces: – Projects of public databases (genomes, proteins, etc. ): § EMBL-EB I(UK), NCBI (USA), DDBJ and PDBJ (Japan), etc. – Web services for Bioinformatics toolkits: § EBI web services, NCBI Entrez Utils, DDBJ, Bio. Moby services – Bioinformatics Web services Index/registry servers: § EMBRACE service registry (Bio. Catalogue), Bio. Moby Central Registry • Grid-enabled software packages: – EELA-2: gr. EMBOSS (UNAM) • Grid portals to mask applications – Genius, Grid. Sphere • Grid infrastructures & VOs – EGEE related: Biomed, GENE, EELA-prod VOs – my. Grid, ca. BIG, Tera. Grid. www. eu-eela. eu IWPACBB 2009. Salamanca, June 12 th, 2009 8

How to access Mr. Bayes on Grid • Simply sending a standard job to How to access Mr. Bayes on Grid • Simply sending a standard job to a site – Software must be preinstalled in sites – Successfully tested in several projects § § National Grid Service (UK) FIRB LIBI “International Laboratory for Bioinformatics” project (Italy) Bioinfo. GRID project EELA: MPI version installed and tested in EELA-CIEMAT site – Supported by EELA-2/EGEE sites • Grid bureaucracy: certificates, VOs, etc. – Usually Biologists are not advanced grid users • Need for friendly interfaces to Grid facilities www. eu-eela. eu IWPACBB 2009. Salamanca, June 12 th, 2009 9

Phylo. Grid aim Offer to the scientific community an easy interface for calculating phylogenies Phylo. Grid aim Offer to the scientific community an easy interface for calculating phylogenies in Grid without requiring the user knowledge about the computational procedure: – Based on MPI-enabled version of Mr. Bayes § By means of a Taverna workflow – Takes advantage of the computational power of actual Grid infrastructures The use of Taverna Workflows: – Allows multiple database selection – Extendable with access to complementary tools (Clustalw-MPI) or other workflows (My. Experiment repository) www. eu-eela. eu IWPACBB 2009. Salamanca, June 12 th, 2009 10

Phylo. Grid architecture GRID protocols LFC Catalog g. Lite GRID SE WMS WNs CE Phylo. Grid architecture GRID protocols LFC Catalog g. Lite GRID SE WMS WNs CE HTTPS Portal Certificate SOAP Grid. Sphere Portal + WF Enactor/Engine www. eu-eela. eu g. Lite UI + Submission WS IWPACBB 2009. Salamanca, June 12 th, 2009 11

Taverna Workflow Mgmt. System • A bioinformatician could easily implement Grid Workflows without Grid Taverna Workflow Mgmt. System • A bioinformatician could easily implement Grid Workflows without Grid skills • Public workflow repository (my. Experiment) • Several Plugins to use WS – My. Grid, Ca. BIG, Grid. SAM, Bio. Moby – Many public databases – GT 4 services and g. Ravi developer framework • Many tools/plugins – Manipulating files, format converter, local and remote execution, visualization applets, tools for accessing WS www. eu-eela. eu IWPACBB 2009. Salamanca, June 12 th, 2009 12

Phylo. Grid Workflow for Mr. Bayes • Input params received from Grid. Sphere portal Phylo. Grid Workflow for Mr. Bayes • Input params received from Grid. Sphere portal • ALN/Clustal. W, PHYLIP, MSA to NEXUS format • Builds NEXUS file for Mr. Bayes • Creates JDL file • Job submission • Nested workflow checks Grid job execution • Get output from SE www. eu-eela. eu IWPACBB 2009. Salamanca, June 12 th, 2009 13

Grid. Sphere portal • Phylo. Grid web portal built on top of Grid. Sphere Grid. Sphere portal • Phylo. Grid web portal built on top of Grid. Sphere portal framework (http: //www. gridsphere. org): – A Grid portal improves usability of Grids § Hiding complexity of technology involved – A Grid portal improves utilization of Grids § Providing an appealing user-friendly Web Interface § Enforcing Grid utilization policies Snapshot of the virtual work area of • PKI security, etc. Phylo. Grid Portal with some results Cohesive Grid portals www. eu-eela. eu IWPACBB 2009. Salamanca, June 12 th, 2009 14

Future work: Grid. Way • The JDL job approach – Hard to handle job Future work: Grid. Way • The JDL job approach – Hard to handle job errors into Taverna workflow – g. Lite plugin for Taverna is under development § Taverna must be installed in a UI or, § Use remote execution to a UI (Taverna remote workflow enactor) • Grid. Way metascheduler – Characteristics § § Fully compatible with g. Lite based Grids (EELA-2, EGEE) Better resource selection based on internal statistics Automatic migration and re-schedule of failed jobs Checkpointing management for large duration tasks – Taverna binding implementation: § WS GRAM interface deployed over Grid. Way § By means of GT 4 plugins or directly implementing a JSDL plugin www. eu-eela. eu IWPACBB 2009. Salamanca, June 12 th, 2009 15

HPV case study with Phylo. Grid • HPV is a recognized underlying factor in HPV case study with Phylo. Grid • HPV is a recognized underlying factor in Cervical Cancer: – 90% cases shows infection from some HPV strand • Complete HPV nucleotide seqs. about 8000 basis long: – E 1, E 2, E 4 -E 7 early expression and L 1, L 2 late expression genes – HPV classification according to L 1 variability (> 100 types) – Two different categories with respect to oncogenic potential • Study: check if this categorization really fits the evolutionary history of HPV – 121 HPV sequences – Molecular phylogenetic calculations for L 1, L 2 and E 7 genes www. eu-eela. eu IWPACBB 2009. Salamanca, June 12 th, 2009 16

Results obatined with Phylo. Grid Molecular Phylogeny of HPV in oncogenes from L 1, Results obatined with Phylo. Grid Molecular Phylogeny of HPV in oncogenes from L 1, L 2, E 7 • 121 HPV nucleotide sequences of L 1 (the major capsid gene) • Phylogenetic tree for L 1 • Broader lines means differences between this tree and tree derived from L 2 gene • Topology similarity score of 85% between L 1 and L 2 Conflict with HPV classification based on variability of L 1 gene www. eu-eela. eu IWPACBB 2009. Salamanca, June 12 th, 2009 17

Summary and conclusions • Phylo. Grid is a tool for Phylogenetic studies on Grid Summary and conclusions • Phylo. Grid is a tool for Phylogenetic studies on Grid by means of MPI-enabled Mr. Bayes: – Friendly interface (Grid. Sphere portal): no computational or grid skills required to perform calculations. – Automation of tasks: Taverna workflow • Phylo. Grid takes advantage of the computational power of actual Grid infrastructures – Allowing Phylogenetic analysis on large scale – Reducing the technological divide that a partial scientific community has for accessing computational platforms such as Grid www. eu-eela. eu IWPACBB 2009. Salamanca, June 12 th, 2009 20

Thanks for your attention ? www. eu-eela. eu IWPACBB 2009. Salamanca, June 12 th, Thanks for your attention ? www. eu-eela. eu IWPACBB 2009. Salamanca, June 12 th, 2009 21

E-science grid facility for Europe and Latin America Contact R. Isea 1: raul. isea E-science grid facility for Europe and Latin America Contact R. Isea 1: raul. isea at gmail. com E. Montes 2: esther. montes at ciemat. es A J. Rubio-Montero 2: antonio. rubio at ciemat. es R. Mayo 2: rafael. mayo at ciemat. es http: //www. ciemat. es/portal. do? IDR=1481&TR=C 1 Fundación 2 CIEMAT www. eu-eela. eu IDEA (Venezuela) (Spain)