Скачать презентацию An Introduction to CAMERA and Underlying Technologies Philip Скачать презентацию An Introduction to CAMERA and Underlying Technologies Philip

a649188803b65cf5efed946fbe2b1032.ppt

  • Количество слайдов: 36

An Introduction to CAMERA and Underlying Technologies Philip Papadopoulos University of California, San Diego An Introduction to CAMERA and Underlying Technologies Philip Papadopoulos University of California, San Diego Supercomputer Center California Institute of Telecommunications and Information Technology (Calit 2)

PI Larry Smarr Announced 17 Jan 2006. Public Release 13 March 2007 $24. 5 PI Larry Smarr Announced 17 Jan 2006. Public Release 13 March 2007 $24. 5 M Over Seven Years

DNA Basics for Non-Biologists • Nucleotide bases of DNA – ACTG (Adenine, Cytosine, Guanine, DNA Basics for Non-Biologists • Nucleotide bases of DNA – ACTG (Adenine, Cytosine, Guanine, Thymine) – A Sequence of Bases Forms One Side of a DNA Strand – Complementary Bases form the other side of DNA – A matches T (pair) – C matches G (pair) • During cell replication, DNA is “unzipped”. The complementary side can then be replicated perfectly • Human DNA is about 3 billion base pairs on 26 Chromosomes

Bases Amino Acids • Triplets of nucleotide bases are called codons and define amino Bases Amino Acids • Triplets of nucleotide bases are called codons and define amino acids. – Amino acids are the basic building blocks of proteins – There are 20 amino acids, but 4^3 = 64 nucleotide combinations. – Many amino acids have multiple codons – Special codons (called start and stop codons) assist in DNA translation during cell replication. • Reading Frames of: GGGAAACCC – This raw sequence could be read as – GGGAAACCC (GGG AAA CCC) (Glycine, Lysine, Proline) – GGAAACCC (GGA AAC) (Glycine, Asparagine) – GAAACCC (GAA ACC) (Glutamic Acid, Threonine)

Sequencing Tidbits • The Institute for Genomic Research (TIGR) sequenced the genome of the Sequencing Tidbits • The Institute for Genomic Research (TIGR) sequenced the genome of the bacterium Haemophilus influenzae in 1995 using shotgun sequencing – 1. 8 Million Base Pairs (Human: 3 Billion) • Sequencing does NOT tell you what function a particular gene plays • It is believed that only ~1. 5% of human chromosome codes for expressed characteristics – The non-coding portions contain our genetic history – Unknown what function the rest our DNA plays

Most of Evolutionary Time Was in the Microbial World You Are Here Tree of Most of Evolutionary Time Was in the Microbial World You Are Here Tree of Life Derived from 16 S r. RNA Sequences Source: Carl Woese, et al

Marine Genome Sequencing Project – Measuring the Genetic Diversity of Ocean Microbes Need Ocean Marine Genome Sequencing Project – Measuring the Genetic Diversity of Ocean Microbes Need Ocean Data Sorcerer II Data Will Double Number of Proteins in Gen. Bank!

Some CAMERA Goals • Provide an infrastructure where scientists from around the world can Some CAMERA Goals • Provide an infrastructure where scientists from around the world can perform analysis on genetic communities – Global Ocean Sampling (GOS) is the initial large data set – ~ 8. 5 Billion base pairs of raw Reads – Metadata is available for samples – Saline, Temperature, Geographic Location, Water Depth, Time of Day … – Other metadata will be correlated with samples (e. g. MODIS Satellite) • Allow others to search and compare input sequences against CAMERA data. • Overall provide a resource dedicated to metagenomics – Support new datasets – Support new analysis tools and web services

Global Ocean Survey (GOS) Sequences are Largely Bacterial ~3 Million Previously Known Sequences ~5. Global Ocean Survey (GOS) Sequences are Largely Bacterial ~3 Million Previously Known Sequences ~5. 6 Million GOS Sequences Source: Shibu Yooseph, et al. (PLOS Biology in press 2006)

Reason for CAMERA • The Global Ocean Survey (GOS) is a huge influx of Reason for CAMERA • The Global Ocean Survey (GOS) is a huge influx of sequence data • Factors that interrelate microbes and microbial communities are not well known • Significant analysis requires large resources – All-to-all comparisons – Integration of other environmental (meta) data (weather, temperature, salinity, …) is essential • Raw Sequence Data sets are mid-sized – Current set of GOS Raw Reads is about 100 GB (FASTA Files)

Calit 2 CAMERA Production Compute and Storage Complex is On-Line 512 Processors ~5 Teraflops Calit 2 CAMERA Production Compute and Storage Complex is On-Line 512 Processors ~5 Teraflops ~ 200 Terabytes Storage

User Map – 03 May 2007 • Site in production on 13 March 2007 User Map – 03 May 2007 • Site in production on 13 March 2007 • More than 500 Registered users from around the globe (~10 new users/day)

Calit 2’s Direct Access Core Architecture CAMERA’s Metagenomics Server Complex Sargasso Sea Data Moore Calit 2’s Direct Access Core Architecture CAMERA’s Metagenomics Server Complex Sargasso Sea Data Moore Marine Microbial Project NASA and NOAA Satellite Data Community Microbial Metagenomics Data. Base Farm Flat File Server Farm 10 Gig. E Fabric Request + Web Services JGI Community Sequencing Project W E B PORTAL Sorcerer II Expedition (GOS) Traditional User Dedicated Compute Farm (100 s of CPUs) Response Direct Access Lambda Cnxns Tera. Grid: Cyberinfrastructure Backplane (scheduled activities, e. g. all by all comparison) (10000 s of CPUs) Source: Phil Papadopoulos, SDSC, Calit 2 Local Environment Web (other service) Local Cluster

Calit 2 CAMERA Production Compute and Storage Complex is On-Line Web, Application, DB Servers Calit 2 CAMERA Production Compute and Storage Complex is On-Line Web, Application, DB Servers 200 TB File Storage 10 Gbit/s Network 1 and 10 Gbit/s Switching Compute Nodes

Global Elements • Data location – Storage Resource Broker Meta data catalog • Data-type Global Elements • Data location – Storage Resource Broker Meta data catalog • Data-type aggregation, cross-correlation, integration – BIRN Data Mediator • Identity Management – Use Grid Security Infrastructure (GSI) Public Key System – Integrated Grid Accounts Management Architecture (GAMA) from SDSC for ease-of-use and Single Sign On • Portal Services – Based on Grid. Sphere – Small Dedicated Compute Cluster (32 nodes)

Logical Layout of Servers Single Sign On Layer Web Server Portal Server (Tomcat) Single Logical Layout of Servers Single Sign On Layer Web Server Portal Server (Tomcat) Single Sign-on Server Public Net Private Net Cluster Frontend Blast Master (Jboss) Postgres Database Cluster Nodes and File Servers GAMA Server

An Incomplete List of Software Components • • • • Postgres Database Apache Tomcat An Incomplete List of Software Components • • • • Postgres Database Apache Tomcat Jboss Servlet Container Google Web Toolkit Sun Grid Engine GAMA (Grid Accounting and Management Architecture)/GSI from Globus OPAL (Grid/Web Services Wrapper) Grid. Sphere Portlet Container CAMERA Registration Portal Venter Application Portal NCBI Blast, MPIBlast, Clustal. W, Mr. Bayes, CDHit, and host of other Bio Software Ergatis Workflow Engine Jforums Drupl All Integrated with Rocks … Single Person Deployment

Opt. IPortal– Another Rocks Cluster Termination Device for the Opt. IPuter Global Backplane • Opt. IPortal– Another Rocks Cluster Termination Device for the Opt. IPuter Global Backplane • • • 20 Dual CPU Nodes, 20 24” Monitors, ~$50, 000 1/4 Teraflop, 5 Terabyte Storage, 45 Mega Pixels--Nice PC! Scalable Adaptive Graphics Environment ( SAGE) Jason Leigh, EVL-UIC Source: Phil Papadopoulos SDSC, Calit 2

Use of Opt. IPortal to Interactively View Microbial Genome 15, 000 x 15, 000 Use of Opt. IPortal to Interactively View Microbial Genome 15, 000 x 15, 000 Pixels Acidobacteria bacterium Ellin 345 (NCBI) Soil Bacterium 5. 6 Mb Source: Raj Singh, UCSD

Use of Opt. IPortal to Interactively View Microbial Genome 15, 000 x 15, 000 Use of Opt. IPortal to Interactively View Microbial Genome 15, 000 x 15, 000 Pixels Acidobacteria bacterium Ellin 345 (NCBI) Source: Raj Singh, UCSD Soil Bacterium 5. 6 Mb

A Look at Networking Introduction to Quartzite An Experimental Network A Look at Networking Introduction to Quartzite An Experimental Network

Sunlight (10 Gigabit) Campus/WAN Sunlight (10 Gigabit) Campus/WAN

Using a Lambda Network for CAMERA • • • Many community databases – Protein Using a Lambda Network for CAMERA • • • Many community databases – Protein Databank (PDB) – Gen. Bank – Swiss. Prot Support only web or web services interfaces – New analysis/programs need access to raw databases/files – Usually, groups make a point-in-time copy of the database – We call this a data “fork” – Updates are not processed – Papers published with point-in-time data out of date by months or years CAMERA “Direct Connect” will allow us to provide a high-speed connection to the backend servers – Try to eliminate data forking – Copies of CAMERA data is inevitable – Need mechanisms that allow others to keep their copies in synch with CAMERA

UCSD Quartzite Core at Completion (Year 5 of Opt. IPuter) • Funded 15 Sep UCSD Quartzite Core at Completion (Year 5 of Opt. IPuter) • Funded 15 Sep 2004 • Physical HW to Enable Optiputer and Other Campus Networking Research • Hybrid Network Instrument Reconfigurable Network and Enpoints

4 x 4 Wavelength Cross-Connect: • All integrated optics (except optical amplifiers) – 4 4 x 4 Wavelength Cross-Connect: • All integrated optics (except optical amplifiers) – 4 1 x 4 WSS modules – • combiners 4 4 x 1 passive optical combiners 4 x 40 l x 40 Gbps = 6. 4 Tbps switching capacity – currently using central 8 l WSSs 1 x 4 WSS 25 | AT&T Labs, October 2007 Optical Amps 4 x 4 WXC rack

WXC performance demonstration: ASE source 1 x 4 WSS 4 x 1 swit ch WXC performance demonstration: ASE source 1 x 4 WSS 4 x 1 swit ch OS A 1 x 4 WSS WSS 1 WSS 2 WSS 3 WSS 4 1 2 3 4 1 3/1 4 1/3 2 4 1 2 3 8 lasers at centre of C-Band at 100 GHz sp use ASE source to illustrate wide bandw 1. use external 4 x 1 switch to scan WXC p 2. alter switch states of WSS 1 and WSS 3 l 1 l 2 l 3 l 4 l 5 l 6 l 1 l 8 26 | AT&T Labs, October 2007

WXC performance demonstration: 27 | AT&T Labs, October 2007 WXC performance demonstration: 27 | AT&T Labs, October 2007

What Does it Cost to Drive the Network • Dominant cost is DWDM optics What Does it Cost to Drive the Network • Dominant cost is DWDM optics • Construction of Multiplexers is Simple, and not expensive ~ $250/Channel/End

Layer 1 – Four Channel DWDM 10 Gbps Switch X 4 Per Side (optional) Layer 1 – Four Channel DWDM 10 Gbps Switch X 4 Per Side (optional) XFP Switch Module X 4 Per Side (optional) XFP DWDM Optics X 4 Per Side Used in Host or Switch DWDM Mux Transmit X 1 Per Side DWDM De. Mux Receive X 1 Per Side SC to LC Fiber 2 M X 5 Per Side 1 Fiber Pair Channel 31 Channel 32 Channel 33 Channel 34 Corning 1 U Rack Containing DWDM Mux / De. Mux + SC to SC couplers, 1 Per side

SFP/XFP Optics Costs DWDM Optics from AACTelecom 10 Gbps 3500 US Luminent XFP DWDM SFP/XFP Optics Costs DWDM Optics from AACTelecom 10 Gbps 3500 US Luminent XFP DWDM per unit (ZR 80 Km) OC 192 and 10 GE compatible 10 Gbps 2900 US Luminent (assembled in US) XFP DWDM per Unit (ER 40 Km) OC-192 and 10 GE compatible 1 Gbps SFP DWDM per Unit (80 KM model) OC-48 1220 US 1)Optics

2) Optional - Layer 2 Switch (10 Gbps capable) 10 Gbps capable switch SMC 2) Optional - Layer 2 Switch (10 Gbps capable) 10 Gbps capable switch SMC 8748 L 2 (A 0707505)+ EXP MOD 10 G (A 0707506) from Dell Switch 1700 US 2 x 10 Gbps XFP ports, 48 x 1 Gbps Copper 10 Gbps module (holds XFP) 300 US

3) DWDM Mux De. Mux (SC connector type) 4, 8 , 16 channel = 3) DWDM Mux De. Mux (SC connector type) 4, 8 , 16 channel = DWDM-100 From oemarket. c om 4 Channel (31, 32, 33, 34 ) 560 US 8 Channel 880 US 16 Channel 1600 (approx) US

4) Corning Rack Mount, Couplers, Fiber Corning Mux De. Mux container -1 U rack 4) Corning Rack Mount, Couplers, Fiber Corning Mux De. Mux container -1 U rack mount Corning PCH 01 U from Ed Carlin Graybar 1 U (sufficient for 200 US 4, 8 or 16 channel) 2 sets of SC to SC adaptors 100 US (approx) Fiber Patch Cables, Single Mode From Ed Carlin Graybar 2 M, SC to LC connector type 30 US (approx) each

Complete Solution Complete Solution

5) Optional- DWDM Media Converter DWDM to Copper Media Converter From Carl Stelling at 5) Optional- DWDM Media Converter DWDM to Copper Media Converter From Carl Stelling at Aaxeon. co m SFP pluggable DWDM to copper media converter 150 US each, not including DWDM optics (just converter)

Quartzite State Nov 2007 • • Core Packet Switch with 68 10 Gig. E Quartzite State Nov 2007 • • Core Packet Switch with 68 10 Gig. E ports (More than ½ Terabit) Approximately 30 Channels Lit 64 -port All-Optical Glimmerglass Switch - All Fiber into Quartzite is switchable 4 port x 8 Lambda DWDM switch at Lucent (On site at Calit 2 in Dec) 4 Channel DWDM Between Calit 2 and SDSC – One channel is used for 10 Gigabit Production to BIRN Data Racks. Ordered, but waiting for fulfillment 20 Mux/Demux (8 C-band DWDM Channels + 1 1310 (LR) Passband) 32 DWDM XFPS (Channel 40 -43 – will fill out rest of channels in 2008)