Скачать презентацию High Performance Computing for University Medical Research A Скачать презентацию High Performance Computing for University Medical Research A

ab6a730b5f6f95e4444faadc626c244b.ppt

  • Количество слайдов: 41

High Performance Computing for University Medical Research: A Successful Implementation Dr. Craig A. Stewart, High Performance Computing for University Medical Research: A Successful Implementation Dr. Craig A. Stewart, Ph. D. stewart@iu. edu Director, Research and Academic Computing, University Information Technology Services Director, Information Technology Core, Indiana Genomics Initiative Dr. Richard Repasky, Ph. D. rrepasky@indiana. edu Bioinformatics Specialist

License Terms • Please cite this presentation as: Stewart, C. A. and R. Repasky. License Terms • Please cite this presentation as: Stewart, C. A. and R. Repasky. High Performance Computing for University Medical Research: A Successful Implementation. 2007. Presentation. Presented at: Bio-IT World Conference & Expo (Boston, MA, 24 -26 Apr 2007). Available from: http: //hdl. handle. net/2022/14600 • Portions of this document that originated from sources outside IU are shown here and used by permission or under licenses indicated within this document. • Items indicated with a © are under copyright and used here with permission. Such items may not be reused without permission from the holder of copyright except where license terms noted on a slide permit reuse. • Except where otherwise noted, the contents of this presentation are copyright 2007 by the Trustees of Indiana University. This content is released under the Creative Commons Attribution 3. 0 Unported license (http: //creativecommons. org/licenses/by/3. 0/). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.

Bioinformatics and Biomedical Research • Bioinformatics, Genomics, Proteomics, ____ics all promise to radically change Bioinformatics and Biomedical Research • Bioinformatics, Genomics, Proteomics, ____ics all promise to radically change our understanding of biological function and the way biomedical research is done. • Traditional biomedical researchers must take advantage of new possibilities • “Post-genomic” research must take advantage of the tremendous store of detailed knowledge held by traditional biomedical researchers

Anopheles gambiae • From www. sciencemag. org/feature/data/mosquito/mtm/index. html Source Library: Centers for Disease Control Anopheles gambiae • From www. sciencemag. org/feature/data/mosquito/mtm/index. html Source Library: Centers for Disease Control PHIL Photo Credit: Jim Gathany

IU’s goals for the Indiana Genomics Initiative (INGEN) • Build on traditional strengths of IU’s goals for the Indiana Genomics Initiative (INGEN) • Build on traditional strengths of IU School of Medicine • Build on IU's strength in Information Technology • Add new programs of research made possible by the sequencing of the human genome • Perform the research that will generate new treatments for human disease in the post-genomic era • Improve human health generally and in the State of Indiana particularly • Enhance economic growth in Indiana • INGEN was created by a $105 M grant from the Lilly Endowment, Inc. and launched December, 2000 • The goal of this talk is to explain how advanced information technology was implemented to aid in the meeting of these goals.

Outline • Background information about IU • The Indiana Genomics Initiative (INGEN) • The Outline • Background information about IU • The Indiana Genomics Initiative (INGEN) • The INGEN Information Technology Core • Facilities • Service • Some key projects • Status and summary of success factors • Acknowledgements

IU in a nutshell • $2 B Annual Budget • 8 campuses, 90, 000 IU in a nutshell • $2 B Annual Budget • 8 campuses, 90, 000 students, 3, 900 faculty • 878 degree programs; > 100 programs ranked within top 20 of their type nationally • Nation’s second largest school of medicine • 1, 347 M. D. , Ph. D. and M. D. /Ph. D students • Sole school of medicine in Indiana • Traditional strengths in human genetic diseases (e. g. , Alcoholism, Huntingtons) and medical records (Regenstrief Institute)

IT @ IU in a nutshell • CIO: Vice President Michael A. Mc. Robbie IT @ IU in a nutshell • CIO: Vice President Michael A. Mc. Robbie • ~$100 M annual budget • Technology services offered university- wide • Networking • IU Operates network Operations Center for Abilene • High Performance Computing • First university in US to own a 1 TFLOPS supercomputer • Top 500 list has for past several years included at least one IU supercomputer

INGEN Structure Programs Cores • Bioethics • Tech Transfer • Information • Genomics • INGEN Structure Programs Cores • Bioethics • Tech Transfer • Information • Genomics • Gene Expression • Medical Informatics • Education • Training Technology • Proteomics • Cell & Protein • Integrated Expression • Human Expression Imaging • In vivo Imaging • Animal

Indiana Genomics Initiative Programs Genomics Medical Informatics Education Bioinformatics Training Bioethics Cores Proteomics Information Indiana Genomics Initiative Programs Genomics Medical Informatics Education Bioinformatics Training Bioethics Cores Proteomics Information Technology In Vivo Imaging Genotyping and Gene Expression Animal Cell and Protein Human Drosophila Expression Integrated Technology Microscopy Transfer

Information Technology Core • Foci: • High Performance Computing • Visualization (esp. 3 D) Information Technology Core • Foci: • High Performance Computing • Visualization (esp. 3 D) • Massive Data Storage • Support for use of all of the above • $6. 7 M budget for IT Core • Baseline IT services for School of Medicine responsibility of School of Medicine CIO

Challenges for UITS and the INGEN IT Core • Assist traditional biomedical researchers in Challenges for UITS and the INGEN IT Core • Assist traditional biomedical researchers in adopting use of advanced information technology (massive data storage, visualization, and high performance computing) • Assist bioinformatics researchers in use of advanced computing facilities • Questions we are asked: • Why wouldn't it be better just to buy me a newer PC? • Questions we ask: • What do you do now with computers that you would like to do faster? • What would you do if computer resources were not a constraint?

Steps in meeting the challenge • Use INGEN funding to enhance IU’s high performance Steps in meeting the challenge • Use INGEN funding to enhance IU’s high performance computing hardware environment • Use INGEN funding to add dedicated staff supporting INGEN researchers • Proof of concept projects showing advanced capabilities of IU’s IT environment • Outreach to get many people using at least the basic capabilities of IU’s advanced IT environment

Hardware Environment • I-Light network • High Performance Computing • IBM SP – 1. Hardware Environment • I-Light network • High Performance Computing • IBM SP – 1. 005 TFLOPS • Sun E 10000 52 GFLOPS • Large, distributed Linux cluster – 1. 1 TFLOPS • Massive Data Storage system • Advanced Visualization Systems • CAVE • John-E-Box

IBM Research SP (Aries/Orion Complex) • Acquired 9/96, expanded in 1998, 1999, 2000, 2001, IBM Research SP (Aries/Orion Complex) • Acquired 9/96, expanded in 1998, 1999, 2000, 2001, 2002 with help of IU IT Strategic Plan funds, IBM SUR grants and INGEN grant from Lilly Endowment, Inc. • Geographically distributed at IUB and IUPUI • 632 cpus, 1. 005 Tera. FLOPS • First University-owned supercomputer in US to exceed 1 TFLOPS processing capacity • Initially 50 th, now 112 th in Top 500 supercomputer list • Distributed memory system with shared memory nodes • AIX 5. 1, wealth of software including SAS, SPSS, S-Plus, Mathematica, Matlab, Maple, Gaussian, GIS, scientific/numerical libraries, Oracle and DB 2, and more

IBM Research SP (Aries/Orion) © 2000 Tyagan Miller IBM Research SP (Aries/Orion) © 2000 Tyagan Miller

Sun E 10000 (Solar) • Acquired 4/00 • Shared memory architecture • ~52 GFLOPS Sun E 10000 (Solar) • Acquired 4/00 • Shared memory architecture • ~52 GFLOPS • 64 400 MHz cpus, 64 GB memory • > 2 TB external disk • Solaris 2. 8 • Supports some bioinformatics software not available under AIX (e. g. GCG/Seq. Web)

Sun E 10000 (Solar) © 2000 Tyagan Miller Sun E 10000 (Solar) © 2000 Tyagan Miller

Distributed Linux Cluster • AVIDD (Analysis and Visualization of Instrument -Driven Data) • 1. Distributed Linux Cluster • AVIDD (Analysis and Visualization of Instrument -Driven Data) • 1. 1 TFLOPS, 0. 5 TB RAM, 10 TB Disk • Tuned, configured, and optimized for handling real-time data streams

Massive Data Storage System • Based on HPSS (High Performance Software System) • 180 Massive Data Storage System • Based on HPSS (High Performance Software System) • 180 TB capacity with existing tapes; total capacity of 480 TB • First distributed HPSS installation; STK 9310 Silos in Bloomington and Indianapolis • Automatic replication of data between Indianapolis and Bloomington, via I-light, overnight. Critical for biomedical data, which is often irreplaceable.

STK Silo © 2000 Tyagan Miller STK Silo © 2000 Tyagan Miller

Advanced Visualization • Advanced Visualization Lab – recognized as leader in implementation of 3 Advanced Visualization • Advanced Visualization Lab – recognized as leader in implementation of 3 D and other advanced visualization technologies • CAVE – Immersive 3 D environment • John-E-Box – IU designed, low-cost passive 3 D device. Under construction now, planned for installation in multiple INGEN-affiliated labs

John-E-Box Invented by John N. Huffman, John C. Huffman, and Eric Wernert John-E-Box Invented by John N. Huffman, John C. Huffman, and Eric Wernert

Specific benefits in hardware environment as a result of INGEN funding: • Funded significant Specific benefits in hardware environment as a result of INGEN funding: • Funded significant fraction of upgrade of IU’s IBM SP to 1 TFLOPS • Funded addition of STK Silo in Indianapolis (and tapes) to provide redundant storage of data • Funded placement of visualization equipment within the School of Medicine

So, what now that we have all of this hardware? • Strategic relationships with So, what now that we have all of this hardware? • Strategic relationships with vendors • University Information Technology Services has a history of excellent customer support and long-term, collaborative research. • Focus on provision of facilities and services as a competitive advantage. • Annual customer satisfaction survey – user satisfaction typically > 95%. These results probably not representative of So. M as of 2000. • More information available at http: //www. indiana. edu/~rac/siguccs_copyright. html • It’s people – consulting staff – that make the hardware useful for researchers

INGEN IT Core Support Staff • Visualization programmer, HPC programmer, and bioinformatics database specialist INGEN IT Core Support Staff • Visualization programmer, HPC programmer, and bioinformatics database specialist hired to support INGEN • Staff added to existing management units within UITS • economy of scale (management, exchange of expertise) • Assures addition rather than substitution for basefunded consulting support

So, why is this better than just buying me a new PC? • Unique So, why is this better than just buying me a new PC? • Unique facilities provided by IT Core • Redundant data storage • HPC – better uniprocessor performance; trivially parallel programming, parallel programming • Visualization in the research laboratories • Hardcopy document – INGEN's advanced IT facilities: The least you need to know • Outreach efforts • Demonstration projects

Example projects • Multiple simultaneous Matlab jobs for brain imaging. • Installation of many Example projects • Multiple simultaneous Matlab jobs for brain imaging. • Installation of many commercial and open source bioinformatics applications. • Site licenses for several commercial packages • Evaluation of several software products that were not implemented.

Creation of new software • Gamma Knife – Penelope. Modified existing version for more Creation of new software • Gamma Knife – Penelope. Modified existing version for more precise targeting with IU's Gamma Knife. • Karyote (TM) Cell model. Developed a portion of the code used for model cell function. http: //biodynamics. indiana. edu/ • Pi. VNs. Software to visualize human family trees • 3 -DIVE (3 D Interactive Volume Explorer). http: //www. avl. iu. edu/projects/3 DIVE/ • fast. DNAml – maximum likelihood phylogenies (http: //www. indiana. edu/~rac/hpc/fast. DNAml/index. html) • Protein Family Annotator – collaborative development with IBM, Inc.

Data Integration • Goal set by IU School of Medicine: Any research within the Data Integration • Goal set by IU School of Medicine: Any research within the IU School of Medicine should be able to transparently query all relevant public external data sources and all sources internal to the IU School of Medicine to which the researcher has read privileges • IU has more than 1 TB of biomedical data stored in massive data storage system • There are many public data sources • Different labs were independently downloading, subsetting, and formatting data • Solution: IBM Discovery. Link, DB/2 Information Integrator

Centralized Life Science Database (CSLD) • Based on use of IBM Discovery. Link(TM) and Centralized Life Science Database (CSLD) • Based on use of IBM Discovery. Link(TM) and DB/2 Information Integrator(TM) • Public data is still downloaded, parsed, and put into a database, but now the process is automated and centralized. • Lab data and programs like BLAST are included via DL’s wrappers. • Implemented in partnership with IBM Life Sciences via IU-IBM strategic relationship in the life sciences • IU contributed writing of data parsers

Status Overall • So far, so good • 108 users of IU’s supercomputers • Status Overall • So far, so good • 108 users of IU’s supercomputers • 104 users of massive data storage system • Six new software packages created or enhanced, more than 20 packages installed for use by INGEN-affiliated researchers • 1 TB of biomedical data stored in the massive data storage system • Three software packages made available as open source software as direct result of INGEN • The INGEN IT Core is providing services valued by traditionally trained biomedical researchers as well as researchers in bioinformatics, genomics, proteomics, etc.

Success in meeting goals? • Work on Penelope code for Gamma Knife likely to Success in meeting goals? • Work on Penelope code for Gamma Knife likely to be first major transferable technology development. Stands to improve efficacy of Gamma Knife treatment at IU • Excellent success in supporting basic research • Development of open source software (licensed under terms similar to Lesser GNU) provide opportunities for technology transfer • Participation in grants and industrial partnerships provides economic benefit for IU

Success factors • Creation of new position, Chief Information Officer and Associate Dean, within Success factors • Creation of new position, Chief Information Officer and Associate Dean, within IU School of Medicine, and significant improvement in basic IT infrastructure within the IU School of Medicine • INGEN has permitted IU to build on excellent IT infrastructure • Dedicated (but not isolated) staff supporting INGEN researchers • Commitment to customer service • Outreach (in the proper formats)

Success factors, con't • Scientific collaborations • Strategy research on behalf of IU School Success factors, con't • Scientific collaborations • Strategy research on behalf of IU School of Medicine • Accountability • Leveraging of industrial partnerships

Funding Support • This research was supported in part by the Indiana Genomics Initiative Funding Support • This research was supported in part by the Indiana Genomics Initiative (INGEN). The Indiana Genomics Initiative (INGEN) of Indiana University is supported in part by Lilly Endowment Inc. • Joint Study Agreement with IBM, Inc. Protein Family Annotator: School of Informatics - M Dalkilic, Center for Genomics and Bioinformatics - P Cherbas, Univ. Information Technology Services & INGEN IT Core - C Stewart. • This work was supported in part by Shared University Research grants from IBM, Inc. to Indiana University. • This material is based upon work supported by the National Science Foundation under Grant No. 0116050 and Grant No. CDA-9601632. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

Additional Information • Further information is available at • • ingen. iu. edu http: Additional Information • Further information is available at • • ingen. iu. edu http: //www. indiana. edu/~uits/rac/ http: //cgb. indiana. edu/ http: //www. ncsc. org/casc/paper. html

Acknowledgements (People) • UITS Research and Academic Computing Division managers: Mary Papakhian, David Hart, Acknowledgements (People) • UITS Research and Academic Computing Division managers: Mary Papakhian, David Hart, Stephen Simms, Richard Repasky, Matt Link, John Samuel, Eric Wernert, Anurag Shankar • INGEN Staff: Andy Arenson, Chris Garrison, Huian Li, Jagan Lakshmipathy, David Hancock • UITS Senior Management: Associate Vice President and Dean Christopher Peebles, RAC(Data) Director Gerry Bernbom • Assistance with this presentation: John Herrin, Malinda Lingwall