ab6a730b5f6f95e4444faadc626c244b.ppt
- Количество слайдов: 41
High Performance Computing for University Medical Research: A Successful Implementation Dr. Craig A. Stewart, Ph. D. stewart@iu. edu Director, Research and Academic Computing, University Information Technology Services Director, Information Technology Core, Indiana Genomics Initiative Dr. Richard Repasky, Ph. D. rrepasky@indiana. edu Bioinformatics Specialist
License Terms • Please cite this presentation as: Stewart, C. A. and R. Repasky. High Performance Computing for University Medical Research: A Successful Implementation. 2007. Presentation. Presented at: Bio-IT World Conference & Expo (Boston, MA, 24 -26 Apr 2007). Available from: http: //hdl. handle. net/2022/14600 • Portions of this document that originated from sources outside IU are shown here and used by permission or under licenses indicated within this document. • Items indicated with a © are under copyright and used here with permission. Such items may not be reused without permission from the holder of copyright except where license terms noted on a slide permit reuse. • Except where otherwise noted, the contents of this presentation are copyright 2007 by the Trustees of Indiana University. This content is released under the Creative Commons Attribution 3. 0 Unported license (http: //creativecommons. org/licenses/by/3. 0/). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.
Bioinformatics and Biomedical Research • Bioinformatics, Genomics, Proteomics, ____ics all promise to radically change our understanding of biological function and the way biomedical research is done. • Traditional biomedical researchers must take advantage of new possibilities • “Post-genomic” research must take advantage of the tremendous store of detailed knowledge held by traditional biomedical researchers
Anopheles gambiae • From www. sciencemag. org/feature/data/mosquito/mtm/index. html Source Library: Centers for Disease Control PHIL Photo Credit: Jim Gathany
IU’s goals for the Indiana Genomics Initiative (INGEN) • Build on traditional strengths of IU School of Medicine • Build on IU's strength in Information Technology • Add new programs of research made possible by the sequencing of the human genome • Perform the research that will generate new treatments for human disease in the post-genomic era • Improve human health generally and in the State of Indiana particularly • Enhance economic growth in Indiana • INGEN was created by a $105 M grant from the Lilly Endowment, Inc. and launched December, 2000 • The goal of this talk is to explain how advanced information technology was implemented to aid in the meeting of these goals.
Outline • Background information about IU • The Indiana Genomics Initiative (INGEN) • The INGEN Information Technology Core • Facilities • Service • Some key projects • Status and summary of success factors • Acknowledgements
IU in a nutshell • $2 B Annual Budget • 8 campuses, 90, 000 students, 3, 900 faculty • 878 degree programs; > 100 programs ranked within top 20 of their type nationally • Nation’s second largest school of medicine • 1, 347 M. D. , Ph. D. and M. D. /Ph. D students • Sole school of medicine in Indiana • Traditional strengths in human genetic diseases (e. g. , Alcoholism, Huntingtons) and medical records (Regenstrief Institute)
IT @ IU in a nutshell • CIO: Vice President Michael A. Mc. Robbie • ~$100 M annual budget • Technology services offered university- wide • Networking • IU Operates network Operations Center for Abilene • High Performance Computing • First university in US to own a 1 TFLOPS supercomputer • Top 500 list has for past several years included at least one IU supercomputer
INGEN Structure Programs Cores • Bioethics • Tech Transfer • Information • Genomics • Gene Expression • Medical Informatics • Education • Training Technology • Proteomics • Cell & Protein • Integrated Expression • Human Expression Imaging • In vivo Imaging • Animal
Indiana Genomics Initiative Programs Genomics Medical Informatics Education Bioinformatics Training Bioethics Cores Proteomics Information Technology In Vivo Imaging Genotyping and Gene Expression Animal Cell and Protein Human Drosophila Expression Integrated Technology Microscopy Transfer
Information Technology Core • Foci: • High Performance Computing • Visualization (esp. 3 D) • Massive Data Storage • Support for use of all of the above • $6. 7 M budget for IT Core • Baseline IT services for School of Medicine responsibility of School of Medicine CIO
Challenges for UITS and the INGEN IT Core • Assist traditional biomedical researchers in adopting use of advanced information technology (massive data storage, visualization, and high performance computing) • Assist bioinformatics researchers in use of advanced computing facilities • Questions we are asked: • Why wouldn't it be better just to buy me a newer PC? • Questions we ask: • What do you do now with computers that you would like to do faster? • What would you do if computer resources were not a constraint?
Steps in meeting the challenge • Use INGEN funding to enhance IU’s high performance computing hardware environment • Use INGEN funding to add dedicated staff supporting INGEN researchers • Proof of concept projects showing advanced capabilities of IU’s IT environment • Outreach to get many people using at least the basic capabilities of IU’s advanced IT environment
Hardware Environment • I-Light network • High Performance Computing • IBM SP – 1. 005 TFLOPS • Sun E 10000 52 GFLOPS • Large, distributed Linux cluster – 1. 1 TFLOPS • Massive Data Storage system • Advanced Visualization Systems • CAVE • John-E-Box
IBM Research SP (Aries/Orion Complex) • Acquired 9/96, expanded in 1998, 1999, 2000, 2001, 2002 with help of IU IT Strategic Plan funds, IBM SUR grants and INGEN grant from Lilly Endowment, Inc. • Geographically distributed at IUB and IUPUI • 632 cpus, 1. 005 Tera. FLOPS • First University-owned supercomputer in US to exceed 1 TFLOPS processing capacity • Initially 50 th, now 112 th in Top 500 supercomputer list • Distributed memory system with shared memory nodes • AIX 5. 1, wealth of software including SAS, SPSS, S-Plus, Mathematica, Matlab, Maple, Gaussian, GIS, scientific/numerical libraries, Oracle and DB 2, and more
IBM Research SP (Aries/Orion) © 2000 Tyagan Miller
Sun E 10000 (Solar) • Acquired 4/00 • Shared memory architecture • ~52 GFLOPS • 64 400 MHz cpus, 64 GB memory • > 2 TB external disk • Solaris 2. 8 • Supports some bioinformatics software not available under AIX (e. g. GCG/Seq. Web)
Sun E 10000 (Solar) © 2000 Tyagan Miller
Distributed Linux Cluster • AVIDD (Analysis and Visualization of Instrument -Driven Data) • 1. 1 TFLOPS, 0. 5 TB RAM, 10 TB Disk • Tuned, configured, and optimized for handling real-time data streams
Massive Data Storage System • Based on HPSS (High Performance Software System) • 180 TB capacity with existing tapes; total capacity of 480 TB • First distributed HPSS installation; STK 9310 Silos in Bloomington and Indianapolis • Automatic replication of data between Indianapolis and Bloomington, via I-light, overnight. Critical for biomedical data, which is often irreplaceable.
STK Silo © 2000 Tyagan Miller
Advanced Visualization • Advanced Visualization Lab – recognized as leader in implementation of 3 D and other advanced visualization technologies • CAVE – Immersive 3 D environment • John-E-Box – IU designed, low-cost passive 3 D device. Under construction now, planned for installation in multiple INGEN-affiliated labs
John-E-Box Invented by John N. Huffman, John C. Huffman, and Eric Wernert
Specific benefits in hardware environment as a result of INGEN funding: • Funded significant fraction of upgrade of IU’s IBM SP to 1 TFLOPS • Funded addition of STK Silo in Indianapolis (and tapes) to provide redundant storage of data • Funded placement of visualization equipment within the School of Medicine
So, what now that we have all of this hardware? • Strategic relationships with vendors • University Information Technology Services has a history of excellent customer support and long-term, collaborative research. • Focus on provision of facilities and services as a competitive advantage. • Annual customer satisfaction survey – user satisfaction typically > 95%. These results probably not representative of So. M as of 2000. • More information available at http: //www. indiana. edu/~rac/siguccs_copyright. html • It’s people – consulting staff – that make the hardware useful for researchers
INGEN IT Core Support Staff • Visualization programmer, HPC programmer, and bioinformatics database specialist hired to support INGEN • Staff added to existing management units within UITS • economy of scale (management, exchange of expertise) • Assures addition rather than substitution for basefunded consulting support
So, why is this better than just buying me a new PC? • Unique facilities provided by IT Core • Redundant data storage • HPC – better uniprocessor performance; trivially parallel programming, parallel programming • Visualization in the research laboratories • Hardcopy document – INGEN's advanced IT facilities: The least you need to know • Outreach efforts • Demonstration projects
Example projects • Multiple simultaneous Matlab jobs for brain imaging. • Installation of many commercial and open source bioinformatics applications. • Site licenses for several commercial packages • Evaluation of several software products that were not implemented.
Creation of new software • Gamma Knife – Penelope. Modified existing version for more precise targeting with IU's Gamma Knife. • Karyote (TM) Cell model. Developed a portion of the code used for model cell function. http: //biodynamics. indiana. edu/ • Pi. VNs. Software to visualize human family trees • 3 -DIVE (3 D Interactive Volume Explorer). http: //www. avl. iu. edu/projects/3 DIVE/ • fast. DNAml – maximum likelihood phylogenies (http: //www. indiana. edu/~rac/hpc/fast. DNAml/index. html) • Protein Family Annotator – collaborative development with IBM, Inc.
Data Integration • Goal set by IU School of Medicine: Any research within the IU School of Medicine should be able to transparently query all relevant public external data sources and all sources internal to the IU School of Medicine to which the researcher has read privileges • IU has more than 1 TB of biomedical data stored in massive data storage system • There are many public data sources • Different labs were independently downloading, subsetting, and formatting data • Solution: IBM Discovery. Link, DB/2 Information Integrator
Centralized Life Science Database (CSLD) • Based on use of IBM Discovery. Link(TM) and DB/2 Information Integrator(TM) • Public data is still downloaded, parsed, and put into a database, but now the process is automated and centralized. • Lab data and programs like BLAST are included via DL’s wrappers. • Implemented in partnership with IBM Life Sciences via IU-IBM strategic relationship in the life sciences • IU contributed writing of data parsers
Status Overall • So far, so good • 108 users of IU’s supercomputers • 104 users of massive data storage system • Six new software packages created or enhanced, more than 20 packages installed for use by INGEN-affiliated researchers • 1 TB of biomedical data stored in the massive data storage system • Three software packages made available as open source software as direct result of INGEN • The INGEN IT Core is providing services valued by traditionally trained biomedical researchers as well as researchers in bioinformatics, genomics, proteomics, etc.
Success in meeting goals? • Work on Penelope code for Gamma Knife likely to be first major transferable technology development. Stands to improve efficacy of Gamma Knife treatment at IU • Excellent success in supporting basic research • Development of open source software (licensed under terms similar to Lesser GNU) provide opportunities for technology transfer • Participation in grants and industrial partnerships provides economic benefit for IU
Success factors • Creation of new position, Chief Information Officer and Associate Dean, within IU School of Medicine, and significant improvement in basic IT infrastructure within the IU School of Medicine • INGEN has permitted IU to build on excellent IT infrastructure • Dedicated (but not isolated) staff supporting INGEN researchers • Commitment to customer service • Outreach (in the proper formats)
Success factors, con't • Scientific collaborations • Strategy research on behalf of IU School of Medicine • Accountability • Leveraging of industrial partnerships
Funding Support • This research was supported in part by the Indiana Genomics Initiative (INGEN). The Indiana Genomics Initiative (INGEN) of Indiana University is supported in part by Lilly Endowment Inc. • Joint Study Agreement with IBM, Inc. Protein Family Annotator: School of Informatics - M Dalkilic, Center for Genomics and Bioinformatics - P Cherbas, Univ. Information Technology Services & INGEN IT Core - C Stewart. • This work was supported in part by Shared University Research grants from IBM, Inc. to Indiana University. • This material is based upon work supported by the National Science Foundation under Grant No. 0116050 and Grant No. CDA-9601632. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).
Additional Information • Further information is available at • • ingen. iu. edu http: //www. indiana. edu/~uits/rac/ http: //cgb. indiana. edu/ http: //www. ncsc. org/casc/paper. html
Acknowledgements (People) • UITS Research and Academic Computing Division managers: Mary Papakhian, David Hart, Stephen Simms, Richard Repasky, Matt Link, John Samuel, Eric Wernert, Anurag Shankar • INGEN Staff: Andy Arenson, Chris Garrison, Huian Li, Jagan Lakshmipathy, David Hancock • UITS Senior Management: Associate Vice President and Dean Christopher Peebles, RAC(Data) Director Gerry Bernbom • Assistance with this presentation: John Herrin, Malinda Lingwall