
44b10c162cc684e4f8351c75da529eeb.ppt
- Количество слайдов: 49
3 nd Essential Practical Bioinformatics Workshop Introductory Lecture: Why Bioinformatics? Tan Tin Wee Department of Biochemistry, YLL School of Medicine, NUS and Victor Tong Joo Chuan I 2 R, Department of Biochemistry, YLL School of Medicine (Adjunct), NUS Mohammad Asif Khan Perdana University Graduate School of Medicine, Malaysia 1
Some definitions of Bioinformatics is “the development and application of techniques from computer science, mathematics and statistics to address biological problems” Dubitzky (Brief Bioinform. 2009; 10: 343) Bioinformatics is “the study of the information content and information flow in biological systems and processes”. Michael Liebman in “Bioinformatics: An Editorial Perspective” (http: //www. netsci. org/Science/Bioinform/feature 01. html) 2
Information flow in Biology
How does the plane fly from A to B? ¡ ¡ A 747 -400 has six million parts A 747 -400 has 274 km of wiring and 8 km of tubing. Seventy-five thousand engineering drawings were used to produce the first 747. Will we be able to understand how a pilot flies or lands the plane ie. the behaviour of a plane, if we took it all apart?
Can I get to understand how the ppt presentation works if I take the parts of my computer apart and analyse every chip and every transistor?
What drives us to study the Life Sciences? Unraveling the Mysteries of Life! SCALE Time Space Image: http: //instruct 1. cit. cornell. edu/courses/bioes 278_lecture/Topic 1 Basic_Concepts/07 -Levels_of_organisation. jpg (Accessed Aug 16, 2006) 6
Why Bioinformatics? Bioinformatics is at the beginning of taking Biology to the next level: ¡ Studying living things with information technology, and with computing systems, and ¡ Thinking about living systems with information theory.
Unraveling the Mysteries of Life! With Information Theory? Soul? Quantum Information? Mind and Consciousness Brain Network of Neurons Epigenetic Code DNA Genetic Information http: //www. labgrab. com/ users/triptangent/blog/ laser-induced-gamma-wavesoffer-insight-brain-functions DEPTH 8
Living Things and Life Sciences are special! Information Energy Matter
Information flow in Biology
Central Dogma and the “omics” DNA RNA Protein Genomics Transcriptomics Proteomics Regulation Interactomics Metabolism Metabolomics Degradation/degradomics, Immunity/immunomics…etc 11
Structure of DNA JD Watson and FHC Crick April 25, 1953 Nature 171, 737 -738 (1953) Fred Sanger (1975) Dideoxy chain termination DNA sequencing J. Mol. Biol. 94 (3): 441– 8 Kary Mullis (1983) Polymerase Chain Reaction (PCR) Human Genome Project formally Began in October 1990, funded by Do. E and NIH, USA Completion of the first assembly of the human genome June 26, 2000 in Washington Bill Clinton with Craig Venter and Francis Collins 13 years $300 million
1995 Haemophilus (Bacteria) 1. 6 Mb, ~1600 genes [Science 269: 496] 1997 Eukaryote (budding yeast), 13 Mb, ~6 K genes [Nature 387: 1] 1998 C elegans (Worm, Animal) ~100 Mb, ~20 K genes [Science 282: 1945] 2000 Human, ~3 Gb, ~100 K genes Genomes highlight the Finiteness of the “data” in Biology
“Next Gen” and 3 rd Gen Sequencing 454 Life Sciences Pyrosequencing (2005) Illumina (Solexa) (2006) Latest Hi. Seq up to 200 Gb per run, 2 x 100 bp read length, up to 25 Gb per day In a single run, sequence two human genomes at ~30 x coverage ABi (Life Technologies) (2007) SOLi. D for less than $10, 000 (USD) per genome Sequencing by Oligonucleotide Ligation and Detection) 60 gigabases of usable DNA data per run 3 rd Gen DNA Sequencing (New!) Single Molecule PCR-independent Sequencing - Heliscope Single Molecule Sequencer (2009) -Pacific Biosciences SMRT™ (2009) Jonas Korlach & Steve Turner 100 Gb/hr >1000 base/read 2010 5 days: Haitian Cholera epidemic genome completed 14
3 rd Gen Sequencing Impact of ¡ RNA-seq for transcriptomics ¡ BS-seq (bisulfite seq) for DNA methylations ¡ Ch. IP-seq for DNA-protein interactions ¡ CNV-seq for copy number variations Newer 3 rd Gen DNA Sequencing (New!) -Nanopore sequencing (Oxford Nanopore and UCSC) -towards electronic, single molecule DNA sequencing of DNA strands -http: //www. nanoporetech. com/press_releases/detail/114 - Ion Torrent - Quantum Dot sequencing (future? )
Bioinformatics is crucial for Personal Genomics Another Wave Of Exponential Growth? The $1, 000 Human Genome? Exponential Growth 23 and. Me. com de. CODEme Navigenics Knome Hellogenome (Theragen) Complete Genomics (USA) Beijing Genome Institute (BGI) Etc etc
Bioinformatics in genomics ¡ ¡ “the development and application of techniques from computer science, mathematics and statistics to address biological problems” Solving the data deluge in l l l ¡ ¡ ¡ Personal genomics Cancer genomics 1000 genomes project How to store the data? How to analyse the data? How to extract information and knowledge from the data?
From genome (1 D) to structure (3 D)
Crystal to Structure Pipeline Crystal Management Crystal Mounting Crystal Alignment Crystal Description ? ¡ ¡ ¡ Data Collection Structure Determination A S A P Automation of individual process steps Systems integration of automated procedures Knowledge-based approach to structure determination Crystal description based on diffraction pattern & micro-beam exposures Structure Determination based on ASAP; processing engines developed by collaborators © John Wooley, UCSD 2003
The Industrial Scale Discovery Pipe. Line of JCSG HT Pipeline Processes, Bottlenecks and Leaks target selection expression cloning imaging harvesting bl xtal mounting xtal screening publication PDB annotation purification crystallization data collection struc. validation phasing tracing struc. refinement © John Wooley, UCSD 2003
Growth of Protein Databank PDB Structures Common cold: structure of the protein shell, or capsid, of the human rhinovirus. Credit: J. Y. Sgro, UW-Madison
Molecules complexes, pathways Regulating biological processes Jak-STAT Signaling Pathway. KEGG Pathways. http: //www. genome. jp/kegg/pathway/hsa 04630. html (Accessed: Aug 5, 2011) 22
Multiple pathways in a cell… Cells have multiple processes that must be coordinated Metabolism Pathways. KEGG Pathways. http: //www. genome. jp/k egg/pathway/map 01100. html (Accessed: Aug 5, 2011) 23
Parts of a machine Image: http: //www. edwardsheattreating. co m/images/machine%20 parts. jpg Image: http: //www. sperdvac. org/Horizontal %20 Mill/milling%20 machine. jpg And so, we study the individual parts of the machine in order to understand the machine itself. 24
Flagellar system http: //www. fbs. osaka-u. ac. jp/en/seminar/image/09_img 13. jpg © Protonic Nano. Machine Project, ERATO, Japan (Namba) http: //www. fbs. osaka-u. ac. jp/en/seminar/image/09_img 12. jpg
Interactomics Which proteins (biomolecules) interac with which proteins (biomolecules)? Pathway information ¡ Stanyon et al. Genome Biology 2004 5: R 96 26
Cellular “Circuitry” Representing and simulating the well known genetic switch mechanism of l lambda phage to choose between lysis and lysogeny growth pathways – e. g. hybrid functional Petri net technique http: //www. genomicobject. net/member 3/GONET/lambda. htm l
E-cell - Model builder - Algorithm modules - Simulation visualiser © E-cell. org
Tissue-Organ Modeling – Simulating heart myocardial function The Physiome Project © Peter Hunter http: //ep. physoc. org/cgi/content/full/89/1/1
Bio. Imaging Technologies With Genomics, Proteomics Computational Biology © Nature Cell Biology 2003
Integrative Biology © Nature Cell Biology 2003
Novartis Institute for Tropical Diseases (NITD) STOPDengue Project Integrative Biology helping Industry © Nature Cell Biology 2003
Bioinformatics and Computational Biology Underpin Integrative Biology to handle complexity of biological data 3 D 1 D 6 D 2 D
Spectrum of Bioinformatics and Computational Biology Tissue/Organ Physiology “E-Cell” simulation COMPUTATIONAL INFORMATIONAL Regulatory Networks/Circuits Pathways In silico research Interactions Function Structure Sequence Genomics Proteomics Transcriptomics Metabolomics EXPERIMENTAL Other ‘omics OBSERVATIONAL Bio. Imaging In vitro In vivo research
Biology is Big Science these days After the Genomes projects, industrial scale generation of data is no big deal. Sophisticated bioinstrumentation from automated sequencers to microarray systems 24 by 7 churn out ever increasingly large scales of output, throughput and data generation.
¡ ¡ ¡ ¡ Stanford Bio. X (US$ 150 M) MIT CSBi (US$10 M/yr) Princeton Sigler Inst for Integrative Genomics ICAHN Lab (US$40 M) Duke Institute for Genome Sciences and Craig Venter’s TCAG (US$250 M) UMichigan LSI (US$380 M) QB 3 UCalifornia. SF /Scruz/Berkeley (US$200 M) Cornell LSI (US$140) UCSD JCSG © Genome. Web LLC 2003
Singapore’s Mechanobiology Institute S$150 M Bioinformatics and Computational Biology underpins the research process.
The Biological Data Deluge of Volume and Complexity Critical Need for Bioinformatics and Computational Biology expertise in the next generation of Biologists 20 th C: Century of Physics and the Atom Bomb 21 st C: Century of Biology and Biotechnology The Economist Feb/Mar 2010
What does all of this have to do with you? ? ? Basic IT and Computer Literacy in Biologists? 39
Basics of Bioinformatics in Brief Bioinformatics applications are made up of many combinations. . . Visualisation Database Algorithm 40
Biological Databases ¡ ¡ Collect, organize and classify data Query the dataset Retrieve entries based on keyword search Limitations of databases Image from Entrez Gene. http: //www. ncbi. nlm. nih. gov/entrez/query. fcgi? db=gene (Accessed Aug 5, 2011) 41
Sequence Analysis and other bioinformatics software ¡ ¡ ¡ Why bioinformatics software? Examples of selected software Scope and sources of bioinformatics software Cautions re: bioinformatics software Accessing and using bioinformatics software 42
Sequence Comparison, Alignment, Assembly ¡ ¡ ¡ After collecting a set of related sequences, how can we compare them as a set? How should we line up the sequences so that the most similar portions are together? What do we do with sequences of different lengths? How can we compare a given sequence to the millions in the database? Which ones are truly related by evolution? What can the study of related sequences tell us? 43
Patterns and Motifs ¡ What is the signature found in groups of sequences? ¡ How can we use these signature patterns or motifs for rapid identification of familial relationships? ¡ Can patterns be used to assign function? Image: http: //www 44 lecb. ncifcrf. gov/~toms/sequencelogo. html
Evolution and Phylogenetic Analysis ¡ ¡ What is a phylogenetic tree? Algorithms used to generate a phylogenetic tree Decisions governing choice of phylogenetic programs Interpreting genome structure using phylogeny Image: David Begun http: //www. newsandevents. utoronto. ca/bios/askus 4. htm 45
Structure Visualization ¡ ¡ ¡ Using graphic tools to view structures Simple commands to analyse structures and active sites Different graphic representations and colouring schemes The function of a protein is a consequence of its folded state: Anfinsen, 1961 The 3 D fold of a protein is called its structure In 3 D, the business end of the protein has contributions from different regions of its sequence Image: Eric Martz Ras. Mol Gallery. http: //www. umass. edu/microbio/ rasmol/galmz. htm 46
Course objectives of the workshop ¡ Students will be sufficiently familiar with a set of common bioinformatics resources, and their underlying concepts, such that: ¡ You will be able to apply these resources appropriately to solve biological questions. ¡ You will be able to independently identify, use and assess additional resources as they become available. ¡ You will be prepared to pursue more advanced bioinformatics training. ¡ You will begin to think computationally and informatically in doing biology 47
Bioinformatics and Computational Thinking in Solving Problems in Biology Database Algorithms Analysis Visualization Biology Software Applications 48
READINGS: ¡ Johnathan Pevsner (2009) Bioinformatics and Functional Genomics (2 nd edition). Wiley-Blackwell Chapters 1 and 2; relevant database sections in Chapters 19 and 20 See http: //bioinfbook. org/ ¡ Arthur Lesk (2008) Introduction to Bioinformatics. (3 rd ed) Chapters 3 and 4. Oxford University Press. (Optional) ¡ For Practicals – if you are still lost. Jean-Michel Claverie and Cedric Notredame (2007) Bioinformatics for Dummies. (2 nd Edition) Wiley Publishing. (optional)
44b10c162cc684e4f8351c75da529eeb.ppt