532a5d468a2cd4c8b89bb54f00fa171e.ppt
- Количество слайдов: 104
Introduction to Bioinformatics 1
What is Bioinformatics? 2
NIH – definitions What is Bioinformatics? - Research, development, and application of computational tools and on molecular approaches for expanding the use of biological, medical, behavioral, and health data, including the means to acquire, store, organize, archive, analyze, or visualize such data. What is Computational Biology? - The development and application of analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social data. 3
NSF – introduction Large databases that can be accessed analyzed with sophisticated tools have become central to biological research and education. The information content in the genomes of organisms, in the molecular dynamics of proteins, and in population dynamics, to name but a few areas, is enormous. Biologists are increasingly finding that the management of complex data sets is becoming a bottleneck for scientific advances. Therefore, bioinformatics is rapidly become a key technology in all fields of biology. 4
NSF – mission statement The present bottlenecks in bioinformatics include the education of biologists in the use of advanced computing tools, the recruitment of computer scientists into this evolving field, the limited availability of developed databases of biological information, and the need for more efficient and intelligent search engines for complex databases. 5
NSF – mission statement The present bottlenecks in bioinformatics include the education of biologists in the use of advanced computing tools, the recruitment of tools computer scientists into this evolving field, the limited availability of developed databases of biological information, and the need for more efficient and intelligent search engines for complex databases. 6
Molecular Bioinformatics involves the use of computational tools to discover new information in complex data sets (from the one-dimensional information of DNA through the two-dimensional information of RNA and the three-dimensional information of proteins, to the four-dimensional information of evolving living systems). 7
Bioinformatics (Oxford English Dictionary): The branch of science concerned with information and information flow in biological systems, esp. the use of computational methods in genetics and genomics. 8
The field of science in which biology, computer science and information technology merge into a single discipline Biologists collect molecular data: DNA & Protein sequences, gene expression, etc. Bioinformaticians Study biological questions by analyzing molecular data Computer scientists (+Mathematicians, Statisticians, etc. ) Develop tools, softwares, algorithms to store and analyze the data. 9
Some biological background…. A biologist 10
The hereditary information of all living organisms, with the exception of some viruses, is carried by deoxyribonucleic acid (DNA) molecules. 2 purines: 2 pyrimidines: adenine (A) guanine (G) cytosine (C) thymine (T) two rings one ring 11
The entire complement of genetic material carried by an individual is called the genome. Eukaryotes may have up to 3 subcellular genomes: 1. Nuclear 2. Mitochondrial 3. Plastid Bacteria have either circular or linear genomes and may also carry plasmids Human chromosomes Circular genome 12
Central dogma: DNA makes RNA makes Protein Modified dogma: DNA makes DNA and RNA, RNA makes DNA, RNA an Protein 13
Amino acids - The protein building blocks 14
15
Any region of the DNA sequence can, in principle, code for six different amino acid sequences, because any one of three different reading frames can be used to interpret each of the two strands. 16
Protein folding A human Hemoglobin: 17
How does it all looks like on a computer monitor? 18
A c. DNA sequence >gi|14456711|ref|NM_000558. 3| Homo sapiens hemoglobin, alpha 1 (HBA 1), m. RNA ACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAA CGTCAAGGCCGCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAG AGGATGTTCCTGTCCTTCCCCACCACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCT GCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGACGCGCTGACCAACGCCGTGGCGCACGTGG ACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACCCG GTCAACTTCAAGCTCCTAAGCCACTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTC ACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAA TACCGTTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTC CCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGC 19
A c. DNA sequence (reading frame) >gi|14456711|ref|NM_000558. 3| Homo sapiens hemoglobin, alpha 1 (HBA 1), m. RNA ACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCA ACGTCAAGGCCGCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGA GAGGATGTTCCTGTCCTTCCCCACCACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTC TGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGACGCGCTGACCAACGCCGTGGCGCACGTG GACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACCC GGTCAACTTCAAGCTCCTAAGCCACTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTT CACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAA ATACCGTTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCC TCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGC A protein sequence >gi|4504347|ref|NP_000549. 1| alpha 1 globin [Homo sapiens] MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVA DALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFL ASVSTVLTSKYR 20
And, a whole genome… ACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTC AAGGCCGCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTT CCTGTCCTTCCCCACCACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAG GGCCACGGCAAGAAGGTGGCCGACGCGCTGACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGC GCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGC CACTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGTGCACGCCTCCCTG GACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCGTTAAGCTGGAGCCTCGGTGGCCA TGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTG AATAAAGTCTGAGTGGGCGGCACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCT CCTGCCGACAAGACCAACGTCAAGGCCGCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCG GAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCACCAAGACCTACTTCCCGCACTTCGACCTGAGCC ACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGACGCGCTGACCAACGCCGTGGCGCAC GTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACCCG GTCAACTTCAAGCTCCTAAGCCACTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCACCC CTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCGTTA AGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCAC CCGTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGCACTCTTCTGGTCCCCACAGACTCAGAGAG AACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCGCCTGGGGTAAGGTCGGCGCGC ACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCACCAAGACCTACTT CCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGACGCGCT GACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCA CAAGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACTGCTGGTGACCCTGGCCGCCCAC CTCCCCGCCGAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGC TGACCTCCAAATACCGTTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCC CCTCCTCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGCGCCGTGGCGC ACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACC CGGTCAACTTCAAGCTCCTAAGCCACTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCAC 21 CCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCGT
How big are whole genomes? E. coli 4. 6 x 106 nucleotides – Approx. 4, 000 genes Yeast 15 x 106 nucleotides – Approx. 6, 000 genes Human 3 x 109 nucleotides – Approx. 30, 000 genes Smallest human chromosome 50 x 106 nucleotides 22
What do we actually do with bioinformatics? 23
Sequence assembly 24 (next generation sequencing)
Genome annotation 25
Molecular evolution Origins and evolutionary genomics of the 2009 swine-origin H 1 N 1 influenza A epidemic 26 Smith et al. (2009) Nature 459, 1122 -1125
Analysis of gene expression Gene expression profile of relapsing versus non-relapsing Wilms tumors. A set of 39 genes discriminates between the two classes of tumors. 27 (http: //www. biozentrum 2. uni-wuerzburg. de/, Prof. Gessler)
Analysis of regulation 28 Toledo and Bardot (2009) Nature 460, 466 -467
Protein structure prediction Protein docking 29
30 Luscombe, Greenbaum, Gerstein (2001)
From DNA to Genome Watson and Crick DNA model Sequence alignment PDB (Protein Data Bank) Sanger sequences insulin protein 1955 1960 1965 1970 1985 ARPANET (early Internet) Sanger dideoxy DNA sequencing 1975 Gen. Bank database Dayhoff’s Atlas PCR (Polymerase Chain Reaction)31
SWISS-PROT database NCBI FASTA 1990 BLAST Human Genome Initiative EBI 1995 First bacterial genome World Wide Web Yeast genome 2000 First human genome draft 32
Origin of bioinformatics and biological databases: The first protein sequence reported was that of bovine insulin in 1956, consisting of 51 residues. Nearly a decade later, the first nucleic acid sequence was reported, that of yeast t. RNAalanine with 77 bases. 33
In 1965, Dayhoff gathered all the available sequence data to create the first bioinformatic database (Atlas of Protein Sequence and Structure). The Protein Data. Bank followed in 1972 with a collection of ten X-ray crystallographic protein structures. The SWISSPROT protein sequence 34 database began in 1987.
Nucleotides 35
Complete Genomes as of August 2011: Eukaryotes 37 Prokaryotes 1708 Total 1745 36
What can we do with sequences and other type of molecular information? 37
Open reading frames Annotation Functional sites Structure, function 38
CCTGACAAATTCGACGTGCGGCATTGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT CTG AAT. . . . . TGAAAAACGTA 39
TF binding site CCTGACAAATTCGACGTGCGGCATTGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT CTG AAT. . . TGAAAAACGTA ORF = Open Reading Frame CDS = Coding Sequence Transcription Start Site promoter Ribosome binding Site 40
Comparing ORFs Identifying orthologs Comparative genomics Inferences on structure and function Comparing functional sites Inferences on regulatory networks 41
Similarity profiles Researchers can learned a great deal about the structure and function of human genes by examining their counterparts in 42 model organisms.
Alignment preproinsulin Xenopus Bos MALWMQCLP-LVLVLLFSTPNTEALANQHL MALWTRLRPLLALLALWPPPPARAFVNQHL **** : * *. *: *: . . * : . *: **** Xenopus Bos CGSHLVEALYLVCGDRGFFYYPKIKRDIEQ CGSHLVEALYLVCGERGFFYTPKARREVEG ********: ***** ** : *: : * Xenopus Bos AQVNGPQDNELDG-MQFQPQEYQKMKRGIV PQVG---ALELAGGPGAGGLEGPPQKRGIV. ** * * ***** Xenopus Bos EQCCHSTCSLFQLENYCN EQCCASVCSLYQLENYCN **** *. ***: ******* 43
44
Ultraconserved Elements in the Human Genome Gill Bejerano, Michael Pheasant, Igor Makunin, Stuart Stephen, W. James Kent, John S. Mattick, & David Haussler (Science 2004. 304: 1321 -1325) There are 481 segments longer than 200 base pairs (bp) that are absolutely conserved (100% identity with no insertions or deletions) between orthologous regions of the human, rat, and mouse genomes. Nearly all of these segments are also conserved in the chicken and dog genomes, with an average of 95 and 99% identity, respectively. Many are also significantly conserved in fish. These ultraconserved elements of the human genome are most often located either overlapping exons in genes involved in RNA processing or in introns or nearby genes involved in the regulation of transcription and development. There are 156 intergenic, untranscribed, ultraconserved segments 45
Junk: Supporting evidence Junk is real! 46
Genome-wide profiling of: • m. RNA levels • Protein levels Functional genomics Co-expression of genes and/or proteins Identifying protein-protein interactions Networks of interactions 47
Understanding the function of genes and other parts of the genome 48
Structural genomics Assign structure to all proteins encoded in a genome 49
Biological databases 50
Database or databank? Initially • Databank (in UK) • Database (in the USA) Solution • The abbreviation db 51
What is a Database? A structured collection of data held in computer storage; esp. one that incorporates software to make it accessible in a variety of ways; transf. , any large collection of information. database management: the organization and manipulation of data in a database management system (DBMS): a software package that provides all the functions required for database management. database system: a database together with a database management system. 52 Oxford Dictionary
What is a database? • A collection of data – – structured searchable (index) updated periodically (release) cross-referenced (hyperlinks) -> table of contents -> new edition -> links with other db • Includes also associated tools (software) necessary for access, updating, information insertion, information deletion…. • Data storage management: flat files, relational databases… 53
Database: a « flat file » example Flat-file database ( « flat file, 3 entries » ): Accession number: 1 First Name: Amos Last Name: Bairoch Course: Pottery 2000; Pottery 2001; // Accession number: 2 First Name: Dan Last name: Graur Course: Pottery 2000, Pottery 2001; Ballet 2001, Ballet 2002 // Accession number 3: First Name: John Last name: Travolta Course: Ballet 2001; Ballet 2002; // • Easy to manage: all the entries are visible at the same time ! 54
Database: a « relational » example Relational database ( « table file » ): Teacher Accession number Education Amos 1 Biochemistry Dan 2 Genetics John 3 Scientology Course Year Involved teachers Advanced Pottery 2000; 2001 1; 2 Ballet for Fat People 2001; 2002 2; 3 55
Why biological databases? • Exponential growth in biological data. • Data (genomic sequences, 3 D structures, 2 D gel analysis, MS analysis, Microarrays…. ) are no longer published in a conventional manner, but directly submitted to databases. • Essential tools for biological research. The only way to publish massive amounts of data without using all the paper in the world. 56
Distribution of sequences • • Books, articles 1968 -> 1985 Computer tapes 1982 -> 1992 Floppy disks 1984 -> 1990 CD-ROM 1989 -> FTP 1989 -> On-line services 1982 -> 1994 WWW 1993 -> DVD 2001 -> 57
Some statistics • More than 1000 different ‘biological’ databases • Variable size: <100 Kb to >20 Gb – – DNA: > 20 Gb Protein: 1 Gb 3 D structure: 5 Gb Other: smaller • Update frequency: daily to annually to seldom to forget about it. • Usually accessible through the web (some free, some not) 58
Some databases in the field of molecular biology… AATDB, Ace. Db, ACUTS, ADB, AFDB, AGIS, AMSdb, ARR, As. Db, BBDB, BCGD, Beanref, Biolmage, Bio. Mag. Res. Bank, BIOMDB, BLOCKS, Bov. GBASE, BOVMAP, BSORF, BTKbase, CANSITE, Carb. Bank, CARBHYD, CATH, CAZY, CCDC, CD 4 OLbase, CGAP, Chick. GBASE, Colibri, COPE, Cotton. DB, CSNDB, CUTG, Cyano. Base, db. CFC, db. EST, db. STS, DDBJ, DGP, Dicty. Db, Picty_c. DB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC, ECGC, EC 02 DBASE, Eco. Cyc, Eco. Gene, EMBL, EMD db, ENZYME, EPD, Epo. DB, ESTHER, Fly. Base, Fly. View, GCRDB, GENATLAS, Genbank, Gene. Cards, Genline, Gen. Link, GENOTK, Gen. Prot. EC, GIFTS, GPCRDB, GRAP, GRBase, g. RNAsdb, GRR, GSDB, HAEMB, HAMSTERS, HEART-2 DPAGE, HEXAdb, HGMD, HIDB, HIDC, Hl. Vdb, Hot. Molec. Base, HOVERGEN, HPDB, HSC-2 DPAGE, ICN, ICTVDB, IL 2 RGbase, IMGT, Kabat, KDNA, KEGG, Klotho, LGIC, MAD, Maize. Db, MDB, Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP 5 Micado, Mito. Dat, MITOMAP, MJDB, Mmt. DB, Mol-R-Us, MPDB, MRR, Mut. Base, Myc. DB, NRSub, 0 -lyc. Base, OMIA, OMIM, OPD, ORDB, OWL, PAHdb, Pat. Base, PDB, PDD, Pfam, Phospho. Base, Pig. BASE, PIR, PKR, PMD, PPDB, PRESAGE, PRINTS, Pro. Dom, Prolysis, PROSITE, PROTOMAP, Rat. MAP, RDP, REBASE, RGP, SBASE, SCOP, Seq. Anai. Ref, SGD, SGP, Sheep. Map, Soybase, SPAD, SRNA db, SRPDB, STACK, Sty. Gene, Sub 2 D, Subti. List, SWISS-2 DPAGE, SWISS-3 DIMAGE, SWISSMODEL Repository, SWISS-PROT, Tel. DB, TGN, tm. RDB, TOPS, TRANSFAC, TRR, Uni. Gene, URNADB, V BASE, VDRR, Vector. DB, WDCM, WIT, Worm. Pep, YEPD, YPM, etc. . . . !!!! 59
Categories of databases for Life Sciences • • • Sequences (DNA, protein) Genomics Mutation/polymorphism Protein domain/family Proteomics (2 D gel, Mass Spectrometry) 3 D structure Metabolic networks Regulatory networks Bibliography Expression (Microarrays, …) Specialized 60
NCBI: http: //www. ncbi. nlm. nih. gov EBI: http: //www. ebi. ac. uk/ DDBJ: http: //www. ddbj. nig. ac. jp/ 61
Literature Databases: Bookshelf: A collection of searchable biomedical books linked to Pub. Med: Allows searching by author names, journal titles, and a new Preview/Index option. Pub. Med database provides access to over 12 million MEDLINE citations back to the mid-1960's. It includes History and Clipboard options which may enhance your search session. Pub. Med Central: The U. S. National Library of Medicine digital archive of life science journal literature. http: //www. ncbi. nlm. nih. gov/omim Online Mendelian Inheritance in Man is a database of human genes and genetic disorders (also 62 OMIA).
Pub. Med (Medline) • MEDLINE covers the fields of medicine, nursing, dentistry, veterinary medicine, public health, and preclinical sciences • Contains citations from approximately 5, 200 worldwide journals in 37 languages; 60 languages for older journals. • Contains over 20 million citations since 1948 • Contains links to biological db and to some journals • New records are added to Pre. MEDLINE daily!
• Alerting services – http: //www. pubcrawler. ie/ – http: //www. biomail. org 65
A search by subject: “mitochondrion evolution”
Type in a Query term • Enter your search words in the query box and hit the “Go” button 68 http: //www. ncbi. nlm. nih. gov/entrez/query/static/helpdoc. html#Searching
The Syntax … 1. Boolean operators: AND, OR, NOT must be entered in UPPERCASE (e. g. , promoters OR response elements). The default is AND. 2. Entrez processes all Boolean operators in a left-to-right sequence. The order in which Entrez processes a search statement can be changed by enclosing individual concepts in parentheses. The terms inside the parentheses are processed first. For example, the search statement: g 1 p 3 OR (response AND element AND promoter). 3. Quotation marks: The term inside the quotation marks is read as one phrase (e. g. “public health” is different than public health, which will also include articles on public latrines and their effect on health workers). 4. Asterisk: Extends the search to all terms that start with the letters before the asterisk. For example, dia* will include such terms as 69 diaphragm, dial, and diameter.
Refine the Query • Often a search finds too many (or too few) sequences, so you can go back and try again with more (or fewer) keywords in your query • The “History” feature allows you to combine any of your past queries. • The “Limits” feature allows you to limit a query to specific organisms, sequences submitted during a specific period of time, etc. • [Many other features are designed to search for literature in MEDLINE] 70
Related Items You can search for a text term in sequence annotations or in MEDLINE abstracts, and find all articles, DNA, and protein sequences that mention that term. Then from any article or sequence, you can move to "related articles" or "related sequences". • Relationships between sequences are computed with BLAST • Relationships between articles are computed with "MESH" terms (shared keywords) • Relationships between DNA and protein sequences rely on accession numbers • Relationships between sequences and MEDLINE articles rely on both shared keywords and the mention of accession numbers in the articles. 71
72
73
74
A search by authors: “Esser” [au] AND “martin” [au]
A search by title word: “Wolbachia pipientis” [ti]
Database Search Strategies • General search principles - not limited to sequence (or to biology). • Start with broad keywords and narrow the search using more specific terms. • Try variants of spelling, numbers, etc. • Search many databases. • Be persistent!! 77
Searching Pub. Med • Structureless searches – Automatic term mapping • Structured searches – Tags, e. g. [au], [ta], [dp], [ti] – Boolean operators, e. g. AND, OR, NOT, () • Additional features – Subsets, limits – Clipboard, history 78
Start working: Search Pub. Med 1. cuban cigars 2. cuban OR cigars 3. “cuban cigars” 4. cuba* cigar* 5. (cuba* cigar*) NOT smok* 6. Fidel Castro 7. “fidel castro” 8. #6 NOT #7 79
“Details” and “History” in Pub. Med 80
“Details” and “History” in Pub. Med 81
The OMIM (Online Mendelian Inheritance in Man) – Genes and genetic disorders – Edited by team at Johns Hopkins – Updated daily 82
MIM Number Prefixes * + gene with known sequence and phenotype # phenotype description, molecular basis known % mendelian phenotype or locus, molecular basis unknown no prefix other, mainly phenotypes with suspected mendelian basis 83
Searching OMIM • Search Fields – Name of trait, e. g. , hypertension – Cytogenetic location, e. g. , 1 p 31. 6 – Inheritance, e. g. , autosomal dominant – Gene, e. g. , coagulation factor VIII 84
OMIM search tags All Fields Allelic Variant Chromosome Clinical Synopsis Gene Map Gene Name Reference [ALL] [AV] or [VAR] [CH] or [CHR] [CS] or [CLIN] [GM] or [MAP] [GN] or [GENE] [RE] or [REF] 85
86
Start working: Search OMIM How many types of hemophilia are there? For how many is the affected gene known? What are the genes involved in hemophilia A? What are the mutations in hemophilia A? 87
Online Literature databases 1. How to use the UH online Library? 2. Online glossaries 3. Google Scholar 4. Google Books 5. Web of Science 88
How to use the online UH Library? http: //info. lib. uh. edu/ 89
90
Online Glossaries Bioinformatics : http: //www. geocities. com/bioinformaticsweb/glossary. html http: //big. mcw. edu/ Genomics: http: //www. geocities. com/bioinformaticsweb/genomicglossary. html Molecular Evolution: http: //workshop. molecularevolution. org/resources/glossary/ Biology dictionary: http: //www. biology-online. org/dictionary/satellite_cells Other glossaries, e. g. , the list of phobias: http: //www. phobialist. com/class. html 91
4. Google Scholar http: //www. scholar. google. com/ 92
What is Google Scholar? Enables you to search specifically for scholarly literature, including peer-reviewed papers, theses, books, preprints, abstracts and technical reports from all broad areas of research. 93
Use Google Scholar to find articles from a wide variety of academic publishers, professional societies, preprint repositories and universities, as well as scholarly articles available across the web. 94
Google Scholar orders your search results by how relevant they are to your query, so the most useful references should appear at the top of the page This relevance ranking takes into account the: full text of each article. the article's author, the publication in which the article appeared and how often it has been cited in scholarly literature. 95
What other DATA can we retrieve from the record? 96
97
98
5. Google Book Search 99
100
Start working: Search Google Books How many times is the tail of the giraffe mentioned in On the Origin of Species by Mr. Darwin? 101
6. Web of science http: //apps. webofknowledge. com. ezproxy. lib. uh. edu/WOS_General. Search_input. do? product =WOS&search_mode=General. Search&SID=4 FB 7 Lbb. Lg. DMh. G 9 f. Di. Lh&preferences. Saved= 102
103
104
532a5d468a2cd4c8b89bb54f00fa171e.ppt