Скачать презентацию Making GO Annotations For Fungal Genomes A brief Скачать презентацию Making GO Annotations For Fungal Genomes A brief

4d5567f77c5dfbf95cc0eb53eb16f50b.ppt

  • Количество слайдов: 118

Making GO Annotations For Fungal Genomes A brief overview Making GO Annotations For Fungal Genomes A brief overview

Outline of Topics • Intro • Overview of Overall Annotation Pipeline • Introduction to Outline of Topics • Intro • Overview of Overall Annotation Pipeline • Introduction to the Gene Ontologies (GO) • Making GO Annotations • Submitting GO Annotations • GO Tool - Ami. GO

Intro & Overview of Overall Sequence and Annotation Pipeline Karen Christie Saccharomyces Genome Databases Intro & Overview of Overall Sequence and Annotation Pipeline Karen Christie Saccharomyces Genome Databases Stanford University

Total Nucleotides at Gen. Bank/EMBL/DDBJ including Whole Genome Shotgun Dec 2006 1. 52 E+11 Total Nucleotides at Gen. Bank/EMBL/DDBJ including Whole Genome Shotgun Dec 2006 1. 52 E+11 NCBI created by Congress EBI created at Hinxton WGS section started Homo sapiens Mus musculus Drosophila melanogaster Caenorhabditis elegans Saccharomyces cerevisiae Haemophilus influenzae Growth in 2006 3. 50 x 1010 nucs Percent of Total 23. 2%

Fungal Genomes being sequenced at Broad Institute Fungal Genomes being sequenced at Broad Institute

Fungal Genomes being sequenced by JGI Fungal Genomes being sequenced by JGI

Published Literature Pub. Med: over 15 million citations Basic search: secondary metabolism → 109580 Published Literature Pub. Med: over 15 million citations Basic search: secondary metabolism → 109580 Limit search: secondary metabolism (published in the last 1 year) → 5440 Boolean operators: secondary metabolism AND Aspergillus 479 → Numbers as of 3/21/2007

Gene Ontology Objectives • GO represents categories used to classify specific parts of our Gene Ontology Objectives • GO represents categories used to classify specific parts of our biological knowledge: – Biological Process – Molecular Function – Cellular Component • GO develops a common language applicable to any organism • GO terms can be used to annotate gene products from any species, allowing comparison of information across species

My genome is sequenced! ATGTCTTTTTTAAGTGCATCGATGTCCTGGGGGCTTAGTATAATGCTCCCCGAG CTTCCTAG CGCTTAGTGCATTAGACTAGGGCCAAAATGACTACTGTTCTTAAAGTACTAGTA CTTACTAC GCCCTGTTTCTTCTTCTAAAAGACTAAGTGCTAGTCTAGATCTACTA TTACTAC CCTACTATACTAGACTAATTACCAACCCCTAGGGTACTAAATTTGCCTAGT TTACGTA GCGTTCTTAAAACGTACTAGATTACCGTACTAGGGACGTACTAAGGTACTAG… What My genome is sequenced! ATGTCTTTTTTAAGTGCATCGATGTCCTGGGGGCTTAGTATAATGCTCCCCGAG CTTCCTAG CGCTTAGTGCATTAGACTAGGGCCAAAATGACTACTGTTCTTAAAGTACTAGTA CTTACTAC GCCCTGTTTCTTCTTCTAAAAGACTAAGTGCTAGTCTAGATCTACTA TTACTAC CCTACTATACTAGACTAATTACCAACCCCTAGGGTACTAAATTTGCCTAGT TTACGTA GCGTTCTTAAAACGTACTAGATTACCGTACTAGGGACGTACTAAGGTACTAG… What do I do now?

Overview of Sequencing/Annotation Pipeline • Sequence of genes/genome ATGCTTCCTGATTTTGCCCTGGACTTCGCTTGTATAAATTCATTGCACC … • Primary Annotation - Overview of Sequencing/Annotation Pipeline • Sequence of genes/genome ATGCTTCCTGATTTTGCCCTGGACTTCGCTTGTATAAATTCATTGCACC … • Primary Annotation - the location and structure of genes • Secondary Annotation - the functions of GO process: terrequinone A biosynthesis the genes alcohol dehydrogenase GO function: methyltransferase activity Enzyme Commission: 2. 1. 1. -

Who will be annotating? • Just you? • A single group? • A consortium Who will be annotating? • Just you? • A single group? • A consortium of groups? The number of people and groups participating and the funding will affect some decisions on whether to set up a database or use flatfiles.

Do you (or your group) have gene calls for your sequence? yes no Make Do you (or your group) have gene calls for your sequence? yes no Make automated or manual gene calls Are the protein predictions submitted to Gen. Bank/DDBJ/EMBL? yes TIGR’s Eukaryotic Annotation course very useful no no yes Submit gene/protein calls to Gen. Bank/DDBJ/EMBL Contact GO Consortium for advice, training, help with coordination, etc. Decide who will collate all GO annotations into one file Uni. Prot. KB contains translations of all coding regions in Gen. Bank/DDBJ/EMBL GOA will make GO annotations (IEA) using automated methods Resources to make functional annotations? Set up pipeline for any automated annotations not being done by GOA will collect all GO annotations and submit them to GOC GOA will maintain annotation file Manual GO annotations from literature, or from sequence similarity methods You (or your group) collects all GO annotations and submits them to GOC You (or your group) maintains annotation file

Automated Eukaryotic Gene Annotation Genome Sequence EST Database Repeat masker Repeat masked sequence Develop Automated Eukaryotic Gene Annotation Genome Sequence EST Database Repeat masker Repeat masked sequence Develop a training set Database comparisons Gene finders Twinscan Gene. Zilla glimmer. HMM Augustus Fgenesh etc. Gene predictions Genome alignments Combined consensus prediction EST based refinement AAT_aa AAT_na t. RNA Scan GMAP Sim 4 etc. (adjust exons, UTRs, alternative splicing) Automated Gene Annotation Based on TIGR course

Manual Gene Annotation? 1 st Question - Is it in the budget? Manual annotation Manual Gene Annotation? 1 st Question - Is it in the budget? Manual annotation can be a lot better than automated, but is a lot more expensive and time consuming! Based on TIGR Eukaryotic Annotation course

Manual Gene Annotation Tools • Viewer only – Gbrowse • Editors – Apollo (requires Manual Gene Annotation Tools • Viewer only – Gbrowse • Editors – Apollo (requires a database) – Manatee (requires a database) – Artemis (runs on flat files) Based on TIGR Eukaryotic Annotation course

Eukaryotic Gene Annotation At the end of the procedure, you’ll have: • Gene calls Eukaryotic Gene Annotation At the end of the procedure, you’ll have: • Gene calls • Protein predictions • Unique IDs for your genes This last is important. Gene IDs are unambiguous. Gene names are frequently ambiguous. You’ll also need IDs in order to submit GO annotations. Example: Gene Name: SP 1 19242 hits in Entrez nucleotide Gene ID: NM_138473 1 hit

Ready to make Functional Annotations! • Questions – What’s your budget? – How much Ready to make Functional Annotations! • Questions – What’s your budget? – How much literature is available? • Automated annotations – Faster, cheaper – Often less specific • Manual annotations – Time consuming & more expensive – Precise and accurate

Do you (or your group) have gene calls for your sequence? yes no Are Do you (or your group) have gene calls for your sequence? yes no Are the protein predictions submitted to Gen. Bank/DDBJ/EMBL? yes TIGR’s Eukaryotic Annotation course very useful Make automated or manual gene calls no no Submit gene/protein calls to Gen. Bank/DDBJ/EMBL yes Contact GO Consortium for advice, training, help with coordination, etc. Decide who will collate all GO annotations into one file Uni. Prot. KB contains translations of all coding regions in Gen. Bank/DDBJ/EMBL GOA will make GO annotations (IEA) using automated methods Resources to make functional annotations? Set up pipeline for any automated annotations not being done by GOA Manual GO annotations from literature, or from sequence similarity methods

Introduction to GO Rama Balakrishnan Saccharomyces Genome Database Stanford University, CA Introduction to GO Rama Balakrishnan Saccharomyces Genome Database Stanford University, CA

The Gene Ontologies A Common Language for Annotation of Genes from Yeast, Flies and The Gene Ontologies A Common Language for Annotation of Genes from Yeast, Flies and Mice …and Plants and Worms …and Humans …and anything else!

http: //www. geneontology. org/ http: //www. geneontology. org/

What’s in a name? • What is a cell? What’s in a name? • What is a cell?

Cell Cell

Cell Cell

Cell Cell

Cell Cell

Cell Image from http: //microscopy. fsu. edu Cell Image from http: //microscopy. fsu. edu

What’s in a name? • The same name can be used to describe different What’s in a name? • The same name can be used to describe different concepts

What’s in a name? What’s in a name?

What’s in a name? • • • Glucose synthesis Glucose biosynthesis Glucose formation Glucose What’s in a name? • • • Glucose synthesis Glucose biosynthesis Glucose formation Glucose anabolism Gluconeogenesis • All refer to the process of making glucose from simpler components

What’s in a name? • The same name can be used to describe different What’s in a name? • The same name can be used to describe different concepts • A concept can be described using different names Comparison is difficult – in particular across species or across databases

What’s in a name? • Rad 54 (S. cerevisiae) • Okra (D. melanogaster) • What’s in a name? • Rad 54 (S. cerevisiae) • Okra (D. melanogaster) • Rhp 54 (S. pombe) What do these genes products have in common? ATP dependent helicase involved in DNA recombination, repair

What is the Gene Ontology? A (part of the) solution: - A controlled vocabulary What is the Gene Ontology? A (part of the) solution: - A controlled vocabulary that can be applied to all organisms - Used to describe gene products - proteins and RNA - in any organism

What is Ontology? 1606 1700 s • Dictionary: A branch of metaphysics concerned with What is Ontology? 1606 1700 s • Dictionary: A branch of metaphysics concerned with the nature and relations of being. • Barry Smith: The science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality.

So what does that mean? From a practical view, ontology is the representation of So what does that mean? From a practical view, ontology is the representation of something we know about. “Ontologies" consist of a representation of things, that are detectable or directly observable, and the relationships between those things.

Ontology Includes: 1. A vocabulary of terms (names for concepts) 2. Definitions 3. Defined Ontology Includes: 1. A vocabulary of terms (names for concepts) 2. Definitions 3. Defined logical relationships to each other

How does GO work? What information might we want to capture about a gene How does GO work? What information might we want to capture about a gene product? • What does the gene product do? – Molecular Function • Why does it perform these activities? – Process • Where does it act? – Location in the cell, cellular component

The 3 Gene Ontologies • Molecular Function = elemental activity/task – the tasks performed The 3 Gene Ontologies • Molecular Function = elemental activity/task – the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity • Biological Process = biological goal or objective – broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions • Cellular Component = location or complex – subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme

Cellular Component where a gene product acts Cellular Component where a gene product acts

Molecular Function activities or “jobs” of a gene product insulin binding insulin receptor activity Molecular Function activities or “jobs” of a gene product insulin binding insulin receptor activity glucose-6 -phosphate isomerase activity drug transporter activity

Molecular Function • A gene product may have several functions; a function term refers Molecular Function • A gene product may have several functions; a function term refers to a single reaction or activity, not a gene product. • Sets of functions make up a biological process.

Biological Process transcription cell division limb development Courtship behavior Biological Process transcription cell division limb development Courtship behavior

Example: Gene Product = hammer Function (what) Process (why) Drive nail (into wood) Carpentry Example: Gene Product = hammer Function (what) Process (why) Drive nail (into wood) Carpentry Drive stake (into soil) Gardening Smash roach Pest Control Clown’s juggling object Entertainment

What’s in a GO term? term: gluconeogenesis id: GO: 0006094 definition: The formation of What’s in a GO term? term: gluconeogenesis id: GO: 0006094 definition: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol. Synonym: glucose biosynthesis

No GO Areas • GO covers ‘normal’ functions and processes – No pathological processes No GO Areas • GO covers ‘normal’ functions and processes – No pathological processes – No experimental conditions • NO evolutionary relationships • NO gene products • NOT a system of nomenclature for genes

Ontology Structure • The Gene Ontology is structured as a hierarchical directed acyclic graph Ontology Structure • The Gene Ontology is structured as a hierarchical directed acyclic graph (DAG) • Terms can have more than one parent and zero, one or more children • Terms are linked by two relationships – is-a – part-of

Parent-Child Relationships Chromosome Cytoplasmic chromosome A child is a subset or instances of a Parent-Child Relationships Chromosome Cytoplasmic chromosome A child is a subset or instances of a parent’s elements Mitochondrial chromosome Nuclear chromosome Plastid chromosome

Parent-Child Relationships One-to-many parental relationship Many-to-many parental relationship DAG: Directed Acyclic Graph Each child Parent-Child Relationships One-to-many parental relationship Many-to-many parental relationship DAG: Directed Acyclic Graph Each child has only one parent Each child may have one or more parents

A Sample DAG cellular_component cell part is_a part_of Intracellular organelle chromosome [Other types of A Sample DAG cellular_component cell part is_a part_of Intracellular organelle chromosome [Other types of chromosomes] mitochondrial chromosome nucleus [other organelles] nuclear chromosome

True Path Rule • The path from a child term all the way up True Path Rule • The path from a child term all the way up to its top-level parent(s) must always be true cell cytoplasm chromosome nuclear chromosome cytoplasmic chromosome mitochondrial chromosome nucleus nuclear chromosome is-a part-of

Ensuring Stability in a Dynamic Ontology • Terms become obsolete when they are removed Ensuring Stability in a Dynamic Ontology • Terms become obsolete when they are removed or redefined • GO IDs are never deleted from the ontologies • For every obsoleted term, a comment is added to explains why the term is now obsolete Biological Process Molecular Function Cellular Component Obsolete Biological Process Obsolete Molecular Function Obsolete Cellular Component

Why modify the GO? • GO reflects current knowledge of biology • Biology drives Why modify the GO? • GO reflects current knowledge of biology • Biology drives changes to the ontologies

Obsolete terms term: MAPKKK cascade (mating sensu Saccharomyces) goid: GO: 0007244 definition: MAPKKK cascade Obsolete terms term: MAPKKK cascade (mating sensu Saccharomyces) goid: GO: 0007244 definition: MAPKKK cascade involved in transduction of definition: OBSOLETE. MAPKKK cascade involved in mating pheromone signal, as described in transduction of mating pheromone signal, Saccharomyces. definition_reference: PMID: 9561267 comment: This term was made obsolete because it is a gene product specific term. To update annotations, use the biological process term 'signal transduction during conjugation with cellular fusion ; GO: 0000750'.

What can scientists do with GO? • Access gene product functional information • Do What can scientists do with GO? • Access gene product functional information • Do cross species comparison • Find how much of a proteome is involved in a process/ function/ component • Provide a link between biological knowledge and … • gene expression profiles • proteomics data

Using GO to Aid Microarray Analysis Microarray analysis Whole genome analysis (J. D. Munkvold Using GO to Aid Microarray Analysis Microarray analysis Whole genome analysis (J. D. Munkvold et al. , 2004)

Beyond GO – Open Biomedical Ontologies Orthogonal to existing ontologies to facilitate combinatorial approaches Beyond GO – Open Biomedical Ontologies Orthogonal to existing ontologies to facilitate combinatorial approaches - Share unique identifier space - Include definitions • Anatomies • Cell Types • Sequence Attributes (SO) • Temporal Attributes • Phenotypes • Diseases • More…. http: //obo. sourceforge. net

GO Annotations: What are they and how are they made? Maria Costanzo Saccharomyces and GO Annotations: What are they and how are they made? Maria Costanzo Saccharomyces and Candida Genome Databases Stanford University

Let’s Get Started! • What is an annotation? • Annotation approaches • Strategies for Let’s Get Started! • What is an annotation? • Annotation approaches • Strategies for identifying literature to annotate • Strategies for reading a paper for annotation • Strategies for annotating a gene and a genome

What is a GO annotation? • A annotation is a piece of information associated What is a GO annotation? • A annotation is a piece of information associated with a gene product • A gene product is usually a protein but can be a functional RNA • A GO annotation is a Gene Ontology term associated with a gene product

Anatomy of a GO annotation Reference Gene Product IMP, IGI, IPI, ISS, IDA, IEP, Anatomy of a GO annotation Reference Gene Product IMP, IGI, IPI, ISS, IDA, IEP, TAS, ND, RCA, IC, IEA Evidence Code GO Term

Evidence Codes for GO Annotations IMP IGI IPI ISS IDA IEP IC TAS ND Evidence Codes for GO Annotations IMP IGI IPI ISS IDA IEP IC TAS ND IEA inferred from mutant phenotype inferred from genetic interaction inferred from physical interaction inferred from sequence similarity inferred from direct assay inferred from expression pattern inferred by curator traceable author statement non-traceable author statement no biological data available inferred from electronic annotation http: //www. geneontology. org/doc/GO. evidence. html

Additional annotation information • WITH/FROM: supporting info for the evidence code – IPI, IGI, Additional annotation information • WITH/FROM: supporting info for the evidence code – IPI, IGI, ISS, IEA, IC – Contains the interacting or similar gene product • QUALIFIER: describes the GO term – NOT – contributes to (used with Molecular Function terms) – colocalizes with (used with Cellular Component terms)

Approaches for annotation of a genome 1. Automated/Electronic approaches 2. Manual approaches 3. Combinatorial Approaches for annotation of a genome 1. Automated/Electronic approaches 2. Manual approaches 3. Combinatorial approach

Electronic Annotation • Generate annotations relatively quickly & cheaply • Annotation derived without human Electronic Annotation • Generate annotations relatively quickly & cheaply • Annotation derived without human validation – Sequence similarity, e. g. BLAST search ‘hits’, HMMs, etc. – Mapping file, e. g. interpro 2 go, ec 2 go, etc. • Useful For: – genomes that don’t have extensive literature – groups with limited curatorial resources

Electronic Annotation • Often based on sequence similarity • Document the method used in Electronic Annotation • Often based on sequence similarity • Document the method used in a abstract – unpublished abstract in your own database – unpublished abstract submitted to GO references collection • Annotation is not reviewed by human • IEA evidence code

Combinatorial Approach, e. g. using sequence similarity 1. Alignments published in literature 2. Analysis Combinatorial Approach, e. g. using sequence similarity 1. Alignments published in literature 2. Analysis using full length protein 3. Analysis using protein domains

Example IEA Annotations from dicty. Base Example IEA Annotations from dicty. Base

Example unpublished reference Example unpublished reference

Manual annotation • Created by scientific curators • Time intensive • Utilizes – published Manual annotation • Created by scientific curators • Time intensive • Utilizes – published literature – sequence comparison data • Aided by curation tools – Manatee (open source from TIGR) – Apollo (open source from GMOD) – Artemis (open source)

Literature Source 1. Pub. Med - National Library of Medicine, National Institutes of Health Literature Source 1. Pub. Med - National Library of Medicine, National Institutes of Health - http: //ncbi. nlm. nih. gov 2. Agricola - United States Department of Agriculture, National Agricultural Library - http: //agricola. nal. usda. gov 3. Embase - Elsevier - http: //www. embase. com 4. Biosis - Thomson - http: //www. biosis. org 5. Unpublished (e. g. for internal sequence analysis methods) - abstract in your own database - unpublished abstract submitted to GO references collection

Example Annotation nek 2 PMID: 11956323 Reference Gene Product IDA centrosome GO: 0005813 Inferred Example Annotation nek 2 PMID: 11956323 Reference Gene Product IDA centrosome GO: 0005813 Inferred from Direct Assay GO Term Evidence Code

What to Search For in Published Literature? 1. Species name 2. Gene/gene product names: What to Search For in Published Literature? 1. Species name 2. Gene/gene product names: daf-12, spo 11, Sonic hedgehog 3. Process AND species: embryonic development AND elegans 4. Function AND species: transcription factor AND mays 5. Cellular component AND species (genus): plasma membrane AND Drosophila

GO Annotation: GMOD Tools for Enhancing Information Retrieval GMOD – Generic Software Components for GO Annotation: GMOD Tools for Enhancing Information Retrieval GMOD – Generic Software Components for Model Organism Databases - http: //www. gmod. org/home - Literature search tools: Pub. Search – http: //www. gmod. org/? q=node/44 Pub. Fetch - http: //www. gmod. org/? q=node/84 Textpresso – http: //www. textpresso. org - full text of articles - semantic categories

GO Annotation: Strategies for Identifying Literature for Curation 1. Primary research literature with new GO Annotation: Strategies for Identifying Literature for Curation 1. Primary research literature with new experimental data 2. - Mutant phenotypes – process 3. - Activity assays – function 4. - Localization studies – component 5. 2. Computational analyses 6. - Phylogenetic analysis – function (ISS) 7. - Domain analysis 8. 3. Review articles - Summarizes and cites primary literature (TAS)

Which parts of the paper are most important? • Experimental Results • Results: Figures, Which parts of the paper are most important? • Experimental Results • Results: Figures, Tables, Text • Materials and Methods • Introductory information • Abstract • Explanatory text (use with caution) • (Introduction) – mostly TAS information • (Discussion)

Reading papers as curator, rather than as a bench scientist • Don’t be swayed Reading papers as curator, rather than as a bench scientist • Don’t be swayed by the speculations or theories that may appear in the Discussion. • Focus on the actual results vs. the possible, but not proven, implications of those results. • Read for details and contact authors if key identifiers are missing.

How to find a GO term to use? • Web based tools– Ami. GO How to find a GO term to use? • Web based tools– Ami. GO browser (http: //www. godatabase. org) – Quick. GO (http: //www. ebi. ac. uk/ego/) • Downloadable tool (https: //sourceforge. net/projects/geneontology/) – OBO-Edit (must also d ownload the ontology file)

Extracting Information from a paper Sample text from PMID: 12374299 In this study, we Extracting Information from a paper Sample text from PMID: 12374299 In this study, we report the isolation and molecular characterization of the B. napus PERK 1 c. DNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK 1 has serine/threonine kinase activity, In addition, the location of a PERK 1 -GTP fusion protein to the plasma membrane supports the prediction that PERK 1 is an integral membrane protein…these kinases have been implicated in early stages of wound response…

Example Manual Annotations from SGD Example Manual Annotations from SGD

Annotation from published literature 1. Focus on known genes 2. Identify literature relevant to Annotation from published literature 1. Focus on known genes 2. Identify literature relevant to that gene a. using gene names, species name 3. Complete annotation set for a gene a. annotate available experimental data b. annotations to root nodes indicate nothing is known

Annotating genes to GO terms and the True Path Rule cellular_component cell part is_a Annotating genes to GO terms and the True Path Rule cellular_component cell part is_a part_of Intracellular organelle chromosome [Other types of chromosomes] mitochondrial chromosome nucleus [other organelles] nuclear chromosome RAD 51 RAD 52

The True Path Rule Applied to Annotations Are all paths to the root true The True Path Rule Applied to Annotations Are all paths to the root true for my gene product? • Yes, great, annotate • No? – Is there a term I can use where all paths will be true – Does the ontology structure need to be changed?

I don’t see terms in the ontology to describe the biology of my species I don’t see terms in the ontology to describe the biology of my species • Source Forge (SF) tracker for term related issues https: //sourceforge. net/projects/geneontology/ • Send an email to the GO mailing list • Content meetings – Organized by the consortium if the ontology related issues can’t be resolved over email/SF – Look for announcements on the GO website, mailing lists

1. Develop GO terms for functions, processes and structures used by microbes in their 1. Develop GO terms for functions, processes and structures used by microbes in their associations with plants and animals • fungi, oomycetes, bacteria, nematodes • 472 terms recently added to GO 2. Create reference genomes by manual annotation of selected microbe genomes • in progress 3. Training workshops • July 26, 2007. IS-MPMI Workshop, Sorrento, Italy • August 8 -10, 2007. Virginia Bioinformatics Institute - travel funds available for students and postdocs http: //pamgo. vbi. vt. edu

GO Terms needed: Secondary Metabolism • The fungal community is going to need to GO Terms needed: Secondary Metabolism • The fungal community is going to need to add new terms: – secondary metabolism pathways – possibly other areas • Fungal species so far annotated have not had secondary metabolism pathways, so no terms have been created to represent these areas • The GO Consortium will be very happy to work with the fungal community to create the needed terms

Contributing GO Annotations Karen Christie Saccharomyces Genome Databases Stanford University Contributing GO Annotations Karen Christie Saccharomyces Genome Databases Stanford University

Do you (or your group) have gene calls for your sequence? yes no Make Do you (or your group) have gene calls for your sequence? yes no Make automated or manual gene calls Are the gene/protein predictions submitted to Gen. Bank/DDBJ/EMBL? yes TIGR’s Eukaryotic Annotation course very useful no no yes Submit gene/protein calls to Gen. Bank/DDBJ/EMBL Contact GO Consortium for advice, training, help with coordination, etc. Decide who will collate all GO annotations into one file Uni. Prot. KB contains translations of all coding regions in Gen. Bank/DDBJ/EMBL GOA will make GO annotations (IEA) using automated methods Resources to make functional annotations? Set up pipeline for any automated annotations not being done by GOA will collect all GO annotations and submit them to GOC GOA will maintain annotation file Manual GO annotations from literature, or from sequence similarity methods You (or your group) collects all GO annotations and submits them to GOC You (or your group) maintains annotation file

Do you (or your group) have gene calls for your sequence? yes no Are Do you (or your group) have gene calls for your sequence? yes no Are the protein predictions submitted to Gen. Bank/DDBJ/EMBL? yes Make automated or manual gene calls no no Submit gene/protein calls to Gen. Bank/DDBJ/EMBL Uni. Prot. KB contains translations of all coding regions in Gen. Bank/DDBJ/EMBL GOA will make GO annotations (IEA) using automated methods GOA will collect all GO annotations and submit them to GOC GOA will maintain annotation file Resources to make functional annotations?

Do you (or your group) have gene calls for your sequence? yes no Are Do you (or your group) have gene calls for your sequence? yes no Are the protein predictions submitted to Gen. Bank/DDBJ/EMBL? yes Make automated or manual gene calls Resources to make functional annotations? no yes Submit gene/protein calls to Gen. Bank/DDBJ/EMBL Contact GO Consortium for advice, training, help with coordination, etc. Decide who will collate all GO annotations into one file Uni. Prot. KB contains translations of all coding regions in Gen. Bank/DDBJ/EMBL GOA will make GO annotations (IEA) using automated methods Set up pipeline for any automated annotations not being done by GOA Manual GO annotations from literature, or from sequence similarity methods You (or your group) collects all GO annotations and submits them to GOC You (or your group) maintains annotation file

I have my annotations, what next? gene_association file - format info at http: //www. I have my annotations, what next? gene_association file - format info at http: //www. geneontology. org/GO. annotation. shtml#file DB: Source of the ID in column 2 Examples- SGD, MGI, Uni. Prot ID for the gene or gene_product Examples - FBgn 0015331, MGI: 99240, SPAC 9. 03 c Symbol like Brr 2, DDX 21_HUMAN that means something to a biologist, not an ID Object_Type - gene, transcript, protein_structure, or complex, should match the ID

Sample gene-associations file These columns may be empty Sample gene-associations file These columns may be empty

What tools/infrastructure do you need to record annotations? • Excel spread sheet (simple, easy, What tools/infrastructure do you need to record annotations? • Excel spread sheet (simple, easy, small scale) OR • Database – File. Maker Pro, Access (Simple databases) – ORACLE, Sybase, or My. SQL (Relational databases)

How do I share my gene_associations file? • Provide them to the larger community How do I share my gene_associations file? • Provide them to the larger community by submitting your annotations to the GO project • What information should I submit to GO? – Gene_association file – Short file with info about submitting group • Where should I submit the data? – Contact the GOC to establish a contact for your group – [email protected] org

Databases contributing annotations include: – dicty. Base (Dictyostelium discoideum) – Fly. Base (Drosophila melanogaster) Databases contributing annotations include: – dicty. Base (Dictyostelium discoideum) – Fly. Base (Drosophila melanogaster) – Gene. DB (Schizosaccharomyces pombe, Plasmodium falciparum, Leishmania major and Trypanosoma brucei) – Uni. Prot Knowledgebase (Swiss-Prot/Tr. EMBL/PIR-PSD) and Inter. Pro databases – Gramene (grains, including rice, Oryza) – Mouse Genome Database (MGD) and Gene Expression Database (GXD) (Mus musculus) – Rat Genome Database (RGD) (Rattus norvegicus) – Reactome – Saccharomyces Genome Database (SGD) (Saccharomyces cerevisiae) – The Arabidopsis Information Resource (TAIR) (Arabidopsis thaliana) – The Institute for Genomic Research (TIGR): databases on several bacterial species – Worm. Base (Caenorhabditis elegans) – Zebrafish Information Network (ZFIN): (Danio rerio)

Annotation Coverage by Genome Annotation coverage Annotation Coverage by Genome Annotation coverage

GO Current Annotations http: //www. geneontology. org/GO. current. annotations. shtml GO Current Annotations http: //www. geneontology. org/GO. current. annotations. shtml

GO Current Annotations: Filtered Files http: //www. geneontology. org/GO. current. annotations. shtml GO Current Annotations: Filtered Files http: //www. geneontology. org/GO. current. annotations. shtml

GO Current Annotations: Unfiltered Files http: //www. geneontology. org/GO. current. annotations. shtml GO Current Annotations: Unfiltered Files http: //www. geneontology. org/GO. current. annotations. shtml

GOA Proteome Species Specific Files http: //www. ebi. ac. uk/GOA/proteomes. html GOA Proteome Species Specific Files http: //www. ebi. ac. uk/GOA/proteomes. html

Resources offered by the GO project • Website (http: //www. geneontology. org) – Lots Resources offered by the GO project • Website (http: //www. geneontology. org) – Lots of documentation – Tools, tutorials and software • Mailing list ([email protected] org) • Help email address ([email protected] org) • GO project on Source. Forge (https: //sourceforge. net/projects/geneontology) – Submit suggestions, e. g. new ontology terms, etc. – Download tools, e. g. OBO-Edit • Ami. GO browser (http: //amigo. geneontology. org) • GO database

Ami. GO Tutorial Rama Balakrishnan Saccharomyces Genome Database Stanford University Ami. GO Tutorial Rama Balakrishnan Saccharomyces Genome Database Stanford University

What is Ami. GO? • Web application that allows you to: – browse the What is Ami. GO? • Web application that allows you to: – browse the ontologies – view annotations from various species – compare sequences using BLAST (GOst)

Ami. GO http: //amigo. geneontology. org Ami. GO http: //amigo. geneontology. org

Basic Search Basic Search

Ami. GO Search Results: GO Terms Ami. GO Search Results: GO Terms

Term Details Page Term Details Page

Gene Product Details and Annotations Gene Product Details and Annotations

Node has children, can be clicked to view children Node has been opened, can Node has children, can be clicked to view children Node has been opened, can be clicked to close Leaf node or no children Is_a relationship Part_of relationship pie chart summary of the numbers of gene products associated to any immediate descendants of this term in the tree .

Annotations associated with a term Annotation data are from the gene_associations file submitted by Annotations associated with a term Annotation data are from the gene_associations file submitted by the annotating groups

Ami. GO Advanced Search Ami. GO Advanced Search

Filters Filters

BLAST • • Blast a protein sequence against all gene products that have a BLAST • • Blast a protein sequence against all gene products that have a GO annotation Can be accessed from the Ami. GO Home page (front page)

BLAST can also be accessed from the annotations section BLAST can also be accessed from the annotations section

Ami. GO Help Ami. GO Help

Contact us • We welcome your input • Please send suggestions, bugs to us Contact us • We welcome your input • Please send suggestions, bugs to us • [email protected] org

Contact us • We welcome your input • Please send suggestions, bugs to us Contact us • We welcome your input • Please send suggestions, bugs to us • [email protected] org

Acknowledgements The people of the GO Consortium: Acknowledgements The people of the GO Consortium: