![Скачать презентацию A Field Guide to Gen Bank and NCBI s Скачать презентацию A Field Guide to Gen Bank and NCBI s](https://present5.com/wp-content/plugins/kama-clic-counter/icons/ppt.jpg)
eda361dd98e4d81d61ebb71920a83137.ppt
- Количество слайдов: 127
A Field Guide to Gen. Bank and NCBI’s Molecular Biology Resources August 30, 2005 University of Colorado Health Sciences Center NCBI Field. Guide National Center for Biotechnology Information
q q q About NCBI Gen. Bank overview Primary vs derivative databases § The Reference Sequence (Ref. Seq) project q q q Entrez databases Genome resources Bookshelf -break- q q Entrez text searching BLAST sequence searching VAST structure searching An integrated example NCBI Field. Guide Topics
Bethesda, MD NCBI Field. Guide The National Institutes of Health
q Accepts submissions of primary data q Develops tools to analyze these data q q Creates derivative databases based on the primary data Provides free search, link, and retrieval of these data, primarily through the Entrez system NCBI Field. Guide The National Center for Biotechnology Information
NCBI Field. Guide NCBI WWW Users per Day
1997 1998 1999 2000 2001 2002 2003 Christmas & New Year NCBI Field. Guide Number of Users Per Day
all[filter] NCBI Field. Guide Homepage - accessing the data
all[filter] 3/15/2005 8/15/2005 NCBI Field. Guide 1/11/2005
Primary Data q Gen. Bank / DDBJ / EMBL # records 57. 3 million (97. 4 %) Derivative Data q Ref. Seq § Ref. Seq reviewed q PDB (structures) “Total” 1. 47 million (2. 5 %) 60, 000 5, 973 59 million NCBI Field. Guide Entrez Nucleotide
NCBI’s Primary Sequence Database Release 149 47 x 106 52 x 109 195 Gigabytes August 2005 Records Nucleotides Over 100 billion bases! 816 files • full release every two months • incremental and cumulative updates daily • available only through internet • release notes: gbrel. txt ftp: //ftp. ncbi. nih. gov/genbank/ ftp: //genbank. sdsc. edu/pub ftp: //bio-mirror. net/biomirror/genbank NCBI Field. Guide Gen. Bank:
q q Nucleotide only sequence database Archival in nature Gen. Bank Data § Direct submissions (traditional records) § Batch submissions (EST, GSS, STS) § ftp accounts (genome data) Three collaborating databases § Gen. Bank § DNA Database of Japan (DDBJ) § European Molecular Biology Laboratory (EMBL) Database NCBI Field. Guide What is Gen. Bank?
“Organismal” PRI ROD PLN BCT INV VRT VRL MAM PHG SYN UNA (28) (15) (13) (11) (7) (4) (2) (1) (1) Primate Rodent Plant and Fungal Bacterial/Archeal Invertebrate Other Vertebrate Viral Mammalian Phage Synthetic Unannotated EST GSS HTG PAT STS CON (377) (138) (63) (17) (9) (1) Expressed Sequence Tag Genome Survey Sequence High Throughput Genomic Patent Sequence Tagged Site Contigs, virtual • Organized by taxonomy (sort of) • Direct submissions (Sequin/Bankit) • Accurate (~1 error per 10, 000 bp) • Well characterized “Functional” • Organized by sequence type • Batch submissions (ftp/email) • Inaccurate • Poorly characterized NCBI Field. Guide Gen. Bank Divisions
q Expressed Sequence Tag § 1 st pass single read c. DNA q Gen. Bank EST GSS § 1 st pass single read g. DNA q High Throughput Genomic § incomplete sequences of genomic HTG STS Genome Survey Sequence clones q Sequence Tagged Site § PCR-based mapping reagents Whole Genome Shotgun NCBI Field. Guide Gen. Bank Functional (Bulk) Divisions
>IMAGE: 275615 5' m. RNA sequence GACAGCATTCGGGCCGAGATGTCTCGCTCCGTGGCCTTAGCTGTGCTCGCGCTACTCTTTCTGG TGGAGGTATCCAGCGTACTCCAAAGATTCAGGTTTACTCACGTCATCCAGCAGAGAATGGAAAGTCAA TTCCTGAATTGCTATGTGTCTGGGTTTCATCCGACATTGAAGTTGACTTACTGAAGAATGGAGA 5’ GAATTGAAAAAGTGGAGCATTCAGACTTGTCTTTCAGCAAGGACTGGTCTTTCTATCTCTTGTACTAC 30, 000 TGAATTCACCCCCACTGAAAAAGATGAGTATGCCGTGTTGAACCATGTNGACTTTGTCACAGNC AAGTTNAGTTTAAGTGGGNATCGAGACATGTAAGGCATCATGGGAGGTTTTGAAGNATGCCGCN genes 3’ TTGGATTGGGATGAATTCCAAATTTCTGGTTTGCTTGNTTTTTTAATATTGGATATGCTTTTG nucleus >IMAGE: 275615 3', m. RNA sequence - isolate unique clones NNTCAAGTTTTATGATTTAACTTGTGGAACAAAAATAAACCAGATTAACCACAACCATGCCTTA RNA - sequence once from TTATCAAATGTATAAGANGTAAATATGAATCTTATATGACAAAATGTTTCATTATAACAAATTT gene products each end AATAATCCTGTCAATNATATTTCTAAATTTTCCCCCAAATTCTAAGCAGAGTATGTAAATTGGAAGTT CTTATGCACGCTTAACTATCTTAACAAGCTTTGAGTGCAAGAGATTGANGAGTTCAAATCTGACCAAG GTTGATGTTGGATAAGAGAATTCTCTGCTCCCCACCTCTANGTTGCCAGCCCTC make c. DNA library 80 -100, 000 unique c. DNA clones in library NCBI Field. Guide EST Division: Expressed Sequence Tags
NCBI Field. Guide GSS, WGS, HTG Whole BAC insert (or genome) shred sequence GSS division or trace archive assembly isolate clones whole genome shotgun assemblies (traditional division) Draft sequence (HTG division)
LOCUS AC 141845 Honeybee Draft Sequences 147720 bp DNA linear HTG 19 -MAR-2004 DEFINITION Apis mellifera clone CH 224 -4 A 2, WORKING DRAFT SEQUENCE, 14 unordered pieces. ACCESSION AC 141845 VERSION AC 141845. 1 GI: 29124029 KEYWORDS HTG; HTGS_PHASE 1; HTGS_DRAFT. • Unfinished sequences of BACs • Gaps and unordered pieces • Finished sequences (Phase 3) move to traditional Gen. Bank division NCBI Field. Guide HTG Example:
q 351 projects § Bacteria (251) § Environmental sequences (6) § Archaea (6) § Eukaryotes (88), including: § Chicken, Rat, Mouse, Dog (2), Chimpanzee, Human § Pufferfish (2) § Honeybee, Anopheles, Fruit Flies (3), Silkworm § Nematode (2) § Yeasts (8), Aspergillus (2) § Rice (2) NCBI Field. Guide Whole Genome Shotgun Projects
wgs master[properties] NCBI Field. Guide Whole Genome Shotgun (WGS) Projects
Sequencing C T T ATT Centers A GA A G Uni. Gene CCT CT A C T ACA T NCBI Field. Guide Derivative Databases GA Gen. Bank Updated ONLY by submitters INV VRT PHG VRL EST STS HTG GSS PRI ROD PLN MAM BCT Labs Uni. STS Updated by NCBI Ref. Seq: Ref. Seq Entrez Gene and annotation pipelines
Entrez Nucleotide query: human[organism] AND lipase[title] NCBI Field. Guide Why Make Reference Sequences?
Entrez Nucleotide query: Why Make Reference Sequences? NCBI Field. Guide human[organism] AND lipase[title]
human[organism] AND lipase[title] AND endothelial[title] 4150 bp 2323 bp 3927 bp 261 bp NCBI Field. Guide 3927 bp
genomes • transcripts proteins non-redundant; best representative • updates to reflect current sequence data and biology • distinct, stable accession series NCBI Field. Guide Ref. Seq Benefits
Accession Sequence Type NM_123456789 NP_123456789 NR_123456 XM_123456 XP_123456 XR_123456 ZP_12345678 m. RNA protein, from NM_ non-coding RNA predicted m. RNA predicted protein predicted non-coding RNA predicted from NZ_ NC_123456 NG_123455 genomic, e. g. , chromosomes genomic, incomplete region NT_123456 NW_123456 NZ_ABCD 12345678 genomic, BAC assembly genomic, WGS collection blue=curated NCBI Field. Guide Reference Sequence: Ref. Seq
NCBI Field. Guide Annotation Process Genomic DNA (NC, NT, NW) Scanning. . Model m. RNA (XM) (XR) Curated m. RNA (NM) (NR) Ref. Seq Genbank Sequences Model protein (XP) Curated Protein (NP)
Genome annotation NM’s must have c. DNA support transcript variant 1 transcript variant 2 transcript variant 3 Longest m. RNA NCBI Field. Guide Creating NM_ Records
NCBI Field. Guide Where is Ref. Seq?
Cancer. Chromosomes Gene Uni. ST S Homologene SNP Genome Pop. Set Nucleotide GEO Books Me. SH Pub. Med OMIM Entrez Protein Taxonomy GENSAT Pub. Chem PMC Journals Domains Structur e 3 D Domains NCBI Field. Guide The Entrez System
Uni. Gene Clusters of ESTs, m. RNAs q db. SNP Single Nucleotide Polymorphisms q GEO Gene Expression Omnibus q microarray and other expression data q CDD Conserved Domain Database protein families (COGs and KOGs) single domains (PFAM, SMART, CD) NCBI Field. Guide A Few Entrez Databases
Gene-oriented clusters of expressed sequences • Automatic clustering using Mega. Blast • Each cluster represents a unique gene • Informed by genome hits • Information on tissue types and map locations • Useful for gene discovery and selection of mapping reagents NCBI Field. Guide Uni. Gene
NCBI Field. Guide A Cluster of ESTs query 5’ EST hits 3’ EST hits
Uni. Gene Collections NCBI Field. Guide
Example Uni. Gene Cluster NCBI Field. Guide
(Now at Build #186) NCBI Field. Guide Histogram of cluster sizes for Uni. Gene Hs Build 177
Uni. Gene Cluster Hs. 95351 NCBI Field. Guide SELECTED PROTEIN SIMILARITES
Uni. Gene Cluster Hs. 95351 NCBI Field. Guide GENE EXPRESSION
NCBI Field. Guide Uni. Gene Cluster Hs. 95351: expression
NCBI Field. Guide Uni. Gene Cluster Hs. 95351: seqs
web page ftp: //ftp. ncbi. nih. gov/repository/Uni. Gene/Homo_sapiens/ NCBI Field. Guide Download sequences
Entrez GEO NCBI Field. Guide
q Primary and derivative (Ref. SNP) § Single nucleotide polymorphisms § Repeat polymorphisms § Insertion-deletion polymorphisms q Over 19 million ref. SNPs (rs. XXXXXXX) (August, 2005) NCBI Field. Guide NCBI’s SNP Database
NCBI Field. Guide Searching db. SNP
NCBI Field. Guide Ref. SNP
NCBI Field. Guide Ref. SNP
NCBI Field. Guide Ref. SNP
Search Mouse SNP between strains NCBI Field. Guide Ref. SNP
Map. View Gene. View Seq. View No 3 D OMIM NCBI Field. Guide Ref. SNP
NCBI Field. Guide Ref. SNP
Entrez GEO NCBI Field. Guide
GPL Platform descriptions GSM GSE Grouping of Raw/processed slide/chip data spot intensities from a single “a single experiment” slide/chip GEO Sa. Mple: Grouping of experiments set of related conditions GDS GEO SEries: experimental Curated by NCBI Field. Guide Submitted by Manufacturer* Submitted by Experimentalists samples Entrez GEO Datasets
Supplied by submitter Platform Sample Series (GPL) (GSM) (GSE) array definition hyb. measurements related Samples Data. Set Assembled by GEO staff (GDS) • A collection of experimentally-related samples processed using the same platform. • Samples within Data. Sets are organized into subgroups based on experimental variables. • Form the basis of GEO’s query, analysis and data display tools. NCBI Field. Guide What’s a Data. Set?
Dataset browser NCBI Field. Guide Gene Expression Omnibus (GEO)
NCBI Field. Guide GEO Dataset Browser
NCBI Field. Guide GEO Dataset Report
… of 12625 NCBI Field. Guide GEO Profiles
Entrez CDD NCBI Field. Guide
q Multiple sequence alignments q Position-specific scoring matrices (PSSM) q Sources SMART, PFAM, COGs, KOGs, and NCBI curated domains (structure-informed alignments) NCBI Field. Guide Conserved Domain Database
>gi|45549418|gb|AAS 67634. 1| ATP 7 A [Solenodon paradoxus] IVYQPHLITVEEIKKQIKAVGFPAFIKKQPKYLKLGAIDIERLKNIPVKSSEGSQQMSPS STNDSKVTLTIDGMHCNSCVSNIESALSTLHYVSSIVVSLQNKSAIIKYNANSVTPEIL KKAIEAISPGQYRVSITSEVESTSNSPSSSSQKAPLNVVSQPLTQVTVININGMTCNS CVQSIEGVMSKKAGVKSIQVSLANRNGTVEYDP LLTSPEILRE NCBI Field. Guide CDD
NCBI Field. Guide CDD Click on a colored bar to align your sequence to the CD CD Pfam COG
NCBI Field. Guide Conserved Domain Database: cd 00371. 1, HMA
NCBI Field. Guide CDD
CDART: Conserved Domain Architecture Retrieval Tool NCBI Field. Guide
Linking from Entrez Protein NCBI Field. Guide cdd
Genomic Biology Gene database Homologene Map Viewer Trace Archive NCBI Field. Guide Genome Resources
NCBI Field. Guide Genomic Biology
Gen Biol: Gen Resources NCBI Field. Guide
NCBI Field. Guide Gen Biol: Gen Resources
Gen Biol: Gen Resources NCBI Field. Guide
NCBI Field. Guide Genome Projects: microb
Gen Biol: Gen Resources NCBI Field. Guide
Gen Biol: Gen Resources NCBI Field. Guide
Gen Biol: Gen Resources NCBI Field. Guide
Gen Biol: Gen Resources NCBI Field. Guide
NCBI Field. Guide Gen Biol: Gen Resources
Genomic Biology Gene database Homologene Map Viewer Trace Archive NCBI Field. Guide Genome Resources
A single query interface to … • Sequences - Ref. Seqs - Gen. Bank - Homologene • Maps – Map. Viewer • Entrez links • Linkouts More organisms, ~ 3000 Entrez integration NCBI Field. Guide Entrez Gene
Global Entrez: NADH 2 NCBI Field. Guide
NCBI Field. Guide Entrez Gene: NADH 2
Not found with “nadh 2” Homo sapiens NCBI Field. Guide Gene Record for Pongo NADH 2
NCBI Field. Guide A Record With More Data: Human HFE
Transcripts with experimental evidence NCBI Field. Guide Human HFE: Transcripts
NCBI Field. Guide Gene Table
NCBI Field. Guide Introns/Exons: Gene Table links to sequence
Human HFE: Links NCBI Field. Guide
NCBI Field. Guide Genotype
Genotype NCBI Field. Guide
Human HFE: Links NCBI Field. Guide
NCBI Field. Guide Gene. View in db. SNP
NCBI Field. Guide SNP in Structure
NCBI Field. Guide SNP in Structure
NCBI Field. Guide SNP in Structure H 41 S 43 C 260
Another Variation Source: OMIM NCBI Field. Guide
NCBI Field. Guide Variants in OMIM
Genomic Biology Gene database Homologene Map Viewer Trace Archive NCBI Field. Guide Genome Resources
Automated detection of homologs among the annotated genes of completely sequenced eukaryotic genomes. q No longer Uni. Gene based q Protein similarities first q Guided by taxonomic tree q Includes orthologs and paralogs NCBI Field. Guide The New Homologene
Homologene Build 43. 1 (8/23/05) Species Number of genes input grouped groups NCBI Field. Guide The New Homologene
NCBI Field. Guide RAG 1 → Homologene
NCBI Field. Guide RAG 1 → Homolgene RAG 1
RAG 1 NCBI Field. Guide RING-finger
NCBI Field. Guide RAG 1 → Homolgene RAG 1
RAG 1 NCBI Field. Guide Sugar_tr
NCBI Field. Guide Homologene: alignment scores
NCBI Field. Guide BLASTP bl 2 seq
Locus. Link Gene database Uni. Gene Homologene Map Viewer Trace Archive NCBI Field. Guide Genome Resources
NCBI Field. Guide List View
Human Map. Viewer NCBI Field. Guide adar
Map. Viewer: Human ADAR NCBI Field. Guide
5’ UTR MV Hs ADAR NCBI Field. Guide 3’ UTR
--Sequence maps-Ab initio Assembly Repeats BES_Clone NCI_Clone Contig Component Cp. G island db. SNP haplotype Fosmid Gen. Bank_DNA Gene Phenotype SAGE_Tag STS TCAG_RNA Transcript (RNA) Hs_Uni. Gene Hs_EST Mm_Uni. Gene Mm_EST Rn_Uni. Gene Rn_EST Ssc_Uni. Gene Ssc_EST Bt_Uni. Gene Bt_EST Gga_Uni. Gene Gga_EST Variation --Cytogenetic maps-Ideogram FISH Clone Gene_Cytogenetic Mitelman Breakpoint Morbid/Disease --Genetic Maps-de. CODE Genethon Marshfield --RH maps-= SNP Gene. Map 99 -G 3 Gene. Map 99 -GB 4 NCBI RH Standford-G 3 TNG Whitehead-RH Whitehead-YAC NCBI Field. Guide Maps & Options
Map. Viewer Uni. Gene Repeats Gene NCBI Field. Guide Component
Phenotype NCBI Field. Guide Gene Variation
NCBI Field. Guide Maps & Options
Locus. Link Gene database Uni. Gene Homologene Map Viewer Trace Archive NCBI Field. Guide Genome Resources
NCBI Field. Guide Trace Archive Page
NCBI Field. Guide Macaca Mulatta Traces
NCBI Field. Guide
Access to sequences NOT in Gen. Bank NCBI Field. Guide Trace Archive BLAST Page
NCBI Field. Guide Literature Links
NCBI Field. Guide BOOKS Database
NCBI Field. Guide BOOKS Database: hyperlinked
NCBI Field. Guide BOOKS Database
NCBI Field. Guide BOOKS Database
NCBI Field. Guide BOOKS Database
NCBI Field. Guide Genes & Dis
NCBI Field. Guide Genes & Dis
NCBI Field. Guide For More Information…
NCBI Field. Guide Intermission
eda361dd98e4d81d61ebb71920a83137.ppt