204a0e445ec6354a23d757d7ee59fa16.ppt
- Количество слайдов: 52
Bioinformatika pro Přf. UK 2001 Jiří Vondrášek Ústav organické chemie a biochemie vondrasek@uochb. cas. cz Jan Pačes Ústav molekulární genetiky hpaces@img. cas. cz http: //bio. img. cas. cz/Prf. UK 2002
Databáze: obsah principy SQL formáty biologických sekvencí IUB kódy DNA databáze proteinové a genomové databáze strukturní databáze
organizace databází Relační databáze c_id identifikátor, číslo a_id identifikátor title text c_id identifikátor name krátký text journal krátký text year datum … … k_id identifikátor c_id identifikátor keyword krátký text
SQL: Structured Query Language c_id identifikátor, číslo title text journal krátký text year datum … … CREATE TABLE article ( c_id INTEGER, title TEXT, journal VARCHAR(30), year DATE );
SQL: Structured Query Language a_id identifikátor c_id identifikátor name krátký text CREATE TABLE author ( a_id INTEGER, c_id INTEGER, name VARCHAR(30) );
SQL: Structured Query Language INSERT INTO article SET c_id = '1', title = 'Something absolutely fantastic', journal = 'Bioinformatics', year = '2002'; INSERT INTO author SET a_id = '1', c_id = '1', name = 'Paces, Jan'; INSERT INTO author SET a_id = '2', c_id = '1', name = 'Vondrasek, Jiri';
SQL: Structured Query Language SELECT article. title, article. journal, author. name FROM article, journal WHERE article. c_id = author. c_id AND article. year > '2000' AND author. name LIKE 'Paces%';
IUB kódy nukleotidy kód A C G T (U M R W S Y K V H D B N - nukleotidy komplement A T C G G C T A U) A AC K AG Y AT S CG W CT R GT M ACG B ACT D AGT H CGT V ACGT N mezera - aminokyseliny kód A C D G H I K L M N P Q R S T V W Y B třípísmenný kód Ala Cys Asp Glu His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr Asx Z Glx X Xxx * --- aminokyselina alanin cystein asparagová kyselina glutamová kyselina histidin isoleucin lysin leucin methionin asparagin prolin glutamin arginin serin threonin valin tryptofan tyrosin asparagová kys. nebo asparagin glutamová kys. nebo glutamin jakákoliv aminokyselina stop
formáty sekvencí binární SCF ALF ABI programy textové s chromatogramy interní formáty databází minimální text fasta anotované EMBL Gen. Bank ASN XML
formáty sekvencí - SCF (standart chromatogram file)
formáty sekvencí - EMBL (formát databáze EMBL) ID XX AC XX SV XX DT DT XX DE XX KW XX OS OC OC XX RN RP RA RT RT RL XX RN RP RA RT RL RL RL XX FH … AF 031150 standard; RNA; ROD; 1379 BP. AF 031150; AF 031150. 1 27 -FEB-1998 (Rel. 54, Created) 27 -FEB-1998 (Rel. 54, Last updated, Version 1) Mus musculus paired-box transcription factor (Pax 4) m. RNA, complete cds. . Mus musculus (house mouse) Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. [1] 1 -1379 Inoue H. , Nomiyama J. , Nakai K. , Matsutani A. , Tanizawa Y. , Oka Y. ; Isolation of full-length c. DNA of mouse PAX 4 gene and identification of its human homologue; Biochem. Biophys. Res. Commun. 243: 628 -633(1998). [2] 1 -1379 Inoue H. , Nomiyama J. , Nakai K. , Tanizawa Y. , Oka Y. ; ; Submitted (23 -OCT-1997) to the EMBL/Gen. Bank/DDBJ databases. Third Dept. of Int. Med. , Yamaguchi University, 1144 Kogushi, Ube, Yamaguchi 755, Japan Key Location/Qualifiers
formáty sekvencí - EMBL (formát databáze EMBL) … FH FH FT FT FT FT XX SQ Key Location/Qualifiers source 1. . 1379 /db_xref=taxon: 10090 /organism=Mus musculus /cell_line=MIN 6 297. . 1346 /codon_start=1 /gene=Pax 4 /product=paired-box transcription factor /protein_id=AAC 40046. 1 /translation=MQQDGLSSVNQLGGLFVNGRPLPLDTRQQIVQLAIRGMRPCDISR SLKVSNGCVSKILGRYYRTGVLEPKCIGGSKPRLATPAVVARIAQLKDEYPALFAWEIQ HQLCTEGLCTQDKAPSVSSINRVLRALQEDQSLHWTQLRSPAVLAPVLPSPHSNCGAPR GPHPGTSHRNRTIFSPGQAEALEKEFQRGQYPDSVARGKLAAATSLPEDTVRVWFSNRR AKWRRQEKLKWEAQLPGASQDLTVPKNSPGIISAQQSPGSVPSAALPVLEPLSPSFCQL CCGTAPGRCSSDTSSQAYLQPYWDCQSLLPVASSSYVEFAWPCLTTHPVHHLIGGPGQV PSTHCSNWP CDS Sequence 1379 BP; 327 aaaaagcggc aaggctctgt gaagctctgg accagaccac cagcaaaccc ccaccttttt tcctccatcc gttttcagtt tgccagttgg agcaggacgg actcagcagt A; 402 C; 347 G; 303 T; 0 other; cgctgaattc tagcagaagg ctgccctctg accccctggc aggactgaag cagctggagg tggagcctgc acaggaccct gagacctctt agaaccagtc ccaaagagaa acttccagaa cttcctgtcc ttctgtgagg agtaccagtg gtgaatcagc tagggggact ctttgtgaat ctcctgagtg ctgttacaag cctggaattc ggagctctcc tgaagcatgc ggcccc 60 120 180 240 300 360 gctgtgggac cctactggga ggccctgcct caacccattg agatgttcca ctcctgtgcatc ccataagagg tatctccaac gaatttgcct caagtgccat aaacctttt 1200 1260 1320 1379 … // agcaccaggc ctgccaatcc caccacccat ctcaaactgg gtgacacctc tggcttcctc atctgattgg cctctatttg atcccaggcc ctcatatgtg aggcccagga acagtaataa
formáty sekvencí - Gen. Bank Genbank LOCUS DEFINITION ACCESSION VERSION KEYWORDS SOURCE ORGANISM AF 145233 1360 bp m. RNA ROD 23 -OCT-1999 Mus musculus transcription factor PAX 4 (Pax 4) m. RNA, complete cds. AF 145233. 1 GI: 6102607. house mouse. Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. REFERENCE 1 (bases 1 to 1360) AUTHORS Kalousova, A. , Benes, V. , Paces, J. , Paces, V. and Kozmik, Z. TITLE DNA binding and transactivating properties of the paired and homeobox protein Pax 4 JOURNAL Biochem. Biophys. Res. Commun. 259 (3), 510 -518 (1999) MEDLINE 99294619 PUBMED 10364449 REFERENCE 2 (bases 1 to 1360) AUTHORS Kalousova, A. , Paces, J. and Kozmik, Z. TITLE Direct Submission JOURNAL Submitted (23 -APR-1999) Dept. of Transcription Regulation, Institute of Molecular Genetics, Videnska 1083, Prague 142 20, Czech Republic FEATURES Location/Qualifiers source 1. . 1360 /organism="Mus musculus" /db_xref="taxon: 10090" gene 1. . 1360 /gene="Pax 4" CDS 211. . 1260 /gene="Pax 4" /note="DNA binding protein; paired box protein; homeobox protein" /codon_start=1 /product="transcription factor PAX 4" /protein_id="AAF 03533. 1" …
formáty sekvencí - Gen. Bank Genbank CDS 211. . 1260 /gene="Pax 4" /note="DNA binding protein; paired box protein; homeobox protein" /codon_start=1 /product="transcription factor PAX 4" /protein_id="AAF 03533. 1" /db_xref="GI: 6102608" /translation="MQQDGLSSVNQLGGLFVNGRPLPLDTRQQIVQLAIRGMRPCDIS RSLKVSNGCVSKILGRYYRTGVLEPKCIGGSKPRLATPAVVARIAQLKDEYPALFAWE IQHQLCTEGLCTQDKAPSVSSINRVLRALQEDQSLHWTQLRSPAVLAPVLPSPHSNCG APRGPHPGTSHRNRTIFSPGQAEALEKEFQRGQYPDSVARGKLAAATSLPEDTVRVWF SNRRAKWRRQEKLKWEAQLPGASQDLTVPKNSPGIISAQQSPGSVPSAALPVLEPLSP SFCQLCCGTAPGRCSSDTSSQAYLQPYWDCQSLLPVASSSYVEFAWPCLTTHPVHHLI GGPGQVPSTHCSNWP" 359 a 381 c 328 g 292 t BASE COUNT ORIGIN 1 tggcaggact 61 ctgcacagga 121 agtcccaaag 181 gtccttctgt … 1081 tccagtgaca 1141 cctgtggctt 1201 catcatctga 1261 gaggcctcta 1321 aaaaa // gaagcagctg ccctgagacc agaaacttcc gaggagtacc gaggctgtta tcttcctgga agaaggagct agtgtgaagc caagaccaga attcccacct ctccgttttc atgcagcagg ccaccagcaa tttttcctcc agtttgccag acggactcag accctggagc atccagaacc ttggcttcct cagtgtgaat cctcatccca cctcctcata ttggaggccc tttgacagta aaaaa ggcctatctc tgtggaattt aggacaagtg ataaaaacct aaaaa caaccctact gcctggccct ccatcaaccc tttcttagat aaaaa gggactgcca gcctcaccac attgctcaaa gttaaaaaaa atccctcctt ccatcctgtg ctggccataa aaaaa
formáty sekvencí - Fast. A fasta >gi|6102607|gb|AF 145233. 1|AF 145233 Mus musculus transcription factor PAX 4 (Pax 4) m. RNA, complete cds TGGCAGGACTGAAGCAGCTGGAGGCTGTTACAAGACCACCAGCAAACCCTGGAGCCTGCACAGGA CCCTGAGACCTCTTCCTGGAATTCCCACCTTTTTTCCTCCAGAACCAGTCCCAAAGAGAAACTTCC AGAAGGAGCTCTCCGTTTTCAGTTTGCCAGTTGGCTTCCTGTCCTTCTGTGAGGAGTACCAGTGTGAAGC ATGCAGCAGGACTCAGCAGTGTGAATCAGCTAGGGGGACTCTTTGTGAATGGCCCCTTCCTC TGGACACCAGGCAGCAGATTGTGCAGCTAGCAATAAGAGGGATGCGACCCTGTGACATTTCACGGAGCCT TAAGGTATCTAATGGCTGTGTGAGCAAGATCCTAGGACGCTACTACCGCACAGGTGTCTTGGAACCCAAG TGTATTGGGGGAAGCAAACCACGTCTGGCCACACCTGCTGTGGTGGCTCGAATTGCCCAGCTAAAGGATG AGTACCCTGCTCTTTTTGCCTGGGAGATCCAACACCAGCTTTGCACTGAAGGGCTTTGTACCCAGGACAA GGCTCCCAGTGTGTCCTCTATCAATCGAGTACTTCGGGCACTTCAGGAAGACCAGAGCTTGCACTGGACT CAACTCAGATCACCAGCTGTGTTGGCTCCAGTTCTTCCCAGTCCCCACAGTAACTGTGGGGCTCCCCGAG GCCCCCAGGAACCAGCCACAGGAATCGGACTATCTTCTCCCCGGGACAAGCCGAGGCACTGGAGAA AGAGTTTCAGCGTGGGCAGTATCCAGATTCAGTGGCCCGTGGGAAGCTGCTGCCACCTCTCTGCCT GAAGACACGGTGAGGGTTTTCTAACAGAAGAGCCAAATGGCGCAGGCAAGAGAAGCTGAAATGGG AAGCACAGCTGCCAGGTGCTTCCCAGGACCTGACAGTACCAAAAAATTCTCCAGGGATCATCTCTGCACA GCAGTCCCCCGGCAGTGTACCCTCAGCTGCCTGTGCTGGAACCATTGAGTCCTTCTGTCAG CTATGCTGTGGGACAGCACCAGGCAGATGTTCCAGTGACACCTCATCCCAGGCCTATCTCCAACCCTACT GGGACTGCCAATCCCTCCTGTGGCTTCCTCCTCATATGTGGAATTTGCCTGGCCCTGCCTCACCAC CCATCCTGTGCATCATCTGATTGGAGGCCCAGGACAAGTGCCATCAACCCATTGCTCAAACTGGCCATAA GAGGCCTCTATTTGACAGTAATAAAAACCTTTTCTTAGATGTTAAAAAAAAAAAAAAA
formáty sekvencí - ASN Seq-entry : : = set { class nuc-prot , descr { title "Mus musculus transcription factor PAX 4 (Pax 4) m. RNA, complete cds. " , source { org { taxname "Mus musculus" , common "house mouse" , db { { db "taxon" , tag id 10090 } } , orgname { name binomial { genus "Mus" , species "musculus" } , lineage "Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus" , gcode 1 , mgcode 2 , div "ROD" } } } , pub { sub { authors { names std
Bioinformatic Links
Gen. Bank
Swiss-Prot
Entrez • Literature (Pub. Med) • Nucleotide (Gen. Bank) • Protein (PIR) • Genome • Structure (PDB) • Pop. Set • Taxonomy • OMIM
Entrez
Entrez
Entrez
SRS
SRS
SRS
SRS
SRS
SRS
SRS
SRS - list
SRS - list
SRS - list
PDB
PDB
PDB
PDB HEADER TITLE COMPND COMPND COMPND COMPND SOURCE SOURCE SOURCE KEYWDS EXPDTA AUTHOR REVDAT JRNL JRNL GENE REGULATION/DNA 22 -APR-99 6 PAX CRYSTAL STRUCTURE OF THE HUMAN PAX-6 PAIRED DOMAIN-DNA 2 COMPLEX REVEALS A GENERAL MODEL FOR PAX PROTEIN-DNA 3 INTERACTIONS MOL_ID: 1; 2 MOLECULE: HOMEOBOX PROTEIN PAX-6; 3 CHAIN: A; 4 ENGINEERED: YES; 5 BIOLOGICAL_UNIT: MONOMER; 6 MOL_ID: 2; 7 MOLECULE: 26 NUCLEOTIDE DNA; 8 CHAIN: B; 9 ENGINEERED: YES; 10 BIOLOGICAL_UNIT: MONOMER; 11 MOL_ID: 3; 12 MOLECULE: 26 NUCLEOTIDE DNA; 13 CHAIN: C; 14 ENGINEERED: YES; 15 BIOLOGICAL_UNIT: MONOMER MOL_ID: 1; 2 ORGANISM_SCIENTIFIC: HOMO SAPIENS; 3 ORGANISM_COMMON: HUMAN; 4 GENE: PAX 6; 5 EXPRESSION_SYSTEM: ESCHERICHIA COLI; 6 EXPRESSION_SYSTEM_STRAIN: BL 21(DE 3); 7 MOL_ID: 2; 8 SYNTHETIC: YES; 9 MOL_ID: 3; 10 SYNTHETIC: YES PAX, PAIRED DOMAIN, TRANSCRIPTION, PROTEIN-DNA INTERACTIONS, 2 GENE REGULATION/DNA X-RAY DIFFRACTION H. E. XU, M. A. ROULD, W. XU, J. A. EPSTEIN, R. L. MAAS, C. O. PABO 1 13 -JUL-99 6 PAX 0 AUTH H. E. XU, M. A. ROULD, W. XU, J. A. EPSTEIN, R. L. MAAS, C. O. PABO TITL CRYSTAL STRUCTURE OF THE HUMAN PAX-6 PAIRED TITL 2 DOMAIN-DNA COMPLEX REVEALS SPECIFIC ROLES FOR THE TITL 3 LINKER REGION AND THE CARBOXY-TERMINAL SUBDOMAIN TITL 4 IN DNA BINDING
PDB SEQRES SEQRES SEQRES SEQRES FORMUL HELIX HELIX SHEET CRYST 1 ORIGX 2 ORIGX 3 SCALE 1 SCALE 2 SCALE 3 ATOM ATOM ATOM 1 A 133 SER HIS SER GLY VAL ASN GLN LEU GLY VAL PHE 2 A 133 ASN GLY ARG PRO LEU PRO ASP SER THR ARG GLN ARG 3 A 133 VAL GLU LEU ALA HIS SER GLY ALA ARG PRO CYS ASP 4 A 133 SER ARG ILE LEU GLN VAL SER ASN GLY CYS VAL SER 5 A 133 ILE LEU GLY ARG TYR ALA THR GLY SER ILE ARG 6 A 133 ARG ALA ILE GLY SER LYS PRO ARG VAL ALA THR 7 A 133 GLU VAL SER LYS ILE ALA GLN TYR LYS GLN GLU 8 A 133 PRO SER ILE PHE ALA TRP GLU ILE ARG ASP ARG LEU 9 A 133 SER GLU GLY VAL CYS THR ASN ASP ASN ILE PRO SER 10 A 133 SER ILE ASN ARG VAL LEU ARG ASN LEU ALA SER 11 A 133 LYS GLN 1 B 26 A A G C A T T C A C 2 B 26 C A T G A G T G C A 1 C 26 T T C T G C A C T C 26 T G C G T G A A T G 4 HOH *84(H 2 O 1) 1 1 ASP A 20 HIS A 31 1 2 2 PRO A 36 LEU A 43 1 3 3 ASN A 47 THR A 60 1 4 4 PRO A 78 GLU A 90 1 5 5 ALA A 96 SER A 105 1 6 6 VAL A 117 GLU A 130 1 1 A 2 SER A 3 VAL A 5 0 2 A 2 VAL A 11 VAL A 13 -1 N PHE A 12 O GLY A 33. 840 61. 686 171. 111 90. 00 P 21 21 21 1. 000000 0. 000000 1. 000000 0. 029551 0. 000000 0. 016211 0. 000000 0. 005844 0. 00000 1 N SER A 1 -1. 985 -12. 356 81. 201 1. 00 60. 11 2 CA SER A 1 -1. 709 -12. 440 82. 636 1. 00 60. 41 3 C SER A 1 -2. 774 -13. 282 83. 373 1. 00 59. 35 4 O SER A 1 -3. 734 -13. 763 82. 751 1. 00 58. 16 5 CB SER A 1 -1. 638 -11. 029 83. 229 1. 00 64. 08 6 OG SER A 1 -2. 862 -10. 345 83. 045 1. 00 69. 46 7 H SER A 1 -2. 431 -11. 538 80. 917 1. 00 40. 00 8 HG SER A 1 -2. 887 -9. 549 83. 596 1. 00 40. 00 9 N HIS A 2 -2. 634 -13. 393 84. 701 1. 00 59. 45 VAL ILE LYS PRO CYS LEU VAL GLU G G A C 12 8 14 13 10 14 4 4 N C C O H H N
SCOP
PDBsum
PDBsum
PDBsum
CATH
CATH
FSSP - Fold classification
Structural genomics
Bioinformatické WWW rozcestníky EBI: Expasy: Pasteur: Lyon: NCBI: http: //www. ebi. ac. uk/Tools http: //www. expasy. ch http: //bioweb. pasteur. fr http: //pbil. univ-lyon 1. fr http: //ncbi. nlm. nih. gov
EBI
Ex. PASy
PBIL
Pasteur
Bioinformatic Links
204a0e445ec6354a23d757d7ee59fa16.ppt