Скачать презентацию DEPARTAMENTO DE ESTATÍSTICA Prof Hélio Magalhães de Oliveira Скачать презентацию DEPARTAMENTO DE ESTATÍSTICA Prof Hélio Magalhães de Oliveira

c99bcc4142b4a35b5c6fb8e91b479817.ppt

  • Количество слайдов: 75

DEPARTAMENTO DE ESTATÍSTICA Prof Hélio Magalhães de Oliveira, UFPE, 21/08/2013 1/2 × n-ário = DEPARTAMENTO DE ESTATÍSTICA Prof Hélio Magalhães de Oliveira, UFPE, 21/08/2013 1/2 × n-ário = 1 × (semi-n-ário). Visão Pessoal TKS dr. Francisco Cysneiros

UNIVERSIDADE FEDERAL DE PERNAMBUCO DEPARTAMENTO DE ESTATÍSTICA Dados estatísticos sobre a vida biológica: a UNIVERSIDADE FEDERAL DE PERNAMBUCO DEPARTAMENTO DE ESTATÍSTICA Dados estatísticos sobre a vida biológica: a aleatoriedade como marca indelével no genoma das espécies. Prof. H. Magalhães de Oliveira UFPE – AGO 2013

Escala Cronológica da Evolução da Vida DNA – origem da vida: Uma cronologia (Battail, Escala Cronológica da Evolução da Vida DNA – origem da vida: Uma cronologia (Battail, 2001)

O QUE É REALMENTE A VIDA? Tendências estão derrubando as barreiras entre o vivo O QUE É REALMENTE A VIDA? Tendências estão derrubando as barreiras entre o vivo e o não vivo. • 1 a mudança: • Superação do vitalismo.

Propriedades características da vida natural • Capacidade de reprodução • Sensibilidade ao ambiente • Propriedades características da vida natural • Capacidade de reprodução • Sensibilidade ao ambiente • Metabolismo • Singularidade química • Alto grau de complexidade e organização • Programação genética que dirige o desenvolvimento • Histórico modelado pela seleção natural

Dificuldades para definir a vida. SEMENTES, estão vivas, mas não metabolizam VIRUS, não se Dificuldades para definir a vida. SEMENTES, estão vivas, mas não metabolizam VIRUS, não se auto-reproduzem (vide mulas) SALSICHAS não estão vivas, mas contém programa genético, são feitas de proteínas e DNA VIRUS DE COMPUTADOR, com propriedades da vida biológica: reproduzem-se, são sensíveis ao ambiente, metabolizam (consomem processamento, memória), podem ser complexos, sobrevivem usando seleção natural.

Fundamentos da Estrutura do DNA • Os organismos vivos => células • Procariontes vs Fundamentos da Estrutura do DNA • Os organismos vivos => células • Procariontes vs Eucariontes • As células dos eucariontescoordenação de todas as atividades: o núcleo • Núcleo: DNA, contém a informação genética. – transmissão da informação genética e – síntese de proteínas.

DNA – Estrutura e Função Bases nitrogenadas Purinas Pirimidinas DNA – Estrutura e Função Bases nitrogenadas Purinas Pirimidinas

DNA – Estrutura Ligação Fosfodiéster DNA – Estrutura Ligação Fosfodiéster

DNA – Estrutura Bases Complementares DNA – Estrutura Bases Complementares

1953: descoberta da estrutura do DNA Watson & Crick: estrutura dupla hélice do DNA 1953: descoberta da estrutura do DNA Watson & Crick: estrutura dupla hélice do DNA

DNA – Estrutura e Função Dupla Hélice DNA – Estrutura e Função Dupla Hélice

DNA – Duplicação Ocorre na presença da DNA polimerase, que rompe as pontes de DNA – Duplicação Ocorre na presença da DNA polimerase, que rompe as pontes de hidrogênio entre as bases nitrogenadas e as duas fitas do DNA se afastam: • Nucleotídeos livres existentes na célula encaixam-se nas fitas, sempre em suas bases complementares • São formadas duas moléculas de DNA idênticas. • A duplicação do DNA é chamada semiconservativa porque a molécula nova do DNA tem uma fita nova e uma fita velha, originária da molécula mãe.

Relação do Dogma Central e tas s ver e DNA replicação a r NA Relação do Dogma Central e tas s ver e DNA replicação a r NA R RNA Retrovírus oli p vo tr X vi ão e riç c as er ns a m In p cri s ran T Síntese Protéica tradução

Síntese de Proteínas - Tradução • A tradução ocorre nos ribossomas • Trinca de Síntese de Proteínas - Tradução • A tradução ocorre nos ribossomas • Trinca de bases do m. RNA códon • Trinca de bases do t. RNA anti-códon

Tradução Nirenberg & Kohana Tradução Nirenberg & Kohana

Síntese de proteínas Síntese de proteínas

Mapping DNA into Proteins The genetic source is characterized by a four-letter alphabet : Mapping DNA into Proteins The genetic source is characterized by a four-letter alphabet : N={U, C, A, G} Input alphabet N 3={n 1, n 2, n 3 | ni N, i=1, 2, 3} Output alphabet A: ={Leu, Pro, Arg, Gln, His, Ser, Phe, Trp, Tyr, Asn, Lys, Ile, Met, Thr, Asp, Glu, Gly, Ala, Val, Stop} High redundancy map GC: N 3 (|| N 3 ||=64) A (||A||=21)

O Código Genético 2 a Letra U C A G U FENILALANINA LEUCINA SERINA O Código Genético 2 a Letra U C A G U FENILALANINA LEUCINA SERINA TIROSINA PARADA CISTEÍNA PARADA TRIPTOFANO U C A G C LEUCINA PROLINA HISTIDINA GLUTAMINA ARGININA U C A G A ISOLEUCINA METIONINA (INÍCIO. ) TREONINA ASPARAGINA LISINA SERINA ARGININA U C A G G VALINA ALANINA AC. ASPÁRTICO AC. GLUT MICO GLICINA U C A G 1 a Letra 3 a Letra

 • “A analogia me levaria a um passo adiante, isto é, à crença • “A analogia me levaria a um passo adiante, isto é, à crença de que todos os animais e vegetais descendem de um protótipo único [. . . ] Todos os seres vivos têm muito em comum, em sua composição química, em suas vesículas germinativas, em sua estrutura celular e em suas leis de crescimento e reprodução [. . . ] Provavelmente todos os seres orgânicos que tenham em qualquer ocasião vivido nessa Terra, descendem de alguma forma primordial única, na qual a vida primeiro respirou. . De um começo tão simples, formas infindáveis, as mais belas e as mais maravilhosas, evoluíram e estão evoluindo. ” CHARLES DARWIN (1859) On the Origin of Species

DNA: Similaridades • Similaridade entre DNA de humanos: • 99 a 99, 1% • DNA: Similaridades • Similaridade entre DNA de humanos: • 99 a 99, 1% • Similaridade humanos - chimpanzés: • 98, 5% • Somente ~2 % do genoma humano codifica proteínas: • 3. 109 bp -> 120 Mb/(8 b/B)=15 MB

O homem é mais próximo do gorila ou do orangotango? Comparação do DNA mitocondrial O homem é mais próximo do gorila ou do orangotango? Comparação do DNA mitocondrial • homem • ATA ACC ATG CAC ACT ATA ACC CTA ACC CTG ACT TCC CTA ATT CCC ATC CTT ACC CTC GTT ACC. . . • gorila • ATA ACT ATG TAC GAT ACC ATA ACC TTA GCC CTA ACT TCC TTA ATT CCC CCT ATC CTT ACC TTC ACT. . . • orangotango • ACA GCC ATG TTT ACC ATA ACT GCC CTC ACC TTA ACT TCC CTA ATC CCC ATT ACC GCT CTC ATT AAC. . .

1953: primeira seqüência de aminoácidos Sanger: seqüência de aminoácidos da insulina bovina MALWTRLRPLLALLALWPPPPARAFVNQHLCGSHLVEALYLVCGERGFFYTP KARREVEGPQVGALELAGGPGAGGLEGPPQKRGIVEQCCASVCSLYQLENYCN 1953: primeira seqüência de aminoácidos Sanger: seqüência de aminoácidos da insulina bovina MALWTRLRPLLALLALWPPPPARAFVNQHLCGSHLVEALYLVCGERGFFYTP KARREVEGPQVGALELAGGPGAGGLEGPPQKRGIVEQCCASVCSLYQLENYCN

Representações Alternativas para o Código Genético – Inner-to-outer map – 2 D-Gray genetic map, Representações Alternativas para o Código Genético – Inner-to-outer map – 2 D-Gray genetic map, – genetic world-chart representations • DE OLIVEIRA, H. M. , SANTOS-MAGALHÃES, N. S. , The Genetic Code revisited: Inner-to-outer map, 2 D-Gray map, and World-map Genetic Representations, 11 th International Conference on Telecommunications, August 1 -7, Fortaleza, Brazil, ICT 2004, submetido. • SANTOS-MAGALHÃES, N. S. , BOUTON, E. A. , DE OLIVEIRA, H. M. , How to Represent the Genetic Code? , Reunião Anual da Sociedade Brasileira de Bioquímica, SBBq, 2004, submetido.

The Inner-to-outer Map First nucleotide: inner circle Second nucleotide: surrounding Third nucleotide: outer region The Inner-to-outer Map First nucleotide: inner circle Second nucleotide: surrounding Third nucleotide: outer region Inner-to-outer map for the genetic code Homofonemas

Modem 64 -QAM de Oliveira Modem 64 -QAM de Oliveira

U [11]; A [00]; G [10]; C [01]. bacteriophage X 174: Each binary codeword U [11]; A [00]; G [10]; C [01]. bacteriophage X 174: Each binary codeword belongs to a constant weigh code. DNA Codeword G. . . C 01 10 A. . . T 00 11 G. . . C 01 10 T. . . A 11 00 A. . . T 00 11 T. . . A 11 00 G. . . C 01 10

Representação 2 D-Gray de Oliveira, Santos Magalhães 2004 Representação 2 D-Gray de Oliveira, Santos Magalhães 2004

Código Genético: Mapeamento dos aminoácidos Santos Magalhães, E. Bouton, de Oliveira 2004 Código Genético: Mapeamento dos aminoácidos Santos Magalhães, E. Bouton, de Oliveira 2004

Coloured 2 D-Gray genetic map Val Ile Thr Ala Val Ile Phe Leu Pro Coloured 2 D-Gray genetic map Val Ile Thr Ala Val Ile Phe Leu Pro Ser Phe Leu Leu Leu Pro Ser Leu Leu Trp Arg Gln Stop Trp Arg Cys Arg His Tyr Cys Arg Gly Ser Asn Asp Gly Ser Gly Arg Lys Glu Gly Arg Val Met Ile Thr Ala Val Met Ile Val Ile Thr Ala Val Ile Phe Leu Pro Ser Phe Leu Coloured Genetic code map for amino-acids This representation merges regions mapped into the same amino-acid !

Terra de Nirenberg-Kohana: Continentes Continents of Niremberg-Kohama's Earth: regions of essential amino acid corresponds Terra de Nirenberg-Kohana: Continentes Continents of Niremberg-Kohama's Earth: regions of essential amino acid corresponds to the land nonessential amino acids constitutes the ocean.

Éxons Íntrons http: //www. dnalc. org/resources/3 d/rna-splicing. html Éxons Íntrons http: //www. dnalc. org/resources/3 d/rna-splicing. html

Eliminando os íntrons na transcrição Eliminando os íntrons na transcrição

Trecho de DNA da b-hemoglobina humana (reading frames) • . . . ACA GAC Trecho de DNA da b-hemoglobina humana (reading frames) • . . . ACA GAC ACC ATG GTC CAC CTT GAC. . . • . . . CAG ACA CCA TGG TGC ACC TGG. . . • . . . AGA CAC CAT GGT GCA CCT TGA. . . Genes da sub-unidade b da hemoglobina (2 genes) B A 90 bp 131 bp 222 bp 851 bp 126 bp

Porção do DNA do genoma do HIV-1 • GGG TTC TTG GGA GCA GGA Porção do DNA do genoma do HIV-1 • GGG TTC TTG GGA GCA GGA AGC ACT ATG GGC GCA. . . • O câncer é causado por agentes (carcinógenos, radiação, vírus) que danificam o DNA, ou interferem nos seus mecanismos de replicação e/ou reparo.

Análise genômica Espectro para localização de Éxons (Gene F 56 F 11. 4) Análise genômica Espectro para localização de Éxons (Gene F 56 F 11. 4)

Análise wavelet de seqüências genômicas Oncogênio c-myb (galinha) 8. 200 bp b-cardíaco humano 6. Análise wavelet de seqüências genômicas Oncogênio c-myb (galinha) 8. 200 bp b-cardíaco humano 6. 000 bp

Genoma Music - Body Music Susumo Ohno URL- http: //www. toshima. ne. jp/~edogiku/Fla. Mov. Genoma Music - Body Music Susumo Ohno URL- http: //www. toshima. ne. jp/~edogiku/Fla. Mov. Intro/

DNA do bacteriófago f. X 174 • 5. 386 bp - 10 genes (A DNA do bacteriófago f. X 174 • 5. 386 bp - 10 genes (A até K) Gene n. de aminoácidos quadro A B C D E F G H J K 455 120 86 152 91 427 175 328 38 56 (1539 bp) (360 bp) (258 bp) (456 bp) (273 bp) (1281 bp) (525 bp) (984 bp) (114 bp) (168 bp) 5. 958 bp 2 1 1 3 1 2 1 3 2 3

Genes no DNA do bacteriófago f. X 174 Genes no DNA do bacteriófago f. X 174

GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGATATTTCTGATGAGTCGAAAAATTATCTTGATAAAGCAGGAATTACTACTGCTTGTTTACGAATTAAATCGAAG TGGACTGCTGGCGGAAAATGAGAAAATTCGACCTATCCTTGCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTTCGCCATCAACTAACGATTCTGTCAAAAACTGACGCGTTG GATGAGGAGAAGTGGCTTAATATGCTTGGCACGTTCGTCAAGGACTGGTTTAGATATGAGTCACATTTTGTTCATGGTAGAGATTCTCTTGTTGACATTTTAAAAGAGCGTGGA TTACTATCTGAGTCCGATGCTGTTCAACCACTAATAGGTAAGAAATCATGAGTCAAGTTACTGAACAATCCGTACGTTTCCAGACCGCTTTGGCCTCTATTAAGCTCATTCAGG CTTCTGCCGTTTTGGATTTAACCGAAGATGATTTCGATTTTCTGACGAGTAACAAAGTTTGGATTGCTACTGACCGCTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTATG GTACGCTGGACTTTGTGGGATACCCTCGCTTTCCTGCTCCTGTTGAGTTTATTGCTGCCGTCATTGCTTATTATGTTCATCCCGTCAACATTCAAACGGCCTGTCTCATCATGG AAGGCGCTGAATTTACGGAAAACATTATTAATGGCGTCGAGCGTCCGGTTAAAGCCGCTGAATTGTTCGCGTTTACCTTGCGTGTACGCGCAGGAAACACTGACGTTCTTACT GACGCAGAAGAAAACGTGCGTCAAAAATTACGTGCGGAAGGAGTGATGTAATGTCTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTTGCGAGGTA CTAAAGGCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGCAGGGGCTTCGGCCCCTTACTTGAGGATAAATTATGTCTAATATTCAAACTGGC GCCGAGCGTATGCCGCATGACCTTTCCCATCTTGGCTTCCTTGCTGGTCAGATTGGTCGTCTTATTACCATTTCAACTACTCCGGTTATCGCTGGCGACTCCTTCGAGATGGA CGCCGTTGGCGCTCTCCGTCTTTCTCCATTGCGTCGTGGCCTTGCTATTGACTCTACTGTAGACATTTTTACTTTTTATGTCCCTCATCGTCACGTTTATGGTGAACAGTGGAT TAAGTTCATGAAGGATGGTGTTAATGCCACTCCTCTCCCGACTGTTAACACTACTGGTTATATTGACCATGCCGCTTTTCTTGGCACGATTAACCCTGATACCAATAAAATCCC TAAGCATTTGTTTCAGGGTTATTTGAATATCTATAACAACTATTTTAAAGCGCCGTGGATGCCTGACCGTACCGAGGCTAACCCTAATGAGCTTAATCAAGATGATGCTCGTTAT GGTTTCCGTTGCTGCCATCTCAAAAACATTTGGACTGCTCCGCTTCCTCCTGAGACTGAGCTTTCTCGCCAAATGACGACTTCTACCACATCTATTGACATTATGGGTCTGCAA GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGATATTTCTGATGAGTCGAAAAATTATCTTGATAAAGCAGGAATTACTACTGCTTGTTTACGAATTAAATCGAAG TGGACTGCTGGCGGAAAATGAGAAAATTCGACCTATCCTTGCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTTCGCCATCAACTAACGATTCTGTCAAAAACTGACGCGTTG GATGAGGAGAAGTGGCTTAATATGCTTGGCACGTTCGTCAAGGACTGGTTTAGATATGAGTCACATTTTGTTCATGGTAGAGATTCTCTTGTTGACATTTTAAAAGAGCGTGGA TTACTATCTGAGTCCGATGCTGTTCAACCACTAATAGGTAAGAAATCATGAGTCAAGTTACTGAACAATCCGTACGTTTCCAGACCGCTTTGGCCTCTATTAAGCTCATTCAGG CTTCTGCCGTTTTGGATTTAACCGAAGATGATTTCGATTTTCTGACGAGTAACAAAGTTTGGATTGCTACTGACCGCTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTATG GTACGCTGGACTTTGTGGGATACCCTCGCTTTCCTGCTCCTGTTGAGTTTATTGCTGCCGTCATTGCTTATTATGTTCATCCCGTCAACATTCAAACGGCCTGTCTCATCATGG AAGGCGCTGAATTTACGGAAAACATTATTAATGGCGTCGAGCGTCCGGTTAAAGCCGCTGAATTGTTCGCGTTTACCTTGCGTGTACGCGCAGGAAACACTGACGTTCTTACT GACGCAGAAGAAAACGTGCGTCAAAAATTACGTGCGGAAGGAGTGATGTAATGTCTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTTGCGAGGTA CTAAAGGCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGCAGGGGCTTCGGCCCCTTACTTGAGGATAAATTATGTCTAATATTCAAACTGGC GCCGAGCGTATGCCGCATGACCTTTCCCATCTTGGCTTCCTTGCTGGTCAGATTGGTCGTCTTATTACCATTTCAACTACTCCGGTTATCGCTGGCGACTCCTTCGAGATGGA CGCCGTTGGCGCTCTCCGTCTTTCTCCATTGCGTCGTGGCCTTGCTATTGACTCTACTGTAGACATTTTTACTTTTTATGTCCCTCATCGTCACGTTTATGGTGAACAGTGGAT TAAGTTCATGAAGGATGGTGTTAATGCCACTCCTCTCCCGACTGTTAACACTACTGGTTATATTGACCATGCCGCTTTTCTTGGCACGATTAACCCTGATACCAATAAAATCCC TAAGCATTTGTTTCAGGGTTATTTGAATATCTATAACAACTATTTTAAAGCGCCGTGGATGCCTGACCGTACCGAGGCTAACCCTAATGAGCTTAATCAAGATGATGCTCGTTAT GGTTTCCGTTGCTGCCATCTCAAAAACATTTGGACTGCTCCGCTTCCTCCTGAGACTGAGCTTTCTCGCCAAATGACGACTTCTACCACATCTATTGACATTATGGGTCTGCAA GCTGCTTATGCTAATTTGCATACTGACCAAGAACGTGATTACTTCATGCAGCGTTACCATGATGTTATTTCTTCATTTGGAGGTAAAACCTCTTATGACGCTGACAACCGTCCTT TACTTGTCATGCGCTCTAATCTCTGGGCATCTGGCTATGATGTTGATGGAACTGACCAAACGTCGTTAGGCCAGTTTTCTGGTCGTGTTCAACAGACCTATAAACATTCTGTGC CGCGTTTCTTTGTTCCTGAGCATGGCACTATGTTTACTCTTGCGCTTGTTCGTTTTCCGCCTACTGCGACTAAAGAGATTCAGTACCTTAACGCTAAAGGTGCTTTGACTTATA CCGATATTGCTGGCGACCCTGTTTTGTATGGCAACTTGCCGCCGCGTGAAATTTCTATGAAGGATGTTTTCCGTTCTGGTGATTCGTCTAAGAAGTTTAAGATTGCTGAGGGT CAGTGGTATCGTTATGCGCCTTCGTATGTTTCTCCTGCTTATCACCTTCTTGAAGGCTTCCCATTCAGGAACCGCCTTCTGGTGATTTGCAAGAACGCGTACTTATTCGC CACCATGATTATGACCAGTGTTTCCAGTCCGTTCAGTTGTTGCAGTGGAATAGTCAGGTTAAATTTAATGTGACCGTTTATCGCAATCTGCCGACCACTCGCGATTCAATCATG ACTTCGTGATAAAAGATTGAGTGTGAGGTTATAACGCCGAAGCGGTAAAAATTTTTGCCGCTGAGGGGTTGACCAAGCGCGGTAGGTTTTCTGCTTAGGAGT TTAATCATGTTTCAGACTTTTATTTCTCGCCATAATTCAAACTTTTTTTCTGATAAGCTGGTTCTCACTTCTGTTACTCCAGCTTCTTCGGCACCTGTTTTACAGACACCTAAAGC TACATCGTCAACGTTATATTTTGATAGTTTGACGGTTAATGCTGGTAATGGTGGTTTTCTTCATTGCATTCAGATGGATACATCTGTCAACGCCGCTAATCAGGTTGTTTCTGTT GGTGCTGATATTGCTTTTGATGCCGACCCTAAATTTTTTGCCTGTTTGGTTCGCTTTGAGTCTTCTTCGGTTCCGACTACCCTCCCGACTGCCTATGATGTTTATCCTTTGAATG GTCGCCATGATGGTGGTTATTATACCGTCAAGGACTGTGTGACTATTGACGTCCTTCCCCGTACGCCGGGCAATAACGTTTATGTTGGTTTCATGGTTTGGTCTAACTTTACC GCTACTAAATGCCGCGGATTGGTTTCGCTGAATCAGGTTATTAAAGAGATTATTTGTCTCCAGCCACTTAAGTGAGGTGATTTATGTTTGGTGCTATTGCTGGCGGTATTGCTT CTGCTCTTGCTGGTGGCGCCATGTCTAAATTGTTTGGAGGCGGTCAAAAAGCCGCCTCCGGTGGCATTCAAGGTGATGTGCTACCGATAACAATACTGTAGGCATGGG TGATGCTGGTATTAAATCTGCCATTCAAGGCTCTAATGTTCCTAACCCTGATGAGGCCGCCCCTAGTTTTGTTTCTGGTGCTATGGCTAAAGCTGGTAAAGGACTTCTTGAAGG TACGTTGCAGGCTGGCACTTCTGCCGTTTCTGATAAGTTGCTTGATTTGGACTTGGTGGCAAGTCTGCCGCTGATAAAGGATACTCGTGATTATCTTGCTGCTG CATTTCCTGAGCTTAATGCTTGGGAGCGTGCTGATGCTTCCTCTGCTGGTATGGTTGACGCCGGATTTGAGAATCAAAAAGAGCTTACTAAAATGCAACTGGACAAT CAGAAAGAGATTGCCGAGATGCAAAATGAGACTCAAAAAGAGATTGCTGGCATTCAGTCGGCGACTTCACGCCAGAATACGAAAGACCAGGTATATGCACAAAATGAGATGC TTGCTTATCAACAGAAGGAGTCTACTGCTCGCGTTGCGTCTATTATGGAAAACACCAATCTTTCCAAGCAACAGCAGGTTTCCGAGATTATGCGCCAAATGCTTACTCAAGCTC AAACGGCTGGTCAGTATTTTACCAATGACCAAATCAAAGAAATGACTCGCAAGGTTAGTGCTGAGGTTGACTTAGTTCATCAGCAAACGCAGAATCAGCGGTATGGCTCTTCT CATATTGGCGCTACTGCAAAGGATATTTCTAATGTCGTCACTGATGCTGCTTCTGGTGTGGTTGATATTTTTCATGGTATTGATAAAGCTGTTGCCGATACTTGGAACAATTTCT GGAAAGACGGTAAAGCTGATGGTATTGGCTCTAATTTGTCTAGGAAATAACCGTCAGGATTGACACCCTCCCAATTGTATGTTTTCATGCCTCCAAATCTTGGAGGCTTTTTTA TGGTTCTTATTACCCTTCTGAATGTCACGCTGATTATTTTGACTTTGAGCGTATCGAGGCTCTTAAACCTGCTATTGAGGCTTGTGGCATTTCTACTCTTTCTCAATCCCC AATGCTTGGCTTCCATAAGCAGATGGATAACCGCATCAAGCTCTTGGAAGAGATTCTGTCTTTTCGTATGCAGGGCGTTGAGTTCGATAATGGTGATATGTTGACGGCC ATAAGGCTGCTTCTGACGTTCGTGATGAGTTTGTATCTGTTACTGAGAAGTTAATGGATGAATTGGCACAATGCTACAATGTGCTCCCCCAACTTGATATTAATAACACTATAGA CCACCGCCCCGAAGGGGACGAAAAATGGTTTTTAGAGAACGAGAAGACGGTTACGCAGTTTTGCCGCAAGCTGCTGAACGCCCTCTTAAGGATATTCGCGATGAGTAT AATTACCCCAAAAAGGTATTAAGGATGAGTGTTCAAGATTGCTGGAGGCCTCCACTATGAAATCGCGTAGAGGCTTTGCTATTCAGCGTTTGATGAATGCGACA GGCTCATGCTGATGGTTTATCGTTTTTGACACTCTCACGTTGGCTGACGACCGATTAGAGGCGTTTTATGATAATCCCAATGCTTTGCGTGACTATTTTCGTGATATTGG TCGTATGGTTCTTGCTGCCGAGGGTCGCAAGGCTAATGATTCACACGCCGACTGCTATCAGTATTTTTGTGTGCCTGAGTATGGTACAGCTAATGGCCGTCTTCATTTCCATG CGGTGCACTTTATGCGGACACTTCCTACAGGTAGCGTTGACCCTAATTTTGGTCGTCGGGTACGCAATCGCCGCCAGTTAAATAGCTTGCAAAATACGTGGCCTTATGGTTAC AGTATGCCCATCGCAGTTCGCTACACGCAGGACGCTTTTTCACGTTCTGGTTGTGGCCTGTTGATGCTAAAGGTGAGCCGCTTAAAGCTACCAGTTATATGGCTGTTGG TTTCTATGTGGCTAAATACGTTAACAAAAAGTCAGATATGGACCTTGCTGCTAAAGGTCTAGGAGCTAAAGAATGGAACAACTCACTAAAAACCAAGCTGTCGCTACTTCCCAA GAAGCTGTTCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTGTCCACGGAGTGCTTAATCCAACTTACCAAGCTGGGTTACGACG CCGTTCAACCAGATATTGAAGCAGAACGCAAAAAGAGAGATTGAGGCTGGGAAAAGTTACTGTAGCCGACGTTTTGGCGGCGCAACCTGTGACGACAAATCTGCTCA AATTTATGCGCGCTTCGATAAAAATGATTGGCGTATCCAACCTGCA

Tamanho de Genomas • Menor número de genes Mycoplasma genitalium 470 genes • Genoma Tamanho de Genomas • Menor número de genes Mycoplasma genitalium 470 genes • Genoma humano Homem ~120. 000 genes (pensava-se erroneamente!)

bacteriófago f. X 174 bacteriófago f. X 174

ORDEM DE MAGNITUDE DE GENOMAS (pares de bases = bp) Vírus bactérias Levedura nematóide ORDEM DE MAGNITUDE DE GENOMAS (pares de bases = bp) Vírus bactérias Levedura nematóide insetos mosca da fruta mamíferos Peixe pulmonado mostarda de erva daninha Pinheiro amoebia dubia 10 kbp (SV 40 5 k, T 2 48. 6 k. . . ) 4 Mbp (E. coli 4. 7 Mb) 9 Mbp 90 Mbp 0. 2 - 7. 5 Gbp 180 Gbp 1. 4 - 5. 7 Gbp (man 3. 2 Gbp) 140 Gbp 200 Mbp 68 Gbp 670 Gbp

PARADOXO DO ‘valor C’ • Valor C = Quantidade de DNA no Seu genoma PARADOXO DO ‘valor C’ • Valor C = Quantidade de DNA no Seu genoma haploide • Muitos organismos menos complexos possuem valores C surpreendentemente elevados. • O DNA “extra” tem função? Senão, por que é preservado de geração para geração?

Gene doença • b-globina humana anemia falciforme • Fator VIII humano hemofilia • Proteína Gene doença • b-globina humana anemia falciforme • Fator VIII humano hemofilia • Proteína kinase distrofia muscular comprimento 2. 000 bp 200. 000 bp 3. 407 bp

A identidade das coisas vivas fornecida pelo substrato genético, parece válida a hipótese “species A identidade das coisas vivas fornecida pelo substrato genético, parece válida a hipótese “species are sparse” (Battail). • N. de espécies vivas na Terra ~ 107 Admita que estas sejam uma fração de 1/100 das que existiram (extinção) Tem-se ~109 espécies (aparentemente grande. . . ) Isso é ridiculamente pequeno com respeito ao n. total de possíveis genomas na ausência de redundância GENOMAS ~ 4^109 ~1010000 (para um genoma típico de 109 nucleotídeos)

Pequena Cronologia de Genomas • 1977 Seqüenciamento completo genoma do fago f. X 174 Pequena Cronologia de Genomas • 1977 Seqüenciamento completo genoma do fago f. X 174 (5. 386 bp) • 1995 Primeiro organismo vivo Genoma do Haemophilus influenzae (1, 8 Mbp) • 1996 Saccharomyces cerevisiae (12, 1 Mbp) • 1997 Escherichia coli (4. 6 Mbp) • 1998 Primeiro animal –nematóide Genoma do caenorhabditis elegans (97, 1 Mbp) • 1999 Primeiro cromossomo humano Cromossomo 22 (33, 4 Mbp) • 2000 Drosophila melanogaster (120 Mbp) • 2000 Cromossomos 5, 16, 19, 21 • 1988 -2000 Human Genome Project • June 2000 – milestone draft sequence

"Tudo está nos genes". . . Ou não! • Durante muito tempo, a genética resumiu-se a esse paradigma. De fato, depois da descoberta da estrutura do DNA, um esquema passou a prevalecer: • A estrutura do DNA é similar a um programa de computador no qual o gene, ao codificar proteínas, determina a aparência dos organismos vivos e governa a maioria dos seus comportamentos.

Reducionaismo: Alerta Andras Paldi (CNRS). • O temendo reducionismo dos pesquisadores genéticos acaba considerando Reducionaismo: Alerta Andras Paldi (CNRS). • O temendo reducionismo dos pesquisadores genéticos acaba considerando o ser vivo como uma adição estrita de elementos justapostos. • Ao estabelecer um catálogo das proteínas corremos o risco de agravar o problema. É como se tentássemos entender o funcionamento de um foguete lendo o catálogo das suas peças!

Of Protein Size and Genomes NEREIDE S. SANTOS-MAGALHÃES, HÉLIO M. DE OLIVEIRA Of Protein Of Protein Size and Genomes NEREIDE S. SANTOS-MAGALHÃES, HÉLIO M. DE OLIVEIRA Of Protein Size and Genomes NEREIDE S. SANTOS-MAGALHÃES, HÉLIO M. DE OLIVEIRA WSEAS TRANS. ON BIOLOGY AND BIOMEDICINE Issue 2, Vol. 3, February 2006 ISSN: 1109 -9518 ~250 academia downloads number of genes? (in living organisms) 1) bacterial genomes; number of genes ~= genome size kbp. bacterial proteins reveals 350 amino acid residues as typical. 2) C. elegans genome of 99 Mbp and genomic rate 25%. Its protein size distribution has an average polypeptide length of 469 amino acids.

 • human proteins; serum albumin has 609 amino acid residues, collagen about 1, • human proteins; serum albumin has 609 amino acid residues, collagen about 1, 000, apolipoprotein B 4, 536, human Titin 26, 926. A DNA code is specified by the triplet DNA(C, R, d), where C is genome size (bp), R is genomic rate d is coding density (genes/bp). number of protein-coding base pairs R= total number C of base pairs of the genome.

Further DNA parameters: g is the number of genes of the genome, e is Further DNA parameters: g is the number of genes of the genome, e is the average number of ‘exons’ per gene.

coding density: estimated in terms of the expected protein size bp/gene • average bacterial coding density: estimated in terms of the expected protein size bp/gene • average bacterial protein ~300 amino acids long, • genomic bacterial rate ~ 0. 8 to 0. 9. Bacteria usually have a coding density d 1, 000 bp/gene number of genes for bacteria: g C/1, 000 (this is striking confirmed at http: //www. cbs. dtu. dk/services/Genome. Atlas/ http: //www. cbs. dtu. dk/services/Genome. Atlas-2. 0/show-databas

protein size histograms (straightforward organisms), FX 174 and the phage l viruses protein size histograms (straightforward organisms), FX 174 and the phage l viruses

C. elegans C. elegans

The coding density of different chromosomes of lower eukaryotic species is roughly the same, The coding density of different chromosomes of lower eukaryotic species is roughly the same, i. e. slight fluctuations from one chromosome to another in the same organism. The C=12, 057, 849 bp, g=6, 268 genes) has an average coding de. S. cerevisiae (nsity 1, 947 bp/gene -- 15 chromosomes. S. cerevisiae Chr 1 2, 093 Chr 9 1, 864 Chr 2 1, 918 Chr 10 1, 906 Chr 3 1, 855 Chr 11 1, 960 Chr 4 1, 870 Chr 12 1, 989 Chr 5 2, 090 Chr 13 1, 841 Chr 6 2, 144 Chr 14 1, 854 Chr 7 1, 891 Chr 15 1, 908 Chr 8 2, 017 average 1, 947 bp/gene (from http: //www. cbs. dtu. dk/services/Genome. Atlas The coefficient of variation (CV %) of the coding density is 5. 06 %

The six chromosomes of the C. elegans (C=98, 971, 533 bp, g=17, 585 genes) The six chromosomes of the C. elegans (C=98, 971, 533 bp, g=17, 585 genes) present an average coding density of 5, 731 bp/gene. Chr. III Chr. IV Chr X average C. elegans 5, 072 5, 592 5, 771 6, 312 4, 899 6, 740 5, 731 bp/gene (from http: //www. cbs. dtu. dk/services/Genome. Atlas The coding density barely varies from one chromosome to another The coefficient of variation (CV %) of the coding density is 1. 72 %

DNA parameters for some well-known genomes, • • • virus X 174 microbial M. DNA parameters for some well-known genomes, • • • virus X 174 microbial M. genitalium H. pylori H. influenzae S. Aureus B. subtilis M. tuberculosis E. coli X. fastidiosa

Organism FX 174 bacteriophage M. genitalium H. pylori H. influenzae S. aureus B. subtilis Organism FX 174 bacteriophage M. genitalium H. pylori H. influenzae S. aureus B. subtilis M. tuberculosis E. coli X. fastidiosa S. cerevisiae C. elegans D. melanogaster 180 Mbp Human (old) ~3, 000 Mbp genome size C (Mbp) coding density number of genes genomic rate d (bp/gene) g R 0. 0054 0. 0485 0. 58 1. 67 1. 83 2. 80 4. 21 4. 41 4. 64 2. 52 12. 06 99 ~60* 538 683 1, 208 1, 066 1, 071 1, 069 1, 025 1, 126 1, 082 1, 238 1, 924 5, 628 d ~ 13, 235 10 71 480 1, 566 1, 709 2, 619 4, 106 3, 918 4, 289 2, 034 6, 268 17, 585 1. 00 0. 95 0. 90 0. 89 0. 86 0. 84 0. 87 0. 97 0. 87 0. 78 0. 70 0. 25 120 d' ~ 8, 823 13, 600 0. 13 1, 000* d ~ 30, 000 2, 000 d' ~20, 000 Human (update) 967* 1, 933 d ~75, 000 genomic redundancy information 1 -R (Mbits) (%) 180 216 363 316 307 299 297 364 314 322 450 469 0. 01 0. 09 1. 04 2. 97 3. 15 4. 70 7. 32 8. 56 8. 08 3. 93 17. 3 49. 5 ~0 5 10 11 14 16 13 3 13 22 30 75 573 46. 8 87 100, 000? ~0. 03 ~300? ~180. 0? ~97? ~25, 800 ~0. 016 ~600 ~92. 9 ~98. 4 d~112, 500 ~2, 900 Mbp average protein length

1) unsuccessful attempt to explain the complexity of living beings: • the genome length. 1) unsuccessful attempt to explain the complexity of living beings: • the genome length. The so-called C-value paradox proved that this is incorrect. 2) The number of genes was supposed to be related to complexity. • people to expect more genes than human actually have. • about 100, 000 widespread in 80’s and late 90’s 3) A potential measure that correlated with the complexity • average protein size.

storing all genes of a single human require less than 10 MB (albeit the storing all genes of a single human require less than 10 MB (albeit the entire the human DNA sequence requires about 1 GB) Let C’ and d’ denote, the genome size and the coding density with the exception of highly repetitive sequences. About one third of high eukaryotic DNA corresponds to these sequences which are not transcribed, but may have structural properties. Therefore, C’=2 C/3 and d’=2 d/3. The superscript “prime” refers to the expurgated genome, i. e. highly repeated sequences apart.

expected gene distribution in the 23 human chromosomes chromosome Chr 1 Chr 2 Chr expected gene distribution in the 23 human chromosomes chromosome Chr 1 Chr 2 Chr 3 Chr 4 Chr 5 Chr 6 Chr 7 Chr 8 Chr 9 Chr 10 Chr 11 Chr 12 Chr 13 Chr 14 Chr 15 Chr 16 Chr 17 Chr 18 Chr 19 Chr 20 Chr 21 Chr 22 Chr X length (bp) 226, 828, 929 205, 000 195, 073, 306 115, 000 117, 696, 509 169, 212, 327 310, 210, 944 143, 297, 300 117, 790, 386 132, 016, 990 130, 908, 954 129, 826, 379 90, 000 87, 191, 216 81, 992, 482 79, 932, 432 79, 376, 966 74, 658, 403 55, 878, 340 59, 424, 990 33, 924, 367 34, 352, 072 152, 118, 949 predicted genes (unveiled genes) 2, 016 1, 822 (1, 346) 1, 734 1, 022 (796) 1, 046 (923) 1, 504 (1, 557) 1, 367 a (1, 150) 1, 274 1, 047 (1, 149) 1, 173 (816) 1, 163 1, 154 800 (633) 775 (1, 050) 729 711 (880) 705 663 497 b (1, 461) 528 (727) 301 c (225) 305 (545) 1, 352 (1, 098)

gene distribution in human chromosomes: • Genome size C=2, 881 Gbp; • Number of gene distribution in human chromosomes: • Genome size C=2, 881 Gbp; • Number of genes g=22, 525. The genes mean size Human karyogram (bp) in each chromosome is:

Chrom. C number (bp) genes& pseudo (only genes) 2, 585 Chr 2 [27] Chr Chrom. C number (bp) genes& pseudo (only genes) 2, 585 Chr 2 [27] Chr 4 237, 000 [27] Chr 6 186, 000 [28] Chr 9 166, 800, 000 [29] Chr 10 109, 044, 351 [30] Chr 13 131, 666, 441 [31] Chr 14 95, 500, 000 [32] Chr 20 87, 410, 661 [33] Chr 22 59, 187, 298 [34] 34, 491, 000 e (bp) -- (1, 346) 1, 574 (kbp) 5. 30 6. 60 -- 33. 8 34. 3 -- (796) 2, 190 (1, 557) 318 7, 208 5. 28 32. 5 342 6, 799 5. 77 a 34. 4 322 7, 817 5. 84 39. 7 320 9, 164 5. 20 40. 2 295 8, 194 6. 35 a 45. 7 292 5, 170 6. 00 27. 2 266 4, 037 5. 40 19. 2 1, 575 (1, 149) 1, 357 (816) 929 (633) 1, 443 (1, 050) 895 (727) 679 (545) Cromossomas humanos: Comprimentos médios

the average number of amino acid residues ( ) and the genomic rate (R) the average number of amino acid residues ( ) and the genomic rate (R) are shown. average number of amino acid residues ( L) genomic rate (R) Chrom. Chr 6 Chr 9 Chr 10 Chr 13 Chr 14 Chr 20 Chr 22 number (aa) 560 658 627 555 624 584 479 1. 56 1. 79 1. 17 1. 10 2. 36 2. 15 1. 82 R (%)

CONCLUSIONS • average length of ‘exon’ about 300 bp, • average length of ‘intron’ CONCLUSIONS • average length of ‘exon’ about 300 bp, • average length of ‘intron’ about 6, 900 bp, • mean of about 6 exons/gene • (from single-exon genes to 175 exon for the Titin gene!) • average number of residues for coded-proteins ~ 600 aa. ****** average protein size as a worthy criterion for assessing life complexity.

DNA-Error Control Code May Be Unstructured H. M. DE OLIVEIRA, N. S. SANTOS-MAGALHÃES The DNA-Error Control Code May Be Unstructured H. M. DE OLIVEIRA, N. S. SANTOS-MAGALHÃES The astonishing reliability by which deoxyribonucleic acid (DNA) has been preserved through ages implies that cell’s replication machinery have to ensure against copying mistakes. The replication machine is self-correcting and operates with a mean of 1 error per 107 nucleotides copied. Around 99% of such errors are corrected by the DNA mismatch repair mechanism, resulting 1 error per 109 nucleotides copied.

Introns & exons most eukaryotic genes have their coding sequences interrupted by noncoding regions Introns & exons most eukaryotic genes have their coding sequences interrupted by noncoding regions (the so-called introns, for intervening nontranscribed sequences). ‘Introns’ are usually longer than the ‘exons’. INTRONS: size ranging from 20 bp, to 250, 000 bp; EXONS: size ranging from 50 to 600 bp (average 300 bp). attempts in understanding the biological role of ‘introns’: no recognized functions were found.

Highly repetitive sequences: SINES (short interspersed elements) 13% of the genome, LINES (long interspersed Highly repetitive sequences: SINES (short interspersed elements) 13% of the genome, LINES (long interspersed elements. ) 21% of the genome. Repetitive DNA has commonly been regarded as “junk-DNA”, noncoding DNA: ‘introns’, 26% of the human genome. q Viruses and bacteria have a high fecundity and few gene families; Þ have little or almost no need for protection. q Plants and animals have high permanency. => Must be robust to mutations (survivors of natural selection)

Standard error correcting codes Þdesigned by imposing constraints on the sequences. Why using structured Standard error correcting codes Þdesigned by imposing constraints on the sequences. Why using structured codes? Answer : (mislead) belief that the decoding of random code is unfeasible. Due to the lack of structure => an exhaustive search. We think that Darwinian mechanisms for protecting DNA may be quite different. No parity rules should be looked for! (HMd. O)

we believe : ‘introns’ were the spontaneous mechanism of introducing uncertainty. ü In a we believe : ‘introns’ were the spontaneous mechanism of introducing uncertainty. ü In a battle, a crucial payload is to be sent to the front. If the only way is sending it through the battlefield, it should not be directly dispatched. Many fake-cargos could be added, and the relevant one will be hidden among them. If the enemy (noise, mutation) hardly tries to intercept this crucial delivery, he can now probably not succeed due to the amount of uncertainty added to the process. Many ineffective cargos (junk-cargos or ‘introns’) will be hit, but the main one will probably be missed. ü same strategy used in the safeguard of authorities such as Presidents of some nations (to include uncertain routes and second self. )

DNA coding has trivial decoding scheme (asynchronous start-stop protocol). q DNA code meet Battail’s DNA coding has trivial decoding scheme (asynchronous start-stop protocol). q DNA code meet Battail’s close-to-random criterion q Biological evolutionary codes match Shannon's paradigm: they are long truly random codes. We quote Battail: “Nature appears as an outstanding engineer…”

ARREMATE: Este seminário é essencialmente uma provocação! Se a Estatística lida com grandes massas ARREMATE: Este seminário é essencialmente uma provocação! Se a Estatística lida com grandes massas de dados (dados já disponíveis), comportamento inerentemente aleatório, as bases de dados de Genomas, disponíveis publicamente, são fonte de desafio para excelentes trabalhos e descobertas Obrigado. . . [email protected] br http: //www 2. ee. ufpe. br/codec/de. Oliveira. html

Ácidos Ribonucléicos - Tipos Ácidos Ribonucléicos - Tipos