Скачать презентацию Multiple Sequence Alignment Dynamic Programming Multiple Sequence Скачать презентацию Multiple Sequence Alignment Dynamic Programming Multiple Sequence

ac7b8b2d416b93d351c0f9be0497a3e7.ppt

  • Количество слайдов: 19

Multiple Sequence Alignment Dynamic Programming Multiple Sequence Alignment Dynamic Programming

Multiple Sequence Alignment VTISCTGSSSNIGAG NHVKWYQQLPG VTISCTGTSSNIGS ITVNWYQQLPG LRLSCSSSGFIFSS YAMYWVRQAPG LSLTCTVSGTSFDD YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG ATLVCLISDFYPGA VTVAWKADS Multiple Sequence Alignment VTISCTGSSSNIGAG NHVKWYQQLPG VTISCTGTSSNIGS ITVNWYQQLPG LRLSCSSSGFIFSS YAMYWVRQAPG LSLTCTVSGTSFDD YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG ATLVCLISDFYPGA VTVAWKADS AALGCLVKDYFPEP VTVSWNSG- VSLTCLVKGFYPSD IAVEWESNG- • Goal: Bring the greatest number of similar characters into the same column of the alignment • Similar to alignment of two sequences.

CLUSTALW MSA of four oxidoreductase NAD binding domain protein sequences. Red: AVFPMILW. Blue: DE. CLUSTALW MSA of four oxidoreductase NAD binding domain protein sequences. Red: AVFPMILW. Blue: DE. Magenta: RHK. Green: STYHCNGQ. Grey: all others. Residue ranges are shown after sequence names. Chenna et al. Nucleic Acids Research, 2003, Vol. 31, No. 13 3497 -3500

Multiple Sequence Alignment: Motivation • Correspondence. Find out which parts “do the same thing” Multiple Sequence Alignment: Motivation • Correspondence. Find out which parts “do the same thing” – Similar genes are conserved across widely divergent species, often performing similar functions • Structure prediction – Use knowledge of structure of one or more members of a protein MSA to predict structure of other members – Structure is more conserved than sequence • Create “profiles” for protein families – Allow us to search for other members of the family • Genome assembly: Automated reconstruction of “contig” maps of genomic fragments such as ESTs • MSA is the starting point for phylogenetic analysis

Multiple Sequence Alignment: Approaches • Optimal Global Alignments -Dynamic programming – Generalization of Needleman-Wunsch Multiple Sequence Alignment: Approaches • Optimal Global Alignments -Dynamic programming – Generalization of Needleman-Wunsch – Find alignment that maximizes a score function – Computationally expensive: Time grows as product of sequence lengths • Global Progressive Alignments - Match closelyrelated sequences first using a guide tree • Global Iterative Alignments - Multiple re-building attempts to find best alignment • Local alignments – Profiles, Blocks, Patterns

Scoring a multiple alignment A A A C Sum of pairs A A C Scoring a multiple alignment A A A C Sum of pairs A A C C A Star A C Tree

Sum of Pairs AAA AAA AAC ACC A A A 10α A A C Sum of Pairs AAA AAA AAC ACC A A A 10α A A C + (6α - 4β) = 20α - 10β A C + (4α - 6β)

Sum-of-Pairs Scoring Function Score of multiple alignment = ∑i <j score(Si, Sj) where score(Si, Sum-of-Pairs Scoring Function Score of multiple alignment = ∑i

Induced Pairwise Alignment S 1 S 2 S 3 S - T I S Induced Pairwise Alignment S 1 S 2 S 3 S - T I S C T G - S - N I L - T I – C N G S S - N I L R T I S C S G F S Q N I Induced pairwise alignment of S 1, S 2: S 1 S 2 S T I S C T G - S N I L T I – C N G S S N I

MSA: Dynamic Programming • The two-sequence alignment algorithm can be generalized to any number MSA: Dynamic Programming • The two-sequence alignment algorithm can be generalized to any number of sequences. • E. g. , for three sequences X, Y, W define C[i, j, k] = score of optimum alignment among X[1. . i], Y[1. . j], W[1. . k] • As for two sequences, divide possible alignments into different classes, depending on how they end. – Use to devise recurrence relations for C[i, j, k] – C[i, j, k] is the maximum out of all possibilities

MSA: 7 ways alignment can end for 3 sequences Xi Yj Wk X 1. MSA: 7 ways alignment can end for 3 sequences Xi Yj Wk X 1. . . Xi-1 Xi Y 1. . . Yj-1 Yj W 1. . . Wk-1 Wk Xi - Yj - Wk Xi Yj - Yj Wk Xi Wk

Dynamic programming for three sequences Each alignment is a path through the dynamic programming Dynamic programming for three sequences Each alignment is a path through the dynamic programming matrix A S V S N —S —S N A — ———A S A N S Start V S N S

Dynamic Programming for Three Sequences There are 7 ways to get to C[i, j, Dynamic Programming for Three Sequences There are 7 ways to get to C[i, j, k] C[i-1, j, k-1] C[i-1, j-1, k-1] C[i-1, j, k-1] For 3 seqs. of length n, time is proportional to n 3 Enumerate all possibilities and choose the best one

Dynamic Programming MSA: General Case • For k sequences of length n, dynamic programming Dynamic Programming MSA: General Case • For k sequences of length n, dynamic programming algorithm does (2 k-1) nk operations – Example: 6 sequences of length 100 require 6. 4 X 1013 calculations • Space for table is nk • Implementations (e. g. , Wash. U MSA 2. 1) use tricks and only search subset of dynamic programming table – Even this is expensive. E. g. , Baylor CM Search launcher limits MSA to 8 sequences of 800 characters and 10 minutes processing time

Problems with SP scoring • Pair-wise comparisons can over-score evolutionarily distant pairs. • Reason: Problems with SP scoring • Pair-wise comparisons can over-score evolutionarily distant pairs. • Reason: For 3 or more sequences, SP scoring does not correspond to any evolutionary tree But not:

Overcoming problems with SP scoring • Use weights to incorporate evolution in sum of Overcoming problems with SP scoring • Use weights to incorporate evolution in sum of pairs scoring: – Some pair-wise alignments are more important than others • E. g. , more important to have a good alignment between mouse and human sequences than mouse and bird – Assign different weights to different pair-wise alignments. • Weight decreases with evolutionary distance. • Use star tree approach – one sequence is assigned as the ancestor and all others are contrasted it.

Star Alignments • Construct multiple alignments using pair-wise alignment relative to a fixed sequence Star Alignments • Construct multiple alignments using pair-wise alignment relative to a fixed sequence • Out of a set S = {S 1, S 2, . . . , Sr} of sequences, pick sequence Sc that maximizes star_score(c) = ∑ {sim(Sc, Si) : 1 ≤ i ≤ r, i ≠ c} where sim(Si, Sj) is the optimal score of a pair-wise alignment between Si and Sj

Algorithm 1. Compute sim(Si, Sj) for every pair (i, j) 2. Compute star_score(i) for Algorithm 1. Compute sim(Si, Sj) for every pair (i, j) 2. Compute star_score(i) for every i 3. Choose the index c that minimizes star_score(c) and make it the center of the star 4. Produce a multiple alignment M such that, for every i, the induced pairwise alignment of Sc and Si is the same as the optimum alignment of Sc and Si.

Step 4: Detail Sc AA--CCTT Sc A-ACC-TT S 1 AATGCC-- S 2 AGACCGT- Sc Step 4: Detail Sc AA--CCTT Sc A-ACC-TT S 1 AATGCC-- S 2 AGACCGT- Sc A-A--CC-TT S 1 A-ATGCC--- S 2 AGA--CCGT-