
7b7d0a97661d1fc5ab0e4d3e5343cf57.ppt
- Количество слайдов: 39
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Multiple Alignment
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Multiple Alignment versus Pairwise Alignment • Up until now we have only tried to align two sequences.
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Multiple Alignment versus Pairwise Alignment • Up until now we have only tried to align two sequences. • What about more than two? And what for?
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Multiple Alignment versus • Up until now we have only Pairwise Alignment tried to align two sequences. • What about more than two? And what for? • A faint similarity between two sequences becomes significant if present in many • Multiple alignments can reveal subtle similarities that pairwise alignments do not reveal
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Generalizing the Notion of Pairwise Alignment • Alignment of 2 sequences is represented as a 2 -row matrix • In a similar way, we represent alignment of 3 sequences as a 3 -row matrix A T _ G C G _ A _ C G T _ A A T C A C _ A • Score: more conserved columns, better alignment
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Alignments = Paths in… • Align 3 sequences: ATGC, AATC, ATGC A -- T G C A A T -- C -- A T G C
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Alignment Paths 0 1 1 2 3 4 A -- T G C A A T -- C -- A T G C x coordinate
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Alignment Paths • Align the following 3 sequences: ATGC, AATC, ATGC 0 2 3 4 -- T G C 1 2 3 3 4 A A T -- C -- • 1 A 0 1 A T G C x coordinate y coordinate
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Alignment Paths 0 2 3 4 -- T G C 1 2 3 3 4 A A T -- C 0 1 2 3 4 -- 0 1 A T G C x coordinate y coordinate z coordinate • Resulting path in (x, y, z) space: (0, 0, 0) (1, 1, 0) (1, 2, 1) (2, 3, 2) (3, 3, 3) (4, 4, 4)
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Aligning Three Sequences • • • Same strategy as aligning two sequences Use a 3 -D “Manhattan Cube”, with each axis representing a sequence to align For global alignments, go from source to sink source sink
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info 2 -D vs 3 -D Alignment Grid V W 2 -D edit graph 3 -D edit graph
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info 2 -D cell versus 2 -D Alignment Cell In 2 -D, 3 edges in each unit square In 3 -D, 7 edges in each unit cube
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Architecture of 3 -D Alignment Cell (i-1, j, k-1) (i-1, j-1, k-1) (i-1, j, k) (i-1, j-1, k) (i, j, k-1) (i, j-1, k) (i, j, k)
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Multiple Alignment: Dynamic Programming • si, j, k = max si-1, j-1, k-1 + (vi, wj, uk) si-1, j-1, k + (vi, wj, _ ) si-1, j, k-1 + (vi, _, uk) si, j-1, k-1 + (_, wj, uk) si-1, j, k + (vi, _ , _) si, j-1, k + (_, wj, _) si, j, k-1 + (_, _, uk) cube diagonal: no indels face diagonal: one indel edge diagonal: two indels • (x, y, z) is an entry in the 3 -D scoring matrix
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Multiple Alignments: Scoring • Number of matches (multiple longest common subsequence score) • Sum of pairs (SP-Score)
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Multiple LCS Score • A column is a “match” if all the letters in the column are the same AAA AAT ATC • Only good for very similar sequences
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Sum of Pairs Score(SP-Score) • Consider pairwise alignment of sequences ai and aj imposed by a multiple alignment of k sequences • Denote the score of this suboptimal (not necessarily optimal) pairwise alignment as s*(ai, aj) • Sum up the pairwise scores for a multiple alignment: s(a 1, …, ak) = Σi, j s*(ai, aj)
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Computing SP-Score Aligning 4 sequences: 6 pairwise alignments Given a 1, a 2, a 3, a 4: s(a 1…a 4) = s*(ai, aj) = s*(a 1, a 2) + s*(a 1, a 3) + s*(a 1, a 4) + s*(a 2, a 3) + s*(a 2, a 4) + s*(a 3, a 4)
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info SP-Score: Example a 1 ATG-C-AAT. A-G-CATAT ak ATCCCATTT To calculate each column: s Pairs of Sequences s*( A 1 A G 1 1 Score=3 A Column 1 1 -m C -m G Column 3 Score = 1 – 2 m
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Multiple Alignment: Running Time • For 3 sequences of length n, the run time is 7 n 3; O(n 3) • For k sequences, build a k-dimensional Manhattan, with run time (2 k-1)(nk); O(2 knk) • Conclusion: dynamic programming approach for alignment between two sequences is easily extended to k sequences but it is impractical due to exponential running time
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Multiple Alignment Induces Pairwise Alignments Every multiple alignment induces pairwise alignments x: y: z: AC-GCGG-C AC-GC-GAG GCCGC-GAG Induces: x: ACGCGG-C; y: ACGC-GAC; x: AC-GCGG-C; z: GCCGC-GAG; y: AC-GCGAG z: GCCGCGAG
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Reverse Problem: Constructing Multiple Alignment from Pairwise Alignments Given 3 arbitrary pairwise alignments: x: ACGCTGG-C; y: ACGC--GAC; x: AC-GCTGG-C; z: GCCGCA-GAG; y: AC-GC-GAG z: GCCGCAGAG can we construct a multiple alignment that induces them?
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Reverse Problem: Constructing Multiple Alignment from Pairwise Alignments Given 3 arbitrary pairwise alignments: x: ACGCTGG-C; y: ACGC--GAC; x: AC-GCTGG-C; z: GCCGCA-GAG; y: AC-GC-GAG z: GCCGCAGAG can we construct a multiple alignment that induces them? NOT ALWAYS Pairwise alignments may be inconsistent
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Inferring Multiple Alignment from Pairwise Alignments • From an optimal multiple alignment, we can infer pairwise alignments between all pairs of sequences, but they are not necessarily optimal • It is difficult to infer a ``good” multiple alignment from optimal pairwise alignments between all sequences
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Combining Optimal Pairwise Alignments into Multiple Alignment Can combine pairwise alignments into multiple alignment Can not combine pairwise alignments into multiple alignment
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Profile Representation of Multiple Alignment T C C C A C G T - A A A G G G – – C C C T T T 1 A A A T C C T T 1 . 6 1 . 4 1 C C – – T G G G G . 4. 2. 4. 8. 4 1 . 6. 2. 2 1. 8 A A G. 8 1. 2. 2. 2 C C C . 6
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Profile Representation of Multiple Alignment T C C C A C G T - A A A G G G – – C C C T T T 1 A A A T C C T T 1 . 6 1 A A G . 4 1 C – – T G G G G . 4. 2. 4. 8. 4 1 . 6. 2. 2 1 C C C . 8 1. 2. 2. 2 C C C . 6 . 8 In the past we were aligning a sequence against a sequence Can we align a sequence against a profile? Can we align a profile against a profile?
An Introduction to Bioinformatics Algorithms Aligning alignments • Given two alignments, can we align them? x GGGCACTGCAT y GGTTACGTC-z GGGAACTGCAG w GGACGTACC-v GGACCT----- Alignment 1 Alignment 2 www. bioalgorithms. info
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Aligning alignments • • Given two alignments, can we align them? Hint: use alignment of corresponding profiles x y z w v GGGCACTGCAT GGTTACGTC-GGGAACTGCAG GGACGTACC-GGACCT----- Combined Alignment
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Multiple Alignment: Greedy Approach • • Choose most similar pair of strings and combine into a profile , thereby reducing alignment of k sequences to an alignment of of k-1 sequences/profiles. Repeat This is a heuristic greedy method u 1= ACGTACGT… u 2 = TTAATTAA… k u 1= ACg/t. TACg/c. T… u 2 = TTAATTAA… u 3 = ACTACT… … … uk = CCGGCCGGCCGG k-1
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Greedy Approach: Example • Consider these 4 sequences s 1 s 2 s 3 s 4 GATTCA GTCTGA GATATT GTCAGC
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Greedy Approach: Example (cont’d) • There are = 6 possible alignments s 2 s 4 GTCTGA GTCAGC (score = 2) s 1 s 4 GATTCA-G—T-CAGC(score = 0) s 1 s 2 GAT-TCA G-TCTGA (score = 1) s 2 s 3 G-TCTGA GATAT-T (score = -1) s 1 s 3 GAT-TCA GATAT-T (score s 3 s 4 GAT-ATT G-TCAGC (score = -1) = 1)
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Greedy Approach: Example (cont’d) s 2 and s 4 are closest; combine: s 2 s 4 GTCTGA GTCAGC s 2, 4 GTCt/a. Ga/c. A (profile) new set of 3 sequences: s 1 s 3 s 2, 4 GATTCA GATATT GTCt/a. Ga/c
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Progressive Alignment • Progressive alignment is a variation of greedy algorithm with a somewhat more intelligent strategy for choosing the order of alignments. • Progressive alignment works well for close sequences, but deteriorates for distant sequences • Gaps in consensus string are permanent • Use profiles to compare sequences
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Clustal. W • Popular multiple alignment tool today • ‘W’ stands for ‘weighted’ (different parts of alignment are weighted differently). • Three-step process 1. ) Construct pairwise alignments 2. ) Build Guide Tree 3. ) Progressive Alignment guided by the tree
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Step 1: Pairwise Alignment • Aligns each sequence against each other giving a similarity matrix • Similarity = exact matches / sequence length (percent identity) v 1 v 2 v 3 v 4. 17. 87. 28. 59. 33. 62 - (. 17 means 17 % identical)
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Step 2: Guide Tree • Create Guide Tree using the similarity matrix • Clustal. W uses the neighbor-joining method • Guide tree roughly reflects evolutionary relations
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Step 2: Guide Tree (cont’d) v 1 v 2 v 3 v 4. 17. 87. 28. 59. 33. 62 - v 1 v 3 v 4 v 2 Calculate: v 1, 3 = alignment (v 1, v 3) v 1, 3, 4 = alignment((v 1, 3), v 4) v 1, 2, 3, 4 = alignment((v 1, 3, 4), v 2)
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info Step 3: Progressive Alignment • Start by aligning the two most similar sequences • Following the guide tree, add in the next sequences, aligning to the existing alignment • Insert gaps as necessary FOS_RAT FOS_MOUSE FOS_CHICK FOSB_MOUSE FOSB_HUMAN PEEMSVTS-LDLTGGLPEATTPESEEAFTLPLLNDPEPK-PSLEPVKNISNMELKAEPFD PEEMSVAS-LDLTGGLPEASTPESEEAFTLPLLNDPEPK-PSLEPVKSISNVELKAEPFD SEELAAATALDLG----APSPAAAEEAFALPLMTEAPPAVPPKEPSG--SGLELKAEPFD PGPGPLAEVRDLPG-----STSAKEDGFGWLLPPPPPPP---------LPFQ PGPGPLAEVRDLPG-----SAPAKEDGFSWLLPPPPPPP---------LPFQ. . : **. : . . *: . * **: Dots and stars show well-conserved a column is.
7b7d0a97661d1fc5ab0e4d3e5343cf57.ppt