70761c924000f40a2c6b8db5b49497a2.ppt
- Количество слайдов: 36
CSCE 555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4: 00 PM-5: 15 PM SWGN 2 A 21 Instructor: Dr. Jianjun Hu Course page: http: //www. scigen. org/csce 555 University of South Carolina Department of Computer Science and Engineering 2008 www. cse. sc. edu.
Outline Experimental limitation of protein structure determination Tertiary Structure Prediction ◦ AB initio ◦ Homology modeling ◦ Threading
Experimental Protein Structure Determination High-resolution structure determination ◦ X-ray crystallography (<1 A ) ◦ Nuclear magnetic resonance (NMR) (~1 -2. 5 A ) Lower-resolution structure determination ◦ Cryo-EM (electron-microscropy) ~10 -15 A Theoretical Models? ◦ Highly variable - but a few equiv to X-ray!
Tertiary Structure Prediction Fold or tertiary structure prediction problem can be formulated as a search for minimum energy conformation ◦ Search space is defined by psi/phi angles of backbone and sidechain rotamers ◦ Search space is enormous even for small proteins! ◦ Number of local minima increases exponentially with number of residues Computationally it is an exceedingly difficult problem!
Levinthal Paradox of Protein Folding: How nature does search? We assume that there are three conformations for each amino acid (ex. α-helix, β-sheet and random coil). If a protein is made up of 100 amino acid residues, a total number of conformations is 3100 = 515377520732011331036461129765621272702107522001 ≒ 5 x 1047. If 100 psec (10 -10 sec) were required to convert from a conformation to another one, a random search of all conformations would require 5 x 1047 x 10 -10 sec ≒ 1. 6 x 1030 years. However, folding of proteins takes place in msec to sec order. Therefore, proteins fold not via a random search but a more sophisticated search process. We want to watch the folding process of a protein using molecular simulation techniques.
Steps in Protein Folding 1 - "Collapse"- driving force is burial of hydrophobic aa’s (fast - msecs) 2 - Molten globule - helices & sheets form, but "loose" (slow - secs) 3 - "Final" native folded state - compaction, some 2' structures rearranged Native state? - assumed to be lowest free energy - may be an ensemble of structures
Protein Folding Funnel Local mimina Global minimum Native Structure 7
Protein Structure Prediction Ab initio ◦ Use just first principles: energy, geometry, and kinematics Homology ◦ Find the best match to a database of sequences with known 3 D-structure Combinations Threading Meta-servers and other methods Knowledge based approaches
Ab Initio Prediction Basic idea Anfinsen’s theory: Protein native structure corresponds to the state with the lowest free energy of the protein-solvent system. General procedures ◦ Develop a Potential/Energy function Evaluate the energy of protein conformation Select native structure ◦ Conformational search algorithm To produce new conformations Search the potential energy surface and locate the global minimum (native conformation) Provides both folding pathway & folded structure Can only apply to very small proteins 9
Potential Functions for PSP Potential function ◦ Physical based energy function Empirical all-atom forcefields: CHARMM, AMBER, ECEPP-3, GROMOS, OPLS Parameterization: Quantum mechanical calculations, experimental data Simplified potential: UNRES (united residue) ◦ Solvation energy Implicit solvation model: Generalized Born (GB) model, surface area based model Explicit solvation model: TIP 3 P (computationally expensive) 10
General Form of All-atom Forcefields Φ r Θ Bond stretching term Angle bending term Dihedral term The most time demanding part. H-bonding term O r H Van der Waals term r Electrostatic term + r ー 11
Search Potential Energy Surface We are interested in minimum points on Potential Energy Surface (PES) Conformational search techniques Energy Minimization Monte Carlo Molecular Dynamics Others: Genetic Algorithm, Simulated Annealing 12
Energy Minimization Local miminum Energy minimization Methods First-order minimization: Steepest descent, Conjugate gradient minimization Second derivative methods: Newton-Raphson method Quasi-Newton methods: L-BFGS 13
Monte Carlo In molecular simulations, ‘Monte Carlo’ is an importance sampling technique. 1. Make random move and produce a new conformation 2. Calculate the energy change E for the new conformation 3. Accept or reject the move based on the Metropolis criterion Boltzmann factor If E<0, P>1, accept new conformation; Otherwise: P>rand(0, 1), accept, else reject. 14
Ab initio Prediction – CASP results
Comparative Modeling (Knowledge based approach) Two primary methods 1) Homology modeling 2) Threading (fold recognition) Both rely on availability of experimentally determined structures that are "homologous" or at least structurally very similar to target Provide folded structure only
Homology Modeling 1. 2. Identify homologous protein sequences ( -BLAST) Among available structures, choose the one with closest sequence match to target as template (can combine steps 1 & 2 by using PDB-BLAST) 3. Build model by placing residues in corresponding positions of homologous structure & refine by "tweaking" Ø Homology modeling - works "well" • • Computationally? not very expensive Accuracy? higher sequence identity better model Ø Requires ~30% sequence identity with sequence for which structure is known
Homology-based Prediction Raw model Loop modeling Side chain placement Refinement
Homology-based Prediction
Threading - Fold Recognition Identify “best” fit between target sequence & template structure Ø Threading - works "sometimes" • Computationally? Can be expensive or cheap, depends on energy function & whether "all atom" or "backbone only" threading • Accuracy? in theory, should not depend on sequence identity (should depend on quality of template library & "luck") Ø Usually, higher sequence identity to protein of known structure better model
Threading Algorithm for PSP Database of 3 D structures and sequences ◦ Protein Data Bank (or non-redundant subset) Query sequence ◦ Sequence < 25% identity to known structures Alignment protocol ◦ Dynamic programming Evaluation protocol ◦ Distance-based potential or secondary structure Ranking protocol 3. 3 b 21
Threading Basic premise: The number of unique structural folds in nature is fairly small (probably 2000 -3000) Statistics from Protein Data Bank (~40, 000 structures) Until very recently, 90% of new structures submitted to PDB had similar structural folds in PDB Thus, chances for a protein to have a native-like structural fold in PDB are quite good ◦ Note: Proteins with similar structural folds could be either homologs or analogs
Steps in Threading Target Sequence ALKKGF…HFDTSE Structure Templates 1. Align target sequence with template structures (fold library) from the Protein Data Bank (PDB) 2. Calculate energy score to evaluate goodness of fit between target sequence & template structure 3. Rank models based on energy scores
Threading Issues Find “correct” sequence-structure alignment of a target sequence with its native-like fold in PDB Structure database - must be complete: no decent model if no good template in library! Sequence-structure alignment algorithm: Bad alignment Bad score! Energy function (scoring scheme): must distinguish correct sequence-fold alignment from incorrect sequence-fold alignments must distinguish “correct” fold from close decoys Prediction reliability assessment - How determine whether predicted structure is correct? (or even close? )
Threading: Template database Build a database of structural templates (eg, ASTRAL domain library derived from the PDB) Supplement with additional decoys, e. g. , generated using ab initio approach such as Rosetta (Baker)
Threading: Energy function Two main methods (and combinations of these) Structural profile (environmental) physico-chemical properties of aa’s Contact potential (statistical) based on contact statistics from PDB Miyazawa & Jernigan (ISU)
Protein Threading: Typical energy function What is "probability" that two specific residues are in contact? How well does a specific residue fit structural environment? Alignment gap penalty? Total energy: Ep + Es + Eg Goal: Find a sequence-structure alignment that minimizes the energy function
CAFASP GOAL The goal of CAFASP is to evaluate the performance of fully automatic structure prediction servers available to the community. In contrast to the normal CASP procedure, CAFASP aims to answer the question of how well servers do without any intervention of experts, i. e. how well ANY user using only automated methods can predict protein structure. CAFASP assesses the performance of methods without the user intervention allowed in CASP.
Performance Evaluation in CAFASP 3 Sum Max. Sub Score # correct (30 FR targets) 3 ds 5 robetta 5. 17 -5. 25 15 -17 pmod 3 ds 3 pmode 3 4. 21 -4. 36 13 -14 RAPTOR 3. 98 13 shgu Servers with name in italic are meta servers Servers (54 in total) 3. 93 13 3. 64 -3. 90 12 -13 3. 75 12 3. 38 -3. 67 11 -12 … … 0. 00 0 3 dsn Max. Sub score ranges from 0 to pcons 3 1 fugu 3 orf_c Therefore, … maximum total pdbblast score is 30 (http: //ww. cs. bgu. ac. il/~dfischer/CAFASP 3, released in December, 2002. )
One structure where RAPTOR did best Red: true structure Blue: correct part of prediction Green: wrong part of prediction • Target Size: 144 • Super-imposable size within 5 A: 118 • RMSD: 1. 9
Some more results by other programs
Some more results by other programs
Some more results by other programs
Summary of current state of the art
Automated Web-Based Homology Modeling q SWISS Model : http: //www. expasy. org/swissmod/SWISSMODEL. html q WHAT IF : http: //www. cmbi. kun. nl/swift/servers/ q The CPHModels Server : http: //www. cbs. dtu. dk/services/CPHmodels/ q 3 D Jigsaw : http: //www. bmm. icnet. uk/~3 djigsaw/ q SDSC 1 : http: //cl. sdsc. edu/hm. html q Esy. Pred 3 D : http: //www. fundp. ac. be/urbm/bioinfo/esypred/
Comparative Modeling Server & Program q COMPOSER http: //www. tripos. com/sci. Tech/in. Silico. Disc/bio. Informatics/matchm aker. html q MODELER http: //salilab. org/modeler q Insight. II http: //www. msi. com/ q SYBYL http: //www. tripos. com/
70761c924000f40a2c6b8db5b49497a2.ppt