Скачать презентацию Using Ontology Reasoning to Classify Protein Phosphatases K Скачать презентацию Using Ontology Reasoning to Classify Protein Phosphatases K

a224407b5889e478fd12c9f272e08bf7.ppt

  • Количество слайдов: 28

Using Ontology Reasoning to Classify Protein Phosphatases K. Wolstencroft, P. Lord, L. tabernero, A. Using Ontology Reasoning to Classify Protein Phosphatases K. Wolstencroft, P. Lord, L. tabernero, A. brass, R. stevens University of Manchester

Introduction Automated classification of proteins into protein subfamilies 1. 2. 3. 4. 5. Background Introduction Automated classification of proteins into protein subfamilies 1. 2. 3. 4. 5. Background Architecture Advantages Results Future directions

Motivation Biological data production fast - High throughput techniques - Large numbers of species Motivation Biological data production fast - High throughput techniques - Large numbers of species being sequenced - Large amount of data uncharacterised Data analysis is now the rate-limiting step

Why Classify? • Classification and curation of a genome is the first step in Why Classify? • Classification and curation of a genome is the first step in understanding the processes and functions happening in an organism • Classification enables comparative genomic studies - what is already known in other organisms • The similarities and differences between processes and functions in related organisms often provide the greatest insight into the biology

Protein Classification • Proteins divided into broad functional classes “Protein Families” - evolutionary relationships Protein Classification • Proteins divided into broad functional classes “Protein Families” - evolutionary relationships - common domain architecture • Relationship between sequence and structure allows searching for distinct structural (and functional) domains within the sequence • Domains could be several amino acids long – or could span most of the protein

Example A search of the linear sequence of protein tyrosine phosphatase type K – Example A search of the linear sequence of protein tyrosine phosphatase type K – identified 9 functional domains >uniprot|Q 15262|PTPK_HUMAN Receptor-type protein-tyrosine phosphatase kappa precursor (EC 3. 1. 3. 48) (R-PTP-kappa). MDTTAAAALPAFVALLLLSPWPLLGSAQGQFSAGGCTFDDGPGACDYHQDLYDDFEWVHV SAQEPHYLPPEMPQGSYMIVDSSDHDPGEKARLQLPTMKENDTHCIDFSYLLYSQKGLNP GTLNILVRVNKGPLANPIWNVTGFTGRDWLRAELAVSSFWPNEYQVIFEAEVSGGRSGYI AIDDIQVLSYPCDKSPHFLRLGDVEVNAGQNATFQCIATGRDAVHNKLWLQRRNGEDIPV………. .

Protein Family Classification • Often diagnostic domains/motif signify family membership e. g. ALL proteins Protein Family Classification • Often diagnostic domains/motif signify family membership e. g. ALL proteins with a tyrosine protein kinase-specific active site (IPR 008266) domain are types of tyrosine kinase

Current Techniques • Human expert classification – gold standard – human knowledge applied to Current Techniques • Human expert classification – gold standard – human knowledge applied to results from bioinformatics analysis tools • Automated use of bioinformatics analysis tools – quick – less detailed

Automated Methods Bioinformatics analysis tools • top BLAST hit - annotating as ‘similar to’ Automated Methods Bioinformatics analysis tools • top BLAST hit - annotating as ‘similar to’ other known proteins - Could result in protein A is similar to protein B, which is similar to protein C, which is similar to protein D etc, • Interpro Scan analysis - shows number and types of domains, but does not provide interpretations

Human Expert Annotation • Same similarity searching tools used for domain/motif identification • Humans Human Expert Annotation • Same similarity searching tools used for domain/motif identification • Humans use expert knowledge to classify proteins according to domain arrangements Presence / order / number of each important Can an ontology be used to capture this knowledge to the standard of a human annotator?

Ontology Approach • Use ontology to capture the ‘rules’ for protein family membership in Ontology Approach • Use ontology to capture the ‘rules’ for protein family membership in formal OWL representation • Ontology contains the human expert knowledge • Ontology reasoning can take the place of human analysis of the data

The Protein Phosphatases • large superfamily of proteins – involved in the removal of The Protein Phosphatases • large superfamily of proteins – involved in the removal of phosphate groups from molecules • Important proteins in almost all cellular processes • Involved in diseases – diabetes and cancer • human phosphatases well characterised

Phosphatase Functional Domains Andersen et al (2001) Mol. Cell. Biol. 21 7117 -36 Phosphatase Functional Domains Andersen et al (2001) Mol. Cell. Biol. 21 7117 -36

Determining Class Definitions R 5 - Contains 2 protein tyrosine phosphatase domains - Contains Determining Class Definitions R 5 - Contains 2 protein tyrosine phosphatase domains - Contains 1 transmembrane domain - Contains 1 fibronectin domains - Contains 1 carbonic anhydrase

Protégé OWL Modelling Protégé OWL Modelling

Requirements • Extract phosphatase sequences from rest of protein sequences from a whole genome Requirements • Extract phosphatase sequences from rest of protein sequences from a whole genome • Identify the domains present in each • Compare these sequences to the formal ontology descriptions • Classify each protein instance to a place in the hierarchy

Architecture OWL DL ontology Raw protein my. Grid Instance Classified Protein sequences Services Store Architecture OWL DL ontology Raw protein my. Grid Instance Classified Protein sequences Services Store Phosphatases Reasoner (racer)

my. Grid Services • extract protein phosphatase sequences from whole genome using simple filtering my. Grid Services • extract protein phosphatase sequences from whole genome using simple filtering – patmatdb EMBOSS tool used to extract proteins with phosphatase diagnostic motifs • perform Interpro. Scan to determine domain architecture • transform the Interpro. Scan results into abstract OWL instance descriptions

Interpro. Scan Results Interpro. Scan Results

Conversion to abstract OWL format restriction(<http: //www. owlontologies. com/unnamed. owl#contains. Domain. IPR 00034 0> Conversion to abstract OWL format restriction( cardinality(1)) restriction( cardinality(1)) restriction( cardinality(1))

Instance Store • Instance Store enables reasoning over individuals • Can support much higher Instance Store • Instance Store enables reasoning over individuals • Can support much higher numbers of individuals • OWL ontology is loaded into the instance store • A DL reasoner (racer) is used to compare individuals to the OWL ontology definitions

Instance Store Instance Store

Example Instances • Protein Individual Dual Specificity Phosphatase DUSE • Ontology Definition of Dual Example Instances • Protein Individual Dual Specificity Phosphatase DUSE • Ontology Definition of Dual Specificity Phosphatase restriction( cardinality(1)) restriction( cardinality(1)) contains. Domain IPR 000340 Necessary and Sufficient for class membership Also inherits contains. Domain IPR 000387 from Parent Class PTP

Results • Human phosphatases have been classified using the system • The ontology classification Results • Human phosphatases have been classified using the system • The ontology classification performed equally well as expert classification • The ontology system refined classification - DUSC contains zinc finger domain characterised and conserved – but not in classification - DUSA contains a disintegrin domain previously uncharacterised – evolutionarily conserved

Aspergillus fumigatus • Phosphatase proteins very different from human >100 human <50 A. fumigatus Aspergillus fumigatus • Phosphatase proteins very different from human >100 human <50 A. fumigatus • Whole subfamilies ‘missing’ Different fungi-specific phosphorylation pathways? No requirement for tissue-specific variations? • Novel serine/threonine phosphatase with homeobox conserved in aspergillus and closely related species, but not in any other - virulence

Ongoing Work • Phosphatases in other genomes – Trypanosomes – Plasmodium falciparum • Other Ongoing Work • Phosphatases in other genomes – Trypanosomes – Plasmodium falciparum • Other protein families – Ion Channels – ABC transporters – Nuclear receptors

Conclusions • Using ontology allows automated classification to reach the standard of human expert Conclusions • Using ontology allows automated classification to reach the standard of human expert annotation • Reasoning capabilities allow interpretation of domain organisation • Highlights anomalies and variations from what is known • Allows fast, efficient comparative genomics studies

Acknowledgements Ph. D Supervisors: Andy Brass, Robert Stevens Group: my. Grid, Phil Lord, Carole Acknowledgements Ph. D Supervisors: Andy Brass, Robert Stevens Group: my. Grid, Phil Lord, Carole Goble Phosphatase Biologist: Lydia Tabernero Medical Research Council