
4b7c39103c0bf50418adc0647ec07928.ppt
- Количество слайдов: 29
Introductory to database handling Endre Sebestyén
What is a database? A database is a bunch of information It is a structured collection of information It contains basic objects, called records or entries The records contain fields, which contain defined types of data, somehow related to that record A nuclotid sequence database would contain for example all kinds of nucleotides as records, and nucleotide properties (length, name, origin, etc) as fields.
What is a database? A database is searchable It is updated regularly (releases) It contains an index (table of content, catalog) New data goes in Obsolete, old data goes out It is cross referenced To other databases
Why databases? The main purpose of databases is not only to collect and organize data, but to allow advanced data retrieval and analysis A database query is a method to retrieve information from the database The organization of records into fields allows us to use queries on fields Example : all mouse rna sequences between 1000 -1500 bp length
Databases on the internet WEBSERVERS USER DATABASE SERVER
Databases on the internet Information system Query system Storage system Data
Databases on the internet Information system Query system Storage system Data Book title Sequence Temperature Picture Video Log files of web servers etc
Databases on the internet Information system Query system Storage system Data Bookshelves Boxes Text files/directories Binary files My. SQL database Oracle database
Types of databases Hierarchical model Tree-like structures Parent -> child One to many relations
Types of databases Network model More complex than the previous Parent -> child One to many Many to one
Types of databases Relational model Most widely used Fast and efficient (if the data structure is designed correctly)
Databases on the internet Lists Catalogues Librarian Index files SQL language grep command
Query systems for databases SQL query language Querying and modifying data Managing the database Optimize queries SELECT * FROM sequence_feature WHERE sequence_primary_id LIKE ‘%$variable%’ SORT BY sequence_primary_id LIMIT 10; Multiple operating systems Different programming languages Different storage systems (My. SQL, Postgre. SQL, etc) Use SQL terminal Throught programming languages
Databases on the internet Library NCBI Entrez Google Lots of other general and specialized databases with search interfaces on the web
Case study: the Do. OP database Tries to collect and analyze the promoter regions of different genes and orthologous gene clusters http: //doop. abc. hu 2 main sections: plant and chordate Chordate: v 1. 4 Plant: v 1. 5, v 1. 6 Integrates different kinds of data Sequence annotation Cross-references to external databases Multiple alignments Conserved sequence regions Goal: easily accessible and searchable interface on the web
Data processing
My. SQL tables
My. SQL tables
My. SQL table
My. SQL tables
Data processing
API for the My. SQL database Application Programming Interface We want to convert the My. SQL data into nice webpages My. SQL query to get data: SELECT * FROM sequence_feature WHERE sequence_primary_id LIKE ‘%$variable%’ SORT BY sequence_primary_id LIMIT 10; And so on… Process the data OR with n API $data = $sequence_feature_object->get_data;
Bio: : DOOP API (More or less) simple representations of the sequence and other data -> modules and objects The API “hides” the My. SQL queries and other stuff from us, so we can concentrate on the web pages It works well only if we have good API design with all the necessary features Bio: : DOOP API modules Clusters Subsets Sequence features Motifs Other modules for managing, sorting and filtering the data
Search page Search types Sequence ID Gene ID Keywords Species Sequence
Search results Cluster ID Description Conserved motifs Taxonomical groups Download sequences
Promoter cluster Sequences Gene annotation Sequence alignment Crossreferences Conserved regions
Promoter cluster UTR region Species, size Motifs
Motifs Further search in the motif collection Similar table as in the previous search results
Thank you for your attention!
4b7c39103c0bf50418adc0647ec07928.ppt