
232061cb8d8702a10653c214cffd8cb7.ppt
- Количество слайдов: 31
and Tools for exploring the biomedical information landscape Les Grivell EMBO Electronic Information Programme EAHIL 2004, Santander,
Electronic information programme Online research information environment for the life sciences A next generation information service for the life sciences [email protected] Life Sciences Mobility Portal
But first, let me take you back – not to Altomira, but to the …… early days of scientific publishing (pre- impact factor)
When libraries were comfortable places that had everything you needed …
and it was possible to keep track of the literature …. (more or less) …
Where are we now? – Publishing is big business • STM publishing is a multi-billion EUR activity (In the UK alone, GBP 22 billion in 2000) • Estimated 164000 scientific periodicals worldwide; around 16% of these are online
– Core science; core journals • Pub. Med lists some 4600 journals in biomedical disciplines • As of 19 Sept 2004, 4429 of these are online • The Pub. Med database provides access to circa 15 million abstracts (but if you can’t be found, you won’t be read …) • The Science Citation Index lists 5876 journals with impact factors ranging from 54. 45 – 0. 00. (you’ve been found, but are you worth reading? …)
Another information explosion: genomics 35 Base pairs (billions) 30 25 Sequence entries in the EMBL DNA database 20 15 10 Morowitz 5 0 1985 1990 Year 1995 2000 2005
Raw sequences are not the only form of digital information
The nice thing about biological information resources is that there are so many …. . • Hundreds of different databases, many in flatfile format • A variety of user interfaces • General lack of interoperability
Wouldn’t it be nice to …… find all published literature references for a large set of gene symbols and explore their relationships? Micro-array chip Co-regulated genes Find literature Database lookup Discover relationships
This is not really such a novel idea ….
Fritz Saxl (1890– 1948) ‘Ich will nicht, dass in der Bibliothek I don’t want there to be endless ewig gesucht wird! Dieses Suchen kostet searching in the library! It is at the Nerven und die dürfen nicht expense of nerves and these verschwendet werden an solche should not be wasted on such Dummheiten. . . stupidities…. Aby Warburg (1866– 1929)
Saxl & Warburg: Mnemosyne Atlas
Some text search engines Bibliographic databases Biosi s Full text / web-pages
Pubmed Text-based! No direct linkage to other datasets Search only title, authors, abstract Boolean keyword search (AND / OR) Search language is English All documents stored and indexed in one location No ranking on relevance to query!
main features • Ability to interconnect literature articles with different types of molecular data, including images • Ability to search through and retrieve journal articles and other full text documents, even when in different physical locations • Ability to support multi-lingual documents and queries • Services free to the academic community Features implemented via conceptual fingerprinting A discovery tool
conceptual fingerprints Full text document Index and link index terms to (multi-lingual) thesauri • 1 conceptual fingerprint (CFP) = 400 bytes • Abstraction: 250. 000 pages/PC/day • Matching: 500. 000 CFP’s: 40 millisec. Fingerprint database
prototypes • Initial prototypes in September 2002 and July 2003 • Current prototype online since 1 st March 2004 • Next launch due mid. October 2004
E-Bio. Sci Content selection: abstracts + full text Choose search focus Full text query in English, French or German. Is fingerprinted for search
… and now a word about 8 partners ( DE, ES, FR, UK) (Platform) 13 partners (ES, FR, IT, NL, UK) (Research project)
Oriel’s aims
Wouldn’t it be nice to be able to navigate from an image to literature and molecular databases? www. bioimage. org (Dr David Shotton, Univ. Oxford)
Gene symbol identification in text Text containing symbols
Improved literature – molecular dataset linkage PEO 1 Twinkle, twinkle, little star, How I wonder what you are. Up above the world so high, Like a diamond in the sky. Twinkle, twinkle, little star, How I wonder what you are GUCY 2 C TYRO 3 CD 44
Problems in gene symbol recognition • Many gene symbols are indistinguishable from everyday words or abbreviations • Synonyms • Homonym synonyms (ELK 1 = SAP 1; CAR 1 = SAP 1; BD-2 = SAP 1; RIP 1_SAPOF = SAP 1)
Word-“processing” e gen pr DA FR ote in de ple tax tion in fra ase req e dis uire d p h 1 Ya s ate iv act
Natural language processing
Protein interaction networks ataxia requires Yfh 1 regulates Ssc 1 Isu 1 interacts activates Oct 1
Hoffman & Valencia (Madrid)
Some web-addresses http: //www. e-biosci. org http: //www. oriel. org http: //www. bioimage. org http: //www. pdg. cnb. uam. es/Uni. Pub/i. HOP/