
e6592962d1f9d8c695c3798ab2fae95d.ppt
- Количество слайдов: 67
Use of Computers in Molecular Biology Meena K Sakharkar Training Manager, Bio. Informatics Centre National University of Singapore
What is Bio. Informatics? • Many related terms and buzzwords • A multiplicity of names: – – – bioinformatics biocomputing biological computing computational biology computational genomics biological data mining
Overview of the challenges of Molecular Biology Computing • The huge dataset problem – automated DNA sequencers – the Human Genome Project – bulk sequencing of c. DNAs (ESTs)
Bases Gen. Bank Growth Chart Year As of Oct. 1999, Gen. Bank contains over 3. 8 billion bases of DNA and protein sequence, which requires about 18 gigabytes of computer disk storage space.
Human Genome Project • What is the Human Genome Project? – 15 -year effort formally begun in October 1990. coordinated by the U. S. Department of Energy and the National Institutes of Health. – identify all the estimated 80, 000 genes in human DNA, – determine the sequences of the 3 billion chemical bases that make up human DNA, – store this information in databases, – develop tools for data analysis, and – address the ethical, legal, and social issues (ELSI) that may arise from the project.
• Who is head of the U. S. Human Genome Project? – The DOE Human Genome Program is directed by Ari Patrinos, and Francis Collins directs the NIH Human Genome Program. – Ari Patrinos also heads the Department of Energy Office of Biological and Environmental Research.
Related fields • • • molecular evolution origin of life genomics and proteomics the Human Genome Project theoretical biology complexity and information theory biotechnology lead drug discovery computing with biomolecules
Our ( working) definition • Bioinformatics: the body of tools, algorithms and know-how needed to handle complex biological information the technological aspect • Computational biology: the application of bioinformatics tools to perform biological studies the scientific aspect very broad and diverse field
• Bioinformatics is clearly a multi disciplinary field including: – computer systems management – networking, database design – computer programming – molecular biology
Integrating bioinformatics and computational biology: • A biologist can use existing tools but might misinterpret results The black-box effect - the 'software kit' • A biologist might refrain from doing some interesting analysis if the existing software doesn't offer it as an option The ability to program is important • A computer scientist or a programmer can produce interesting and/or efficient algorithms and tools, but these might lack biological relevance. A biological training/background is important • Beware of the 'just a tool maker' stigma • Best results are achieved by integrating the development of tools with their usage in interesting biological systems
How to handle all the information? • • • Producing Processing Storing Sharing Querying Retrieving Visualising Annotating Curating
Use of Computers in Molecular Biology • Powerful tools to organise the data itself. – Exponential growth. – A new release is made every two months. • Data Analysis. – Retrieval. – Homology Search. – Modelling purposes - Drug Design • Data Integration • Data Visualisation
Paradigmatic Shift: • Getting new sequences is now easy. • Having a new sequence, we can start by analysing it using the computer, or we can start by doing experimental work. • "A month in the lab can often save an hour in the library. " Westheimer. . . or searching the Internet, or doing computerised analyses. • From 'wet lab' to 'soft lab'. • in vivo, in vitro, and in silico
Information is being collected, organized, and made available: • Gen. Bank is the central sequence information database in the United States • Data is shared between Gen. Bank and European Molecular Biology Laboratory (EMBL) and the DNA Database of Japan (DDBJ) • All sequence data submitted to any of these databases is automatically integrated into the others. • Sequence data is also incorporated from the Genome Sequence Data Base (GSDB) and from patent applications.
Similarity Searching in the databanks • "Are there any sequences in the databanks similar to my sequence? " • Directly searching the databanks by comparing sequences uses too much computer time • The Biologist uses timesaving tools: FASTA and BLAST • Relies on statistics and the informed judgement of the Biologist.
Pairwise and Multiple Alignments • Multiple Alignment is the basis for the study of protein families and functional domains. • When pairwise alignment is expanded to multiple sequences, it becomes a computationally huge problem. • To reduce the nearly infinite permutations, a simplified heuristic (approximate) algorithm is used known as progressive pariwise alignment
Structure-function relationships: Sequence patterns that predict function • Challenging areas of computational molecular biology is the prediction of the function of protein molecules from their sequence. • Sequence determines 3 -D structure, structure determines function • Identify conserved regions (domains or motifs) • Domain databases can be used to scan any unknown protein sequence
Searching Literature using Pub. Med at NCBI
Pub. Med • Project by NIH and NLM. • Search Tool for accessing literature citations. • Pub. Med Search system - Med. Line and Pre Medline Database and Molecular Biology Databases indexed under Entrez.
Med. Line • Med. Line - MEDlers On. LINE Database NCBI’s premier bibliographic database. • Covers medicine, nursing, dentistry, veterinary medicine, the health care sciences and pre-clinical sciences. • Has over 3900 current biomedical journals published in the US and other foreign countries.
Med. Line • 9 million records. • Since 1966.
Pre. Med. Line • Introduced in August 1996. • Basic Citation and abstracts before the full records are prepared and added to Medline.
MEDLINE SAMPLE RECORD UI AU TI MH MH AB PT SO 98408838 Tao X, Dafu D Relationship between synonymous codon usage and protein structure. Codon* Protein Folding* Protein Structure, Secondary* Proteins / genetics …… The hypothesis that synonymous codon usage is related to protein three- dimensional structure is examined by … Journal article FEBS Lett 1998 Aug 28 : 434 (1 - 2) : 93 - 6
MEDLINE Indexing • Me. SH Terms to LIMIT Retrieval – human, animal, male, female, – age groups, organism, etc. • Publication Types ( Another way to LIMIT ) – review, clinical trial, letter, journal article, etc.
MEDLINE Subject Headings Advantages of Me. SH Terms • Represent a subject concept & no term synonyms needed • Find relevant articles on a search topic that may not be explicitly mentioned in a title or abstract • Focus search & be specific to eliminate irrelevant records • Increase search efficiency to save time … Get reliable results
Searching MEDLINE Subject Headings • Disadvantages of Me. SH • Thesaurus terms may not cover all concepts, esp. jargon • Not every concept in abstract or article can get thesaurus terms
MEDLINE Searching Search terms are combined with Boolean “OR” and “AND”.
Modifying Retrieval -- NOT ENOUGH Found • Reduce number of concepts to combine • Add synonyms or related terms – Use both free- text words & Me. SH terms – Truncate free- text words as appropriate – Explode subject term, if it has narrower terms • Do NOT use limits ( e. g. , major point, review ) • Consult a professional searcher … Librarian
Modifying Retrieval --TOO Many Found • Use Me. SH terms only … Use no free- text words • Use “Me. SH Power” to Focus Your Search – Try a more specific Me. SH term – Limit Me. SH terms to MAJOR point of article – Use a Subheading with your Me. SH term • Reduce number of synonyms, if free- text searching • Add additional concepts to your search • Use Limits … English language, reviews • Restrict to human, animal, or organism
Internet Tools and Searches
Network Utilities
What is the Internet? • A world wide collection of networks of computers • A network of computer networks • A network based on the TCP/IP protocol
Standalone Computer PC Printer A typical setup at home Speakers
LAN A Small Local Area Network of two computers and one printer in your office
Inter-Departmental Network
Campus Wide Network
Campus Network Wide Area Network National Network Inter. Country Network Global Network The INTERNET
What can you do with Internet? INTERNET APPLICATIONS • Electronic Mail (Email) • Internet Talk/Chat (IRC) • File Transfer (FTP) • Remote Login (Telnet) • Internet News (Usenet) • Info retrieval (Gopher, World Wide Web) • Audio. Video Conferencing (CU-See. Me, Mbone) • Internet Phone
FTP: File Transfer Protocol ftp ncbi. nlm. nih. gov login: anonymous passwd: email address If you want to ftp from a server then use your own login and passwd
Ftp commands continued…. . • • cd - change directory ls - listing pwd - present working directory bin - transfer in binary mode asc - transfer in ascii mode hash - show the transfer. lcd - local change directory
FTP commands continued. . • prompt - multiple file tranfer • mget - multiple file tranfer else you can just use get • mput - put multiple files onto the server put - single file transfer
Telnet • Work on another machine by remote login. • Telnet intron. bic. nus. edu. sg login: passwd: • Must have an account on the machine for doing telnet • Must have internet connection • Space allocated to you on the machine
HTML- an Introduction
What is Hypertext? • Non-Linear Text • Links embedded in the text • Jumps to other locations in the document/db the quick brown fox jumps over the fence Fence. . .
Creating a Web Page • • Terms to Know WWW/Web: World Wide Web HTML: Hyper Text Mark-up Language URL: Uniform Resource Locator • I assume that: – know how to use Netscape or some other Web browser – have access to a Web server (or that you want to produce HTML documents for personal use in localviewing mode)
Creating a Web Page What an HTML Document Is? • Collection of styles • HTML documents are plain-text files • Can be created using any text editor • You can also use word-processing software if you remember to save your document as "text only with line breaks. " • HTML is not case sensitive. • TAGS are used to mark the element of the file for your browser.
Creating a Web Page TAGS Explained • Every HTML document should contain certain standard HTML tags. • Each document consists of head and body tags. • The head contains the title, and the body contains the actual text that is made up of paragraphs, lists, and other elements.
<html> <head> <TITLE>A Simple HTML Example</TITLE> </head> <body> <H 1>HTML is Easy To Learn</H 1> <P>Welcome to the world of HTML. This is the first paragraph. While short it is still a paragraph!</P> <P>And this is the second paragraph. </P> </body> </html> • The required elements are the <html>, <head>, <title>, and <body> tags (and their corresponding end tags). • Note: Because you should include these tags in each file, you might want to create a template file with them.
TAGS Explained • HTML: – This element tells your browser that the file contains HTML-coded information. – The file extension. html also indicates this an HTML document and must be used. • HEAD: – The head element identifies the first part of your HTML-coded document that contains the title.
TITLE The title element contains your document title and identifies its content in a global context. BODY Contains the content of your document. HEADINGS HTML has six levels of headings, numbered 1 through 6. With 1 being the most prominent. Headings are displayed in larger and/or bolder fonts than normal body text. The syntax of the heading element is: <Hy>Text of heading </Hy> where y is a number between 1 and 6 specifying the level of the heading. PARAGRAPHS Carriage returns in HTML files aren't significant. Word wrapping can occur at any point in your source file, and multiple spaces are collapsed into a single space by your browser. The </P> closing tag can be omitted. This is because browsers understand that when they encounter a <P> tag, it implies that there is an end to the previous paragraph.
Using the <P> and </P> as a paragraph container means that you can center a paragraph by including the ALIGN=alignment attribute in your source file. <P ALIGN=CENTER> This is a centered paragraph. [See the formatted version below. ] </P> This is a centered paragraph.
Lists HTML supports unnumbered, and definition lists. You can nest lists too, but use this feature sparingly because too many nested items can get difficult to follow. Unnumbered Lists To make an unnumbered, bulleted list, 1. start with an opening list <UL> (for unnumbered list) tag 2. enter the <LI> (list item) tag followed by the individual item; no closing </LI> tag is needed 3. end the entire list with a closing list </UL> tag Below is a sample three-item list: <UL> <LI> apples <LI> bananas <LI> grapefruit </UL> The output is: • • • apples bananas grapefruit
Numbered Lists A numbered list (also called an ordered list, from which the tag name derives) is identical to an unnumbered list, except it uses <OL> instead of <UL>. The items are tagged using the same <LI> tag. The following HTML code: <OL> <LI> oranges <LI> peaches <LI> grapes </OL> produces this formatted output: 1. oranges 2. peaches 3. grapes
A definition list (coded as <DL>) usually consists of alternating a definition term (coded as <DT>) and a definition (coded as <DD>). Web browsers generally format the definition on a new line. The following is an example of a definition list: <DL> <DT> NCSA <DD> NCSA, the National Center for Supercomputing Applications, is located on the campus of the University of Illinois at Urbana-Champaign. <DT> Cornell Theory Center <DD> CTC is located on the campus of Cornell University in Ithaca, New York. </DL> The output looks like: NCSA, the National Center for Supercomputing Applications, is located on the campus of the University of Illinois at Urbana-Champaign. Cornell Theory Center CTC is located on the campus of Cornell University in Ithaca, New York.
Nested Lists can be nested. You can also have a number of paragraphs, each containing a nested list, in a single list item. Here is a sample nested list: <UL> <LI> A few New England states: <UL> <LI> Vermont <LI> New Hampshire <LI> Maine </UL> <LI> Two Midwestern states: <UL> <LI> Michigan <LI> Indiana </UL> The nested list is displayed as • A few New England states: – – – • Vermont New Hampshire Maine Two Midwestern states: – – Michigan Indiana
Forced Line Breaks/Postal Addresses The <BR> tag forces a line break with no extra (white) space between lines. Using <P> elements for short lines of text such as postal addresses results in unwanted additional white space. For example, with <BR>: National Center for Supercomputing Applications<BR> 605 East Springfield Avenue<BR> Champaign, Illinois 61820 -5518<BR> The output is: National Center for Supercomputing Applications 605 East Springfield Avenue Champaign, Illinois 61820 -5518 Horizontal Rules The <HR> tag produces a horizontal line the width of the browser window. A horizontal rule is useful to separate sections of your document. For example, many people add a rule at the end of their text and before the <address> information. You can vary a rule's size (thickness) and width (the percentage of the window covered by the rule). Experiment with the settings until you are satisfied with the presentation. For example: <HR SIZE=4 WIDTH="50%"> displays as:
• Physical Styles <B> bold text <I> italic text <TT> typewriter text, e. g. fixed-width font.
• Linking Power - link text and/or image. Browser highlights the identified text or image with color and/or underlines to indicate that it is a hypertext link. HTML's single hypertext-related tag is <A>, which stands for anchor. To include an anchor in your document: 1. start the anchor with <A (include a space after the A) 2. specify the document you're linking to by entering the parameter HREF="filename" followed by a closing right angle bracket (>) 3. enter the text that will serve as the hypertext link in the current document 4. enter the ending anchor tag: </A> (no space is needed before the end anchor tag) Here is a sample hypertext reference in a file called US. html: <A HREF="http: //www. bic. nus. edu. sg">BIC Home. Page</A> This entry makes the words BIC Home. Page the hyperlink to the document http: //www. bic. nus. edu. sg/index. html,
You can make it easy for a reader to send electronic mail to a specific person or mail alias by including the mailto attribute in a hyperlink. The format is: <A HREF="mailto: emailinfo@host">Name</a> For example, enter: <A HREF="mailto: meena@bic. nus. edu. sg"> Meena KS</a> to create a mail window that is already configured to open a mail window for the Meena KS. (You, of course, will enter another mail address!)
To include an inline image, enter: <IMG SRC=Image. Name> where Image. Name is the URL of the image file. The syntax for <IMG SRC> URLs is identical to that used in an anchor HREF. If the image file is a GIF file, then the filename part of Image. Name must end with. gif. Filenames of X Bitmap images must end with. xbm; JPEG image files must end with. jpg or. jpeg; and Portable Network Graphic files must end with. png. Image Size Attributes <IMG SRC=Self. Portrait. gif HEIGHT=100 WIDTH=65>
Demo: http: //www. ncbi. nlm. nih. gov
e6592962d1f9d8c695c3798ab2fae95d.ppt