b353c6ebce3f43a07eb011c691f833e0.ppt
- Количество слайдов: 38
Lecture 10: Back to the Basics: Python and Application in Bioinformatics Y. Z. Chen Department of Pharmacy National University of Singapore Tel: 65 -6616 -6877; Email: phacyz@nus. edu. sg ; Web: http: //bidd. nus. edu. sg Content • What is python? • Python basics • Application in bioinformatics
Why Programming? Programming skills needed for tasks such as: • Write a program to do the same PUBMED search every week and list the new hits for molecular interactions, network regulations. • Do a BLAST search against sequences which are on your list of proteins with known kinetic data • Merge results from different searches • Import data into Excel for plotting
What Programming Tools? • Popularly used programming tools: • Programming languages - Perl, Python, C, C++, Java, Visual Basic, PHP, Fortran • Software libraries - Bio. Perl, Biopython, and Bio. Java • Databases - My. SQL, Postgres, Oracle
Statistics of Software Usage Nature Biotech 25, 390 (2007)
Why Python? • Suitable for relatively small automated tasks such as search-andreplace over a large number of text files, rename and rearrange files, write a small database, specialized GUI application, and development of simple games • Faster and easier alternatives to C/C++/Java • Simpler to use, available on Windows, Mac. OS X, and Unix operating systems • A real programming language, more structure and support than shell scripts or batch files can offer, more error checking than C, high-level data types built in, applicable to a much larger problem domain than Awk or even Perl yet in many cases equally easy to use • An interpreted language, which can save you considerable time during program development because no compilation and linking is necessary.
Why Python? • Allows you to split program into modules used in other Python programs, comes with a large collection of standard modules such as file I/O, system calls, sockets, interfaces to graphical user interface toolkits. • Enables programs to be written compactly and readably at typically much shorter length than equivalent C, C++, Java programs, for several reasons: • The high-level data types allow you to express complex operations in a single statement; • statement grouping is done by indentation instead of beginning and ending brackets; • no variable or argument declarations are necessary. • Extensible: if you know how to program in C it is easy to add a new built-in function or module to the interpreter, you can link the Python interpreter into an application written in C and use it as an extension or command language for that application.
What is Python? Python is a Programming Language • Started by Guido van Rossum in 1990 as a way to write software for the Amoeba operating system. Influenced by ABC, which was designed to be easy to learn. It is also very useful for large programs written by expert programmers. • The word "Python" comes from the comedy troupe "Monty Python. " Words and jokes from the skits and movies appear often in Python software, including "spam, " "idle, " and "grail"
What is Python? Python Properties • • Interpreted Language Interactive mode Imperative and "Object-Oriented" Cross-platform Doesn't try to guess what you mean Great for team projects Popular for web applications, testing, and XML Extremely popular for chemical informatics (but not so much in bioinformatics)
What is Python? Interactive Mode • Python has an interactive mode. You can type Python code and see the results immediately. To start Python, open a unix shell and type "python". > python Python 2. 3. 3 (#1, Jan 29 2004, 22: 55: 13) [GCC 3. 3. 3 [Free. BSD] 20031106] on freebsd 5 Type "help", "copyright", "credits" or "license" for more information. >>> • At the >>> prompt you can enter Python code.
Python Resources http: //python. org/
Python Resources http: //www. pasteur. fr/recherche/unites/sis/formation/python/index. html
Python Resources http: //www. pasteur. fr/recherche/unites/sis/formation/python/index. html
Example: Using Python as a calculator >>> 2+3 5 >>> 4+6*8 52 >>> abs(-4) 4 >>> help(abs) Help on built-in function abs: abs(. . . ) abs(number) -> number Return the absolute value of the argument. >>> 89**34 1902217732808760980190430983601716818363305103120555045416541165041 L >>> print 89**34 1902217732808760980190430983601716818363305103120555045416541165041 >>> "What. . . is the air-speed velocity of an unladen swallow? " 'What. . . is the air-speed velocity of an unladen swallow? ' >>> print "What do you mean? An African or European swallow? " What do you mean? An African or European swallow?
What is Python? Example: Importing a module >>> import math >>> help(math) Help on module math: NAME math FILE /usr/local/lib/python 2. 3/lib-dynload/math. so DESCRIPTION This module is always available. It provides access to the mathematical functions defined by the C standard. >>> math. pi 3. 1415926535897931 >>> math. sin(math. pi/2. 0) 1. 0 >>>
What is Python? Example: Print the Time of Day >>> import datetime >>> now = datetime. now() >>> now datetime(2008, 2, 2, 19, 23, 28, 809434) >>> print now 2008 -02 -02 19: 23: 28. 809434 >>> print "Now is", now. strftime("%d-%m-%Y"), "at", now. strftime("%H: %M") Now is 02 -02 -2008 at 19: 23 >>> • The notation name 1. name 2 is called an attribute lookup. In this case, name 2 is an attribute of name 1 and has some value. >>> now. day 2 >>> now. year 2008 >>> now. hour 19
Simple Python script Code: # file: simple_code. py import math import datetime print "log(1 e 23) =", math. log(1 e 23) print "2*sin(3. 1414) = ", 2*math. sin(3. 1414) now = datetime. now() print "Now is", now. strftime("%d-%m-%Y"), "at", now. strftime("%H: %M") print "or, more precisely, %s" % now Output: > python simple_code. py log(1 e 23) = 52. 9594571389 2*sin(3. 1414) = 0. 000385307177203 Now is 02 -02 -2008 at 19: 55 or, more precisely, 2008 -02 -02 19: 55: 43. 046953 >
Python Script Creating Python Script • A Python program is just a text file. You can use any text (programmer's) editor. There are several on the Linux machines, including vi, XEmacs, Kate, xvim, and nedit. You can also use one of the free IDEs like idle, Py. Shell, or (under Microsoft Windows) Pythonwin. Running Python Script • Option 1: Run the python program from the command line, giving it the name of the script file to run. > python now. py Now is 02 -02 -2004 at 19: 55 or, more precisely, 2004 -02 -02 19: 55: 43. 046953 >
Python Script Running Python Script • Option 2: Put the magic comment #!/usr/bin/env python as the very first line in the program. Code: #!/usr/bin/env python # now. py import datetime now = datetime. now() print "Now is", now. strftime("%d-%m-%Y"), "at", now. strftime("%H: %M") print "or, more precisely, %s" % now Make the script executable with chmod +x now. py > chmod +x now. py Then run the program as if it's any other Unix program > now. py Now is 02 -02 -2004 at 19: 55
Python Statements Statement examples: sum = 2 + 2 # this is a statement name = raw_input("What is your name? ") # these are two statements print "Hello, ", name print "Did you know that your name has", len(name), "letters? " # This is one statement spread across 2 lines # Another way to extend a statement across several lines print "Here is your name repeated 7 times: ", ( name * 7 )
Python Statements Blocks, If and for statements Eco. RI = "GAATTC" sequence = raw_input("Enter a DNA sequence: ") if Eco. RI in sequence: print "Sequence contains an Eco. RI site" # This is a one-line block import sys sequence 2 = raw_input("Enter another sequence: ") if len(sequence 2) < 100: print "Sequence is too small. Throw it back. " # a two-line block sys. exit(0) sequences = (sequence, sequence 2) for seq in sequences: print "sequence length =", len(seq) # a block. . . for c in "ATCG": print "#%s = %d" % (c, seq. count("C")) #. . . with a block inside it
Python Objects and Literals String Literals # single quotes 'Who said "to be or not to be"? ' # double quotes "DNA goes from 5' to 3'. " # escaped quotes ""That's not fair!" yelled my sister. " # creates: "That's not fair!" yelled my sister # triple quoted strings, with single quotes '''This one string can go over several lines''' # "raw" strings, mostly used for regular expressions r""That's not fair!" yelled my sister. " # creates: "That's not fair!" yelled my sister # You can even have raw triple double quoted strings! r"""So there!"“”
Python Objects and Literals Numeric Literals 123 # an integer 1. 23 # a floating point number -1. 23 # a negative floating point number 1. 23 E 45; # scientific notation 0 x 7 b; # hexadecimal notation (decimal 123) 0173; # octal notation (decimal 123) 12+3*j; # complex number 12 + 3 i (Note that Python uses "j"!) 2147483648 L # a long integer
Python Objects and Literals List literal >>> data = [1, 4, 9, 16] >>> data[0] 1 >>> data[1] 4 >>> data[2] = 7 >>> data [1, 4, 7, 16] >>> data[1: 3] [4, 9] >>>
Python Objects and Literals Tuple literal >>> data = (1, 4, 9, 16) >>> data[1] 4 >>> data[2] = 7 Traceback (most recent call last): File "", line 1, in ? Type. Error: object doesn't support item assignment >>> Dictionary literal >>> d = {"A": "ALA", "C": "CYS", "D": "ASP"} >>> print d["A"] ALA >>>
Python Operators Some operation using numbers >>> (1+2)**2 9 >>> (2+3*4)/2 7 >>> 7%3 # % is the modulo operator 1 >>> 7 == 7 True >>>
Python Operators Some operation using strings >>> "Andrew" + "Dalke" 'Andrew Dalke‘ >>> "*" * 10 '*****' >>> "My name is %s. What's your name? " % "Andrew" 'My name is Andrew. What's your name‘ >>> "My first name is %s and family name is %s" % ("Andrew", "Dalke") 'My first name is Andrew and family name is Dalke‘ >>> "My first name is %(first)s. Is yours also %(first)s? " % . . . {"first": "Andrew", "family": "Dalke"} 'My first name is Andrew. Is yours also Andrew? ‘ >>> "Andrew" == "Dalke" False >>>
Python Functions http: //python. org/doc/current/lib/built-in-funcs. html
Python Functions String Methods >>> seq = "AATGCCG" >>> seq. lower() 'aatgccg' >>> seq. count("A") 2 >>> seq. find("GC") 3 >>> seq. find("gc") -1 >>> seq. replace("C", "U") 'AATGUUG' >>> import string >>> seq. translate(string. maketrans("ATCG", "TAGC")) 'TTACGGC' >>> # Make the reverse complement >>> seq. translate(string. maketrans("ATCG", "TAGC"))[: : -1] 'CGGCATT' >>>
Python Functions Special Methods Some methods are used so often that they have special syntax. >>> s = "AATGCCGTTTAT" >>> s[0] # index 'A' >>> s[1: 4] # slice from beginning to end 'ATG' >>> s[: 4] # default beginning is position 0 'AATG' >>> s[-1] # index from the end 'T' >>> s[-3: ] # default end includes the last character 'TAT' >>> s[3: -3] 'GCCGTT' >>> s[: : 2] # the optional third parameter is the stride 'ATCGTA' >>> s[: : -1] # returns the string, reversed 'TATTTGCCGTAA' >>>
Python Processing Command Line Arguments • When a Python script is run, its command-line arguments (if any) are stored in the list sys. argv. Code: #!/usr/bin/env python # file: echo. py import sys print sys. argv Output: > chmod +x echo. py > echo. py tuna ['echo. py', 'tuna'] > echo. py tuna fish ['echo. py', 'tuna', 'fish'] > echo. py "tuna fish" ['echo. py', 'tuna fish'] > echo. py ['echo. py'] >
Python Processing Command Line Arguments Computing the Hypotenuse of a Right Triangle Code: #!/usr/bin/env python # file: hypotenuse. py import sys, math if len(sys. argv) != 3: # the program name and the two arguments # stop the program and print an error message sys. exit("Must provide two positive numbers") # Convert the two arguments from strings into numbers x = float(sys. argv[1]) y = float(sys. argv[2]) print "Hypotenuse =", math. sqrt(x**2+y**2) Output: > hypotenuse. py 5 12 Hypotenuse = 13. 0 >
Python I/O (Input / Output) Input • Text input comes from sys. stdin. It has a method called readline which reads a line of input. >>> import sys >>> s = sys. stdin. readline() This is a line of text. The line ends when I press 'Enter'. >>> s "This is a line of text. The line ends when I press 'Enter'. n" >>> • You can also use the raw_input function to get a string from sys. stdin. This function takes an optional argument which is used as the prompt. >>> name = raw_input("What is your name? ") What is your name? Andrew >>> print name, "is a nice name" Andrew is a nice name >>>
Python I/O (Input / Output) Output • Most Python text output goes to the sys. stdout file object. You've been using the print statement, which uses sys. stdout under the covers. Output file handles have a write function which writes a string to the file with no extra interpretation. >>> a, b, c = 1, 4, 9 >>> print "The first three squares are", a, b, "and", c The first three squares are 1 4 and 9 >>> print "The first three squares are", a, ", ", b, "and", c, ". " The first three squares are 1 , 4 and 9. >>> print "The first three squares are %s, %s and %s. " % (a, b, c) The first three squares are 1, 4 and 9. >>> import sys >>> sys. stdout. write("The first three squares are %s, %s and %s. n" %. . . (a, b, c)) The first three squares are 1, 4 and 9 >>>
Python Applications in Bioinformatics BLAST output parsing • BLAST is the most widely used bioinformatics tool to search large sequence databases. The original BLAST authors expected the output to be read by people only. But many use BLAST as part of a larger algorithm and want to automate the BLAST step by using parsers for BLAST output flavors (BLASTN, BLASTP, TBLASTX, WU-BLAST, and so on). BLAST parsers have been developed and put into library in Bioperl, Biopython, Bio. Java, etc. , which all have BLAST output parsers. First few lines of the BLASTP output
Python Applications in Bioinformatics BLAST output parsing • Getting program version information • Program reporting the version information of a BLAST file
Python Applications in Bioinformatics BLAST output parsing • Getting no of sequences in the database and no of letters
Python Applications in Bioinformatics BLAST output parsing • Reading description lines
Python Applications in Bioinformatics BLAST output parsing • Reading description lines
b353c6ebce3f43a07eb011c691f833e0.ppt