Python programming for Life Science researchers - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Python programming for Life Science researchers

Description:

Python is a dynamic object-oriented programming language. ... Python's designers reject exuberant syntax, such as in Perl, in favor of a ... – PowerPoint PPT presentation

Number of Views:336
Avg rating:3.0/5.0
Slides: 51
Provided by: genesU
Category:

less

Transcript and Presenter's Notes

Title: Python programming for Life Science researchers


1
Python programming for Life Science researchers
  • Sebastián Bassi
  • Universidad Nacional de Quilmes, Argentina
  • sbassi_at_gmail.com
  • Updates http//genes.unq.edu.ar

2
What is Python?
  • Python is a dynamic object-oriented programming
    language. It offers strong support for
    integration with other languages and tools, comes
    with extensive standard libraries, and can be
    learned in a few days.
  • Python is free, the source code (in C) is
    available and programs made in python can be
    distributed free of charge or you can charge a
    fee.
  • Python website is www.python.org

Python Overview
3
Why Python?
  • Some python characteristics
  • Easy to read (pseudocode that works)
  • Easy/Fast to code
  • Batteries included
  • Multiplatform (Windows, Linux, OSX, PDA)
  • Dynamically typed
  • Strongly typed
  • Friendly community

Python Overview What is different from other
languages
4
Python philosophy
  • Python's designers reject exuberant syntax, such
    as in Perl, in favor of a sparser, less cluttered
    one. Python's developers expressly promote a
    particular "culture" or ideology based on what
    they want the language to be, favoring language
    forms they see as "beautiful", "explicit" and
    "simple".
  • Mandatory indentation, English keywords instead
    of punctuation and few syntactic constructions
    are derived from this philosophy.

Source Wikipedia article on Python.
5
What can be done with Python
BitTornado A BitTorrent client (p2p, a file
sharing application).
There are several toolkits to make Python GUIs.
They are all out of the scope of this tutorial.
6
Dynamic web page generation (via CGI like Perl)
The Python CGI module allows you to show dynamic
generated content in your website.
7
Bioinformatics apps (like GUI-BLAST)
A multi-platform GUI is used for running BLAST
queries.
8
How I use Python
  • Retrieve data from different sources MySQL,
    Access, Webpages, CSV files, Excel files, XML
    files.
  • Write data in different formats XML, CSV, PDF,
    plain text.
  • Draw graphics in SVG and GIF/PNG.
  • Make dynamic web pages, some of them even query
    different sources (like other web pages).
  • Make GUI to command line programs.
  • Parse BLAST files.
  • Run multiple BLAST.
  • Convert and manipulate biological data.

There are many different uses of Python.
9
Python interactive interpreter
Python interactive interpreter screenshot
10
Python as a calculator
Python can be used as a calculator
11
Numeric Data types Int, Long, Float
  • Int From -2.1031-1 to 2.1031-1
  • Long Integer gt than 2.1031 (no longer used, see
    footnote)
  • Float Floating point numbers.

Long are no longer used, int data type can handle
large integer according to system capacity.
12
Text Data Types String
Strings can be concatenated like this 's ...
s' ('tga', 'atg')
13
Note Escape characters
Non-printable and special characters should be
escaped with a special character \.
Enter \n Tab \t Slash (\) \\ Quotes \
You can use a escape character to insert a double
quote inside a text, like this print Here is a
\ (double quote) This will be printed as Here
is a (double quote)
14
Data types Lists
  • An array of data. Like C vectors, VB and Perl
    arrays.

List, definition, creating and invoking. Using
one index, you invoke only one element.
15
List Slice Notation
Slice notation is used for lists and strings.
Using 2 indexes, you are invoking a sublist and
not a single element
16
List Operations, insert data
  • Append Add an element after the last element.
  • Insert Add after any arbitrary position.
  • Extend Add a list after the last position

Append, Insert, Extend as way to insert data in a
list
17
List Delete elements
  • LIST.pop(n) will retrieve the nth element of LIST
    (defaultlast)
  • LIST.remove(N) will remove the first N in LIST

Delete with pop and remove. Pop will return the
value, and pop() will do it with the last element
18
Data type Tuples
  • Defined like a list, with parentheses instead of
    square brackets.
  • Indexes works as lists. Can use slicing.
  • Tuples are immutable. Can't add or remove
    elements.
  • Tuples are faster than list. Tuples are like
    write-protected list.

When you need to iterate over a list of constant
values, use a tuple instead of a list.
19
Dictionaries
  • Datatype used to store one-to-one relationships
    between keys and values (like hash in Perl or the
    Scripting. Dictionary object in Visual Basic).

threecode dictionary is part of Biopython.
Elements in a dictionary are unordered.
20
Dictionaries Some methods
  • If key is not found, Python rises an error
  • gtgtgt threecode"kkk"
  • Traceback (most recent call last)
  • File "ltpyshell299gt", line 1, in -toplevel-
  • threecode"kkk"
  • KeyError 'kkk'
  • Before looking for a value, check the key
  • gtgtgt threecode.has_key("kkk")
  • False

del threecodeA deletes that item from
dictionary. threecode.clear() deletes all items.
21
Program flow If, elif, else
Elif works as switch in C. Note indentation. In
Python is mandatory to delimit code blocks!!
22
If sample
See footnote on slide 12 for string concatenation
and slide 15 for list slicing. Elif works as C
switch
23
Program flow For
This is how you iterate over a sequence (list)
Never modify the sequence you are iterating over
inside the loop in a for statement.
24
For sample
for x in range(5) works as BASIC for x0 to 4
To cicle inside numbers, create a list with
numbers with range function. See indentation.
25
While Do while is true
while True will generate an infinite loop. Can
be escaped with break.
We will use while True and break on BLAST
parsers
26
Modularize your code Functions
  • Variables declared inside a function, lives only
    inside the function. Only argument in return is
    returned to the program.
  • If the function just do something instead of
    returning a value use return None (this is not
    mandatory, but improves legibility of the code)
  • Usage MyInterproHandle get_interpro_entry(IPR0
    04560)

To return more than one value, return a list with
all the variables you need.
27
Modules
A chunk of code that can be used from a program
or in interactive mode. Functions, classes,
constants and dictionaries can be called and used
from a program. A module must be invoked before
used.
Modules are searched in several path, like your
home directory. See them all with sys.path.
28
Reading text files
fileobjectopen(filename,r) for line in
fileobject print line
readlines() return a list of string from all the
file
Files can't be edited while opened, should wait
until closed to edit it, even with an external
program.
29
Write text files
  • There are two modes for writing files
  • w Write with overwrite if a file exists
  • a Write at the end of the file (append). Useful
    for log files.

Open can take a third argument, which defines how
file is buffered before writing.
30
Data Manipulation
  • The problem A text file with data on it should
    be parsed, that is, read and interpreted by the
    program, and then display or store only selected
    information.
  • Python tools
  • Build-in open file function.
  • Control flow structures.
  • String manipulation methods.

This is a generic overview of the problem and
tools.
31
Sample file BLAST Hit table
  • inseq2 gi26249933refNP_755973.1 100.00 29 0 0
    1 29 837 865 1e-08 60.8
  • inseq2 gi1789736gbAAC76363.1 100.00 29 0 0 1 2
    9 834 862 1e-08 60.8
  • inseq2 gi3483131gbAAC33265.1 100.00 29 0 0 1 2
    9 480 508 1e-08 60.8
  • inseq2 gi29542596gbAAO91530.1 46.43 28 15 0 2
    29 515 542 4.2 32.3
  • inseq2 gi67762813refZP_00501511.1 48.28 29 15
    0 1 29 278 306 7.2 31.6
  • inseq2 gi67737420refZP_00488193.1 43.12 27 15
    0 1 29 278 306 7.2 31.6
  • inseq2 gi67714721refZP_00484082.1 47.88 42 15
    0 1 29 278 306 7.2 31.6
  • inseq2 gi69988727refZP_00641885.1 41.3159 15 0
    1 29 221 249 7.2 31.6
  • 2000 more lines follows (removed to enter into
    this slide)

Your mission (should you choose to accept it)
Get all GI from this file and retrieve URL to get
full Genbank record only if identity is
greater than 45.
This URL will be handy for this kind of task
ncbi.nlm.nih.gov/entrez/query/static/linking.html
32
Python script of data manipulation
To send the output to a text file just redirect
it in the command line with gt. Like program.py
gt my_text
33
XML Basic Overview
  • Language to describe data (with nothing about
    data presentation).
  • Based on text format (binary XML is out of the
    scope of this tutorial).
  • XML are human-legible (kind of)
  • Easy to write programs to process XML documents
  • Header with parsing information
  • lt?xml version1.0?gt
  • Body
  • lttagname attribute_nameattribute_valuegta
    textlt/tagnamegt
  • ltline type'demo'gtA simple linelt/linegt
  • Empty element ltimg srclogo.png /gt

Pay attention XML is everywhere!. Official
webpage is www.w3.org/XML
34
XML Some real world samples
A RSS feed. Is XML based.
RSS is a popular way to syndicate news. Atom is
another protocol, also based on XML.
35
XML Some real world samples
XML BLAST output.
BLAST can be instructed to output as XML instead
of text or HTML
36
XML Sample with attributes
All elements in this sample contains attributes.
SVG contains width and height. Text contains x, y
and style and Path has d and style.
Plasmids in SVG at bioinformatics.org/savvy/.
More bioXML at xml.com/pub/rg/Bioinformatics
37
XML Parser with elementtree
Elementtree is located at effbot.org/zone/element.
htm. From version 2.5, it will be included in the
standard library.
38
XML code output
39
What is Biopython?
  • It is a distributed collaborative effort to
    develop Python libraries and applications which
    addresses the needs of current and future work in
    bioinformatics.
  • It provides
  • Tools for working with sequences (aa and nt).
  • Parsers of all popular bio file formats (fasta,
    gb, pdb, BLAST output).
  • Data retrieve from biological databases.
  • Wrapper to bio-programs (BLAST, ClustalW, EMBOSS,
    Primer3, and more).
  • Biological functions like LCC, restriction
    enzymes cutting, and more.
  • Tables and constants.

With biopython you can program repetitive task
concatenating several programs.
40
Biopython sample. BLAST output parsing for vector
removing from DNA sequences
BLAST can be instructed to output as table with
Hit Table enabled on Alignment view .
41
This first half parse the BLAST output, w/o
biopython.
42
Using fasta parser to read sequences and
FastaWriter to write the modified sequence.
43
With HTML is easy to make GUIs to command line
programs or Biopython functions. Just use any
HTML or text editor. This form asks for the same
parameters that Tm function uses.
This is a GUI (Graphical User Interface) for
Biopython melting point function.
44
Form code
Look for action path and variable names.
45
Generate Tm in HTML from multiple sequences using
Python
The Tm function is inline to avoid dependency
problem (biopython is not included in standard
hosting packages).
46
In formu is stored all form variables. Doc is
an object used for storing the HTML info.
47
CGI output generated from command line. The CGI
script could work using CLI (w/o webserver)
48
Result of CGI code after submit button is pressed
in HTML.
There is a FAQ for Python CGI http//starship.pyt
hon.net/crew/davem/cgifaq/faqw.cgi
49
Source code of generated webpage
50
Thats all for today. But there is a lot more in
Python!
Resources The Quick Python Book, Dary Harms and
Kenneth McDonald, Manning, 2000 Professional XML,
Birdbeck et al., 2nd Ed., Word Press, 2001 Python
Tutorial, Guido van Rossum, March 2006
(http//docs.python.org/tut/) Dive into Python
(diveintopython.com) Biopython tutorial and
cookbook, Jeff Chang, Brad Chapman, Iddo
Friedberg, 2001 (http//bioweb.pasteur.fr/docs/doc
-gensoft/biopython/Doc/Tutorial.pdf) Python Speed
Performance Tips (http//wiki.python.org/moin/Py
thonSpeed/PerformanceTips) Python course in
Bioinformatics, Katja Schuerer, 2004
(http//www.pasteur.fr/recherche/unites/sis/format
ion/python/) Beginners Guide to Python, 2006
(http//wiki.python.org/moin/BeginnersGuide) Softw
are development skills for scientists and
engineers, Greg Wilson (http//osl.iu.edu/lums/sw
c/)
There is also an IRC channel at irc.freenode.org
(python)
Write a Comment
User Comments (0)
About PowerShow.com