Bioinformatics - PowerPoint PPT Presentation

About This Presentation
Title:

Bioinformatics

Description:

This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation – PowerPoint PPT presentation

Number of Views:551
Avg rating:3.0/5.0
Slides: 56
Provided by: mgnetOrg
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics


1
Bioinformatics
  • This presentation will probably involve audience
    discussion, which will create action items. Use
    PowerPoint to keep track of these action items
    during your presentation
  • In Slide Show, click on the right mouse button
  • Select Meeting Minder
  • Select the Action Items tab
  • Type in action items as they come up
  • Click OK to dismiss this box
  • This will automatically create an Action Item
    slide at the end of your presentation with your
    points entered.
  • Joshua Gilkerson
  • Albert Kalim
  • Ka-him Leung
  • David Owen

2
What is Bioinformatics?
  • Bioinformatics The collection, classification,
    storage, and analysis of biochemical and
    biological information using computers especially
    as applied in molecular genetics and genomics.
    (Dictionary.com)
  • Molecular genetics The branch of genetics that
    deals with the expression of genes by studying
    the DNA sequences of chromosomes.
    (Dictionary.com)

3
What is Bioinformatics? (cont.)
  • Another definition of molecular genetics The
    branch of genetics that deals with hereditary
    transmission and variation on the molecular
    level. (Dictionary.com)
  • Genomics A branch of biotechnology concerned
    with applying the techniques of genetics and
    molecular biology to the genetic mapping and DNA
    sequencing of sets of genes or the complete
    genomes of selected organisms using high-speed
    methods, with organizing the results in
    databases, and with applications of the data (as
    in medicine or biology). (Dictionary.com)

4
How old is the discipline?
  • The answer to this one depends on which source
    you choose to read.
  • From T K Attwood and D J Parry-Smith's
    "Introduction to Bioinformatics", Prentice-Hall
    1999 Longman Higher Education ISBN 0582327881
    "The term bioinformatics is used to encompass
    almost all computer applications in biological
    sciences, but was originally coined in the
    mid-1980s for the analysis of biological sequence
    data."

5
How old is the discipline? (cont.)
  • From Mark S. Boguski's article in the "Trends
    Guide to Bioinformatics" Elsevier, Trends
    Supplement 1998 p1
  • "The term "bioinformatics" is a relatively
    recent invention, not appearing in the literature
    until 1991 and then only in the context of the
    emergence of electronic publishing...

6
Bioinformatic Research up to 2005
  • Metabolic networks
  • Regulatory networks
  • Trait mapping
  • Gene function analysis
  • Scientific literature
  • DNA sequence
  • Gene expression
  • Protein expression
  • Protein Structure
  • Genome mapping

7
What remains to be done?
  • Comparative Genomics
  • Description of mRNAs, proteins (identity and
    structure)
  • Functional analyses
  • Detailed understanding of development,
    regulation, variation

8
The Human Genetic Code
9
Bioinformatics Activity Where Is Bioinformatics
Done?
  • The biggest and best source of bioinformatics
    links is the Genome Web at the Rosalind Franklin
    Centre for Genomics Research at the Genome Campus
    near Cambridge, United Kingdom.
  • Others Research Centers, Sequencing Centers, and
    "Virtual" Centers (for example consortia and
    communities).

10
Research Centers
  • Centro Nacional de Biotecnologia (CNB), Madrid,
    Spain.
  • Computational Biology and Informatics Laboratory
    at the University of Pennysylvania, Philadelphia,
    USA
  • CIRB Centro Interdipartimentale di Ricerche
    Biotecnologiche, Bologna, Italy
  • Cold Spring Harbor Labs, New York, USA
  • European Molecular Biology Laboratory (EMBL),
    Heidelberg, Germany.
  • Généthon, France.
  • GIRI Genetic Information Research Institute,
    California, USA.
  • MRC Human Genetics Unit, Edinburgh, United
    Kingdom.
  • MRC Rosalind Franklin Centre for Genomics
    Research(RFCGR), Hinxton, United Kingdom.

11
Sequencing Centers
  • The Department of Genome Analysis at the
    Institute of Molecular Biotechnology, Jena,
    Germany.
  • The Australian Genome Research Facility,
    Austalia.
  • Baylor College of Medicine, USA.
  • Michael Smith Genome Sciences Centre, Canada.

12
Virtual Centers
  • International Center for Cooperation in
    Bioinformatics network (ICCBnet)
    http//www.iccbnet.org/
  • Belgian EMBnet node http//www.be.embnet.org/

13
Online Resources What Bioinformatics Websites
Are There?
  • Blogs
  • Information
  • Directories
  • Portals
  • Societies
  • Tools
  • Tutorials

14
Blogs
  • Bioinformatics.Org is a bioinformatics blog.
  • The Bio-Web (http//cellbiol.com/) links to
    resources online for molecular and cell
    biologists and covers current news in various
    biological/computational fields.
  • Genehack (http//genehack.org/)
  • is one of the first bioinformatics blogs.

15
Information
  • The Australian National Genomic Information
    Service (ANGIS) is operated by the Australian
    Genomic Information Centre (http//www.angis.org.a
    u/new/about/generalinfo.htmlAGIC, currently at
    the University of Sydney) to offer software,
    databases, documentation, training and support
    for biologists
  • "The University of Maryland AgNIC gateway
    (http//agnic.umd.edu/) is a guide to quality
    agricultural biotechnology information on the
    Internet."

16
Directories
  • Christy Hightower, Engineering Librarian at the
    Science and Engineering Library, University of
    California Santa Cruz has already done this
    better than me.
  • Visit her excellent article (http//www.istl.org/
    istl/02-winter/internet.html) about
    bioinformatics Net resources in Issues in Science
    and Technology Librarianship.

17
Societies
  • Humberto Ortiz Zuazaga kindly introduced The
    International Society for Computational Biology
    (http//www.iscb.org/) which he points out "has
    links to programs of study and online courses in
    computational biology and to job postings".

18
Collection of Tools
  • Bioinformatics.Org for a collection of
    bioinformatics toolbox.
  • The Rosalind Franklin Center's "GenomeWeb
    (http//www.rfcgr.mrc.ac.uk/GenomeWeb/).
  • Of historical interest only now, is the legendary
    " Pedro's Molecular Biology Search and Analysis
    Tools (http//www.public.iastate.edu/pedro/resea
    rch_tools.html) that provides a collection of WWW
    Links to Informationand Services Useful to
    Molecular Biologists.

19
Portals
  • Bioinformatics.Org is an international
    organization which promotes freedom and openness
    in the field of bioinformatics and is the root
    domain of a damned fine Website .
  • CCP11 (Collaborative Computational Project 11,
    http//www.rfcgr.mrc.ac.uk/CCP11/index.jsp) is
    another product of the UK's Genome Campus. CCP11
    is funded by the BBSRC and is hosted at the MRC
    Rosalind Franklin Center for Genomics Research
    RFCGR located on the Wellcome Trust Genome
    Campus, Cambridge.
  • Jennifer Steinbachs runs compbiology.org which
    is a general computational biology site as well
    as being a portal to her own work.
  • BioPlanet (http//www.bioplanet.com/index.php) is
    well worth visiting. It describes itself as "a
    not-for-profit site, funded with our resources,
    for its users' benefit."
  • ColorBasePair (http//www.colorbasepair.com/) is
    a densely packed portal with lots of
    bioinformatics links.

20
Genome Project
  • Ka-Him Leung

21
Genomics
  • Genome
  • complete set of genetic instructions for making
    an organism
  • Genomics
  • attempts to analyze or compare the entire genetic
    complement of a species

22
Genomic Issues
  • Genomic DNA is a linear sequence of 4 nucleotides
    (A, C, G, T)
  • DNA forms the double helix by pairing with its
    reverse complement (A-T, G-C)
  • Genomic DNA contains many genes, each of which is
    formed from one or more exons (stretches of
    genomic DNA), separated by introns
  • A gene is copied into complementary RNA in a
    process called transcription (U substitutes T)

23
Genomic Issues (cont.)
  • DNA sequencing, the process of determining the
    exact order of the 3 billion chemical building
    blocks (called bases and abbreviated A, T, C, and
    G) that make up the DNA of the 24 different human
    chromosomes
  • In the human genome, about 3 billion bases are
    arranged along the chromosomes in a particular
    order for each unique individual.
  • One million bases (called a megabase and
    abbreviated Mb) of DNA sequence data is roughly
    equivalent to 1 megabyte of computer data storage
    space. Since the human genome is 3 billion base
    pairs long, 3 gigabytes of computer data storage
    space are needed to store the entire genome.

24
Different Genomics
  • Comparative Genomics the management and analysis
    of the millions of data points that result from
    Genomics
  • Functional Genomics ways of identifying gene
    functions and associations
  • Structural Genomic emphasizes high-throughput,
    whole-genome analysis.

25
History of Genome
  • 1980
  • First complete genome sequence for an organism is
    published
  • FX174 - 5,386 base pairs coding nine proteins.
    (5Kb)
  • 1995
  • First bacterial genome(Haemophilus influenzea)
    sequenced (1.8 Mb)
  • 1996
  • Saccharomyces cerevisiae genome sequenced
    (baker's yeast, 12.1 Mb)
  • 1997
  • E. coli genome sequenced (4.7 Mbp)
  • 1998
  • Sequence of first human chromosome completed
  • 2000
  • A. Thaliana genome (flower) (100 Mb)
  • D. Melanogaster genome(Fruitfly) (180Mb)
  • 2001
  • 10,000 full-length human cDNAs sequenced
  • 2003
  • Human genome sequence completed

26
Human Genome Project
  • U.S. Human Genome Project was a 13-year effort
    coordinated by the Department of Energy and the
    National Institutes of Health.
  • Start at 1990. To complete mapping and
    understanding of all the genes of human beings.
  • In June 2000, scientists completed the first
    working draft of the human genome.
  • A high-quality, "finished" full sequence was
    completed in April 2003.

27
Goals of HGP
  • identify all the approximately 20,000-25,000
    genes in human DNA,
  • determine the sequences of the 3 billion chemical
    base pairs that make up human DNA,
  • store this information in databases,
  • improve tools for data analysis,
  • transfer related technologies to the private
    sector, and
  • address the ethical, legal, and social issues
    (ELSI) that may arise from the project.

28
DNA Sequencing Process
  • Mapping
  • Identify set of clones that span region of genome
    to be sequenced
  • Library Creation
  • Make sets of smaller clones from mapped clones
  • Template Preparation
  • Purify DNA from smaller clones.
  • Setup and perform sequencing chemistries
  • Gel Electrophoresis
  • Determine sequences from smaller clones
  • Pre-finishing and Finishing
  • Specialty techniques to produce high quality
    sequences
  • Data editing Annotation
  • Quality assurance Verification Biological
    annotation
  • Submission to public database

29
(No Transcript)
30
Future of HGP
  • HGP is the first step in understanding humans at
    the molecular level. Work is still ongoing to
    determine the function of many of the human
    genes.
  • What still need to be done
  • Gene number, exact locations, and functions
  • Gene regulation
  • DNA sequence organization
  • Chromosomal structure and organization
  • Noncoding DNA types, amount, distribution,
    information content, and functions
  • Coordination of gene expression, protein
    synthesis, and post-translational events
  • Interaction of proteins in complex molecular
    machines
  • Predicted vs. experimentally determined gene
    function
  • Evolutionary conservation among organisms
  • Protein conservation (structure and function)
  • Proteomes (total protein content and function) in
    organisms

31
(No Transcript)
32
Sequence Alignment
  • Joshua Gilkerson

33
Sequence Alignment
  • In genomics, many situations arise when sequences
    need to be compared or searched for similar
    sub-sequences.
  • Both of these task are aided by aligning the
    sequences to one another.
  • The two sequences are called the subject and the
    query.

34
Local vs. Global
  • Global alignment aligns the entire query to the
    entire subject.
  • Local alignment aligns a piece one sequence to a
    piece of the other.
  • Which is used depends on the application.
  • Surprisingly, these are computationally
    equivalent.
  • Sometime local-global mixed are used, aligning
    the entire query sequence against any one part of
    the subject.

35
Example Alignments
  • Global Alignment
  • AGCTCGA--GATTGCTGGACATGCTGCTGCT
  • A--TCGAGCGATTGC-----ATGCAGCTGCT
  • Local Alignment
  • Same subject as above
  • Query Sequence GAGAT
  • AGCTCGAGATTGCTGGACATGCTGCTGCT
  • AGAT GAGAT GAGAT

36
Model for Alignment
  • The best alignment is the one chosen from all
    possible alignments that minimizes the score.
  • Scoring is done pairwise at each position along
    the alignment.
  • Introducing a gap is more expensive than
    extending one already introduced(affine gap
    penalty).

37
Model for Alignment
  • Score ? gap penalties ? similarity weights
  • Gap penalty open penalty size size penalty
  • Open penalty and size penalty are constants gt0.
  • Similarity weight is zero for same base, gt0 for
    disparate bases.
  • BLOSUM similarity weights are most commonly used.

38
Scoring Example
  • Same example as earlier
  • Using
  • Gap opening penalty of 1
  • Gap size penalty of 1
  • Similarity scores all 1
  • AGCTCGA--GATTGCTGGACATGCTGCTGCT
  • A--TCGAGCGATTGC-----ATGCAGCTGCT
  • 021000021000000211110000100000013

39
Needleman-Wunsch Algorithm
  • Sequences Q and S
  • Scoring matrix M len(Q) x len(S)
  • Similarity matrix s
  • Gap length penalty - g opening penalty - 0
  • M(i,j) - score for best alignment of first i
    elements of Q and first j elements of S.
  • M(i,j) minimum of
  • M(i-1,j)g,
  • M(i,j-1)g,
  • M(i-1,j-1)s(Q(i),Q(j))

40
Needleman-Wunsch Example
CAT vs TAG lt-s M-gt g1
C A T
0 1 2 3
T 1
A 2
G 3
A C T G
A 0 1 1 1
C 0 3 1
T 0 0
G 0
41
Needleman-Wunsch Example
CAT vs TAG lt-s M-gt g1
C A T
0 1 2 3
T 1 2 2 2
A 2 2 2 3
G 3 3 3 2
A C T G
A 0 1 1 1
C 0 3 1
T 0 0
G 0
42
Needleman-Wunsch Example
CAT vs TAG lt-s M-gt g1
C A T
0 1 2 3
T 1 2 2 2
A 2 2 2 3
G 3 3 3 2
A C T G
A 0 1 1 1
C 0 3 1
T 0 0
G 0
43
Needleman-Wunsch Example
  • Two equally good alignments
  • -CAT C-AT
  • and
  • T-AG -TAG

44
Needleman-Wunsch
  • Runs in n2 time.
  • Easily generalized to allow gap opening penalty
    by using 3 copies of M, one for prefixes ending
    with a match, one ending with a gap in each
    sequence.
  • Easily generalized to local alignment by saying s
    is best score for an alignment of some suffix of
    the sequences ending at i and j. In practice,
    this means
  • The first row and column are filled with all
    zeroes instead of just the top-left-most
    position.
  • The end of the alignment is at the globally
    minimal position, not the lower-left corner.
  • The beginning is at the location where
    backtracking cannot continue.

45
Other Alignment Tools
  • The Basic Local Alignment Search Tool (BLAST) is
    probably the most widely used tool in genomics.
  • Finds local alignments.
  • Used on very large sequences (entire genomes)
  • Smith-Waterman Algorithm - Adaptation of
    Needleman-Wunsch for local alignments.
  • FASTA package

46
The Importance of Bioinformatics and Summary
  • David Owen

47
The importance of bioinformatics
  • Traditionally, molecular biology research was
    done entirely in a laboratory.
  • But the genome projects has increased the data by
    a huge amount. Thus the researchers need to
    incorporate computers for making sense of the
    vast amount of data.

48
Challenges
  • Intelligent and efficient storage of the massive
    data.
  • Easy and reliable access to the data.
  • Development of tools which allow the extraction
    of meaningful information.
  • The developer of the tool must also consider the
    following
  • The user (biologist) might not be an expert with
    computers.
  • The tool must be able to provide access across
    the internet.

49
Processes
  • Three main processes a bioinformatics tool
  • must have
  • DNA sequence determines protein sequence
  • Protein sequence determines protein structure
  • Protein structure determines protein function
  • The information obtained from these processes
  • allow us to understand better of the biology of
  • organisms.

50
Computer Scientist vs. Biologist
  • Computer scientist
  • Logic
  • Problem-solving
  • Process-oriented
  • Algorithmic
  • Optimizing
  • Biologist
  • Knowledge gathering
  • Experimentally-focused
  • Exceptions are as common as rules
  • Describe work as a story
  • Develop conclusions and models
  • The need for communication between computer
    scientist and biologist.

51
Research Areas
  • Further research areas include
  • Sequence alignment
  • Protein structure prediction
  • Prediction of gene expression
  • Protein-protein interactions
  • Modeling of evolution

52
Future of Bioinformatics
  • Integration of a wide variety of data sources.
    E.g. Combining the GIS data (maps) and weather
    systems, with crop health and genotype data,
    allows us to predict successful outcomes of
    agricultural experiments.
  • Large-scale comparative genomics. E.g. the
    development of tolls that can do 10-way
    comparisons of genomes.
  • Modeling and visualization of full networks of
    complex system.

53
Ultimate Goal
  • Obtain a better understanding of the biology of
    organisms through the examination of biological
    information hidden in the vast amount of data we
    have.
  • This knowledge will allow us to improve our
    standard of life.

54
References
  • http//www.ornl.gov/sci/techresources/Human_Genome
    /project/about.shtml
  • http//www.genome.gov/
  • http//bioinfo.mbb.yale.edu/course/projects/final-
    4/
  • http//www.dictionary.com
  • http//www.ebi.ac.uk/2can/bioinformatics/index.htm
    l
  • http//bioinformatics.ca/workshop_pages/bioinforma
    tics/day1-files/1.0_intro_bffo_2005.pdf

55
References (cont.)
  • http//elegans.uky.edu/520/Lecture/index.html
  • http//bioinformatics.org/
Write a Comment
User Comments (0)
About PowerShow.com