The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology

Description:

WWW and X GUIs. Comprehensive. Widely available. Similar ... Several GUIs (java, WWW, X) Excellent GUI including interactive graphical output. Comprehensive ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 26
Provided by: david1424
Category:

less

Transcript and Presenter's Notes

Title: The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology


1
Bioinformatics
a definition ?
The design, construction and use of software
tools to generate, store, annotate, access and
analyse data and information relating to
Molecular Biology
OR
Biologists doing stuff with computers?
Here we consider the use of Bioinformatics tools
rather than their design and construction
Here we consider the access and analysis of data
and information items rather than their
generation, storage or annotation
2
Software Tools for Sequence Analysis
Packages that offer a comprehensive range of
bioinformatics tools for sequence analysis.
General Packages
Most researchers would expect to use such
packages at some time.
Packages that offer tools for a particular type
of analysis.
Specialised Packages
Used intensely by researchers in the relevant
area, not at all by everyone else.
Tools whose nature inclines them to be primarily
accessed over the network.
WWW Resources
These categorisations are very general
Many specialist programs are incorporated into
the general packages.
Most things can be done at a web site somewhere.
3
Sequence Analysis an Overview
Sequencing Project Management
Database Retrieval
Restriction Mapping
Primer Design
DNA/RNA Folding
Nucleic Acid Sequence Analysis
Database Retrieval
Seeking Coding regions
Database Similarity Searching
Translation to amino acids
Pairwise Sequence Comparison
Multiple Sequence Alignment
Protein Sequence analysis
Prediction of Function
Structure prediction
Phylogeny
Motifs and Patterns
Structure analysis
4
Software Tools for Sequence Analysis
General Packages
Commercial
UNIX only
WWW and X GUIs
Comprehensive
Widely available
Open source
UNIX only
Several GUIs (java, WWW, X)
Comprehensive
Similar structure to the GCG package
Open source
Windows, MacOS X, UNIX
Excellent GUI including interactive graphical
output
Not comprehensive but allows access to EMBOSS
5
Software Tools for Sequence Analysis
General Packages
Expensive
Commercial
Other options
Windows PCs or Macintoshes
Good GUIs
Public Domain
Windows, Macintosh, UNIX
Modern intuitive GUI
Access remote databases
6
Sequence Analysis an Overview
Sequencing Project Management
Database Retrieval
Restriction Mapping
Primer Design
DNA/RNA Folding
Nucleic Acid Sequence Analysis
Database Retrieval
Seeking Coding regions
Database Similarity Searching
Translation to amino acids
Pairwise Sequence Comparison
Multiple Sequence Alignment
Protein Sequence analysis
Prediction of Function
Structure prediction
Phylogeny
Motifs and Patterns
Structure analysis
7
Software Tools for Sequence Analysis
Specialised Packages
Sequencing Project Management
Free academic licence
The Phred - Phrap Package By Phil Green et al
Excellent base call confidence estimation (phred)
Excellent large scale contig assembler (phrap)
Available by anonymous ftp
Excellent GUI
Excellent contig editor
Excellent finishing tools
Simple confidence estimation Contig assembler
not good for big projects BUT phred and phrap can
be accessed from Staden GUI
8
Software Tools for Sequence Analysis
Specialised Packages
DNA/RNA Folding
Free for academic use
Can be installed locally or run via a WWW page
Incorporated into the GCG general package
Michael Zukers Programs
Protein Structure Analysis
Nominal fee for academic use
LINUX, IRIX, Windows
Whatif by Gert Vriend
9
Software Tools for Sequence Analysis
Specialised Packages
Protein Structure Analysis for very rich people
IRIX, HP-UX, LINUX
IRIX, AIX, LINUX
Both systems are very impressive _at_ very expensive
10
Software Tools for Sequence Analysis
Specialised Packages
Phylogeny
Available by anonymous ftp
Windows, Macintosh, UNIX
Incorporated into the EMBOSS general package
Commercial, but reasonable
UNIX, VMS, DOS and windows
Incorporated into the GCG general package
11
Sequence Analysis an Overview
Sequencing Project Management
Database Retrieval
Restriction Mapping
Primer Design
DNA/RNA Folding
Nucleic Acid Sequence Analysis
Database Retrieval
Seeking Coding regions
Database Similarity Searching
Translation to amino acids
Pairwise Sequence Comparison
Multiple Sequence Alignment
Protein Sequence analysis
Prediction of Function
Structure prediction
Phylogeny
Motifs and Patterns
Structure analysis
12
Software Tools for Sequence Analysis
WWW Resources
Database Retrieval
Sequence Retrieval System
Retrieves MUCH more than sequences
Core elements free to academic sites
Implemented in many places
It is possible to integrate analysis tools
Elements of SRS are incorporated into EMBOSS
13
Software Tools for Sequence Analysis
WWW Resources
Database Retrieval
Retrieves MUCH more than sequences
Access to NCBI databases only
Entrez client software available by anonymous ftp
Most general packages include tools to access
local sequence databases
EMBOSS programs can access sequences from remote
SRS servers
14
Software Tools for Sequence Analysis
WWW Resources
Database Similarity Searching
Very popular, very widely available
Not sensitive But extremely fast
Popular, widely available
Not sensitive much slower than blast
Can be installed locally or run via a WWW page
Available by anonymous ftp (blast, fasta)
BOTH blast fasta
DNA/Protein query V DNA/Protein database
Incorporated into the GCG general package
15
Software Tools for Sequence Analysis
WWW Resources
Database Similarity Searching
Fully sensitive
Slow algorithm fast computers
MPsrch
Protein V Protein only
Major use when blast/fasta fail
Exclusively a WWW resource
16
Software Tools for Sequence Analysis
WWW Resources
Structure prediction
Was consensus service now JNet only
JNet available by anonymous ftp
Older service, similar approach to JNet
Main element is called PHD
Both JPred and PHD work best from aligned protein
families
Simpler methods predicting from single sequences
in most general packages
17
Software Tools for Sequence Analysis
WWW Resources
Other WWW services
General Services
EBI
Pasteur Institute
And many more
Protein sequence analysis
Expasy
Gene finding
Simple gene finding in most general packages
Primer design
Primer design in most general packages
Primer design in EMBOSS is primer3
18
Databases
Database are available from WWW sites and highly
interlinked
Clinical and Mutation
OMIM
MGMD
Bibliographic
PubMed
Raw Sequence
As accessed for sequence retrieval
19
Databases
Sequence Databases
Contain both raw sequence data and annotation
DNA Sequences
(European Molecular Biology Laboratory)
GenBank (NCBI)
Refseq (NCBI)
DNA Data Bank of Japan
Protein Sequences
Refseq (NCBI)
PIR
Trembl (GenPept)
20
Databases
Database are available from WWW sites and highly
interlinked
Clinical and Mutation
OMIM
MGMD
Bibliographic
PubMed
Raw Sequence
As accessed for sequence retrieval
Alignments and Patterns
As generated by analysis software
21
Databases
Alignments and Patterns
Alignments
Aligned protein families
Comprised of a number of sections
Aligned protein domains
Automatically generated from protein sequence
databases
Conserved blocks of protein alignments
Used to compute scoring schemes for protein
comparisons
22
Databases
Alignments and Patterns
Patterns
Patterns are largely derived from the conserved
portions of aligned protein families
Representations of single motifs
Now comprised of both simple patterns and HMM
profiles
Representations of patterns of motifs
(fingerPRINTS)
23
Databases
Database are available from WWW sites and highly
interlinked
Clinical and Mutation
OMIM
MGMD
Bibliographic
PubMed
Raw Sequence
As accessed for sequence retrieval
Alignments and Patterns
As generated by analysis software
Structural
PDB
Integrated
Ensembl
24
The End.
25
dpj10_at_mole.bio.cam.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com