Aucun%20titre%20de%20diapositive - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Aucun%20titre%20de%20diapositive

Description:

A platform for pattern discovery in sets of biological sequences C. Alland, J. Nicolas – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 27
Provided by: ALLA142
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Aucun%20titre%20de%20diapositive


1
A platform for pattern discovery in sets of
biological sequences
C. Alland, J. Nicolas
2
Framework bioinformatics platform of Genopole
Ouest
http//www.sb-roscoff.fr/ BioInfo-GPO/
Sequencing Genotyping
O. Collin H. Leroy
Bioinformatics
Functional exploration
Biochips
  • Coordination
  • Data Bases
  • Bioinformatics Software
  • High Performance
    Computing
  • Teaching

PCIO SunFire 6800 56 UltraSparc III 56 Go RAM
Proteomics
3
Welcome Page of the bioinformatics platform
service
http//idefix.univ-rennes1.fr8080/ Serveur-GPO/
4
Software Page of the bioinformatics platform
service
http//idefix.univ-rennes1.fr8080/
Serveur-GPO/services.php
5
Aims of the project
Set of biological sequences
Common characteristic or discriminant pattern
  • Annotation of genomes Discovery of new
    genes/proteins
  • Characterization of functional families
  • Experimental comparison of methods Choice of
    complexities and representations of patterns
    Copy/Implementation of several algorithms
  • Practical tool Parameter tuning Filtering

6
Architecture of the platform
Visualization of results
Pattern Discovery Algorithms
Interface
Supervisor
Statistical Analysis of inter-motif regions
Refinement
Search of patterns
Practical Use
7
Welcome page of the pattern discovery service
Regular languages inferring methods
Jonassen
Marsan
Pevzner
8
Brazma hierarchy for (generalized) regular
patterns
  • J full regular languages (finite automata)

9
Example of the discovery of candidates in the
defensin family
Collaboration with GERM (C. Pineau, F.
Bourgeon) directed by B. Jégou, staffed with 40
people and specialized in researches on male
reproduction in mammals.
  • Defensins are a major family of antimicrobial
    peptides found in mammals, cationic peptides of
    28-42 amino acids length containing 3
    intramolecular disulfide bonds.
  • Starting point a set of 30 sequences (including
    all organisms), 4 for human.
  • Aim discovery of new candidates

10
Pratt principle of the algorithm
  1. One starts from a pattern graph containing all
    the most specific allowed patterns covering at
    least k of the n sequences in the training set
  2. A pattern search tree is explored starting from
    the most general one (empty pattern) and
    specializing it by adding allowed components
    (belonging to the pattern graph generalization
    operators) while patterns obtain a better score.
    Several scores and search strategies are
    available
  3. The most significant patterns are filtered and a
    refinement phase may be applied to specialize
    flexible wild card with ambiguous letters

11
Pratt three levels of use
  1. Simple most parameters are fixed or simplified
  2. Expert all parameters available
  3. Meta Pratt is applied to sequences of patterns.

12
Simple Pratt parameters
13
Simple Pratt results
14
Advanced Pratt parameters
15
Advanced Pratt results
16
Visualization of selected results
17
Meta Pratt
18
Search pattern in a databank
19
Results of the search in a databank
20
View of the search in a databank
21
Statistical Analysis of inter-motif regions
22
Results for refinment of patterns
23
Reverse Search in a Genome
24
Reverse Search in a Genome principle
  • From the patterns and knowledge of exons/introns
    splicing, a formal grammar may be inferred.
  • Genomes are translated in the six frames and
    compiled in a suffix tree data structure.
  • Syntactical analysis is done with the help of
    operations on suffix trees and results in
    potential new candidates.

To jnicolas_at_irisa.fr Pattern
C-x(2,4)-G-x(1,3)-C-x(3,4)-C-x(7)-AG-HKNRST-C-
x(5,6)-C-C Organisme Chromosome Phase
Position LengthOcc Length Ch preOcc
Occ postOcc No match No match No match No
match No match No match
25
Conclusion / Perspectives
  • 10 new potential defensins discovered
  • Importance of a complete environment coupling
    highly expressive patterns with syntactical
    search in banks
  • Current research  meta level  using
    grammatical inference. Infer any regular language
    from a set of positive AND negative instances.
  • Open questions Better filtering of patterns,
    introduction of probabilities, long distance
    interaction.

26
(No Transcript)
About PowerShow.com