Bioinformatik - Ein ideales Fach, um mathematisch-naturwissenschaftliche F - PowerPoint PPT Presentation

1 / 107

About This Presentation

Title:

Bioinformatik - Ein ideales Fach, um mathematisch-naturwissenschaftliche F

Description:

Bioinformatik - Ein ideales Fach, um mathematisch-naturwissenschaftliche F cher (Biologie, Chemie, Mathematik) mit der Informatik zu verbinden – PowerPoint PPT presentation

Number of Views:162

Avg rating:3.0/5.0

Slides: 108

Provided by: OliverKo

Category:

more less

Transcript and Presenter's Notes

Title: Bioinformatik - Ein ideales Fach, um mathematisch-naturwissenschaftliche F

1
Bioinformatik -Ein ideales Fach, um
mathematisch-naturwissenschaftliche Fächer
(Biologie, Chemie, Mathematik) mit der Informatik
zu verbinden

Dr. Clemens Gröpl(für Prof. Knut Reinert)FU
Berlin
groepl_at_inf.fu-berlin.de
Vortrag am 1. September 2006
beim 5. Berliner MNU-Kongress, TU Berlin

2
Bioinformatics in the media
3
Bioinformatics in Berlin

4
Topics in bioinformatics
Molecular therapy of deseases protein-protein
docking

Find drugs that alter or inhibit the function of
the target molecule.
Searching data bases helps to find suitable
candidates and reduce side effects.

5
Topics in bioinformatics
(Some pictures with the courtesy of the MPI für
Informatik, Saarbrücken.)

Gene finding
Improve prediction of coding and regulatory
regions
Comparing multiple genomes is promising

6
Topics in bioinformatics
Identifying SNPs (single nucleotide polymorphisms)
or other polymorphisms
GACGTGCACTAAATCGCGCAACTG
TTCGGGTTGGACGTGCACTAACTCG
TTCGGGTTGGACGTGTACTAAATCGCGCAACTG
GGGTTGCACGTGCACTAAATCGCGCAACTG

Identification of SNPs helps to
associate patterns of genetic diversity to
diseases
associate genetic patterns to drug tolerance

7
Topics in bioinformatics
8
Topics in bioinformatics
Problem (Project together with
Charité) Within the first year, 8-10 of the
patients lose the transplant. After 10 years,
about 50 have lost the transplant. Diagnosis is
invasive and sometimes leads to loss of
craft. Goal Analyse urin samples of patients
and detect as early as possible diagnostic
markers to counteract craft loss.
9
Topics in bioinformatics

Automated measurement methods lead to terabytes
of data (Prof. Schlüter)
10
Bioinformatik

Bioinformatik verbindet naturwissenschaftliche
Fächer (Biologie, Chemie, Mathematik) mit der
Informatik

11
Selected topics
Zwei konkrete Unterrichtsthemen (?)

Genome assembly
Mass spectrometry based proteomics

12
Selected topics
Zwei konkrete Unterrichtsthemen (?)

Genome assembly
Mass spectrometry based proteomics

13
Assemblierung des Menschlischen Genoms

Knut Reinert

Girlsday April 2005
14
Assemblierung des Menschlischen Genoms
In jeder Zelle unseres Körpers befindet sich eine
vollständige Kopie unseres Genoms.
Bilder zum Großteil von http//www.genomenewsnetwo
rk.org/resources/whats_a_genome
15
Assemblierung des Menschlischen Genoms
16
Assemblierung des Menschlischen Genoms
Die Chromosome bestehen aus aufgewickelter DNS
(DesoxyriboNukleinSäure).
17
Assemblierung des Menschlischen Genoms
DNS besteht aus einem Doppelstrang immer gleicher
Paare (G-C und A-T) GGuanin, CCytosin,
AAdenin, TThymin
18
Assemblierung des Menschlischen Genoms
Wenn man den einen DNS strang kennt, kennt man
auch den anderen. (Umdrehen und A durch T und C
durch G ersetzen).
19
Grundidee von shotgun sequencing
Puzzle Sie zusammen
Mache ganz viele Kopien
Zerschnippsle Sie
20
Grundidee von shotgun sequencing
ACGTCGCTATGCCGTATCG
ACGTCGCTATGCCGTATCG ACGTCGCTATGCCGTATCG
ACGTCGCTATGCCGTATCG ACGTCGCTATGCCGTATCG
ACGTCGCTATGCCGTATCG ACGTCGCTATGCCGTATCG
Und dass machen wir dann mit DNS
Das Problem ist nur Wie kann man solch kleine
Schnipsel lesen?
21
DNA Sequenzierung
Shotgun DNA Sequenzierung (Technologie)
22
DNA Sequenzierung
TCACAATCAACTGCGCTATAG
A
G
T
G
A
G
T
G
T
T
A
AGTGTTAGTTGACGCGATATC
T
C
C
T
G
T
C
T
C
A
C
G
A
A
T
G
C
A
C
G
A
C
A
T
T
G
A
A
T
A
G
C
T
A
T
23
DNA Sequenzierung
Kapillar Sequenzierer

110 Kapillare mit Ladeadapter und
Detektionsinterface.

simulierter Laserstrahl.

24
DNA Sequenzierung
Verwandeln von analoger in digitale Information
25
Human Genome Project
18 Länder hatten Forschungsprogramme bezüglich
des menschlichen Genoms Australien, Brasilien,
Kanada, China, Daenemark, Frankreich,
Deutschland, Israel, Italien, Japan, Korea,
Mexiko, Niederlande, Russland, Schweden,
Grossbritannien, und die USA. 1100
Wissenschaftler beteiligt Die 5 größten
Sequenzierungszentren (USA/GB) sind DOE Joint
Genome Institute Baylor College of Medicine
Sanger Centre Washington University Genome
Sequencing Center Whitehead Institute/MIT
Center for Genome Research Kosten ca. 3000
Millionen US Dollar
26
Celera Genomics
Private Firma 300 Wissenschaftler
beteiligt Größte Sequenzieranlage der
Welt. Kosten ca. 500 Millionen US Dollar
27
Celeras Rechnerpark

300 ABI 3700 DNA Sequenzierer
50 Leute zum Bedienen
40 Verwaltung
2000 qm labor
2000 qm für Sequenzierer
über 1 Terabyte Hauptspeicher
über 80 Terabyte Plattenspeicher

28
ABI 3700 Kapillarsequenzierer
29
Wie funktioniert das Zusammenpuzzeln ?
30
Wie funktioniert das Zusammenpuzzeln ?
Lösung 1. Puzzle
Die Bioinformatik nutzt Methoden aus Informatik
und Mathematik um biologische Fragestellungen
zu beantworten
Lösung 2. Puzzle
ACGTCGCTATGCCGTATCGATGCGATCGA TGCAGTCGGTATCGATGCGA
TGC
31
Wie funktioniert das Zusammenpuzzeln ?
Lösung 1. Puzzle
Die Bioinformatik nutzt Methoden aus Informatik
und Mathematik um biologische Fragestellungen
zu beantworten
Lösung 2. Puzzle
ACGTCGCTATGCCGTATCGATGCGATCGA TGCAGTCGGTATCGATGCGA
TGC
32
Das ganze Problem ist GROSS!!!

Um das menschlische Genom zu sequenzieren
braucht man ca. eine 5-fache Überdeckung
mit Schnipseln.
Das Genom ist 3.000.000.000 Zeichen lang
Das heißt man muss 15.000.000.000 Zeichen in
Schnipseln lesen
Auf ein doppeltes DIN A4 Blatt passen ca. 10000
Zeichen. In einem solchen Paket sind 500 Blätter.
Also passen 5 Millionen Zeichen auf die Blätter
Wir brauchen also nur 3000 solche Pakete

33
Assemblierung des Menschlischen Genoms
34
DNA Sequenzierung
35
BAC-by-BAC Ansatz
Genom
AGTTGAGATCGCCCTAGCGCTAATAGCGCACATCACAACGGCGCGCTCTA
CGGCACGATATACGGTGTCGCTT
Für jeden BAC (33 500 für Menschen)
36
BAC-by-BAC Ansatz
Genom
2 separate Prozesse clone libraries instabil,
Kartierung schwierig libraries müssen für jeden
clone gemacht werden Assemblierungsproblem
einfach
37
Whole Genome (Double Barreled) Shotgun
Genom
38
Whole Genome (Double Barreled) Shotgun
Genom
39
Genome assembly

Interessante Anwendung von Algorithmen auf
Zeichenketten und Graphen
Die bunten Schnipsel kann man leicht selber
herstellen
Mehr Info zum Ablauf des Girlsday 2005 und 2006
bei Eva Lange und Knut Reinert

40
Selected Topics
Zwei konkrete Unterrichtsthemen (?)

Genome assembly
Mass spectrometry based proteomics

41
Genomics vs. Proteomics
42
Definition of proteomics

Proteom
Die Gesamtheit aller Proteine in einem Lebewesen,
einem Gewebe, einer Zelle oder einem
Zellkompartiment, unter exakt definierten
Bedingungen und zu einem bestimmten Zeitpunkt,
wird als Proteom bezeichnet (zum Beispiel Proteom
des Menschen, der Kartoffelknolle, der
Bakterienzelle, des Zellkerns).
(http//de.wikipedia.org/wiki/Proteom)

43
Genomics vs. proteomics
Genomics
Proteomics
Proteome is dynamic (age, tissue, what you had
for lunch) Up to 2000 k Proteins Emerging
technology (MS, HPLC/MS, protein chips)
Genome is rather static 30 k
genes Established, fully automated
technology (capillary sequencer)
44
From genes to proteins

Transcription and translation are heavily
regulated
Protein expression levels are not static
mRNA levels and protein levels often not
correlated
Contradictory results from seemingly similar
methods
RNA chips
DNA chips
gene disruption
knock out

Anderson et al., Electrophoresis (1998), 19,
1853-61
45
Proteins end products?
46
Definition of proteomics

Proteomics
Die Proteomik (englisch proteomics) umfasst die
Erforschung des Proteoms, d.h. der Gesamtheit
aller in einer Zelle oder einem Lebewesen unter
definierten Bedingungen und zu einem definierten
Zeitpunkt vorliegenden Proteine.
Das Proteom ist im Gegensatz zum (eher)
statischen Genom (hoch) dynamisch und kann sich
daher in seiner qualitativen und quantitativen
Proteinzusammensetzung aufgrund veränderter
Bedingungen (Umweltfaktoren, Temperatur,
Genexpression, Wirkstoffgabe etc.) verändern.
(http//de.wikipedia.org/wiki/Proteomik)

47
Definition of proteomics

Proteomics
can be defined as the qualitative and
quantitative comparison of proteomes under
different conditions to further unravel
biological processes.
(www.expasy.org)

48
Application fields

Diagnostics Find relevant patterns in one- or
two-dimensional LC measurements
Time series Analyze the temporal behaviour in a
time series experiment
Quantitative Measurements Determine absolute
content of peptides using additive method
(Myoglobin, Gliadin)

49
Serum myoglobin as a diagnostic marker

Myoglobin
17 kDa protein
stores oxygen in skeletal and heart muscle
release in serum after a myocardial infarct
Important parameter for blood re-circulation
after thrombolytic therapy
healthy people 30-90 ng/mL diseased gt 100-1000
ng/ml

50
Protein concentrations in serum

Serum albumin
40 mg/ml, 600 nmol/ml
Immunoglobulins
20 mg/ml, 350 nmol/ml
Myoglobin 550 ng/ml, 32 pmol/ml

Dynamic Range 20.000
gt Separation necessary
51
Separation by SAX, HPLC, ESI-MS
SAX
Serum
HPLC
ESI MS
52
Shotgun proteomics
K
Digestion
Separation
Peptid- digest
Proteins

Key idea of shotgun proteomics
Separation of whole proteins possible but
difficult, hence digestion preferred
Separate peptides
Identify proteins through peptides

53
Mass spectrometry
Peak intensity in scan corresponds to amount
present, but intensities are not comparable!
54
HPLC-MS analysis
55
Sample preparation
For myoglobin quantification, we usedan
experimental setup called additive series.

Target solution(s)
Myoglobin-depleted human serum
Spiked with 0.40-0.50 ng/µl human myoglobin
Target value to be quantitated
Spiked with 0.50 ng/µl horse myoglobin
Internal standard
Aliquots spiked with eight known amounts between
0.24 and 3.3 ng/µl of human myoglobin
Additive series
Four technical replicates for each measurement
(8x4).

56
Interpreting an additive series
intensity
measurements
57
Results
Expected value 0.47 ng/µl myoglobin
58
Proteomics data flow
Raw Data
HPLC/MS
Sig. Proc.
Filtered Raw Data
Map
Data Reduction
Diff. Anal.
Annotated Maps
Differentially Expressed Proteins
Identification
59
Whats in a map?

Retention time (RT) for each scan
Peptide mass/charge ratio (m/z)
(usually within 20 ppm)
Intensity (I)
-gt use m/z and RT to identify peptides
-gt use I to quantify peptides
(relative quantitation only!)
Maps become HUGE (108 Peaks)!

60
Maps
61
Maps
62
Maps
63
Maps
64
Maps
65
Proteomics data flow
Raw Data
HPLC/MS
10 GB
Sig. Proc.
Filtered Raw Data
Map
Data Reduction
1 GB
50 MB
Diff. Anal.
Annotated Maps
Differentially Expressed Proteins
Identification
1 kB
50 MB
66
Proteomics data flow
Raw Data
HPLC/MS
Sig. Proc.
Filtered Raw Data
Map
Data Reduction
Diff. Anal.
Annotated Maps
Differentially Expressed Proteins
Identification
67
Peak picking
sticks
raw data

The raw ion count data acquired by the mass
spectrometer needs to be converted into peak
lists for further processing.
This is called peak picking.

68
Peak picking
sticks
raw data

Issues
Identify peak locations
Integrate the peak signal, assign stick
parameters centroid, width, height,
signal-to-noise, skewness,
Reduces the amount of data by a factor of 10
100

69
Wavelet transformation

Using the Continuous Wavelet Transformation (CWT)
we can split the signal into different frequency
ranges (scales).

raw
a3

a0.3
a0.06
70
Peak picking algorithm

Compute the wavelet transform
Search for a peak maximum
Search for peak endpoints
Estimate the centroid
Determine the height

71
Data reduction
Raw Map
72
Whats in a map?

LC/MS experiments produce gigabytes of raw data
We need to reduce this to the essential features
therein
One can deal with both dims one after another or
use a two-dimensional approach

73
Data reduction
Raw Map
74
Feature finding
Feature finding from a global perspective.A
small section of LC/MS raw data (left)and the
features extracted by FeatureFinder (right).
75
Feature finding

A two-dimensional model has to be adjusted to the
raw data
Both dimensions can be modeled independently

76
Feature finding

isotope pattern
feature model
elution profile
RT
m/z
77
Isotope patterns

Natural isotopes occur with well-known abundances
Thus the theoretical peak positions and
intensities can be computed(More about this
later!)

78
Feature models m/z

Isotope patterns
Masses of isotopic variants are about 1, 2, Da
larger than the monoisotopic mass
At 0.2 Th mass resolution, isotopic variants are
not clearly separated for charge states 2
Peak picking will not work in this case
Instead we apply a (Gaussian) mixture model to
the whole isotope pattern

79
Feature models m/z

Isotope patterns

80
Feature models RT

Elution profiles
Can be modeled by a normal distribution
Or an exponentially modified normal distribution
(for fronting and tailing)

81
Proteomics data flow
Raw Data
HPLC/MS
Sig. Proc.
Filtered Raw Data
Map
Data Reduction
Diff. Anal.
Annotated Maps
Differentially Expressed Proteins
Identification
82
Differential analysis

Two common basic approaches
Direct Differential Quantitation (DDQ)
Isotope tagging (e.g. ICAT, MeCAT)

Map 1 (normal)
Map 2 (diseased)
diseased
normal
83
Feature matching

When analyzing e.g. an additive series, we need
to match features across maps

84
Feature matching

A star-like matching of 32 LC/MS feature maps

85
Differential analysis

Two common basic approaches
Direct Differential Quantification (DDQ)
Isotope labeling (e.g. ICAT, MeCAT, SILAC,)

86
Isotope labeling (ICAT)
Heavy and light ICAT reagents are 8 Dalton apart
87
Isotope labeling (ICAT)
Heavy and light ICAT reagent 8 Dalton apart
88
Proteomics data flow
Raw Data
HPLC/MS
Sig. Proc.
Filtered Raw Data
Map
Data Reduction
Diff. Anal.
Annotated Maps
Differentially Expressed Proteins
Identification
89
Peptide identification by MS2
Certain MS/MS instruments can select ions within
a definedm/z range and subject them toanother
step of fragmentation.
90
Peptide fragmentation
The peptide backbone breaks to formfragments
with characteristic masses.
Doublychargedpeptide
Chen et al. (2001)
y-ion
b-ion

91
Peptide ions in spectrum
1166
1020
907
778
663
534
405
292
145
88
b ions
K
L
E
D
E
E
L
F
G
S
147
260
389
504
633
762
875
1022
1080
1166
y ions
92
Peptide ions in spectrum
1166
1020
907
778
663
534
405
292
145
88
b ions
K
L
E
D
E
E
L
F
G
S
K
L
E
D
E
E
L
F
G
S
147
260
389
504
633
762
875
1022
1080
1166
y ions
y6
100
y7
Intensity
M2H2
y5
b3
b4
y2
y3
b5
y4
y8
b8
b9
b6
b7
y9
0
250
500
750
1000
m/z
93
Whats the problem?
Peptide fragmentation possibilities (ion types)
yn-i
yn-i-1
-HN-CH-CO-NH-CH-CO-NH-
CH-R
Ri
i1
R
i1
bi
bi1
94
Whats the problem?!!!
Peptide fragmentation possibilities (ion types)
95
Identification (using SCOPE)
Vineet Bafna, Nathan Edwards, Proc. ISMB 2001
96
Mass decomposition

Massenspektrometer messen die Gesamtmasse von
Ionen (Peptide, Fragmente, Metabolite, )
Wie kann man aus der Gesamtmasse auf die
Elementzusammensetzung schließen?
Kann man evtl. die Summenformel rekonstruieren?

97
Mass decomposition

Nehmen wir an, es gibt Elemente mit Massen a1,
a2, , ak (der Einfachheit halber natürliche
Zahlen, d.h. Nominalmassen)und die gemessene
Masse ist M.
Die Frage ist, gibt es natürliche Zahlenc1, c2,
, ck, so dass ?i 1,,k ci ai M ?
Das Problem kann mit dynamischer Programmierung
gelöst werden.

98
Mass decomposition

Dieses Problem ist auch unter dem Namen money
changing problem bekanntKann man 3,30 mit
Münzen zu2, 1, 0,50 und 0,20 bezahlen?

99
Mass decomposition

Damit eng verwandt ist dasmoney making
problemWie viele Möglichkeiten gibt es, 3,30
mit Münzen zu bezahlen?

100
Mass decomposition

Sei Ci,m die Anzahl der Möglichkeiten , um
m über den Münzwerten a1, a2, , ai
darzustellen. (Also die Anzahl von(c1, c2, ,
ci), so dass ?i ci ai m.)

101
Mass decomposition

Die Tabelleneinträge findet man leichtdurch eine
einfache Rekursion.

102
Mass decomposition

Die Tabelleneinträge findet man leichtdurch eine
einfache Rekursion.

103
Mass decomposition

Einfache dynamische Programierung, kann man mit
Münzen und Atommassen betrachten
Implementation ist auch einfach

104
Zusammenfassung

Bioinformatik verbindet Informatik, Biologie,
Medizin, Chemie, Mathematik,
Interdisziplinäres Verständis und
Zusammenarbeiten ist absolut notwendig
Mögliche Themen für die Oberstufe
Genomassemblierung
Massendekomposition

105
Fragen?

Danke für die Aufmerksamkeit !!!

106
(No Transcript)
107
Collaborators (OpenMS)

Dr. Clemens Gröpl
Eva Lange, Tim Conrad,
Ole Schulz-Trieglaff
(Algorithmische Bioinformatik,
FU Berlin)
Prof. Hartmut Schlüter
(Universitätsmedizin Berlin, Charité)
Prof. Dr. Oliver Kohlbacher
Marc Sturm, Andreas Bertsch
Jens Joachim
(SBS/WSI, Tübingen)
Andreas Hildebrandt,
Rene Husong
(Uni Saarbrücken)

Prof. Dr. Christian Huber Bettina Mayr et al.
(Instr. Analytik Bioanalytik, Univ. des
Saarlandes, Saarbrücken) Dr. Albert Sickmann
(Virchow-Zentrum, Würzburg) Herbert Thiele Jens
Decker (Bruker Daltonics, Bremen) Dr. Christoph
Klein (IRMM, Geel now IHCP Ispra)

Write a Comment

User Comments (0)