Mining the Genome - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Mining the Genome

Description:

FILIP ELEZN Department of Cybernetics, Czech Technical University in Prague. Mining the Genome ... Dept. of Cybernetics. Gerstner Laboratory ... – PowerPoint PPT presentation

Number of Views:169
Avg rating:3.0/5.0
Slides: 29
Provided by: FilipZ4
Category:

less

Transcript and Presenter's Notes

Title: Mining the Genome


1
Mining the Genome
  • Filip elezný
  • CVUT FEL, Prague
  • Dept. of Cybernetics
  • Gerstner Laboratory

2
Intro
  • Research at CVUT FEL Dept. of Cybernetics
  • Nature Inspired Technologies
  • machine learning
  • evolutionary computation
  • Agent Computing
  • Robotics
  • Computer Vision
  • EU Projects (6 FP)
  • 14 running in 2005, 9 new starting 2006

3
Machine Learning basics
4
Machine Learning Data Mining
  • Supervised learning
  • given examples and their class labels
  • find a model for predicting class labels of new
    examples
  • also concept learning, predictive
    classification, ...
  • Example
  • Given
  • Discover

sizesmall luxurylow ? affordable
5
Machine Learning
Plethora of paradigms
Decision trees
Support VectorMachines
Artificial NeuralNetworks
Symbolic
Subsymbolic
Statistical
Learning optimization in structure / parameter
space Learning search AI techniques employed
(gradient descent, heuristic search)
6
Relational Learning
What if examples have a structure?
Not an attribute tuple ! Description spread in
multiple tables of a relational database
7
Relational Learning
  • Relational learning
  • Representing data and rules in relational logic
    (Prolog)
  • Exploits background knowledge (eg. charge)
  • Inductive Logic Programming

carcinogenic(Compound) IF has_atom(Compound,
Atom) type(Atom, carbon) charge(Atom,
Charge) Charge gt 0.0133 has_atom(Compound,
Atom2) double_bond(Atom1, Atom2)
8
Applications of Interest
  • 3 hot fields intersection

BIO technologies(genomics)
INFORMATION technologies(machine learning)
NANO technologies(microarray chips)
9
A quick intro into computational genomics
10
Background GENETICS
How does a cell know what to do?
11
Chromosomes
Chromosomes get copied during mitosis They carry
the assembly instructions? How?
Chromosomes proteins DNA where is the
information ??
12
DNA
1953 Jim Watson Francis Crick Discover the DNA
structure. That is where the information is.
4-symbol alphabet Guanin, Adenin, Cytosin,
Tymin Double-helix pairing C-G A-T
video
13
The CENTRAL DOGMA of Molecular Biology
  • Gene DNA subsequence
  • Genes code for proteins
  • Gene expression
  • DNA piece transcribes to RNA
  • RNA translates into a protein
  • Proteins do the job
  • - enzymes
  • - building blocks
  • - ...

video
14
Protein Coding
Codon(3 bases)
DNA strand
aminoacid
Protein
15
Protein structures
resolution
16
Secondary structure prediction
Two common secondary structures
? - sheet
? - helix
Primary structure determines secondary
structure. Computational problemGiven primary
structure, predict if ? - sheet or ? -
helix NOBODY CAN DO THAT !
17
Secondary structure prediction
  • Secondary structure prediction with ILP
  • Muggleton 1992
  • Using ILP, obtained rulessuch as

alpha0(A,B) ? ... position(A,D,O)
not_aromatic(O) small_or_polar(O)
position(A,B,C) very_hydrophobic(C)
not_aromatic(C) ...etc (22 literals)
  • Note the incorporation of background knowledge
  • Accuracy 81, best at the time
  • Published in Jr Protein Engineering

18
Sequencing the Human Genome
19
The Genome project
  • 1993 2003
  • All human genes sequenced
  • Celera X NIH race
  • Challenge NOW
  • annotate the genes
  • discover functions
  • interactions
  • dynamic pathways

video
20
Genomics research
  • Traditional functional genomics research
  • Hypothesis - driven
  • eg. a gene is suspected to be responsible for ...
  • then tracing its expression in relevant tissues
  • First hypothesize, then measure

21
Gene Expression Microarrays
  • Microarray chip
  • Measures expression of tens of thousands genes
    simultaneously high-throughput
  • pioneering technology (mid to late 90s)
  • A grid carrying synthesized DNA probes
  • ? Breakthrough in genomics research?

photo scan
22
Genomics Research
  • High-Throughput approach to functional genomics
    ?
  • Data-driven, unbiased, First measure, then
    hypothesize
  • Might reveal never-thought-of relationships

Microarray data
Human analysis
Hypotheses
IMPOSSIBLE (TOO MUCH DATA)
Expression of almost entire genome(tens of
thousands genes)
23
Genomics Research through Machine Learning
  • AI based High-Throughput functional genomics ?

High-throughputscreening
High-performancecomputing
Microarray data
Machine Learning
Hypotheses
Interpretation
24
Genomics Research with AI
  • This concept has recently been proven to work
  • Golub et al., Science 286531-537 1999
  • leukemia classification model (AML vs. ALL)
  • voting of informative attributes (genes)
  • Discovery of new classes (clustering)
  • Ramaswamy et al., PNAS 9815149-54 2001
  • Tumor classification
  • 14 classes of cancer
  • used Support Vector Machines

video
25
Interpretable classifiers
  • Comprehensibility Pursuit Rule Based Models
  • Models interpretable by biologists
  • Our work
  • D. Gamberger, N. Lavrac, F. elezný, J. Tolar Jr
    Biomed Informatics 37(5)269-284 2004

IF gene_20056 EXPRESSEDAND gene_23984
NOT_EXPRESSEDTHEN cancer_class AML
Class
26
Exploiting Background knowledge
  • Tons of genomic background knowledge available
  • Relational learning would allow to exploit it!

27
Relational Genomic Data Mining
  • Our current work
  • Combining expression gene annotation data

Rule Based Model
28
Relational Genomic Data Mining
  • Example rule algorithmically discovered
  • ... open end, no conclusions

expressed_in_all(Gene) IF has_location(Gene,
integral_to_membrane) has_function(Gene,
receptor_activity)
Expression of genes coding for proteins located
in the integral to membrane cell component, whose
functions include receptor activity, has a high
correlation with the BCR class of acute
lymphoblastic leukemia (ALL) and a low
correlation with other classes of ALL.
Write a Comment
User Comments (0)
About PowerShow.com