The Challenge of Predicting Gene Function - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

The Challenge of Predicting Gene Function

Description:

The most important revelation from the sequenced genomes is that ... strand the DNA strand 'w' or 'c' position the number of exons (no. of start positions) int ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 55
Provided by: ross197
Category:

less

Transcript and Presenter's Notes

Title: The Challenge of Predicting Gene Function


1
The Challenge ofPredicting Gene Function
  • Ross D. King
  • Department of Computer Science
  • University of Wales, Aberystwyth

2
Gene Function Prediction
  • The most important revelation from the sequenced
    genomes is that the functions of typically only
    between 60-70 of the predicted genes are known
    with any confidence.
  • The new science of functional genomics is
    dedicated to determining the function of the
    genes of unassigned function, and to further
    detailing the function of genes with purported
    function

3
Data Mining Prediction
  • We have developed a method for predicting the
    functional class of gene products based on
    ILP/Relational data mining.
  • The idea is to learn a reliable predictive
    function on the examples of genes with products
    of known function.
  • Then apply this function to genes where the
    functional class is unknown.
  • We call this approach Data Mining Prediction
    (DMP).

4
Predicting Gene Function in Yeast
  • We will demonstrate our approach using ORFs in
    yeast
  • (Saccharomyces cerevisiae).
  • Using the MIPS functional classification scheme
  • For those ORFs whose function is currently
    unknown
  • Using 5 types of data
  1. Sequence statistics
  2. Homology (sequence similarity)
  3. Predicted Secondary Structure
  4. Expression (microarray)
  5. Phenotype

5
We want to map from sequence to function class
6
Classification Schemes 1
  • MIPS/GeneOntology

1,0,0,0 "METABOLISM" 2,0,0,0 "ENERGY" 3,0,0,0
"CELL CYCLE AND DNA PROCESSING" 4,0,0,0
"TRANSCRIPTION" 5,0,0,0 "PROTEIN
SYNTHESIS" 6,0,0,0 "PROTEIN FATE (folding,
modification, destination)" 8,0,0,0 "CELLULAR
TRANSPORT AND TRANSPORT MECHANISMS" 10,0,0,0
"CELLULAR COMMUNICATION/SIGNAL TRANSDUCTION
MECHANISM" 11,0,0,0 "CELL RESCUE, DEFENSE AND
VIRULENCE" 13,0,0,0 "REGULATION OF / INTERACTION
WITH CELLULAR ENVIRONMENT" 14,0,0,0 "CELL
FATE" 29,0,0,0 "TRANSPOSABLE ELEMENTS, VIRAL AND
PLASMID PROTEINS" 30,0,0,0 "CONTROL OF CELLULAR
ORGANIZATION" 40,0,0,0 "SUBCELLULAR
LOCALISATION" 62,0,0,0 "PROTEIN ACTIVITY
REGULATION" 63,0,0,0 "PROTEIN WITH BINDING
FUNCTION OR COFACTOR REQUIREMENT " 67,0,0,0
"TRANSPORT FACILITATION" 98,0,0,0 "CLASSIFICATION
NOT YET CLEAR-CUT" 99,0,0,0 "UNCLASSIFIED
PROTEINS"
7
Classification Schemes 2
Hierarchy of classes
1,0,0,0 "METABOLISM" 1,1,0,0 "amino acid
metabolism" 1,2,0,0 "nitrogen and sulfur
metabolism" 1,3,0,0 "nucleotide
metabolism" 1,4,0,0 "phosphate metabolism" 1,5,0,0
"C-compound and carbohydrate metabolism" 1,6,0,0
"lipid, fatty-acid and isoprenoid
metabolism" 1,7,0,0 "metabolism of vitamins,
cofactors, and prosthetic groups" 1,20,0,0
"secondary metabolism"
8
Classification schemes 3
Hierarchy of classes
1,0,0,0 "METABOLISM" 1,1,0,0 "amino acid
metabolism" 1,1,1,0 "amino acid
biosynthesis" 1,1,4,0 "regulation of amino acid
metabolism" 1,1,7,0 "amino acid
transport" 1,1,10,0 "amino acid degradation
(catabolism)" 1,1,99,0 "other amino acid
metabolism activities" 1,2,0,0 "nitrogen and
sulfur metabolism" 1,3,0,0 "nucleotide
metabolism" 1,4,0,0 "phosphate metabolism" 1,5,0,0
"C-compound and carbohydrate metabolism" 1,6,0,0
"lipid, fatty-acid and isoprenoid
metabolism" 1,7,0,0 "metabolism of vitamins,
cofactors, and prosthetic groups" 1,20,0,0
"secondary metabolism"
... and ORFs may have multiple functions too!
9
Sequence Data
field description type aa_rat_X of amino
acid X in the protein real seq_len length of
the protein sequence int aa_rat_pair_X_Y of
the amino acids X and Y consecutively real mol_wt
molecular weight of the protein int theo_pI the
oretical pI (isoelectric point) real atomic_comp_
X atomic composition of X (C,H,N,O,S) real alipha
tic_index aliphatic index real hydro grand
average of hydropathy real strand the DNA
strand 'w' or 'c' position the number of
exons (no. of start positions) int cai codon
adaptation index real motifs number of PROSITE
motifs int tmSpans number of transmembrane
spans int chromosome chromosome
number 1..16,mit
478 attributes in total
10
Homology data
YAL001C mvltiypdelvqivsdkiasnkgkitlnqlwdisgkyfdls
dk....
sfc3 keyword(membrane) length(358) dbref(prosite)
dbref(embl)
We look up the associated information from
SwissProt
11
Predicted Secondary Structure Data
mvltiypdelvqivsdkiasnkgkitlnqlwdisgkyfdlsdkkvk...
cbbbbccaaaaaaaaaaaacccccbbbbaaaaaacccbbccccccb...
We record length and relative positions of the
secondary structure elements. This is relational
data.
12
Expression Data
  • Microrarray experiments to measure expression
    changes in yeast under a variety of conditions,
    including cell cycle, heat shock, diauxic shift.
  • Short time series data, numerical-valued

Spellman et al (1998), Roth et al (1998) DeRisi
et al (1997), Eisen et al (1998) Gasch et al
(2000, 2001), Chu et al (1998)
13
Phenotype Data
  • Data from knockout gene growth experiments
  • Many missing data
  • 69 attributes x 1461 ORFs of known function
  • 991 genes of unknown function
  • Data taken from 3 sources (TRIPLES, MIPS, EUROFAN)

deleted ORF
growth medium
ORF YAL001C YAL019W YAL021C YAL029C
calcofluor white w n n n
sorbitol n s n w
benomyl n w n w
...
H2O2 w w n r
s sensitive (less growth) w wild-type (no
observable effect) r resistant (more
growth) n no data
14
What are the Machine Learning Issues?
  • Large volume of data
  • Missing data
  • Accurate results required
  • Intelligible results required
  • Class hierarchy
  • Multiple labels
  • Relational data

15
Relational vs Propositional
Propositional single table, fixed number of
columns/attributes
Relational multiple tables, multiple values
16
Data Mining Prediction (DMP)
Entire database
Test data
1/3
2/3
PolyFARM
Data for rule creation
Validation data
1/3
2/3
Training data
All rules
Best rules
Rule gener- ation
Select best rules
Measure rule accuracy
C4.5
Results
17
Warmr
  • Warmr is an ILP Algorithm Developed by Dehaspe
    et al.
  • It is an ILP version of the well known Apriori
    data mining algorithm.
  • Designed to find frequent patterns in a datalog
    database.

18
PolyFARM
  • First-order association rule mining
  • Finding all frequent first order patterns in the
    data
  • Distributed on a Beowulf cluster
  • 47,034 homology patterns, f gt 5
  • 19,628 structure patterns, f gt 2
  • Clare King PADL 2003

hom(SPID, close) sq_len(SPID, short)
classification(SPID, ecoli)
A close homology to a short protein in E. coli
struc(Pos1, a) neighbour(Pos1, Pos2, c)
neighbour(Pos2, Pos3, a) coil_dist(high)
Contains alpha-coil-alpha with a high overall
coil distribution
19
Propositionalisation
Transforming relational data into boolean
attributes
patt1 patt2 patt3 patt4 ... patt47034 YAL001C 0
1 0 0 ... 1 YAL002W 0 1 1 0 ... 1 YAL003W 1 0 0 1
... 0 YAL004W 1 1 0 0 ... 1 YAL005C 0 0 0 0 ... 1
...
20
Dichotomic Search 1
  • As an alternative to the WARMR data-mining
    approach, we developed a frequent pattern finding
    method based on dichotomic search.
  • This approach uses domain-specific logics as
    intermediates between propositional logic and
    predicate logic.

21
Dichotomic Search 2
  • Most existing algorithms traverse the search
    space in either a top-down or a bottom-up
    fashion. We propose a new approach based on
    dichotomic search which explores the search space
    in both direction, allowing larger steps
  • Dichotomic search combines completeness (w.r.t.
    concepts), non-redundancy, and flexibility.
  • Ferre, S. King, R.D. (2005). Fundamenta
    Informaticae

22
Data Mining Prediction (DMP)
Entire database
Test data
1/3
2/3
PolyFARM
Data for rule creation
Validation data
1/3
2/3
Training data
All rules
Best rules
Rule gener- ation
Select best rules
Measure rule accuracy
C4.5
Results
23
C4.5
aa_ratio_pair_p_y
  • Open source decision tree algorithm
  • propositional learning
  • commonly used
  • produces interpretable rules
  • reliable
  • fast
  • accurate
  • Made modifications for
  • multiple labels
  • hierarchical labels
  • Clare King Bioinformatics 2002

gt0.232
lt0.232
metabolism
strand
w
c
transcription
aa_rat_a
gt6.4
lt6.4
transport
cell fate
24
Data Mining Prediction (DMP)
Entire database
Test data
1/3
2/3
PolyFARM
Data for rule creation
Validation data
1/3
2/3
Training data
All rules
Best rules
Rule gener- ation
Select best rules
Measure rule accuracy
C4.5
Results
25
Results
  • Many rules from each data type
  • Rules at each level of hierarchy
  • Some classes are much easier to predict than
    others (for example "protein synthesis" at
    71-93, "energy" at 20-47)
  • Good levels of accuracy on held out test data
  • Many predictions for ORFs of unknown function
    (some function at some level is predicted for 96
    of the ORFs of unknown function)
  • Some rules explainable by biology -gt scientific
    knowledge discovery
  • Clare King (2003) Bioinformatics suppl. 2.,
    42-49

26
Accuracy Table
27
Expression Data Rule
If in the micro-array experiment (sorbitol
incubation) the ORF expression is gt -0.25 and in
the micro-array experiment (nitrogen depletion)
the ORF expression is lt -1.29 and in the
micro-array experiment (YPD stationary phase) the
ORF expression is gt -1.06 then the function of
this ORF is pheromone response, mating type
determination, sex-specific proteins"
Accuracy on training data 11/12 (92) Accuracy
on the test data 3/4 (75) 21 predictions made
28
Structure Rule
  • 80 accurate on test data
  • Most matching ORFs belong to the Mitochondrial
    Carrier Family
  • These have 6 long transmembrane alpha-helices of
    about 20-30 amino acids
  • Why do we notice alpha-helices of length 10-14?

29
Alignment
YJL133W -------NEYNPLIHCLC----GSISGSTCAAITTPLDCIKT
VLQIRG------------ 251 YKR052C -------NSYNPLIHCLC-
---GGISGATCAALTTPLDCIKTVLQVRG------------
241 YIL006W ----NNTNSINLQRLIMA----SSVSKMIASAVTYPHE
ILRTRMQLKS------------ 310 YBR104W
----LTRNEIPPWKLCLF----GAFSGTMLWLTVYPLDVVKSIIQNDD--
---------- 271 YGR096W ----KTTAAHKKWELATLNHSAGTIGG
VIAKIITFPLETIRRRMQFMNSKHLEK------ 250 YJR095W
-----QMDVLPSWETSCI----GLISGAIGPFSNAPLDTIKTRLQKDK--
---------- 246 YKL120W -----LMKDGPALHLTAS-----TISG
LGVAVVMNPWDVILTRIYNQK------------ 261 YLR348C
-----FDASKNYTHLTAS-----LLAGLVATTVCSPADVMKTRIMNGS--
---------- 239 YMR166C ----DGRDGELSIPNEILT---GACAG
GLAGIITTPMDVVKTRVQTQQPPSQSNKSYSVT 300 YDL198C
------DYSQATWSQNFIS---SIVGACSSLIVSAPLDVIKTRIQNRN--
---------- 242 YGR257C ----RFASKDANWVHFINSFASGCISG
MIAAICTHPFDVGKTRWQISMMN---------- 302 YDL119C
FIHYNPEGGFTTYTSTTVNTTSAVLSASLATTVTAPFDTIKTRMQLEP--
---------- 255 YJL133W -SQTVSLEIMRKADTFSKAASAIYQV
YGWKGFWRGWKPRIVANMPATAISWTAYECAKHF 310 YKR052C
-SETVSIEIMKDANTFGRASRAILEVHGWKGFWRGLKPRIVANIPATAIS
WTAYECAKHF 300 YIL006W -DIPDSIQRR-----LFPLIKATYAQE
GLKGFYSGFTTNLVRTIPASAITLVSFEYFRNR 364 YBR104W
-LRKPKYKNS-----ISYVAKTIYAKEGIRAFFKGFGPTMVRSAPVNGAT
FLTFELVMRF 325 YGR096W FSRHSSVYGSYKGYGFARIGLQILKQE
GVSSLYRGILVALSKTIPTTFVSFWGYETAIHY 310 YJR095W
---SISLEKQSGMKKIITIGAQLLKEEGFRALYKGITPRVMRVAPGQAVT
FTVYEYVREH 303 YKL120W ----GDLYKG-----PIDCLVKTVRIE
GVTALYKGFAAQVFRIAPHTIMCLTFMEQTMKL 312 YLR348C
----GDHQP------ALKILADAVRKEGPSFMFRGWLPSFTRLGPFTMLI
FFAIEQLKKH 289 YMR166C HPHVTNGRPAALSNSISLSLRTVYQSE
GVLGFFSGVGPRFVWTSVQSSIMLLLYQMTLRG 360 YDL198C
---FDNPESG------LRIVKNTLKNEGVTAFFKGLTPKLLTTGPKLVFS
FALAQSLIPR 293 YGR257C ---NSDPKGGNRSRNMFKFLETIWRTE
GLAALYTGLAARVIKIRPSCAIMISSYEISKKV 359 YDL119C
----SKFTNS------FNTFTSIVKNENVLKLFSGLSMRLARKAFSAGIA
WGIYEELVKR 305
30
Alignment
YJL133W -------cccccaaaaaa----aaaaaaaaaaacccaaaaaa
aaaacc------------ 251 YKR052C -------cccccaaaaaa-
---aaaaaaaaaaacccaaaaaaaaaacc------------
241 YIL006W ----ccccccccaaaaaa----aaaaaaaaaaacccaa
aaaaaaaacc------------ 310 YBR104W
----ccccccccaaaaaa----aaaaaaaaaaacccaaaaaaaaaacc--
---------- 271 YGR096W ----cccccccccccccbaaaaaaaaa
aaaaaacccaaaaaaaaaacccccccc------ 250 YJR095W
-----cccccccaaaaaa----aaaaaaaaaaacccaaaaaaaaaccc--
---------- 246 YKL120W -----ccccccaaaaaaa-----aaaa
aaaaaacccaaaaaaaaaacc------------ 261 YLR348C
-----ccccccaaaaaaa-----aaaaaaaaaacccaaaaaaaaaacc--
---------- 239 YMR166C ----cccccccccaaaaaa---aaaaa
aaaaaacccaaaaaaaaaacccccccccccccc 300 YDL198C
------cccccccaaaaaa---aaaaaaaaaaacccaaaaaaaaaacc--
---------- 242 YGR257C ----ccccccccccccaaaaaaaaaaa
aaaaaacccaaaaaaaaaacccc---------- 302 YDL119C
ccccccccccccccaaaaaaaaaaaaaaaaaaacccaaaaaaaaaacc--
---------- 255 YJL133W -ccccccccccccccaaaaaaaaaaa
ccccaaaaccaaaaaaacaaaaaaaaaaaaaaaa 310 YKR052C
-ccccccccccccccaaaaaaaaaaacccaaaaaccaaaaaaaccaaaaa
aaaaaaaaaa 300 YIL006W -ccccccccc-----aaaaaaaaaaac
cccaaacccaaaaaaaccaaaaaaaaaaaaaaa 364 YBR104W
-ccccccccc-----aaaaaaaaaaacccaaaaaccaaaaaaaccaaaaa
aaaaaaaaaa 325 YGR096W cccccccccccccccaaaaaaaaaaac
ccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 310 YJR095W
---ccccccccccccaaaaaaaaaaacccaaaaaccaaaaaaaccaaaaa
aaaaaaaaaa 303 YKL120W ----cccccc-----aaaaaaaaaaac
ccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 312 YLR348C
----ccccc------aaaaaaaaaaacccaaaaaccaaaaaaaccaaaaa
aaaaaaaaaa 289 YMR166C cccccccccccccccaaaaaaaaaaac
ccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 360 YDL198C
---cccccca------aaaaaaaaaacccaaaaacccaaaaaaaaaaaaa
aaaaaaaaaa 293 YGR257C ---ccccccccccccaaaaaaaaaaac
ccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 359 YDL119C
----ccccca------aaaaaaaaaacccaaaaacccaaaaaaccaaaaa
aaaaaaaaaa 305
31
Homology rule
  • This rule is 100 accurate on test data
  • Almost all matching ORFs are from the 20S
    proteasome subunit for degradation of proteins
  • These subunits exist in archaea and eukaryotes,
    but only in one specific branch of bacteria
    (actinomycetes).

32
Homology rule
  • This rule is 100 accurate on test data
  • Almost all matching ORFs are from the 20S
    proteasome subunit for degradation of proteins
  • These subunits exist in archaea and eukaryotes,
    but only in one specific branch of bacteria
    (actinomycetes).

33
Application of DMP to Bacterial Genomes
  • Successful for both M. tuberculosis and E. coli.
  • Of the ORFs with no assigned function gt40 were
    predicted to have a function at one or more
    levels of the class hierarchy.
  • It was found that many of the predictive rules
    were more general than possible using sequence
    homology.
  • References
  • King et al. (2000) KDD 2000
  • King et al. (2000) Yeast (Comparative and
    Functional Genomics)
  • King et al. (2001) Bioinformatics

34
Example Rule (level 2 E. coli)
If the ORF is not predicted to have a b-strand of
length ? 3 ? a homologous protein from class
Chytridiomycetes was found Then its functional
class is Cell processes, Transport/binding
proteins 12/13 (86) correct on Test Set -
probability of this result occurring by chance is
estimated at 4x10-7. 24 ORFs of unknown
function are predicted by the rule.
16 ORFs now with putative or confirmed function -
93.8 accurate predictions
35
Experimental Conformation
  • The original bacterial ORF predictions were made
    over three years ago.
  • In the intervening time many more ORFs have been
    sequenced, making traditional homologous
    prediction methods more accurate and sensitive,
    and the function of some ORFs have been
    determined by wet biology.
  • The E. coli genome has been re-annotated by
    Monica Rileys group.

36
Wet Biology conformation
  • A number of predictions have been confirmed or
    falsified by new wet experimental data.
  • This new data is biased towards hard classes.
    Despite this the results are still good
  • Level 2 23 predictions - 47.8 accuracy
  • Level 3 23 predictions - 43.4 accuracy

This is very much better than random as there are
many classes.
37
Confirmation of Wet Predictions
38
Extension to Arabidopsis Genome
  • Collaborative project with the Institute of
    Grassland and Environmental Research and the
    University of Nottingham.
  • Large increase in data 6,000 (yeast) -gt 25,000
    ORFs.
  • Large amount of micro-array data from the
    Nottingham Arabidopsis stock centre.
  • The increase in data is a challenge to our
    machine learning algorithms, 100s MBs.
  • Clare, A., Karwath, A., Ougham, H. and
    King, RD (2006) Functional Bioinformatics for
    Arabidopsis thaliana. Bioinformatics 2006 22
    1130-1136

39
Results
  • Accuracy comparable to yeast and bacteria
  • Large fraction of genes of currently unknown
    function are predicted.
  • Some rules could be interpreted in terms of known
    biology
  • Clare, A., Karwath, A., Ougham, H. and King, RD
    (2006) Functional Bioinformatics for Arabidopsis
    thaliana. Bioinformatics 2006 22 1130-1136

40
Gibberellin Biosynthesis Prediction
  • Gibberellin is an important plant hormone.
  • Chosen because of interesting phenotypes often
    extreme size.
  • Insertion of a promoter to overproduce gene
    product.
  • Result
  • 2 days earlier flowering
  • Average leaf number and weight increased at 21
    days.
  • This phenotype is consistent with prediction.

41
(No Transcript)
42
Leaf number increases more rapidly in the mutant
(yellow bars) than in wildtype Landsberg erecta
(blue bars)
43
Paclobutrazol (P) (inhibitor of gibberllin)
abolishes the difference between mutant (M) and
wildtype (L)C control
44
Availability
All predictions available at http//www.genepredic
tions.org
All rules and data available at
http//www.aber.ac.uk/compsci/Research/bio/dss/
45
ILP 2005 Challenge 1
  • Yeast function prediction data used as a
    community challenge http//www.protein-logic.com/
  • The intention of the challenge was to provide a
    real-world data set to test of how far we have
    progressed in the field of ILP and
    multi-relational data mining. The questions we
    wanted to answer were Are the tools up to the
    job? Do they scale? Do they handle noisy, sparse
    and complex data?

46
ILP 2005 Challenge 2
  • A. J. Knobbe, E. K. Y. Ho, R. Malik ILP
    CHallenge 2005 The Safarii MRDM environment.
    C. Perlich Approaching the ILP 2005 challenge
    Class-Conditional Bayesian Propositionalization
    for Genetic Classification. J. Struyf, C. Vens,
    T. Croonenborghs, S. Dzeroski, H. Blockeel
    Applying Predictive Clustering Trees to the
    Inductive Logic Programming 2005 Challenge Data.
  • F. Riguzzi A Simple Approach to a Multi-Label
    Classification Problem.

47
Propositional Approach
  • Zafer Barutcuoglu, Robert E. Schapire and Olga G.
    Troyanskaya. Hierarchical multi-label prediction
    of gene function. Bioinformatics (in press)
  • Hierarchy of SVMs.
  • Uses a Bayesian net to combine predictions.

48
Conclusions
  • Data mining and machine learning are powerful
    tools for functional genomics.
  • The DMP method can be successfully applied to
    different genomes (bacterial, yeast, Arabidopsis)
    to predict gene functional class.
  • Micro-array data is a useful component in DMP.
  • Biological insight can be extracted from DMP
    rules.
  • The structure of gene prediction problems makes
    them an exciting test bed for machine learning
    methods.

49
Acknowledgements
  • Amanda Clare Aberystwyth
  • Andreas Karwath Freiburg (Aberystwyth)
  • Luc Dehaspe PharmaDM
  • Helen Ougham IGER
  • BBSRC

50
The Need for Logic to Represent Scientific
Knowledge
  • Logic is the best understood way to represent
    knowledge.
  • Traditional statistics, machine learning, and
    data mining are based on propositional logic.
  • For some problems we require a richer description
    language, i.e. first-order predicate calculus.
  • Using logic programming (predicate calculus) we
    can incorporate deduction, abduction, and
    induction.

51
Inductive Logic Programming
  • Inductive Logic Programming (ILP) uses logic
    programs (first-order predicate calculus) to
    learn with describe examples, theories, and
    background knowledge.
  • For certain types of problem ILP is a powerful
    data analysis technique - more accurate, and more
    comprehensible results than conventional methods.
  • Has been successfully applied to a number of
    biological/chemical problems.

52
ILP for Science
  • The key advantage of ILP for scientific
    applications is that it allows the application of
    compact relational representations that are
    natural for scientists to use. This allows
    domain understandable rules to be automatically
    formed.
  • This advantage comes at a computational cost.
    However, non-technical reasons are probably the
    greatest barrier to adoption of ILP. For
    example, it is very difficult to explain the
    benefits of ILP to domain experts.

53
Prediction of Lethality
  • Instead of using microarray-data to prediction
    the functional class of a gene we have been using
    the same approach to predict whether a gene
    knock-out will be lethal (grown in a rich medium).

If false the function of the ORF is cell
cycle and true the function of the ORF is rRNA
transcription and in the micro-array experiment
(cell cycle) the ORF expression is gt -0.79 then
the knockout is lethal.
Example Rule Test accuracy 82 (Default 21).
54
Summary Results
  • Using voting (2 or more rules agree on a
    prediction)
  • Level 2 128 ORFs predicted - 87.5 accuracy
  • Level 3 23 ORFs predicted - 91.3 accuracy
  • All predictions
  • Level 2 335 ORFs predicted - 64.5 accuracy
  • Level 3 204 ORFs predicted - 44.6 accuracy
Write a Comment
User Comments (0)
About PowerShow.com