Title: A Systematic approach to the LargeScale Analysis of GenotypePhenotype correlations
1A Systematic approach to the Large-Scale Analysis
of Genotype-Phenotype correlations
Supervisee Paul Fisher Supervisors Dr. Robert
Stevens Prof. Andy Brass Advisor Dr. Steve
Pettifer
2PhD Hypothesis
- Utilising the capabilities of workflows and a
pathway-driven approach, we are able to provide a
more - - systematic
- - efficient
- - scalable
- - un-biased
- - unambiguous methodology
-
- the benefit will be that new biology results
will be derived, increasing community knowledge
of genotype and phenotype interactions.
3Summary of 1st / (early) 2nd Year
- Identified issues with current methods
- Developed own methodologies
- Constructed methods using workflows
- Tested methods through use cases
- Obtained Biological results
- Published results in Biology and Bioinformatics
journals
4QTL mapping study
Microarray gene expression study
Statistical analysis
Identify genes in QTL regions
Identify differentially expressed genes
Genomic Resource
Annotate genes with biological pathways
Annotate genes with biological pathways
Pathway Resource
Select common biological pathways
Hypothesis generation and verification
Wet Lab
Literature
5Scientific Literature A Knowledge resource
Pathway information
Literature database
Implicit interactions
Phenotype information
6What Does the Text Hold?
Protein Info
Related Proteins
Protein-Protein Interactions
Pathways
Biological processes
7What Next ?
Biological processes
Generate a Profile for Pathway / Phenotype
Apoptosis Cell Death Stress response ..
MeSH vocabulary
8Two Profiles Phenotype and Pathway
Find common terms
- Phenotype Terms
- Apoptosis
- Cholesterol
- Diabetes
- Jak-Stat
- Ribosome
- Cell Adhesion Molecules
- Pathway terms
- Apoptosis
- Cholesterol
- Diabetes
- Cell Death
- JNK pathway
High chance pathway is linked to phenotype
9And So
Which pathways do I investigate in the Wet Lab?
High Priority
Prioritise those that have links in the literature
Phenotype Terms
Pathway Terms
Lower Priority Pathways
10The Prototype Workflows
Find common terms
Get terms from abstracts
Get abstracts for pathways / phenotype
11Future work (3rd Year)
- Provide a corpus of specific workflow fragments
- Text mining
- Quality Assurance (adding a level of
significance to returned results) - Evaluate text mining workflows using test cases
including - Analysis of data obtained from Trypanosomiasis
project - Analysis of data for studying the infection of
Mice by Trichuris muris - Publish findings
- Biology journals for biological results
- Bioinformatics journals for text mining workflow
methods - Publish workflows within the myExperiment
project for sharing and re-use - Compile thesis on findings over 3 years
12(No Transcript)
13Questions
14Genotype to Phenotype
15Current Methods
Genotype
Phenotype
200
?
What processes to investigate?
16Phenotype
Pathway A
CHR
literature
Pathway linked to phenotype high priority
QTL
Gene A
Pathway B
Gene B
literature
Pathway not linked to phenotype medium priority
Gene C
Pathway C
literature
Genotype
Pathway not linked to QTL low priority
17Issues with current approaches
- Scale of analysis task is huge
- User bias and premature filtering
- Hypothesis-Driven approach to data analysis
- Constant flux of data - problems with
re-analysis of data - Implicit methodologies (hyper-linking through web
pages) - Error proliferation from any of the listed issues
- Solution Automate (through workflows)
18Phenotype
Genotype
200
?
Metabolic pathways
Phenotypic response investigated using microarray
in form of expressed genes or evidence provided
through QTL mapping
Genes captured in microarray experiment and
present in QTL (Quantitative Trait Loci ) region
Microarray QTL
19QTL mapping study
Microarray gene expression study
Statistical analysis
Identify genes in QTL regions
Identify differentially expressed genes
Genomic Resource
Annotate genes with biological pathways
Annotate genes with biological pathways
Pathway Resource
Workflow methods
Select common biological pathways
Hypothesis generation and verification
Wet Lab
Literature
Manual methods
20Evaluation - Results
- Identified many issues with current investigative
techniques - A strong candidate gene was found for
Trypanosomiasis resistance - Daxx gene not found using manual investigation
methods - The gene was identified from analysis of
biological pathway information - Sequencing of the Daxx gene in Wet Lab showed
mutations that believed to change the structure
of the protein - Mutation was published in scientific literature,
noting its effect on the binding of Daxx protein
to another protein other protein controls one
of the phenotypes of Trypanosomiasis resistance - Sex dependant biological pathways identified in
Trichuris muris infection - 2 year study of candidate genes did not identify
these pathways directly - Sex dependence identified as a possible factor in
expulsion of parasite - Current studies still ongoing in Life Science
department - FOUND NEW BIOLOGY !!!!
21Taverna Workflow Workbench
http//taverna.sf.net
22Ideally What Next?
Protein Info
Protein-Protein Interactions
Fill in the blanks and verify hypotheses
Biological processes
23Link Phenotype to Pathways
Find common terms
- Phenotype Terms
- apoptosis
- Cholesterol
- Diabetes
- Jak-Stat
- Ribosome
- Cell Adhesion Molecules
apoptosis Cell Death JNK pathway
Cholesterol Cell Death JNK pathway
Simple means of linking pathways
- apoptosis
- Cholesterol
- Diabetes
- Cell Death
- JNK pathway
- Another pathway
24Publication of Results
- Published methods and biological findings
- Nucleic Acids Research
- Presented Poster at ISMB Vienna (2007)
- Abstract for Poster to be published in BMC
Bioinformatics Journal - Assisted in Writing Book chapter on Building
workflows that traverse the bioinformatics data
landscape, To appear in Data Mining Techniques
in Grid Computing Environments