Title: Please have a seat. Our program will commence shortly.
1Please have a seat. Our program will commence
shortly.
2Biomarker Automated Retrieval Tool
K N
R C
- Ronny Chan, Kim Ngo
- Earth Science Data Systems Dept.
3Bioinformatics Relationship
- Science produces massive amounts of data
- Data needs to be analyzed, stored, retrieved
- ? This is data-mining
- We want to apply computer science to improve this
process
4Motivation
- Problems with conventional data mining
- Time consuming
- Accuracy not defined (subjective)
- No objective scientific info retrieval tool
Where are the Biomarkers?
5Cancer Biomarkers
An indicator of cancerous growth.
BIO
6Proposed Solution
- Create a program that allows people to quickly
scan literature for the most relevant
keywords/biomarkers
BAG-1
ERBB2
B.A.R.T.
HER-2
EP-CAM
HPEBP4
7Significance
- What is the need of the project?
- More efficient research
- Save time
B.A.R.T.
conventional
enhanced
8Goals
- Make biomarker/keyword searches more efficient
- Learn Java
- Learn SQL
9Approach
- Write a program
- Read in articles
- Use part of Vector Space Model algorithm to rank
terms - Output relevant terms in statistical rankings
BRCA1
they
VS.
10Vector Space Model
- Information Retrieval System
- Introduced by Gerald Salton in the 60s.
- Used widely in different search engines
11Algorithm for B.A.R.T.
Keywords Input
PubMed Query Agent
Keyword Parser
Content Analyzer
Content Ranker
Data Store
Data Retrieval and Output
12Results
- DCIS
- CU-TP3982
- ERBB2
- HER-2
- HPEBP4
- BAG-1
- EP-CAM
- 99M
13Lessons Difficulties
- Deciding on algorithm choice
- Ease of implementation and effectiveness
- Limited knowledge experience
- Java, SQL
- Initial implementation is slow
5 ARTICLES 160 sec
20 ARTICLES 1904 sec
100 ARTICLES 838 years
UPDATE AUGUST 18, 2004 ? 100 ARTICLES 819
years
14Future work
- Apply different term weight functions to make
results more robust - Optimize the program for speed
15Citations
- http//ir.iit.edu/dagr/cs529/files/handouts/03Vec
torSpaceImplementation-6per.PDF - http//classes.engr.oregonstate.edu/eecs/spring200
4/cs419/10 - http//www.cs.ust.hk/dlee/Papers/ir/ieee-sw-rank.
pdf - http//hartford.lti.cs.cmu.edu/classes/95-778/Lect
ures/04-BooleanVectorSpaceB.pdf - Biomarkers Definitions Working Group.
- Biomarkers and surrogate endoints preferred
definitions and conceptual framework. Clin.
Pharmacol. Ther. 69(3), 89-95 (2001).
16Acknowledgements
National Science Foundation (NSF)
National Institute of Health (NIH)
Earth Science Data System, JPL Tina Xiao Paul
Ramirez Chris Mattmann Roshanak Roshandel Sean
Hardman
Southern California Bioinformatics Summer
Institute (So Cal BSI)
SoCalBSI Professors Jacqueline Heras
ALL SoCalBSI Colleagues
17VSM Example
ID TERM DF IDF
1 the 3 0
2 stage 2 .176
3 level 1 .477
4 sighting 1 .477
5 cell 1 .477
6 malignant 1 .176
7 in 3 0
8 of 3 0
9 breast 1 .477
10 detection 2 .176
11 Cancer 2 .176
Q malignant breast cancer D 1 detection of
malignant level in the cell D 2 sighting of
breast stage in the breast cancer D 3 detection
of malignant stage in the cancer
doc the stage level sighting cell malignant in of breast detection cancer
D1 1(0) 0 1(.477) 0 1(.477) 1(.176) 1(0) 1(0) 0 1(.176) 0
D2 1(0) 1(.176) 0 1(.477) 0 0 1(0) 1(0) 2(.477) 0 1(.176)
D3 1(0) 1(.176) 0 0 0 1(.176) 1(0) 1(0) 0 1(.176) 1(.176)
Q 0 0 0 0 0 1(.176) 0 0 1 0 1(.176)
18Example Continued
Keyword tf idf