Please have a seat. Our program will commence shortly. - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Please have a seat. Our program will commence shortly.

Description:

Science produces massive amounts of data. Data needs to be ... Paul Ramirez. Chris Mattmann. Roshanak Roshandel. Sean Hardman. ALL SoCalBSI Colleagues ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 19
Provided by: ron1167
Category:

less

Transcript and Presenter's Notes

Title: Please have a seat. Our program will commence shortly.


1
Please have a seat. Our program will commence
shortly.
2
Biomarker Automated Retrieval Tool
K N
R C
  • Ronny Chan, Kim Ngo
  • Earth Science Data Systems Dept.

3
Bioinformatics Relationship
  • Science produces massive amounts of data
  • Data needs to be analyzed, stored, retrieved
  • ? This is data-mining
  • We want to apply computer science to improve this
    process

4
Motivation
  • Problems with conventional data mining
  • Time consuming
  • Accuracy not defined (subjective)
  • No objective scientific info retrieval tool

Where are the Biomarkers?
5
Cancer Biomarkers
An indicator of cancerous growth.
BIO
6
Proposed Solution
  • Create a program that allows people to quickly
    scan literature for the most relevant
    keywords/biomarkers

BAG-1
ERBB2
B.A.R.T.
HER-2
EP-CAM
HPEBP4
7
Significance
  • What is the need of the project?
  • More efficient research
  • Save time

B.A.R.T.
conventional
enhanced
8
Goals
  • Make biomarker/keyword searches more efficient
  • Learn Java
  • Learn SQL

9
Approach
  • Write a program
  • Read in articles
  • Use part of Vector Space Model algorithm to rank
    terms
  • Output relevant terms in statistical rankings

BRCA1
they
VS.
10
Vector Space Model
  • Information Retrieval System
  • Introduced by Gerald Salton in the 60s.
  • Used widely in different search engines

11
Algorithm for B.A.R.T.
Keywords Input
PubMed Query Agent
Keyword Parser
Content Analyzer
Content Ranker
Data Store
Data Retrieval and Output
12
Results
  • DCIS
  • CU-TP3982
  • ERBB2
  • HER-2
  • HPEBP4
  • BAG-1
  • EP-CAM
  • 99M

13
Lessons Difficulties
  • Deciding on algorithm choice
  • Ease of implementation and effectiveness
  • Limited knowledge experience
  • Java, SQL
  • Initial implementation is slow

5 ARTICLES 160 sec
20 ARTICLES 1904 sec
100 ARTICLES 838 years
UPDATE AUGUST 18, 2004 ? 100 ARTICLES 819
years
14
Future work
  • Apply different term weight functions to make
    results more robust
  • Optimize the program for speed

15
Citations
  • http//ir.iit.edu/dagr/cs529/files/handouts/03Vec
    torSpaceImplementation-6per.PDF
  • http//classes.engr.oregonstate.edu/eecs/spring200
    4/cs419/10
  • http//www.cs.ust.hk/dlee/Papers/ir/ieee-sw-rank.
    pdf
  • http//hartford.lti.cs.cmu.edu/classes/95-778/Lect
    ures/04-BooleanVectorSpaceB.pdf
  • Biomarkers Definitions Working Group.
  • Biomarkers and surrogate endoints preferred
    definitions and conceptual framework. Clin.
    Pharmacol. Ther. 69(3), 89-95 (2001).

16
Acknowledgements
National Science Foundation (NSF)
National Institute of Health (NIH)
Earth Science Data System, JPL Tina Xiao Paul
Ramirez Chris Mattmann Roshanak Roshandel Sean
Hardman
Southern California Bioinformatics Summer
Institute (So Cal BSI)
SoCalBSI Professors Jacqueline Heras
ALL SoCalBSI Colleagues
17
VSM Example
ID TERM DF IDF
1 the 3 0
2 stage 2 .176
3 level 1 .477
4 sighting 1 .477
5 cell 1 .477
6 malignant 1 .176
7 in 3 0
8 of 3 0
9 breast 1 .477
10 detection 2 .176
11 Cancer 2 .176
Q malignant breast cancer D 1 detection of
malignant level in the cell D 2 sighting of
breast stage in the breast cancer D 3 detection
of malignant stage in the cancer
doc the stage level sighting cell malignant in of breast detection cancer
D1 1(0) 0 1(.477) 0 1(.477) 1(.176) 1(0) 1(0) 0 1(.176) 0
D2 1(0) 1(.176) 0 1(.477) 0 0 1(0) 1(0) 2(.477) 0 1(.176)
D3 1(0) 1(.176) 0 0 0 1(.176) 1(0) 1(0) 0 1(.176) 1(.176)
Q 0 0 0 0 0 1(.176) 0 0 1 0 1(.176)
18
Example Continued
Keyword tf idf
Write a Comment
User Comments (0)
About PowerShow.com