Processing of large document collections - PowerPoint PPT Presentation

About This Presentation
Title:

Processing of large document collections

Description:

Processing of large document collections. Part 12 (Closing of the course) ... and/or exact answer (~one slot value of a template? ... processing based on ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 11
Provided by: haho5
Category:

less

Transcript and Presenter's Notes

Title: Processing of large document collections


1
Processing of large document collections
  • Part 12 (Closing of the course)
  • Helena Ahonen-Myka
  • Spring 2005

2
QA vs IE
  • open domain, closed domain?
  • questions, task definitions
  • task definition
  • IE static
  • QA question defines the task dynamically
  • answer
  • IE structured template
  • QA text snippet and/or exact answer (one slot
    value of a template?)
  • similar components can be used
  • language analysis, semantics (WordNet, word
    lists,), pattern matching

3
Mapping to the information retrieval process
information need
documents
query
document representations
matching
result
query reformulation
4
TC vs TS vs IE vs QA?
  • text categorization (TC)
  • text summarization (TS)
  • information extraction (IE)
  • question answering (QA)

5
General issues
  • performance requirements
  • building vs. use
  • offline vs. online processing
  • effectiveness vs. efficiency
  • observed performance in some user task
  • evaluation
  • simplified research settings vs. real-life
    environments

6
General issues
  • portability, scalability
  • amount and type of manual processing
  • domain/language dependency
  • are some components available off-the-shelf?
  • but do not use heavy processing for simple tasks
  • e.g. linguistic analysis vs. pattern matching
  • static vs. dynamic system/component?

7
General issues
  • What is our goal?
  • automatic text undestanding?
  • and automatic processing based on that?
  • probably tools for specific tasks for specific
    users are more reasonable than very generic,
    open-domain tools

8
Learning goals
  • learn to recognize components of
    applications/processes
  • learn to recognize which (kind of) methods could
    be used in each component
  • learn to implement some methods
  • (meta)learn to control learning processes (What
    do I know? What should I know to solve this
    problem?)

9
Exam exercise points
  • exam Thu 12.5. at 16-20, A111
  • points exam 50 pts, exercises 10 pts
  • required 30 pts ( 1-)
  • exercise points
  • 1 exercise -gt 1 point
  • 2 -gt 2
  • 3 -gt 3
  • 5 -gt 4
  • 7 -gt 5
  • 9 -gt 6
  • 11 -gt 7
  • 13 -gt 8
  • 15 -gt 9
  • 16 -gt 10

10
Kiitos!!!
  • ja hyvää kesää!!
Write a Comment
User Comments (0)
About PowerShow.com