Information Extraction I: Kissler/Marais Web Language PowerPoint PPT Presentation

presentation player overlay
1 / 7
About This Presentation
Transcript and Presenter's Notes

Title: Information Extraction I: Kissler/Marais Web Language


1
Information Extraction IKissler/Marais Web
Language
2
Information extraction applications
  • Find useful information
  • Extract it into form that can be processed
  • Process it
  • Present it back

3
A model of info-extraction applications
Robustness is key criterion
Tricky part. Theoretically, this will be
obviated by Semantic Web and Web Services
Not necc. Web presentation
From Kistler/Marais WWW7
4
Example applications
  • Shopping robots
  • Personalized news
  • Financial applications
  • Use free data on Web
  • Intra/extranets
  • Manufacturing info
  • Project info
  • Meta-search engines
  • Convert Latex2HTML-generated pages into printable
    form

5
Marais/Kistler Web Language
  • Language for writing Web info extraction
    applications
  • Like Perl LWP, but specialized
  • Good for O(10K)-page applications
  • Manual/semi-automatic resource discovery
  • Manual (heuristics) for extraction

6
Challenges of info-extraction applications
  • Web is unreliable
  • Internet failures
  • Site failures
  • Resource-discovery problem
  • Where are pages with interesting data?
  • Pages are unstructured
  • Difficult to reliably extract information
  • Pages change frequently

7
Rest of todays lecture
  • From Marais SRI talk (slide 12)
Write a Comment
User Comments (0)
About PowerShow.com