The Linguist - PowerPoint PPT Presentation

About This Presentation
Title:

The Linguist

Description:

Title: The Linguist s Search Engine Author: Jan Last modified by: Jan Created Date: 2/5/2004 5:37:45 AM Document presentation format: Bildschirmpr sentation – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 9
Provided by: Jan1285
Learn more at: http://web.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: The Linguist


1
The Linguists Search Engine
  • 02/04/2004

2
Background
  • Address http//lse.umiacs.umd.edu/
  • Developed at the University of Maryland by
    Resnik, Elkiss et al. in collaboration with
    Fellbaum (Princeton) and Olsen (Microsoft).
  • Accessible to a general audience since 20 January
    2004 (brand new!)
  • No fees or complicated registration process

3
Some Facts Built-in Corpus
  • Preprocessed corpus of about three million
    sentences taken from the Internet Archive
    www.archive.org
  • Automatically annotated in Penn Treebank style
    syntactic bracketing
  • Relies on computational linguistic tools (such as
    MXTERMINATOR, MXPOST, Charniaks stochastic
    parser, the Minipar Parser, Wordnet, etc.)

4
Searching the built-in corpus
  • Nice features
  • Query by example
  • Limited regular expressions support (e.g.
    disjunction, negation)
  • Wordnet relations are supported
  • Save queries for later reuse
  • Offensive content filter (for less embarrassing
    live demonstrations)
  • Problems
  • Only English is supported (without even once
    mentioning this fact anywhere in the
    documentation!)

5
Demo Simple Search
  • Simple search of the built-in corpus
  • Query by example
  • Search for of-genitive constructions
  • Query by hand
  • Search for s-genitives where the possessor is
    not a proper name (i.e. NNP / NNPS)
  • Searching for synonyms of fearsome
    fearsomea1/syns
  • GO TO THE LSE

6
Some Facts Customized Corpora
  • You can build your own collection of sentences
    and have them annotated
  • Uses AltaVista as a basis for web-wide search
    www.altavista.com (about 1.000.000 pages)
  • Extracts sentences from retrieved pages and
    annotates them
  • Job-based with fair scheduling procedures
  • Query syntax restricted to AltaVista queries plus
    expansion of inflectional forms

7
Demo Customized Collection
  • Demo search on a collection of sentences with the
    verb give
  • How to start a new collection
  • GO TO THE LSE

8
Further Information
  • LSE Starters Guide lse.umiacs.umd.edu/lse_guide.
    html
  • LSE Users Guide lse.umiacs.umd.edu/lseuser/lseus
    er.pdf
  • LSE Users Forum lse.umiacs.umd.edu/forum
  • AltaVista Documentation www.altavista.com/help/se
    arch/help_adv
  • Penn Tagset www.computing.dcu.ie/acahill/tagset.
    html
  • Still ugly but flexible alternative
    www.stanford.edu/jstrunk/
Write a Comment
User Comments (0)
About PowerShow.com