Logic Programming for Natural Language Processing - PowerPoint PPT Presentation

About This Presentation
Title:

Logic Programming for Natural Language Processing

Description:

An IE application which will produced structured output from a corpus of free, ... Documents: biographies. Why NLP? Language is the cornerstone of intelligence ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 17
Provided by: tjh5
Learn more at: https://www.tjhsst.edu
Category:

less

Transcript and Presenter's Notes

Title: Logic Programming for Natural Language Processing


1
Logic Programming for Natural Language Processing
  • Menyoung Lee
  • TJHSST Computer Systems Lab
  • Mentor Matt Parker
  • Analytic Services, Inc.

2
Purpose
  • To link together
  • Recent developments in natural language
    processing (NLP) Information Extraction (IE)
  • Classical logic programming Prolog
  • New Paradigm bifurcated process
  • An IE application which will produced structured
    output from a corpus of free, unstructured text.
  • Transformation of extracted information into a
    Prolog knowledge-base (sets of fact-triples)
  • Documents biographies

3
Why NLP?
  • Language is the cornerstone of intelligence
  • The Turing Test the ability to converse like man
  • Understanding and generating texts in a natural
    language, e.g. English
  • Many specific NLP tasks
  • Chatterbots, e.g. Eliza
  • Machine Translation
  • Information Retrieval (IR), e.g. Google
  • Information Extraction!!
  • SciFi Dreams universal translation, computers
    you can talk to, etc.

4
Information Extraction (IE)
  • Most generally, the transformation of
  • Information contained in free, unstructured text
    in a natural language into
  • A prescribed, structured format.
  • More specifically, the identification of
  • Instances of certain object classes
  • Their attributes
  • Relationships between object instances
  • Always restricted into a particular domain
  • In order to have a reasonably sized and
    sufficiently expressive ontology

5
Why IE?
  • An Expert must read many documents
  • Advent of the Internet Information Age
  • Explosion of the sheer volume of textual
    information, readily available in electronic form
  • New opportunity lots and lots of available
    information to exploit
  • Formidable challenge impossible for an expert to
    read and analyze that much text.
  • A pragmatic approach
  • Full text understanding is out of reach
  • Automate just some of the tasks, i.e. the
    identification of objects, attributes, and
    relations

6
IE - Details
  • Five Tasks in IE
  • Named Entity Recognition (NE)
  • Coreference Resolution (CO)
  • Template Element Construction (TE)
  • Template Relation Construction (TR)
  • Scenario Template Production (ST)
  • Metrics for Evaluation
  • Precision
  • Recall
  • F-measure (borrowed from IR)
  • More intuitive reformulation

7
Annotations
  • Annotations identify objects in text
  • Annotation graph a directed, acyclic graph (DAG)
  • Nodes
  • position in the text
  • Edges
  • The literal text
  • Annotations

8
Frames
  • Frame representation of an object, consisting of
    slots, which contain values
  • Typical Prolog fact Frame(Slot, Value).
  • We propose to synthesize it with the idea of
    annotations Doc(Annot, Text).
  • Main idea represent the document directly as an
    object compromise between text and knowledge
  • Several Advantages
  • A corpus of multiple related documents
  • Direct link between information and its source
  • Opens the door for the application of Prolog's
    logic.

9
Design
  • The IE application
  • Input corpus of free, unstructured text
  • Output the annotated documents, represented as
    annotation graphs
  • How use GATE (language JAPE)
  • The Prolog application
  • Input the annotated document
  • Output a frame, i.e. a set of Prolog facts.
  • How use XSB (language Prolog)

10
General Architecture for Text Engineering (GATE)
  • A comprehensive architecture for development of
    NLP applications
  • Documents treated as an annotation graph
  • Java Annotation Patterns Engine
  • Its own language for writing grammars that
    identify instances of object classes to annotate
  • A Nearly New Information Extraction (ANNIE)
    system
  • An already implemented rudimentary IE system,
    that can be extended through addition of
  • JAPE grammars for annotating
  • Machine-learning models for annotating

11
GATE
12
Procedures
  • Obtain the corpus Python script
  • Write the Jape grammars
  • annotations 'Mathematician', 'Father'.
  • Train a model
  • annotation 'Protagonist'
  • Write the Prolog application to
  • Parse GATE's XML output into a structure
  • Construct the annotation graph from it
  • Process the annotations into a document frame
  • Output the document frame
  • Test by posing queries

13
IE Result Fermat.html
  • Precision 1. (why so high?)
  • use of a gazetteer list
  • aggressive pruning by context
  • Recall 0.9474
  • paid for aggressive pruning, missed some
  • F-measure (ß 2)
  • 0.973

14
Prolog Result
  • Correctly constructs facts.
  • Sample session
  • ?- 'Galois.html.xml'('Mathematician', X).
  • X Abel
  • X Cauchy
  • X Evariste Galois
  • X Fourier
  • X Galois
  • X Gauss
  • X Gergonne
  • X Jacobi
  • X Lagrange
  • X Legendre
  • X Libri
  • X Liouville
  • X Poisson
  • X Vernier

15
Results
  • The Prolog layer is universal, cross-domain
  • The IE application may produce any annotation,
    not restricted to one subject area
  • Bifurcation success
  • Opens door to logic and rules, esp. for
    cross-document relations
  • ?- 'Galois.html.xml'('Mathematician', X),
    'Cauchy.html.xml'('Protagonist', X).
  • X Cauchy
  • no

16
Conclusion
  • With the recent advancements in computing power,
    logic programming is finally feasible for
    practical use
  • To run my Prolog application, ran it on the
    server robustus, giving it 2 GB of memory
  • However, computing power continues to be a
    limitation (GATE crashed every day)
  • Where do we go from here?
  • More expressive document frame
  • Context analysis (through proximity, etc)
  • Better IE applications through statistical
    processing
Write a Comment
User Comments (0)
About PowerShow.com