Data Frames Version 3 Proposal - PowerPoint PPT Presentation

About This Presentation
Title:

Data Frames Version 3 Proposal

Description:

We only declare method signatures in data frames. Another language (e.g. Java) is used to ... Each value phrase may have an associated canonicalization method ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 10
Provided by: stephen69
Learn more at: https://www.deg.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: Data Frames Version 3 Proposal


1
Data Frames Version 3 Proposal
2
Data Frames Version 2
  • Year matches 2
  • constant extract "\d2"
  • context "(\\d)\d2,\dkK"
    0.5,
  • extract "\d2"
  • context "(\\d)\d2,\d"
    0.6,
  • extract "\d2"
  • context "\b'\d2\b" 0.8
  • end
  • Mileage matches 8
  • constant extract "\b1-9\d1,2k" 0.6,
  • extract "1-9\d?,\d3" 0.3
  • keyword "\bmiles\b", "\bmi\.", "\bmi\b"
  • end
  • Also except, substitute, filter phrases lexicons

3
Kimballs Ontology Editor
4
Internal Representation
  • Replace SQL field length with arbitrary type
    field
  • This is the internal representation
  • Type is either lexical or nonlexical
  • Type could be the name of an object set in the
    ontology
  • Or it could be the name of a type in whatever
    language will be used to implement methods (more
    on this later), together with a units name (e.g.
    miles, meters, grams, pounds)

5
Methods
  • Add a method phrase to data frames
  • Conceptually they are restricted derived object
    sets and relationship sets
  • We only declare method signatures in data frames
  • Another language (e.g. Java) is used to define
    the method body
  • Our tool will generate a template in which the
    programmer can write method bodies
  • The template will have OO structures that allow
    read-only access to the seamless model/data
    instance
  • Keyword phrases may also apply to methods

6
Canonicalization Methods
  • Each value phrase may have an associated
    canonicalization method
  • The purpose is to convert the extracted value
    string into a common form
  • The data frame may have a default
    canonicalization method that applies if there is
    no individual method for a value phrase

7
Inheritance
  • Inheritance is defined more cleanly
  • Generalization/specialization will indicate
    inheritance hierarchy
  • The internal representation cannot be overridden
    in specializations
  • Multiple parents must have the same internal
    representation
  • Individual inherited phrases can be deleted or
    overridden
  • New phrases can be added
  • In the case of name conflict, we require fully
    qualified names to be used (no automatic
    disambiguation)

8
General Constraints
  • We may decide to implement a limited form of
    general constraint in the ontology
  • E.g. Birth Date lt Death Date
  • Or Event Distance.toMiles() lt 26
  • If so, we may want to implement operator
    overloading (something like C)
  • The general constraint issue is not core to the
    current data frame discussion, but it has
    interesting ramifications

9
Other Issues
  • How to integrate methods and confidence values
    into record-assembly heuristics
  • Ontos system will have to be rewritten
  • Extract into model instance, not SQL tables
  • We can always generate database tables later if
    wed like
  • Ontologies created graphically and stored as XML
Write a Comment
User Comments (0)
About PowerShow.com