A Data Model to Support End-User Software Engineering - PowerPoint PPT Presentation

Loading...

PPT – A Data Model to Support End-User Software Engineering PowerPoint presentation | free to download - id: 714187-YjMzM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

A Data Model to Support End-User Software Engineering

Description:

A Data Model to Support End-User Software Engineering Christopher Scaffidi Carnegie Mellon University Questions for the panel Some areas where I would appreciate ... – PowerPoint PPT presentation

Number of Views:11
Avg rating:3.0/5.0
Slides: 46
Provided by: Chri4235
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: A Data Model to Support End-User Software Engineering


1
A Data Model to Support End-User Software
Engineering
Christopher Scaffidi Carnegie Mellon University
2
Questions for the panel
  • Some areas where I would appreciate suggestions
  • What aspects of this work would be of most
    interest to the ICSE community (in future
    research papers)?
  • For any potential problems that you see in the
    work, what solutions can you suggest?

3
Target audience
  • In 2012, we project that there will be 90 million
    computer end users (EUs) in American
    workplaces.
  • Of these, at least half will create spreadsheets,
    databases, and/or web applications. These are
    called end-user programmers (EUPs).
    5
  • Both EUs and EUPs will benefit from the proposed
    research, though the proposed research is
    primarily aimed at EUPs (including EUs who become
    EUPs because of the research).

introduction ? prototype ? proposed work ?
evaluation
4
Contextual inquiryWhat are the problems of EUs
and EUPs?
  • Observed 3 administrative assistants, 4 managers,
    and 3 webmasters/graphic designers (1-3 hrs,
    each) 39

introduction ? prototype ? proposed work ?
evaluation
5
How can EUPs validate web formsif they do not
know JavaScript?
Is the input valid? EDSH 225 Is the input
nearly valid? EDXH 225 Does it just need
reformatting? Smith 225 Or is it obviously
badly invalid? Robotics Institute
introduction ? prototype ? proposed work ?
evaluation
6
Other tasks, other data, other problems
  • When building a staff roster by merging data
    sources into a single spreadsheet, one of the
    EUs
  • Had to manually transform data to consistent
    format(e.g. Put person names in Lastname,
    Firstname format)
  • Had to scrutinize data to identify questionable
    values that deserved double-checking(e.g. A
    first name with 15 characters might be right)
  • Had to manually check for (near-)
    duplicates(e.g. Scaffidi, Christopher and
    Scaffidi, Chris)
  • We and research collaborators identified many
    additional data validation and data reuse tasks
    that were poorly supported by existing tools.
    379

introduction ? prototype ? proposed work ?
evaluation
7
Underlying problem abstraction mismatch
  • Tools support strings, integers, floats,
    sometimes dates.
  • Problem domain involves higher-level categories
    of data
  • University names Carnegie Mellon, CMU
  • Person names Scaffidi, Christopher, Chris
    Scaffidi
  • CMU phone numbers 8-1234, x8-1234
  • CMU room numbers WeH 4623, Wean 4623
  • These data categories are
  • Human-readable
  • Short ( 1 input field)
  • Multi-format
  • Sometimes ambiguous / fuzzy (non-binary scale of
    validity)
  • Often particular to certain groups of people

introduction ? prototype ? proposed work ?
evaluation
8
A New Direction Create a new abstraction for
each category of data
  • Like software libraries, implementations of
    these abstractions could be reused in many
    programs.
  • Abstractions would need to include functionality
    for
  • Recognizing instances of the category
  • (for automating data validation)
  • Transforming instances among various formats
  • (for automating data reformatting)
  • Testing instances for equality
  • (for automating removal of duplicates)

introduction ? prototype ? proposed work ?
evaluation
9
A New Direction Other requirements for
abstractions
  • EUPs over a range of programming expertise must
    be able to create custom new abstractions.
  • Flexibility
  • Abstractions must capture fuzziness when
    recognizing instances of the category and when
    testing equivalence.
  • EUPs must have the option of configuring
    abstractions to learn exceptional cases.
  • Sharability
  • EUPs must still be able to share and find useful
    abstractions even as the number of abstractions
    grows.

introduction ? prototype ? proposed work ?
evaluation
10
Thesis
  • The proposed data model and development
    environment will enable end-user programmers to
    implement and share custom abstractions for
    flexibly recognizing, transforming and
    equivalence-testing values in categories of
    short, human-readable data.
  • The model and environment will help end-user
    programmers to more quickly and correctly
    validate and reuse data than is possible through
    currently practiced methods.

introduction ? prototype ? proposed work ?
evaluation
11
Topes
  • Tope an abstraction implementation for a data
    category
  • Greek word for place, because each corresponds
    to a data category with a natural place in the
    problem domain
  • Topes in practice
  • EUPs create new topes by using the basic tope
    editor (or by writing topes in another language,
    such as JavaScript)
  • EUPs publish topes on repositories.
  • Other EUs EUPs download topes to their local
    cache.
  • Tool plug-ins let EUs EUPs browse their local
    cache and associate topes with variables and
    input fields.
  • Plug-ins get topes from local cache and use them
    to recognize, transform, and equivalence-test
    data.

introduction ? prototype ? proposed work ?
evaluation
12
Related Work Existing approaches do not meet the
requirements.
  • Regexps / grammars / data detectors recognize
    data but do not specify how to transform data
  • Types
  • A value is or is not a valid instance of a type
    (non-fuzzy)
  • If invalid at compilation, values cannot become
    valid at runtime
  • Typed languages are probably difficult for EUPs
    who are uncomfortable with untyped scripting
    languages.
  • Research on units (e.g. Slate) and constraint
    systems (e.g. Cues) typically only apply to
    numeric data in certain applications (e.g.
    spreadsheets).
  • And none of these has built-in support for
    helping users decide which abstractions to trust,
    so sharing is impeded.

introduction ? prototype ? proposed work ?
evaluation
13
Outline
  • Introduction
  • Related work
  • Prototype
  • Proposed work
  • Evaluation

How could flexible formats be expressed?
introduction ? prototype ? proposed work ?
evaluation
14
Sample task web form validationThe painful old
way
  • Drag widgets and validator onto page, select a
    regexp, customize if desired.

introduction ? prototype ? proposed work ?
evaluation
15
Sample task web form validationResults of the
painful old way
  • Invalid inputs cause a hard-coded message to
    appear.
  • Oops, forgot to enter a message at
    design-time.
  • For valid inputs, no error message appears.
  • Hm, didnt realize the area code was optional.
  • What if I want to allow campus phone numbers?

introduction ? prototype ? proposed work ?
evaluation
16
Sample task web form validationThe wonderful
new way
  • Drag widgets and validator onto page, select a
    format, customize if desired.

introduction ? prototype ? proposed work ?
evaluation
17
Sample task web form validationCreating this
format took 55 seconds
introduction ? prototype ? proposed work ?
evaluation
18
Sample task web form validationResults of the
new way
  • Invalid inputs cause a targeted message to
    appear.
  • Inputs that violate an always or never constraint
    cannot be submitted to the server.
  • Inputs that violate an often constraint cause a
    warning, which the application user can
    override.

introduction ? prototype ? proposed work ?
evaluation
19
Prototype implementationSystem block diagram
Spreadsheet
Format editor
Parser
introduction ? prototype ? proposed work ?
evaluation
20
Expressiveness evaluation
  • Four administrative assistants use of a web
    browser was logged for three weeks, resulting in
    nearly 6000 sample data values that they typed
    into web forms.
  • Not logged verbatim characters were generalized
  • Eg Cscaffid0_at_gmail.com ? Aa70_at_a5.a3
  • We manually grouped values into 19 semantic
    families (eg email address) based on widgets
    HTML name and words visually nearby to the
    widgets
  • Created and tested formats for 14 families (4250
    values)
  • Omitted username/passwords and long blocks of
    text
  • Inference testing features were not used during
    format creation

introduction ? prototype ? proposed work ?
evaluation
21
Expressiveness evaluation results
  • 9 families needed 1 format each 5 needed 2
    formats each
  • The only error attributable to editor
    expressiveness
  • 1 of the 4250 test values had a trailing period
    on a street type (in an address line)
  • This particular version of the editor had no way
    to say that a part could contain a period but
    only at the end
  • After support for multiple formats is added, then
    the editor as a whole will be evaluated for
    usability.

6
introduction ? prototype ? proposed work ?
evaluation
22
Outline
  • Introduction
  • Related work
  • Prototype
  • Proposed work
  • Evaluation

Generalizing the prototype A lightweight data
model A development environment to help EUPs
create, share and use topes
introduction ? prototype ? proposed work ?
evaluation
23
Proposed data model
  • 1 tope implementation contains executable
    functions
  • 1 isastring?0,1 function per format, for
    recognizing instances of the format
  • 0 or 1 eqcstring x string?0,1 function per
    format, for testing equivalence of two values in
    a format(default is a binary test for being
    exactly identical)
  • 0 or more trfstring?string function linking
    formats, for transforming values form one format
    to another
  • A lightweight data model
  • Only contains 3 kinds of functions (isa/eqc/trf)
  • These correspond to the operations that people
    had to keep performing manually in our studies.

introduction ? prototype ? proposed work ?
evaluation
24
Example topeNotional representation
  • An example tope for CMU room numbers
  • 3 isa functions, up to 3 eqc functions, 4 trf
    functions
  • A topes eqc and trf functions can be omitted if
    desired

Formal building name room number Elliot Dunlap
Smith Hall 225
Building abbreviation room number EDSH 225
Colloquial building name room number Smith 225
introduction ? prototype ? proposed work ?
evaluation
25
Proposed development environmentFunctional
decomposition diagram
Development Environment
Basic Topes Editor
Repository Software
Plug-Ins
Publishing Tools
Search Tools
EUPs implement topes in basic topes editor (or
JavaScript), then publish in repositories. Other
EUs and EUPs search for topes, download them,
then use them through plug-ins.
introduction ? prototype ? proposed work ?
evaluation
26
Proposed development environmentEnhanced basic
topes editor
Development Environment
Basic Topes Editor
Repository Software
Plug-Ins
Publishing Tools
Search Tools
introduction ? prototype ? proposed work ?
evaluation
27
Proposed workEnhancing the basic topes editor
  • Extend isa support
  • Improve error message generation
  • Add trf support
  • EUPs will specify a series of steps
  • Select a part, select an operator
  • Operators permutation, lookup, arithmetic,
    capitalization
  • Add (regression) testing features to facilitate
    consistency
  • Add eqc support
  • For each part, EUPs will specify a comparison
    operator, returning value in 0,1, and these
    will be multiplied.
  • Operators exactly identical, case-insensitive
    comparison, arithmetic distance, edit distance

introduction ? prototype ? proposed work ?
evaluation
28
Proposed development environmentPublishing tools
Development Environment
Basic Topes Editor
Repository Software
Plug-Ins
Publishing Tools
Search Tools
introduction ? prototype ? proposed work ?
evaluation
29
Proposed WorkPublishing topes in repositories
  • Clients will have a list of known repository
    servers
  • Generally pre-configured to include a global
    server at CMU
  • Organizations will configure clients to include
    the organizational server
  • EUs and EUPs will be able to add new servers to
    their list
  • To support publishing/searching, the repository
    will house meta-information about topes,
    including
  • a human-visible non-unique name description
  • an internally-used globally unique id (guid)
    based on the topes URL in the repository

introduction ? prototype ? proposed work ?
evaluation
30
Proposed development environmentSearch tools
Development Environment
Basic Topes Editor
Repository Software
Plug-Ins
Publishing Tools
Search Tools
Normalization
introduction ? prototype ? proposed work ?
evaluation
31
Proposed workSearching for relevant topes
  • Search by keyword
  • Search tope name and description
  • And match based on words that are visually near
    to topes
  • Search by groups of people
  • Within an organization, or by authors email
    domain
  • Within spaces that are group-private
  • Search by groups of topes
  • If you liked this tope, you may also like XYZ
  • Similar to Amazon.coms product recommendations
  • Search by example
  • Find me a tope that recognizes 412-555-1212
  • For efficiency, filter based on signature
    (\d3-\d3-\d4)

introduction ? prototype ? proposed work ?
evaluation
32
Proposed workSearching for trustworthy topes
Evidence 8 EUs and EUPs may trust topes Search features
Explicit formal roles Created by their organizations system administrators. Search by tope author
Prior performance From people who have previously supplied good topes. Search by tope author
Model of motivation From vendors that care about brand image. Search by tope author
Group membership From people who are known to have a similar background. Search by tope author
Reputation That earned anonymous votes of confidence. Search by tope ratings (either anonymous or not)
References That present a list of high-profile people who like the topes. Search by tope ratings (either anonymous or not)
Certification That are inspected and certified by a third party. Search by tope ratings (either anonymous or not)
Social context That are actively maintainedthat is, for which improved versions are regularly available. That are implemented in a familiar language/platform. Search by tope publication date and execution platform
introduction ? prototype ? proposed work ?
evaluation
33
Proposed development environmentEnhanced plug-ins
Development Environment
Basic Topes Editor
Repository Software
Plug-Ins
Publishing Tools
Search Tools
introduction ? prototype ? proposed work ?
evaluation
34
Proposed workEnhancing plug-ins
  • Target tools
  • Microsoft Excel
  • Microsoft Visual Studio.NET
  • Robofox
  • Operations supported
  • Assertions run isa on selected cells
  • Transformation run trf on selected cells
  • De-duplication run eqc on selected cells,
    cluster the cells
  • Each will support basic editor topes JavaScript
    topes

introduction ? prototype ? proposed work ?
evaluation
35
Proposed workRecognizing exceptions in plug-ins
  • Tope creators might overlook values.
  • From the standpoint of a tope format, these
    normal values are exceptional cases that need
    to be tolerated.
  • Simple approach Record a whitelist of exceptions
  • More sophisticated For each format, record
    exceptions, infer a format (new isa function),
    and average this functions score with the raw
    functions score
  • Exceptional values can be incorporated into the
    tope in the local cache and/or, at EUPs
    discretion, propagated to the repository of the
    topes master copy

introduction ? prototype ? proposed work ?
evaluation
36
Outline
  • Introduction
  • Related work
  • Prototype
  • Proposed work
  • Evaluation

Examples Experiments Field testing
introduction ? prototype ? proposed work ?
evaluation
37
Evaluation
  • Expressiveness Identify test tasks based on
    previous studies create topes for data involved
    in those tasks
  • Creation of topes by EUPs Controlled experiment
    in which students staff create topes
  • Usefulness for tasks Controlled experiment in
    which students staff use topes to perform the
    test tasks
  • Flexibility of topes Test the topes created by
    participants on test data drawn from EUSES
    spreadsheet corpus
  • Sharability of topes Field testing in which
    several dozen students staff will install and
    use the environment

introduction ? prototype ? proposed work ?
evaluation
38
Referenced papers
  • Conference papers
  • 1 C. Scaffidi. Unsupervised Inference of Data
    Formats in Human-Readable Notation. Proceedings
    of 9th International Conference on Enterprise
    Integration Systems (ICEIS'07), 2007, to appear.
  • 2 C. Scaffidi, K. Bierhoff, E. Chang, M.
    Felker, H. Ng, C. Jin. Red Opal Product-Feature
    Scoring from Reviews. Proceedings of 8th ACM
    Conference on Electronic Commerce (ACMEC'07),
    2007, to appear
  • 3 C. Scaffidi, A. Cypher, S. Elbaum, A.
    Koesnandar, and B. Myers. Scenario-Based
    Requirements for Web Macro Tools. Submitted for
    publication, 2007.
  • 4 C. Scaffidi, A. Ko, B. Myers, M. Shaw.
    Dimensions Characterizing Programming Feature
    Usage by Information Workers. VL/HCC'06
    Proceedings of the 2006 IEEE Symposium on Visual
    Languages and Human-Centric Computing, pp. 59-62,
    2006.
  • 5 C. Scaffidi, M. Shaw, and B. Myers.
    Estimating the Numbers of End Users and End User
    Programmers. VL/HCC'05 Proceedings of the 2005
    IEEE Symposium on Visual Languages and
    Human-Centric Computing, pp. 207-214, 2005.
  • Other papers
  • 6 C. Scaffidi, B. Myers, M. Shaw. The Topes
    Format Editor and Parser, Technical Report
    CMU-ISRI-07-104, School of Computer Science,
    Carnegie Mellon University, Pittsburgh, PA, May
    2007.
  • 7 C. Scaffidi, B. Myers, and M. Shaw. Trial By
    Water Creating Hurricane Katrina "Person
    Locator" Web Sites. In Leadership at a Distance
    Research in Technologically-Supported Work (S.
    Weisband, ed), Lawrence Erlbaum, pp. 209-222,
    2007.
  • 8 C. Scaffidi, M. Shaw. Toward a Calculus of
    Confidence. First International Workshop on the
    Economics of Software and Computation, co-located
    with ICSE'07, 2007, to appear.
  • 9 C. Scaffidi, M. Shaw, B. Myers. Games
    Programs Play Obstacles to Data Reuse, 2nd
    Workshop on End User Software Engineering
    (WEUSE), 2006.

introduction ? prototype ? proposed work ?
evaluation
39
Thank You
  • to the symposium committee/panel for the
    opportunity to present
  • to many people for helpful suggestions
  • to NSF and EUSES for funding (ITR-0325273 and
    CCF-0438929)

Marwan Abi-Antoun Margaret Burnett Martin Erwig Andy Ko Mary Beth Rosson
Robin Abraham Owen Cheng George Fairbanks Thomas LaToza Mary Shaw
Matt Bass Ciera Christopher Thomas Green Alon Lavie Jeff Stylos
Nels Beckman Michael Coblenz Josh Gross Henry Lieberman Dean Sutherland
Kevin Bierhoff Allen Cypher Greg Hartman Larry Maccherone Steve Tanimoto
Alan Blackwell Uri Dekel Jim Herbsleb Brad Myers Susan Wiedenbeck
Barry Boehm Sebastian Elbaum John Hosking John Pane
introduction ? prototype ? proposed work ?
evaluation
40
Questions for the panel
  • Some areas where I would appreciate suggestions
  • What aspects of this work would be of most
    interest to the ICSE community (in future
    research papers)?
  • For any potential problems that you see in the
    work, what solutions can you suggest?

introduction ? prototype ? proposed work ?
evaluation
41
  • This slide intentionally left blank.

42
Survey of EUPsBetter data-manipulation features
needed
  • Asked 831 information workers about use of 23
    features in 5 tools (eg creating spreadsheet
    macros, database stored procedures, and web
    forms) 49
  • The most widely used features were related to
    manipulating linked structures of data (eg
    database tables) rather than imperative or macro
    programming
  • Yet respondents complained about these features
  • Not always easy to move sturctured sic data or
    text
  • Not always integrated a lot of data manipulation
    redundant
  • Information entered inconsistently into database
    fields by different people leaves a lot of
    database cleaning

introduction ? prototype ? proposed work ?
evaluation
43
Interviews of web site creatorsConfirmation of
specific problems
  • Interviewed 6 people involved in creating person
    locator web sites after Hurricane Katrina
    79
  • Many omitted data validation on web forms
  • Hard to detect that 12 Years old is an invalid
    street address (what would the regexp look like?)
  • Aggregator sites were built to scrape and
    consolidate data from numerous person locator
    sites.
  • Hard to transform data into a single consistent
    format
  • Hard to identify probable duplicates in the
    merged data set

introduction ? prototype ? proposed work ?
evaluation
44
Sample task validating person namesCustomizing
constraints in our prototype
  • User can add/edit constraints

introduction ? prototype ? proposed work ?
evaluation
45
Benefits of the format editor
  • Exotic regexp notation is replaced with
    sentence-like screen prompts.
  • Soft constraints (often) are supported.
  • Negation constraints (never) are supported.
  • In terms of expressiveness,
  • Augmented context-free grammars
  • gt context-free grammars gt regexps
  • But is the expressiveness adequate for common
    data?

introduction ? prototype ? proposed work ?
evaluation
About PowerShow.com