Making Mashups with Marmite - PowerPoint PPT Presentation

About This Presentation
Title:

Making Mashups with Marmite

Description:

Lots of content out there on the web. But not always in a form amenable to your needs ... Friendster locations. Ex. Most popular videos on YouTube, Yahoo Video, ... – PowerPoint PPT presentation

Number of Views:261
Avg rating:3.0/5.0
Slides: 43
Provided by: jason203
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Making Mashups with Marmite


1
Making Mashups with Marmite
Jeff WongJason I. Hong Carnegie Mellon
University
2
The Big Picture Problem
  • Lots of content out there on the web
  • But not always in a form amenable to your needs
  • Ex. Easy to get a list of hotels in San Jose, not
    so easy to sort by distance to convention center
  • Two observations
  • In many cases, all of the data and services
    people need already exist, but not connected
    together
  • Unlikely that a web site can predict all possible
    needs

3
A Solution Mashups
  • Rapidly growing community of users creating
    mashups combining content from multiple web
    sites
  • Ex. Housingmaps.com

4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
A Solution Mashups
  • Rapidly growing community of users creating
    mashups combining content from multiple web
    sites
  • Ex. Housingmaps.com
  • Ex. MySpace child predators
  • Ex. Friendster locations
  • Ex. Most popular videos on YouTube, Yahoo Video,

8
A Solution Mashups
  • Rapidly growing community of users creating
    mashups combining content from multiple web
    sites
  • Ex. Housingmaps.com
  • Ex. MySpace child predators
  • Ex. Friendster locations
  • Ex. Most popular videos on YouTube, Yahoo Video,
  • ProgrammableWeb.com statistics
  • 1500 mashups created since April 2005
  • 356 open web-based APIs available

9
But Creating Mashups is Hard
  • Requires lots of skill to create a mashup
  • Ex. Housingmaps creator has PhD in computer
    science
  • Ex. MySpace child predator list took months
  • Requires programming expertise in many areas
  • Web crawling
  • Text parsing
  • Pattern matching
  • Databases
  • HTML

10
MarmiteEnd-User Programming for Mashups
  • Main idea make it easy to create web mashups
  • Use a dataflow approach connecting small
    operators
  • Inspired by Unix pipes and Apples Automator
  • Example
  • Get all events from Upcoming.org
  • Filter out events that are too old
  • Put them all onto a map
  • Runs inside of a standard web browser

11
Set of Operators
12
Data Flow View
13
Data View
14
Using Marmite (Envisioned)
  • Extract content from one or more web pages
  • names, addresses, dates, phone , URLs
  • Process it in a data flow manner
  • filtering out values or adding metadata
  • integrating with other data sources (similar to a
    database join operation)
  • Direct the output to a variety of sinks
  • databases, map services, text files,
    visualizations, web pages, or source code that
    can be further edited

15
Marmite
  • Motivation and Examples
  • Features and Design Rationale
  • User Evaluation

16
Features and Design Rationale
  • Conducted a series of quick evaluations to
    understand design space and potential problems
  • Automator
  • Lo-fi prototypes

17
Automator
18
Informal Automator Evaluation
  • Had three novices try three simple web-based
    tasks
  • Warm-up task
  • Traverse a set of web pages
  • Download a set of images
  • Some findings
  • Some difficulties knowing how to start and what
    to do next
  • Little feedback about state of system between
    operations
  • Difficult to iterate due to network speed issues

19
Lo-Fi Prototypes
  • 6 paper prototypes with 20 participants

20
Design Solutions
  • Problem how to start and what to do next
  • Solution Suggest next actions
  • Weak data typing to find types (addresses,
    numbers, etc)
  • Filter operators to only show relevant ones
  • Suggest operators that might be applicable

21
(No Transcript)
22
Design Solutions
  • Problem little feedback about state of system
    between operations
  • Solution link data flow and data view together
  • Many systems take program-centric view (ex.
    Automator) or data-centric view (ex.
    spreadsheets)
  • Use hybrid data flow / data view, showing an
    operation and its effects together
  • Data view usually spreadsheet, other views
    possible too (for example, maps)

23
(No Transcript)
24
(No Transcript)
25
Design Solutions
  • Problem difficult to iterate due to network
    speeds
  • Solution cache data, let people replay data
  • Reload, pause, play

26
Other Design Findings
  • Screen real estate issues
  • Collapsible operators, leaving a readable label

27
Extracting Generic Content
  • Cant have pre-defined extractor operators for
    every possible web site
  • Need a more general way of extracting data from
    pages
  • Developed a generic wizard UI for selecting links
  • Content from that set could be extracted via
    other operators
  • Uses Solvent (MIT), an XPath-based algorithm for
    finding patterns in web pages
  • Finds groups of related web content based on
    how HTML is structured

28
Marmite
29
Operators
  • Operators have input types
  • Operator uses this to guess which columns it
    wants
  • Operators have output types

30
Implementation
  • JavaScript (for underlying code) and Extensible
    Binding Language (XBL for UI)
  • Operators currently in JavaScript
  • Ideally could be scriptable in any programming
    language
  • Currently 15 operators

31
Marmite
  • Motivation and Examples
  • Features and Design Rationale
  • User Evaluation

32
Evaluation
  • Informal user study with 6 people
  • 2 novices
  • 2 people with spreadsheet experience (formulas)
  • 2 people with programming experience
  • Tasks (in increasing difficulty)
  • Warmup task showing how to retrieve a set of
    addresses and how to geocode an address
  • Search for and filter out events further than a
    week away
  • Compile a list of events from two event services
    and plot them on a map
  • Recreate the housingmaps site

33
Results
  • Three people able to complete all tasks in 1
    hour
  • First two users confused about suggested actions
    (automatically popped up, made manual for other 4
    users)
  • Novice made some progress, not able to finish all
    tasks
  • Able to re-create housingmaps in 15 minutes

34
Marmite
35
More Results
  • Biggest barrier was understanding the data flow
  • Did not understand input and output concept
  • Applied operators as one-off, did not realize
    that it was a static representation of flow
  • Did not understand data flow and data view were
    linked

36
Future Directions
  • Short-term
  • Better screen-scraping operators
  • More operators
  • Better connection with web services (WSDL and
    REST)
  • Better help for starting a data flow
  • Long-term
  • Intelligence analysis
  • Better visualizations
  • Location-based services

37
Conclusions
  • Marmite, a tool for creating web-based mashups
  • Extract content from one or more web pages
  • Process it in a data flow manner
  • Direct the output to a variety of sinks
  • Hybrid data flow / data view
  • User evaluation shows some promising results
  • Jeff Wong, Jason Hong, Making Mashups with
    Marmite Re-purposing Web Content through
    End-User Programming, CHI 2007

38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
Marmite
42
Types of Operators
  • Sources
  • Add data into Marmite by querying databases,
    extracting information from web pages, and so on.
  • Processors
  • modify, combine, or delete existing rows. Example
    operators include geocoding (converting street
    addresses to latitude and longitude) and
    filtering. Processor operators might add or
    remove columns as well
  • Sinks
  • redirect the flow the data out of Marmite.
    Examples include showing data on a map, saving it
    to a file, or to a web page.
Write a Comment
User Comments (0)
About PowerShow.com