Autocompletion for Mashups - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Autocompletion for Mashups

Description:

Problems with the algorithm. The number of lists the algorithm accesses is very large ... Distributed environment. Incorporating context and user preference ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 28
Provided by: ohadgre
Category:

less

Transcript and Presenter's Notes

Title: Autocompletion for Mashups


1
Autocompletion for Mashups
  • Ohad Greenshpan, Tova Milo, Neoklis Polyzotis

Tel-Aviv University UCSC
2
Talk Roadmap
  • Introduction on Mashups and Autocompletion
  • Problem Definition
  • The Algorithm
  • Implementation experiments
  • Conclusions Related Work

3
Introduction - What is a mashup ?
  • Mashup is a technology for integration of data,
    services
  • and applications being available on the web, into
    a single application.

4
Application Integration
GUI
Logic
Data
Data
Data
Data
5
Mashup Development is difficult ...
6
knowledge
?
knowledge
7
Introduction - Mashup Autocompletion
8
The Mashup Model
9
Inheritance
B
B
A
A
10
Mashup Autocompletion Problem Definition
  • Given a database of mashlets and GPs and a set
  • of mashlets selected by the user, identify and
    rank GPs that link a
  • subset of the selected mashlets.
  • Based on Popularity Relevance to user
    query

What would be the ideal GP
  • The most popular one that connects only the user
    mashlets and nothing else
  • Relaxations
  • Less popular
  • Connects variants of the user mashlets
  • Connects a subset of the user mashlets
  • Connects additional mashlets

11
Inheritance
12
Problem Abstraction
  • Each glue pattern is represented as a point in a
    multidimensional space.
  • One dimension representing the GP popularity
  • The rest All mashlets
  • 1) User Mashlets
  • 2) Other mashlets
  • The algorithm goal is to find the top-k GPs that
    link the given user mashlets (the ones close to
    the optimal GP).

GP Popularity
m2
m1
13
Data Structure Basic Top-k Algorithm
GP Popularity
Mashlets
L1
gtgp,scorelt
gtg7,0.1lt
gtg4,0.2lt
gtg6,0.2lt
gtg1,0.3lt
gtg5,0.4lt
gtg2,0.5lt
gtg3,0.7lt
L2
gtgp,scorelt
gtg4,0.1lt
gtg3,0.2lt
gtg1,0.5lt
gtg2,0.5lt
gtg7,0.5lt
gtg5,0.8lt
gtg6,0.8lt
L0
gtgp,scorelt
gtg1,0.1lt
gtg2,0.2lt
gtg3,0.4lt
gtg4,0.4lt
gtg5,0.4lt
gtg6,0.4lt
gtg7,0.4lt
L3
gtgp,scorelt
gtg1,0.1lt
gtg2,0.6lt
gtg7,0.6lt
gtg6,0.7lt
gtg4,0.8lt
gtg5,0.8lt
gtg3,0.9lt
Glue Patterns
14
Problems with the algorithm
  • The number of lists the algorithm accesses is
    very large
  • Most of the mashlet lists are unrelated to the
    user selection (query)

15
Data Structure
Mashlets
GP Popularity
User mashlets














































































































































































































































































Glue Patterns
16
Algorithm
17
Correctness of AC - Lemma
  • Theorem 4.1 Algorithm AC returns a correct
    solution
  • Proof is based on a lemma showing that any
    candidate that has not been encountered by AC,
    has a total score lower than the threshold.

Optimality of AC
  • Competing Algorithms
  • C class of deterministic algorithms that
    operate under the same access model as AC.
  • Algorithms receive as input the lists, the
    monotonic function, and k.
  • Algorithms can use any order (i.e., not
    specifically round-robin) and any thresholding
    scheme, and can rely on accessed elements.
  • Instance Optimality
  • AC is instance optimal within class C if there
    are constants c and c0 such that for every input
    instance I, cost(AC,I) ccost(A,I)c0 for any
    A?C.

18
Calculating Popularity Glue
Pattern and Mashlets Rank
  • Page-rank style algorithm
  • Takes into account popularity of mashlets and
    GPs, as well as relationship between them.

GP
GP
GP
M
M
GP
M
M
19
IBM Mashup Center
Implementation
Websphere Application Server
Knowledge base
MatchUp Algorithm
20
Experiments (synthetic dataset)
  • Synthetic dataset for large-scale experiments
  • Generated a DB of 40k mashlets GPs
    (ProgrammableWeb has 4k)
  • Based on ProgrammableWeb characteristics.
  • Experiments for synthetic dataset
  • Varying of total mashlets and GPs
  • Varying k
  • Varying of user mashlets
  • Varying GP complexity

21
Results (synthetic dataset)
GP Complexity 5, varying k
22
Results (synthetic dataset)
GP Complexity 10, varying k
23
Results (synthetic dataset)
Varying of user mashlets
24
Experiments (real dataset)
  • Real dataset
  • Used real-life mashlets from ProgrammableWeb and
    IBM Mashup Center
  • Scenario development of a travel-related mashup
  • Experiments for quality assesment
  • IBM Mashup Center as the mashup platform
  • Users placed mashlets
  • MatchUp offered top-10 GPs for their mashlets
  • Users searched for alternatives
  • Results
  • User satisfaction was high
  • High correlation between suggestions and users
    lists
  • Browsing for additional results was in general
    unsuccessful
  • Gluing process was significantly expedited

25
Related Work
  • Autocompletion in many other domains
  • Phrase Prediction (Nandi Jagadish, VLDB 2007)
  • File locations (Myers, CHI 2000)
  • Web service composition
  • Model for WS composition (Berardi et al., VLDB
    2005)
  • Optimized and customized algorithm (Mcilraith and
    Son, KR 2002)
  • Mashup assembly tools
  • MashMaker (Ennals Garofalakis, SIGMOD 2007)
    data -gt widgets
  • MashupAdvisor (Elmeleegy et al., ICWS 2008)
    mashup -gt output recomm. -gt assembly to achieve
    this output

26
(No Transcript)
27
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com