How%20Could%20We%20All%20Get%20Along%20on%20the%20Web%202.0?%20%20The%20Power%20of%20Structured%20Data%20on%20the%20Web - PowerPoint PPT Presentation

About This Presentation
Title:

How%20Could%20We%20All%20Get%20Along%20on%20the%20Web%202.0?%20%20The%20Power%20of%20Structured%20Data%20on%20the%20Web

Description:

Web search. Access to 'heterogeneous', distributed information ... Web 2.0 ... interactions help focus search. User-input, community-input, ... – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 53
Provided by: dbirdayS
Category:

less

Transcript and Presenter's Notes

Title: How%20Could%20We%20All%20Get%20Along%20on%20the%20Web%202.0?%20%20The%20Power%20of%20Structured%20Data%20on%20the%20Web


1
How Could We All Get Along on the Web 2.0? The
Power of Structured Data on the Web
  • Sihem Amer Yahia
  • Yahoo! Research

2
Outline
  • Web search and web 2.0 search
  • Why should we all get along?
  • How could we all get along?
  • Related work
  • Conclusion

3
Web search
  • Access to heterogeneous, distributed
    information
  • Heterogeneous in creation
  • Heterogeneous in motives
  • Keyword search very effective in connecting
    people to information

Search
web pages
web pages
4
Web search vs web 2.0 search?
Web 2.0 search
Subscribers
Feeds
Web search
Anonymous
Content creators
Content aggregators
5
  • Web 2.0 a generation of internet-based
    services that
  • let people form online communities
  • in order to collaborate
  • and share information
  • in previously unavailable ways

6
Online communities
  • Subscribers join communities where they
  • exchange content emails, comments, tags
  • rate content from other subscribers
  • exhibit common behavior
  • About 500M unique Y! visitors per month, about
    200M subscribers (login visitors) to more than
    130 Y! services

7
Web 2.0 search
Connecting people to people
Web 2.0
Flickr Y!Answers YouTube Y!Groups
8
Web 2.0 search examples
  • Mary is a professional photographer and is
    looking for aerial photos of the Hoggar desert
  • She is also an amateur Jazz dancer and wants to
    ask about dance schools w/flexible schedules in
    SF
  • She is also looking for the latest video on bird
    migration in Central Park, NY
  • She has heart problems but loves biking and is
    interested in finding about email discussions on
    biking trails in northern California

9
Outline
  • Web search and web 2.0 search
  • Why should we all get along?
  • How could we all get along?
  • Related work
  • Conclusion

10
Improving users experience
  • Keyword search should be maintained simple and
    intuitive
  • Keyword queries usually short
  • only express a small fraction of the user's true
    intent
  • Users's interactions within community-based
    systems can be used to infer a lot more about
    intent and return better answers

11
Why should we all get along?
  • Contributed content is structured
  • This is what DB community knows how to do best
  • Relevance to query keywords is key
  • This is what IR community knows how to do best

12
Searching online communities
sub sub trust
s1 s3 c13
s1 s4 c14
s2 s3 c23
s4 s6 c46



id author date
001 s2 1/1/06
002 s4 1/8/06
003 s4 3/9/06


id sub annotation
001 s2 1/1/06
002 s4 1/8/06
003 s4 3/9/06


data table
Tags, ratings, Reviews table
community relationship table
13
Searching online communities
  • Search for most relevant data on some topic
  • Querying data selection over data table
  • Querying annotations selection over annotation
    table join w/data table
  • Personalizing answers join w/subscribers table
  • Relevance use data relevance annotation table

14
Why should we all get along?
  • Query interpretation depends on subscribers
    interest at the time of querying
  • Data annotations are dynamic
  • Precompute all (sub,sub,trust) for each topic?
  • Need for dynamic query generation

15
DB and IR
  • Shared interactions help focus search
  • User-input, community-input, extraction
  • Personalizing answers with community information
  • Ranking as a combination of
  • Relevance
  • Relationship strengths between people in the same
    community

16
Outline
  • Web search and web 2.0 search
  • Why should we all get along?
  • How could we all get along?
  • Applications
  • Technical challenges
  • Related work
  • Conclusion

17
Applications
  • Flickr enables sharing and tagging photos
  • Y! Answers enables asking and answering questions
    in natural language
  • YouTube enables sharing videos, rating videos,
    commenting on videos and subscribing to new
    videos from favorite users
  • Y! Groups enables creating groups, joining
    existing groups, posting in a group

18
Flickr
  • Acquired by Y! in 2005
  • Tag search
  • Photos grouped into categories.
  • Set privacy levels on each photo

19
(No Transcript)
20
(No Transcript)
21
The new inputs to Flickr search
Users tag and rate photos
  • Combine tag-based search
  • with community knowledge
  • Combine photo rating with
  • relationship strength

Users tagging same photos with similar tags form
a community of interest
22
Y! Answers
  • Launched in second half of 2005
  • Incentive system based on points and voting for
    best answers
  • Questions grouped by category
  • Some statistics
  • over 60 million users
  • over 120 million answers, available in 18
    countries and in 6 languages

23
(No Transcript)
24
Y! Answers
25
Y! Answers
26
The new inputs to Y!Answers search
Users provide Questions/Answers
Combine community information with answer rating
Voting information reflects communities of
interest
27
YouTube
  • Founded in February 2005
  • Tag search
  • Videos grouped by category
  • Some statistics
  • 100 million views/day
  • 65,000 new videos/day

28
(No Transcript)
29
(No Transcript)
30
The new inputs to YouTube search
Users provide videos, tags, ratings, comments
Combine community information with video rating
Similar tags on same videos imply communities of
interest
31
Yahoo! Groups
  • Yahoo! acquired eGroups in 2000
  • Group moderators
  • Groups belong to categories
  • Public and private groups
  • Some statistics
  • over 7M groups
  • over 190M subscribers
  • over 100K new subscribers/day
  • over 12M emails/day

32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
Alternative query interpretations
  • Return all group postings relevant to a query.
  • Return only posting by subscribers sharing the
    same interests women with heart disease
    interested in steep slopes

36
The new inputs to Group search
Users participate in many groups
Combine community information with postings
relevance
Group membership and postings imply communities
of interest
37
Outline
  • Web search and web 2.0 search
  • Why should we all get along?
  • How could we all get along?
  • Applications
  • Technical challenges
  • Related work
  • Conclusion

38
So, how can we all get along?
  • Augment keyword query with conditions on
    structure to focus and personalize search (DB)
  • Flickr tags
  • Answers points
  • YouTube reviews and ratings
  • Groups emails
  • Combine it with relevance (IR)

39
Search architecture
Query evaluation
Query tightening
Subscriber
Ranking content relevance relationship
Find relevant community of interest
40
Example
S1 S1 S2 S2 S3 S3 S4
S4 S5 S5 S6 S6 S7 S7
From To Date Subject Content
( si, sj, cij )
message structure
Many such relationships depending on subscribers
interests
Query tightening
41
Can we really all get along?
  • IR may think that user weights are enough to
    target communities of interest and personalize
    queries
  • DB thinks expressiveness of query languages
    cannot all be captured by ranking functions

42
Query rewriting
Content-Only
Content in context
Loose interpretation of context
43
Query relaxation
  • Primitive operations for dropping query
    predicates
  • Answers to relaxed query contain answers to exact
    one
  • Scores relaxed answer no higher than score of
    exact one

44
Query tightening
  • Primitive operations for adding query predicates
  • Tighter answers are found but looser answers
    should be maintained
  • Scores tighter answers no lower than scores of
    other answers

45
More technical challenges
  • Query tightening primitives to focus search
  • Subscriber has a different profile/community of
    interest
  • Topk processing needs to enforce user profiles

46
Outline
  • Web search and web 2.0 search
  • Why should we all get along?
  • How could we all get along?
  • Applications
  • Technical challenges
  • Related work
  • Conclusion

47
Related Work
  • Language models Ask Bruce Croft
  • Web search personalization
  • Search behavior
  • HARD track at TREC
  • Building relationship graphs
  • Collaborative filtering
  • Clustering
  • Unsupervised learning

48
Tempting conclusion
  • Little information could be gathered on users to
    greatly improve new-generation search
  • IR and DB views both needed

49
More technical challenges
  • Subscriber belongs to different communities of
    interest
  • Should subscriber turn off personalization?
  • How is efficiency affected? (revisiting topk
    processing)
  • Back from community search to web search?

50
Beyond search in online communities
  • Are online communities a way to build more
    accurate user profiles or more?
  • display relevant groups when user is asking a
    question on Y! Answers mashups?

51
Danger of online communities
  • Are we discouraging diversity?

52
Thank you.
  • Questions?
  • sihem_at_yahoo-inc.com
  • http//research.yahoo.com/sihem
Write a Comment
User Comments (0)
About PowerShow.com