Title: How%20Could%20We%20All%20Get%20Along%20on%20the%20Web%202.0?%20%20The%20Power%20of%20Structured%20Data%20on%20the%20Web
1 How Could We All Get Along on the Web 2.0? The
Power of Structured Data on the Web
- Sihem Amer Yahia
- Yahoo! Research
2Outline
- Web search and web 2.0 search
- Why should we all get along?
- How could we all get along?
- Related work
- Conclusion
3Web search
- Access to heterogeneous, distributed
information - Heterogeneous in creation
- Heterogeneous in motives
- Keyword search very effective in connecting
people to information
Search
web pages
web pages
4Web search vs web 2.0 search?
Web 2.0 search
Subscribers
Feeds
Web search
Anonymous
Content creators
Content aggregators
5 - Web 2.0 a generation of internet-based
services that - let people form online communities
- in order to collaborate
- and share information
- in previously unavailable ways
6Online communities
- Subscribers join communities where they
- exchange content emails, comments, tags
- rate content from other subscribers
- exhibit common behavior
- About 500M unique Y! visitors per month, about
200M subscribers (login visitors) to more than
130 Y! services
7Web 2.0 search
Connecting people to people
Web 2.0
Flickr Y!Answers YouTube Y!Groups
8Web 2.0 search examples
- Mary is a professional photographer and is
looking for aerial photos of the Hoggar desert - She is also an amateur Jazz dancer and wants to
ask about dance schools w/flexible schedules in
SF - She is also looking for the latest video on bird
migration in Central Park, NY - She has heart problems but loves biking and is
interested in finding about email discussions on
biking trails in northern California
9Outline
- Web search and web 2.0 search
- Why should we all get along?
- How could we all get along?
- Related work
- Conclusion
10Improving users experience
- Keyword search should be maintained simple and
intuitive - Keyword queries usually short
- only express a small fraction of the user's true
intent - Users's interactions within community-based
systems can be used to infer a lot more about
intent and return better answers
11Why should we all get along?
- Contributed content is structured
- This is what DB community knows how to do best
- Relevance to query keywords is key
- This is what IR community knows how to do best
12Searching online communities
sub sub trust
s1 s3 c13
s1 s4 c14
s2 s3 c23
s4 s6 c46
id author date
001 s2 1/1/06
002 s4 1/8/06
003 s4 3/9/06
id sub annotation
001 s2 1/1/06
002 s4 1/8/06
003 s4 3/9/06
data table
Tags, ratings, Reviews table
community relationship table
13Searching online communities
- Search for most relevant data on some topic
- Querying data selection over data table
- Querying annotations selection over annotation
table join w/data table - Personalizing answers join w/subscribers table
- Relevance use data relevance annotation table
14Why should we all get along?
- Query interpretation depends on subscribers
interest at the time of querying - Data annotations are dynamic
- Precompute all (sub,sub,trust) for each topic?
- Need for dynamic query generation
15DB and IR
- Shared interactions help focus search
- User-input, community-input, extraction
- Personalizing answers with community information
- Ranking as a combination of
- Relevance
- Relationship strengths between people in the same
community
16Outline
- Web search and web 2.0 search
- Why should we all get along?
- How could we all get along?
- Applications
- Technical challenges
- Related work
- Conclusion
17Applications
- Flickr enables sharing and tagging photos
- Y! Answers enables asking and answering questions
in natural language - YouTube enables sharing videos, rating videos,
commenting on videos and subscribing to new
videos from favorite users - Y! Groups enables creating groups, joining
existing groups, posting in a group
18Flickr
- Acquired by Y! in 2005
- Tag search
- Photos grouped into categories.
- Set privacy levels on each photo
19(No Transcript)
20(No Transcript)
21The new inputs to Flickr search
Users tag and rate photos
- Combine tag-based search
- with community knowledge
- Combine photo rating with
- relationship strength
Users tagging same photos with similar tags form
a community of interest
22Y! Answers
- Launched in second half of 2005
- Incentive system based on points and voting for
best answers - Questions grouped by category
- Some statistics
- over 60 million users
- over 120 million answers, available in 18
countries and in 6 languages
23(No Transcript)
24Y! Answers
25Y! Answers
26The new inputs to Y!Answers search
Users provide Questions/Answers
Combine community information with answer rating
Voting information reflects communities of
interest
27YouTube
- Founded in February 2005
- Tag search
- Videos grouped by category
- Some statistics
- 100 million views/day
- 65,000 new videos/day
28(No Transcript)
29(No Transcript)
30The new inputs to YouTube search
Users provide videos, tags, ratings, comments
Combine community information with video rating
Similar tags on same videos imply communities of
interest
31Yahoo! Groups
- Yahoo! acquired eGroups in 2000
- Group moderators
- Groups belong to categories
- Public and private groups
- Some statistics
- over 7M groups
- over 190M subscribers
- over 100K new subscribers/day
- over 12M emails/day
32(No Transcript)
33(No Transcript)
34(No Transcript)
35Alternative query interpretations
- Return all group postings relevant to a query.
- Return only posting by subscribers sharing the
same interests women with heart disease
interested in steep slopes
36The new inputs to Group search
Users participate in many groups
Combine community information with postings
relevance
Group membership and postings imply communities
of interest
37Outline
- Web search and web 2.0 search
- Why should we all get along?
- How could we all get along?
- Applications
- Technical challenges
- Related work
- Conclusion
38So, how can we all get along?
- Augment keyword query with conditions on
structure to focus and personalize search (DB) - Flickr tags
- Answers points
- YouTube reviews and ratings
- Groups emails
- Combine it with relevance (IR)
39Search architecture
Query evaluation
Query tightening
Subscriber
Ranking content relevance relationship
Find relevant community of interest
40Example
S1 S1 S2 S2 S3 S3 S4
S4 S5 S5 S6 S6 S7 S7
From To Date Subject Content
( si, sj, cij )
message structure
Many such relationships depending on subscribers
interests
Query tightening
41Can we really all get along?
- IR may think that user weights are enough to
target communities of interest and personalize
queries - DB thinks expressiveness of query languages
cannot all be captured by ranking functions
42Query rewriting
Content-Only
Content in context
Loose interpretation of context
43Query relaxation
- Primitive operations for dropping query
predicates - Answers to relaxed query contain answers to exact
one - Scores relaxed answer no higher than score of
exact one
44Query tightening
- Primitive operations for adding query predicates
- Tighter answers are found but looser answers
should be maintained - Scores tighter answers no lower than scores of
other answers
45More technical challenges
- Query tightening primitives to focus search
- Subscriber has a different profile/community of
interest - Topk processing needs to enforce user profiles
46Outline
- Web search and web 2.0 search
- Why should we all get along?
- How could we all get along?
- Applications
- Technical challenges
- Related work
- Conclusion
47Related Work
- Language models Ask Bruce Croft
- Web search personalization
- Search behavior
- HARD track at TREC
- Building relationship graphs
- Collaborative filtering
- Clustering
- Unsupervised learning
48Tempting conclusion
- Little information could be gathered on users to
greatly improve new-generation search - IR and DB views both needed
49More technical challenges
- Subscriber belongs to different communities of
interest - Should subscriber turn off personalization?
- How is efficiency affected? (revisiting topk
processing) - Back from community search to web search?
50Beyond search in online communities
- Are online communities a way to build more
accurate user profiles or more? - display relevant groups when user is asking a
question on Y! Answers mashups?
51Danger of online communities
- Are we discouraging diversity?
52Thank you.
- Questions?
- sihem_at_yahoo-inc.com
- http//research.yahoo.com/sihem