Making the Web searchable, or the Future of Web Search - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Making the Web searchable, or the Future of Web Search

Description:

There are approximately 500 million users ... media properties (Flickr, delicious, Answers, 360, Video, MyBlogLog, Jumpcut ... Public opinion on Britney Spears ... – PowerPoint PPT presentation

Number of Views:70

Avg rating:3.0/5.0

Slides: 34

Provided by: Yah956

Category:

more less

Transcript and Presenter's Notes

Title: Making the Web searchable, or the Future of Web Search

1
Making the Web searchable, or the Future of Web
Search

Peter Mika
Yahoo! Research Barcelona

2
About Yahoo!

Yahoo!'s mission is to connect people to their
passions, their communities, and the worlds
knowledge
Yahoo! Research Barcelona
Established January, 2006
Led by Ricardo Baeza-Yates
Research areas
Web Mining
content, structure, usage
Distributed Web retrieval
Multimedia retrieval
NLP and Semantics

3
Yahoo! by numbers (April, 2007)

There are approximately 500 million users of
Yahoo! branded services, meaning we reach 50
percent or 1 out of every 2 users online, the
largest audience on the Internet (Yahoo! Internal
Data).
Yahoo! is the most visited site online with
nearly 4 billion visits and an average of 30
visits per user per month in the U.S. and leads
all competitors in audience reach, frequency and
engagement (comScore Media Metrix, US, Feb.
2007).
Yahoo! accounts for the largest share of time
Americans spend on the Internet with 12 percent
(comScore Media Metrix, US, Feb. 2007) and
approximately 8 percent of the worlds online
time (comScore WorldMetrix, Feb. 2007).
Yahoo! is the 1 home page with 85 million
average daily visitors on Yahoo! homepages around
the world, an increase of nearly 5 million
visitors in a month (comScore WorldMetrix, Feb.
2007).
Yahoo!s social media properties (Flickr,
delicious, Answers, 360, Video, MyBlogLog,
Jumpcut and Bix) have 115 million unique visitors
worldwide (comScore WorldMetrix, Feb. 2007).
Yahoo! Answers is the largest collection of human
knowledge on the Web with more than 90 million
unique users and 250 million answers worldwide
(Yahoo! Internal Data).
There are more than 450 million photos in Flickr
in total and 1 million photos are uploaded daily.
80 percent of the photos are public (Yahoo!
Internal Data).
Del.icio.us hits 2 million users in February,
growing more than six times its size from 300,000
users in December 2005 (Yahoo! Internal Data).
Yahoo! Mail is the 1 Web mail provider in the
world with 243 million users (comScore
WorldMetrix, Feb. 2007) and nearly 80 million
users in the U.S. (comScore Media Metrix, US,
Feb. 2007)
Interoperability between Yahoo! Messenger and
Windows Live Messenger has formed the largest IM
community approaching 350 million user accounts
(Yahoo! Internal Data).
Yahoo! Messenger is the most popular in time
spent with an average of 50 minutes per user, per
day (comScore WorldMetrix, Feb. 2007).
Nearly 1 in 10 Internet users is a member of a
Yahoo! Groups (Yahoo! Internal Data).
Yahoo! News is the 1 online news destination and
has reached a new audience high in February with
36.2 million users, 10 million more users than
its nearest competitor MSNBC (comScore Media
Metrix, US, Feb. 2007).
Yahoo! is one of only 26 companies to be on both
the Fortune 500 list and the Fortunes Best
Place to Work List (2006).

4
Overview

Why reconsider search?
Context
Semantic Web metadata infrastructure
Web 2.0 user-generated metadata
Thesis making the Web searchable
Research challenges (SW IR)
Conclusion

5
Motivation

State of Web search
Picked the low hanging fruit
Heavy investments, marginal returns
High hanging fruits
Hard searches remain
The Web and its technology have changed
Semantic Web
Web 2.0

6
Hard searches

Ambiguous searches
Paris Hilton
Multimedia search
Images of Paris Hilton
Imprecise or overly precise searches
Publications by Jim Hendler
Find images of strong and adventurous people
(Lenat)
Searches for descriptions
Search for yourself without using your name
Product search (ads!)
Searches that require aggregation
Size of the Eiffer tower (Lenat)
Public opinion on Britney Spears
Queries that require a deeper understanding of
the query, the content and/or the world at large
Note some of these are so hard that users dont
even try them any more

7
Example
8
The Semantic Web (1996-)

Making the content of the Web machine processable
through metadata
Documents, databases, Web services
Active research, standardization, startups
Ontology languages (RDF, OWL family), query
language for RDF (SPARQL)
Software support (metadata stores, reasoners,
APIs)

9
Problem difficulties in deployment

Not enough take-up in the Web community at large
Technological challenges
Discovery
Ontology learning
Ontology mapping
Lack of attention to the social side
Over-estimating complexity for users
Need for supporting ontology creation and sharing
Focus shifts from documents to databases --the
Web of Data
Task/domain-specific applications

10
Web 2.0 (2003-)

Simple, nimble, socially transparent interfaces
Simplified KR
e.g. tagging, microformats, Wikipedia infoboxes
In exchange for a better experience,
users are willing to
Provide content, markup and metadata
Provide data on themselves and their networks
Rank, rate, filter, forward
Develop software and improve your site
User-generated content
Content that users actually care about!

11
Example Microformats

Agreements on the way to encode certain kinds
metadata in HTML
Reuse of semantic-bearing HTML elements
Based on existing standards
Each microformat defines a vocabulary for
describing a given type of resource
Persons, Events, but also syntactic metadata
licenses, tags
Not ontologies
No formal descriptions of schema, only text
No namespaces, unique identifiers (URIs)
? no interlinking, reuse among schemas
No datatypes
Widely used in millions of hand-authored
documents
And in hundreds of millions dynamically generated
ones

12
Example hCard
href"mailtojfriday_at_host.com"Joe Friday
1-919-555-7878 class"title"Area Administrator, Assistant

rel"friend colleague met" href"http//meyerweb.
com/"Eric Meyer wrote a post
( /2005/12/16/tax-relief/" Tax Relief)
about an unintentionally humorous letter he
received from the class"fn org url" href"http//irs.gov/" Interna
l Revenue Service .
13
Example Wikipedia infoboxes

Templates for common types of objects

14
Example Wikipedia infoboxes
15
Example Wikipedia infoboxes

cf. microformats
Similar level of representation
Infoboxes are never annotations
Largely uncontrolled growth
Niche templates
Templates in several languages
? overlapping domains
Infoboxes to RDF
dbPedia
Compare also to Semantic Wikis
Semantic MediaWiki, OntoWiki etc.

16
Web 2.0 bottleneck lack of foundations

Tags
No shared syntax (TagCommons? A microformat for
tags?)
Mapping problems due to lack of semantics
flickrajax del.icio.usajax ?
flickrajaxPeter flickrajaxJohn ?
flickrajaxPeter1990 flickrajaxPeter2006 ?
Microformats
You cannot make a vocabulary for everything in a
centralized way
Serious validation,mapping problems on the
instance level
Wikipedia
Serious validation,mapping problems on both the
instance and the schema level

17
Thesis making the Web searchable

The Web has changed
Content owners are interested in their content to
be found (Web 2.0)
Cf. findability (Peter Morville), reusability
(mashups), open data movement
Foundations are laid for a Semantic Web
We need to
Combine the best of Web 2.0 and the Semantic Web
Reconsider Web search in this new world

18
Semantic Web and Web 2.0

Focus on user-generated content
Getting the representation right
RDF
Embedded RDF
GRDDL, RDFa
Innovations on the interface side
Capture semantics while authoring
New methods of reasoning
Semantics syntax statistics
Bottom-up, emergent semantics
Methods of logical reasoning combined with
methods of graph mining, statistics
Scalability
Giving up soundness and/or completeness
Dealing with the mess
Social engineering
Collaborative spaces for creating and sharing
ontologies, data
Connecting islands of semantics
Best practices, documentation, advocacy

19
Example GRDDL

Bridges the world of microformats and RDF
Associate RDF-producing XSLT transformations to
XML and (X)HTML documents
One page may contain different microformats (e.g.
persons and events described in the same page)
One microformat may be mapped to multiple
ontologies

ta-view" Joe Friday's Home page
href"http//www.w3.org/2003/12/rdf-in-xhtml-xslts
/grokFOAF.xsl" /

Note of course it is possible to extract
non-RDF data through XSLT,
e.g. extract VCard from an HTML fragment ---
but thats not called GRRDL

20
Example RDFa

Embedding RDF into (X)HTML
Increased complexity, e.g. namespaces
Reuse of semantic-bearing HTML elements is not
possible any more
No need for XSLT any more
You can use XSLT to extract RDFa, but dont have
to
Not much track record
Big question user complexity (? data quality)

ons.bib" about"mika06jws"
class"swrcArticle hbib article"    class"vcard"             property"foafname"
rev"swrcauthor" href"mika06jws"
class"foafPerson author fn" Peter Mika

21
Example openacademia.org and RDFa

_at_INBOOKens03ontoknowledge,
AUTHOR "Victor Iosif and Peter Mika and Rikard
Larsson and Hans Akkermans",
TITLE "Towards the Semantic Web
Ontology-Driven Knowledge Management",
CHAPTER "Field Experimenting With Semantic Web
Tools In A Virtual Organization",
PUBLISHER "John Wiley \ Sons",
YEAR "2003

22
Example machine tags
23
Example ZoneTag project (Yahoo! Research
Berkeley)
24
Example Freebase
25
Web Search 2.0

In an ideal world
Plenty of metadata to harvest
Metadata is unambiguous, described using a single
ontology or a set of carefully designed
ontologies
User intent can be captured directly as a formal
query
Query and the knowledge base use the same
ontology
Query is executed on a single knowledge base,
gives the correct, single answer
And all this very fast

26
Web Search 2.0

In reality
Many lightweight ontologies or just tags
Tags are mostly personal, not social
Intent is unclear, matching is a problem
Poor quality of annotations
Everyones a librarian
99 of the Web is Web 1.0
Input/output interface
Keywords for searching
Very limited interaction
Not everything scales

27
Web Search 2.0

Keep on improving machine technology
NLP
Information Extraction
Exploit the users for the tasks that are hard for
the machine
Encourage and support users
Exploit user-generated metadata in any shape or
form
Support standards of the SW architecture

28
Example folksonomies

Simplified view tags are just anchortext
Can be used to generate simple co-occurrence
graphs

hilton
url1
paris
url2
eiffel
url3
29
(No Transcript)
30
The more complete picture

Folksonomies as tripartite graphs of users, urls
and tags

user1
user2
hilton
url1
user3
paris
url2
eiffel
url3
31
Mining and modelling folksonomies

Opportunities for mining community-specific
interpretations of the world
Peter Mika. Ontologies are us A unified model of
social networks and semantics. Journal of Web
Semantics 5 (1), page 5-15, 2007
Related works at
Social and Collaborative Construction of
Structured Knowledge (CKC2007)
Bridging the gap between Semantic Web and Web 2.0
(ESWC2007)
Journal of Web Semantics special issue on
Semantic Web and Web 2.0 (upcoming, Q4 2007)
TAGORA project

32
Vision ontology-based search

Query at the knowledge level
Partial description of a class/instance
Mapping of queries and resources in the
conceptual space
Computing relevance in semantic terms
Novel user interfaces

33
Technical challenges

Improving NLP and IE
Query interface
Data quality
Cleaning up metadata, tags
Spam
Ontology mapping and entity resolution
Ranking across types
Results display
How do you avoid information overload?
How do you display information you partially
understand?

34
Social challenges

Getting the users on your side
Users are unwilling to submit large amounts of
structured data to a commercial entity (Google
Base)
Provide a clear motivation and/or instant
gratification
Trust them but not too much (Mahalo)

35
Example Technorati and microformats
http//technorati.com/posts/tag/semanticweb

rel"tag"Semantic Web

36
Conclusion

Why a new vision?
The opportunity convergence
Semantic Web metadata infrastructure
Web 2.0 user-generated metadata
Thesis making the Web searchable
Semantic Web and Web 2.0
Web Search 2.0

37
What is there to gain?

Knowledge-based search
Sorting out hard searches
Creating new information needs
Beyond search
Analysis, design, diagnosis etc. on top of
aggregated data
Personalization
Rich user profiles
Monetization
No more buy virgins on eBay

38
Questions?

Peter Mika. Social Networks and the Semantic Web.
Springer, July, 2007.
Special Issue on the Semantic Web and Web 2.0,
Journal of Web Semantics, December, 2007.

Write a Comment

User Comments (0)