Thanks to Jim Hendler, Carl Lagoze, Jayavel Shanmugasundaram, Sara Cohen, Jonathan Mamou, Yaron Kanz - PowerPoint PPT Presentation

1 / 90
About This Presentation
Title:

Thanks to Jim Hendler, Carl Lagoze, Jayavel Shanmugasundaram, Sara Cohen, Jonathan Mamou, Yaron Kanz

Description:

... defines how descriptive markup should be embedded in a document ... programmers to write programs to handle documents (and data) ... SGML documents have ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 91
Provided by: raym118
Category:

less

Transcript and Presenter's Notes

Title: Thanks to Jim Hendler, Carl Lagoze, Jayavel Shanmugasundaram, Sara Cohen, Jonathan Mamou, Yaron Kanz


1
Thanks to Jim Hendler, Carl Lagoze, Jayavel
Shanmugasundaram, Sara Cohen, Jonathan Mamou,
Yaron Kanza, Mark Sapossnek, Yehoshua Sagiv,
Frank van Harmelen
  • XML, RDF and
  • Advanced Search
  • (Semantic Web)

2
What we have covered
  • What is IR
  • Evaluation
  • Tokenization and properties of text
  • Web crawling
  • Query models
  • Vector methods
  • Measures of similarity
  • Indexing
  • Inverted files
  • Basics of internet and web
  • Spam and SEO
  • Search engine design
  • Google and Link Analysis
  • Social network analysis
  • This lecture metadata, XML, RDF issues in
    advanced search and the Semantic Web

3
The importance of data and their rules
  • Tim Berners-Lee
  • inventor of the world wide web
  • Founder of the W3C
  • Presentation at Ted

4
Metadata is data about data
Metadata and Markup languages
Metadata often is written in XML
5
Metadata is semi-structured data conforming to
commonlyagreed upon models, providing
operational interoperabilityin a heterogeneous
environment
6
What is metadata?Some simple definitions
  • Structured data about data.
  • Dublin Core Metadata Initiative FAQ, 2005
  • http//dublincore.org/resources/faq/
  • Machine-understandable information about Web
    resources or other things.
  • Tim Berners-Lee, W3C, 1997
  • http//www.w3.org/DesignIssues/Metadata

7
"Web resources or other things"
  • Metadata might be "about" anything!
  • HTML documents
  • digital images
  • databases
  • books
  • museum objects
  • archival records
  • metadata records
  • Web sites
  • collections
  • services
  • physical places
  • people
  • organizations
  • works
  • formats
  • concepts
  • events

8
What is metadata?Towards a "functional" view
  • Data associated with objects which relieves their
    potential users of having to have full advance
    knowledge of their existence or characteristics.
  • Lorcan Dempsey Rachel Heery, "Metadata a
    current view of practice and issues", 1998
  • http//www.ukoln.ac.uk/metadata/publications/jdmet
    adata/

9
What is metadata?Towards a "functional" view
  • Structured data about resources that can be used
    to help support a wide range of operations.
  • Michael Day, "Metadata in a Nutshell", 2001
  • http//www.ukoln.ac.uk/metadata/publications/nutsh
    ell/

10
What is metadata?Towards a "functional" view
  • Data associated with objects which relieves their
    potential users of having to have full advance
    knowledge of their existence or characteristics.
  • Lorcan Dempsey Rachel Heery, "Metadata a
    current view of practice and issues", 1998
  • http//www.ukoln.ac.uk/metadata/publications/jdmet
    adata/
  • Structured data about resources that can be used
    to help support a wide range of operations.
  • Michael Day, "Metadata in a Nutshell", 2001
  • http//www.ukoln.ac.uk/metadata/publications/nutsh
    ell/

11
What might metadata "say"?
What is this called? What is this about? Who made
this? When was this made? Where do I get (a copy
of) this? When does this expire? What format does
this use? Who is this intended for? What does
this cost? Can I copy this? Can I modify
this? What are the component parts of this? What
else refers to this? What did "users" think of
this? (etc!)
12
What operations/functions?
  • resource disclosure discovery
  • resource retrieval, use
  • resource management, including preservation
  • verification of authenticity
  • intellectual property rights management
  • commerce
  • content-rating
  • authentication and authorization
  • personalization and localization of services
  • (etc!)

13
What operations/functions?
  • Different functions different metadata
  • Metadata (and metadata standards) sometimes
    classified according to function
  • Descriptive primarily for discovery, retrieval
  • Administrative primarily for management
  • Structural relationships between component parts
    of resources
  • Contextual relationships between resources
  • No one size fits all solution!

14
Metadata importance
  • data about data is about as good as the
    definition gets...
  • As a data resource grows, metadata becomes more
    important
  • Lack of metadata has different consequences
  • documentation metadata can be regenerated
    automatically, or by hand
  • datasets, pictures once lost, can be impossible
    to regenerate

15
Types of Metadata
  • Descriptive
  • Discovery / description of objects
  • Title, author, abstract, etc.
  • Structural
  • Storage presentation of objects
  • 1 pdf file, 1 ppt file, 1 LaTeX file, etc.
  • Administrative
  • Managing and preservation of objects
  • Access control lists, terms and conditions,
    format descriptions, meta-metadata

See http//www.loc.gov/standards/metadata.html
16
Which View is Correct?
figure 1 from http//www.dlib.org/dlib/january01/
lagoze/01lagoze.html
17
Approaches to Metadata
  • from Ng, Park and Burnett, 1997 (also JASIS,
    50(13)) http//www.scils.rutgers.edu/sypark/asis.
    html
  • library science bibliographic control
  • organizing the physical containers of
    information, by means of bibliographical
    description, subject analysis, and classification
    notation construction, so that the container can
    be efficiently described, identified, located and
    retrieved
  • computer and information science data management
  • not only to store, access and utilize data
    effectively, but also to provide data security,
    data sharing, and data integrity

18
Metadata and Cataloging
  • In library science, metadata issues are closely
    tied with cataloging issues
  • purpose of a catalog (Cutter, 1904)
  • enable a person to find a book
  • show what the library has
  • assist in the choice of a work
  • Does computer science has a cataloging analog
    coupled with metadata?

19
DL Metadata Issues
  • Who provides metadata?
  • author? publisher? professional cataloger?
    extracted from content?
  • Is metadata integrated with data?
  • related question is metadata a first class
    object?
  • Formats!
  • which ones?
  • extensible?
  • paradox the more powerful the format, the less
    likely it will be used...

20
Metadata Formats and Implementation
  • Use markup languages
  • Interoperable
  • Extensible
  • Robust
  • Permits advance search features
  • When online, the beginning of a semantic web!

21
Interesting Formats
  • Library science
  • Machine Readable Catalogue (MARC) huge,
    extensive, all purpose, one size fits all format
  • pro does everything
  • con kids, dont try this at home!
  • Computer science
  • application-specific formats refer, BibTeX,
    RFC-1807, etc.
  • Dublin Core - common ground?

22
What is a markup language?
  • Textual (i.e. person readable) language where
    significant elements are indicated by markers
  • ltTITLEgtXMLlt/TITLEgt
  • Examples are RTF, HTML, XML, TEX etc.
  • Easy to process and can be manipulated by a
    variety of application programs

23
What is SGML?
  • Standard Generalized Markup Language
  • ISO 8879
  • Can define any document format of any complexity
  • Enables, extensibility, structure and validation
  • Too many optional features for the Web

24
Standard Generalized Markup Language (SGML)
  • Based on GML (generalized markup language),
    developed by IBM in the 1960s
  • An international standard (ISO 88791986) defines
    how descriptive markup should be embedded in a
    document
  • Can define any document format of any complexity
  • Enables, extensibility, structure and validation
  • Too many optional features for the Web
  • Gave birth to the extensible markup language
    (XML), W3C recommendation in 1998

25
The Purpose of SGML
  • SGML is designed to make your information last
    longer than the systems that created it. Such
    longevity also implies immunity to short-term
    changes -- such as a change from one application
    program to another -- so SGML is also inherently
    designed for re-purposing and portability.

26
What is SGML?
  • SGML (and it's derivatives, HTML and XML) are
    ASCII character based representations of
    electronic data
  • Remember, it's all bits--meaning is derived from
    how they are organized
  • Think of SGML docs as strings that must be
    parsed--A web browser parses an HTML doc and uses
    the markup codes to display the data contained
  • Since it's all ASCII, these docs can also be
    handled by non parsing tools (such as vi, emacs,
    perl, etc.)

27
What is SGML?
  • SGML is
  • very large, powerful and complex
  • been in heavy industrial and commercial use for
    two decades (ISO standard 1985)
  • XML is lightweight, cut down version of SGML

28
SGML?XML?HTML
  • SGML is the mother tongue but is overkill for
    most common desktop applications.
  • XML is an abbreviated version of SGML
  • easier to define own document types
  • easier for programmers to write programs to
    handle documents (and data)
  • omits all the options (and most of more complex
    and less-used parts) of SGML)
  • HTML is just one of many SGML or XML
    applications most frequently used on the Web

29
SGML Components
  • SGML documents have three parts
  • Declaration specifies which characters and
    delimiters may appear in the application
  • DTD (document type definition) / style sheet
    defines the syntax of markup constructs
  • Document instance actual text (with the tag) of
    the documents
  • More info could be found http//www.W3.Org/markup
    /SGML

30
World Wide Web (W3C) Consortium
31
What is XML?
  • XML eXtensible Markup Language
  • designed to improve the functionality of the Web
    by providing more flexible and adaptable
    information and identification
  • extensible because not a fixed format like HTML
  • a language for describing other languages (a
    meta-language)
  • design your own customised markup language

32
The HTML World
ltbodygt lth1gt XML and Information Retrieval A
SIGIR 2000 Workshop lt/h1gt ltpgt The
workshop was held on 28 July 2000. The editors of
the workshop were David Carmel,
Yoelle Maarek, and Aya Soffer lt/pgt
lth2gt XQL and Proximal Nodes lt/h2gt
ltpgt The paper was authored by Ricardo
Baeza-Yates and Gonzalo
Navarro. The abstract of this paper is given
below. lt/pgt ltpgt We consider
the recently proposed language lt/pgt
ltpgt The paper references the following
papers lta
hrefhttp//www.acm.org/www8/paper/xmlqlgt
lt/agt
lt/pgt
33
The XML World
ltworkshop date28 July 2000gt lttitlegt XML
and Information Retrieval A SIGIR 2000 Workshop
lt/titlegt lteditorsgt David Carmel, Yoelle
Maarek, Aya Soffer lt/editorsgt ltproceedingsgt
ltpaper id1gt
lttitlegt XQL and Proximal Nodes lt/titlegt
ltauthorgt Ricardo Baeza-Yates lt/authorgt
ltauthorgt Gonzalo Navarro
lt/authorgt ltabstractgt We
consider the recently proposed language
lt/abstractgt ltsection
nameIntroductiongt
Searching on structured text is becoming more
important with XML
ltsubsection nameRelated Workgt
The XQL language
lt/subsectiongt lt/sectiongt
ltcite
xmlnsxlinkhttp//www.acm.org/www8/paper/xmlqlgt
lt/citegt lt/papergt
34
Why use XML?
  • XML is written in SGML the Standardized General
    Markup Language, an international standard (ISO
    8879)
  • XML very simple dialect of SGML
  • goal enable generic SGML to be served, received
    and processed on the Web in ways not possible
    with HTML

35
Why use XML?
  • XML is not just for Web pages
  • use to store any kind of structured document
  • to enclose/encapsulate information in order to
    pass it between different computing systems that
    are otherwise unable to communicate

36
Key feature of XML
  • An application is free to use XML tagged data in
    many different ways, e.g.
  • produce an image
  • generate a formatted text listing
  • display the XML documents markup in pretty
    colors
  • restructure the data into a format for storing in
    a database, transmission over a network, input to
    another program.

37
XML is important because...
  • Removes 2 constraints that held back Web
    development
  • dependence on a single, inflexible document type
    (HTML) much abused
  • reduced the complexity of full SGML many options
    but hard to program

38
  • XML allows the flexible development of
    user-defined document types.
  • provides a robust, non-proprietary, persistent,
    and verifiable file format for the storage and
    transmission of text and data both on and off the
    Web

39
XML Software?
  • many programs are XML ready already today.
  • xml.coverpages.org covers news of new additions
    to XML

40
Is XML a Computer Language?
  • XML is not C or C or like any other programming
    language
  • By itself, it cannot specify calculations,
    actions, decisions to be carried out in any order
  • XML is a markup specification language

41
XML - a Markup Language
  • with XML, you can design ways of describing
    information (text or data), usually for storage,
    transmission or processing by a program
  • XML conveys no information about what should be
    done with the data or text it merely describes
    it.
  • By itself, XML does anything it is a data
    description format

42
How do I run or execute an XML file?
  • You cant and you dont !
  • XML is not a programming language
  • XML is a markup specification language
  • XML files are just data (waiting for a program to
    do something with them)
  • XML files can be viewed with an XML editor or
    XML-compatible browser

43
Things to Remember
  • XML does not replace HTML it provides an
    alternative which allows you to define your own
    set of markup elements to a published standard
  • lt?xml version"1.0" standalone"yes"?gt
  • ltconversationgt
  • ltgreetinggtHello, world!lt/greetinggt
  • ltresponsegtStop the planet, I want to get
    off!lt/responsegt
  • lt/conversationgt

44
Things to Remember
  • All parts of an XML document are case sEnSiTiVe
  • Element type names are case sensitive, so ltBODYgt
    lt/b odygt is out.
  • Attribute names are case sensitive
  • ltPIC width7cm/gt and
  • ltPIC WIDTH6cm/gt
  • describe different attributes, not just
    different values for the attribute PIC width.

45
What is XQuery?
  • XQuery is the language for querying XML data
  • The best way to explain XQuery is to say that
  • XQuery is to XML what SQL is to database
  • tables.
  • XQuery uses XPath expressions to extract XML
    data.
  • XPath is a language for finding information in an
    XML document.
  • XPath is used to navigate through elements and
    attributes in an XML document.
  • XQuery is defined by the W3C.
  • XQuery is supported by all the major database
    engines (IBM, Oracle, Microsoft, etc.)
  • XQuery 1.0 is not yet a W3C Recommendation
    (XQuery is a Working Draft). Hopefully it will be
    a recommendation in the near future.

46
Motivation for XML Search
  • It is becoming increasingly popular to publish
    data on the Web in the form of XML documents.
  • Current search engines, which are an
    indispensable tool for finding HTML documents,
    have two main drawbacks when it comes to
    searching for XML documents.
  • It is not possible to pose queries that
    explicitly refer to XML tags.
  • Search engines return references (i.e. links) to
    documents and not specific fragments thereof.
    This is problematic, since large XML documents
    may contain thousands of elements storing many
    pieces of information that are not necessarily
    related to each other.

47
The HTML World
ltbodygt lth1gt XML and Information Retrieval A
SIGIR 2000 Workshop lt/h1gt ltpgt The
workshop was held on 28 July 2000. The editors of
the workshop were David Carmel,
Yoelle Maarek, and Aya Soffer lt/pgt
lth2gt XQL and Proximal Nodes lt/h2gt
ltpgt The paper was authored by Ricardo
Baeza-Yates and Gonzalo
Navarro. The abstract of this paper is given
below. lt/pgt ltpgt We consider
the recently proposed language lt/pgt
ltpgt The paper references the following
papers lta
hrefhttp//www.acm.org/www8/paper/xmlqlgt
lt/agt
lt/pgt
48
The XML World
ltworkshop date28 July 2000gt lttitlegt XML
and Information Retrieval A SIGIR 2000 Workshop
lt/titlegt lteditorsgt David Carmel, Yoelle
Maarek, Aya Soffer lt/editorsgt ltproceedingsgt
ltpaper id1gt
lttitlegt XQL and Proximal Nodes lt/titlegt
ltauthorgt Ricardo Baeza-Yates lt/authorgt
ltauthorgt Gonzalo Navarro
lt/authorgt ltabstractgt We
consider the recently proposed language
lt/abstractgt ltsection
nameIntroductiongt
Searching on structured text is becoming more
important with XML
ltsubsection nameRelated Workgt
The XQL language
lt/subsectiongt lt/sectiongt
ltcite
xmlnsxlinkhttp//www.acm.org/www8/paper/xmlqlgt
lt/citegt lt/papergt
49
Problems with XQuery
  • A query language for XML, such as XQuery, can be
    used to extract data from XML documents.
  • However, such a query language is not an
    alternative to an XML search engine for several
    reasons.
  • The syntax of XQuery is more complicated than the
    syntax of a standart search query. Hence, it is
    not appropriate for a naive user.
  • Extensive knowledge of the document structure is
    required in order to correctly formulate a query.
    Thus, queries must be formulated on a per
    document basis.
  • XQuery lacks any mechanism for ranking answers.
  • Solution - XML Search engine

50
XML Search Tool Design Features?
  • A simple syntax that can be used by naive users
  • Search results should include XML fragments and
    not necessarily full documents
  • The XML fragments in an answer, should be
    semantically related
  • For example, a paper and an author should be in
    an answer only if the paper was written by this
    author
  • Search results should be ranked
  • Search results should be returned in reasonable
    time

51
XML Search Engines
  • Summary of XML engines
  • Or just use web search engine with filetypexml
  • Many for commercial use and some in design
  • Active research area
  • Web XML is a step in the direction of the
    semantic web!

52
What is Web 2.0 ?
  • Term coined by Tim OReilly and Media Live
    International as part of brainstorming session
    about the future of the web in 2005
  • Also may be called the Live Web or Living Web
  • Refers to more interactive technologies that
    engage, facilitate and empower users
  • Companies utilizing interactive technologies are
    the hot investments
  • Companies are just starting to embrace these
    technologies for business value
  • Tims Def (Video) Schmidts (Video)
  • The Machine (Video)

53
Web 1.0 vs 2.0 (Some Examples)
Source www.oreilly.com, What is web 2.0 Design
Patterns and Business Models for the next
Generation of Software, 9/30/2005
54
Web 3.0This will be the INTELLIGENT Web!
The Semantic Web!
55
How will we get the semantic web?
56
  • The Web and Web 2.0 were designed with humans in
    mind.
  • (Human Understanding)
  • The Web 3.0 will anticipate our needs! Whether it
    is State Department information when traveling,
    foreign embassy contacts, airline schedules,
    hotel reservations, area taxis, or famous
    restaurants the information. The new Web will
    be designed for computers.
  • (Machine Understanding)
  • The Web 3.0 will be designed to anticipate the
    meaning of the search.

57
Web 2.0 vs Web 3.0
  • Web 2.0 On the Web, you can see your e-mails,
    photographs, and restaurant appointments.

Web 3.0 On the Web... ...you can see your
photographs arranged so that you know what
restaurants you visited on a particular date, and
based on related emails sent that day.
58
  • The next stage for the Web will be making data
    accessible to artificial intelligence agents.
  • The Web 3.0 will need new languages beyond HTML
    or XML. That is the case of RDF or Resource
    Description Framework.
  • The Web 3.0 will need data delivered in
    computer-readable form (RDF).

59
General idea of Semantic Web
  • Make current web more machine accessible and
    intelligent!
  • (currently all the intelligence is in the user)
  • Motivating use-cases
  • Search engines
  • concepts, not keywords
  • semantic narrowing/widening of queries
  • Shopbots
  • semantic interchange, not screenscraping
  • E-commerce
  • Negotiation, catalogue mapping, personalisation
  • Web Services
  • Need semantic characterisations to find them
  • Navigation
  • by semantic proximity, not hardwired links
  • .....

60
Example
  • Try these queries with Google
  • Distance between Paris and Madrid Google returns
  • www.freedom-tour.com/mall/kmeurope.htm (giving
    you distances in miles and kilometers)
  • (The) Largest city of France Google returns
    France Largest City Paris
  • (The) Largest city of Spain Google returns Spain
    Largest City Madrid
  • Now, try these with Google
  • Distance between largest city of France and
    largest city of Spain
  • Distance betweenlargest city of Franceand
    largest city of Spain
  • And worst, Distance betweenthe largest city of
    France and the largest city of Spain No
    result returned by Google!

61
Example
  • So, whats wrong with Google?
  • Nothing. The problem is with the World Wide Web
  • The Web contains unstructured information
  • and Google is a keyword- and phrase-based search
    engine
  • Initiative to make the contents on the Web
    structured information/represented knowledge
  • the Semantic Web

62
General idea of Semantic Web(2)
  • Do this by
  • Making data and metadataavailable on the Webin
    machine-understandable form (formalized)
  • Structure the data and meta-data in ontologies

63
Expressed using the W3C stack
64
What its like to be a machine on the Web
65
Required are
  • Explicit meta-data
  • Shared domain descriptions
  • Machine-processable content
  • Machine-support for interoperability

66
machine accessible meaning (What its like
to be a machine)
67
XML ? machine accessible meaning
68
So why not just use XML?
  • No agreement on
  • structure
  • is country a
  • object?
  • class?
  • attribute?
  • relation?
  • something else?
  • what does nesting mean?
  • vocabulary
  • is country the same as nation?

ltcountry nameNetherlandsgt ltcapital
nameAmsterdamgt ltareacodegt020lt/areacodegt
lt/capitalgt lt/countrygt
ltnationgt ltnamegtNetherlandslt/namegt
ltcapitalgtAmsterdamlt/capitalgt
ltcapital_areacodegt 020 lt/capital_areacodegt
lt/nationgt
  • Are the above XML documents the same?
  • Do they convey the same information?
  • Is that information machine-accessible?

69
2nd aim of Semantic Web Data integration
  • Unstructured and sensors, programs, services
    semi-structured sources (document collections,
    message traffic, web pages, ...)
  • Structured data without an explicit data schema
    (non-local databases, data tables, charts and
    reports, ...)
  • Non-Text collections (image, video, sound, ...)
  • Streams of data from
  • Must specify the structure of data resources..

70
2nd aim of Semantic Web Data integration
  • ... so a processor can tell how the "attributes"
    and "values" are related
  • What is required vs. optional?
  • How many values for a particular attribute?
  • What attributes are keys for other attributes?
  • Which attributes are necessarily related to other
    attributes and in what way??
  • How do the attributes (and values) in one data
    source map to attributes and values describing
    another source?

71
Stack of languages
  • XML
  • Surface syntax, no semantics
  • XML Schema
  • Describes structure of XML documents
  • RDF
  • Datamodel for relations between things
  • RDF Schema (RDFS)
  • RDF Vocabulary Definition Language
  • OWL
  • A more expressive Vocabulary Definition Language

72
Semantic web languages today
  • Today there are three semantic web languages
  • RDF Resource Description Frameworkhttp//www.w3
    .org/RDF/
  • DAMLOIL Darpa Agent Markup Language
    http//www.daml.org/ (deprecated)
  • OWL Ontology Web Languagehttp//www.w3.org/2001
    /sw/
  • OWL lit
  • OWL DL
  • OWL Full

73
RDF is the first Semantic Web language
Graph
XML Encoding
RDF Data Model
ltrdfRDF ..gt lt.gt lt.gt lt/rdfRDFgt
Good For HumanViewing
Good for MachineProcessing
Triples
stmt(docInst, rdf_type, Document) stmt(personInst,
rdf_type, Person) stmt(inroomInst, rdf_type,
InRoom) stmt(personInst, holding,
docInst) stmt(inroomInst, person, personInst)
RDF is a simple language for building graph based
representations
Good For Reasoning
74
The RDF Data Model
  • An RDF document is an unordered collection of
    statements, each with a subject, predicate and
    object (aka triples)
  • A triple can be thought of as a labelled arc in a
    graph
  • Statements describe properties of web resources
  • A resource is any object that can be pointed to
    by a URI
  • a document, a picture, a paragraph on the Web,
  • E.g., http//umbc.edu/ypeng/F07671.html
  • a book in the library, a real person (?)
  • isbn//5031-4444-3333
  • Properties themselves are also resources (URIs)

75
(No Transcript)
76
RDF without a Schema
  • Object -gtAttribute-gt Value triples
  • objects are web-resources
  • Value is again an Object
  • triples can be linked
  • data-model graph

77
Bluffers guide to RDF (2)
  • Every identifier is a URL
  • world-wide unique naming!
  • Has XML syntax
  • Any statement can be an object
  • graphs can be nested

78
What does RDF Schema add?
  • Defines vocabulary for RDF
  • Organizes this vocabulary in a typed hierarchy
  • Class, subClassOf, type
  • Property, subPropertyOf
  • domain, range

Person
subClassOf
subClassOf
range
domain
Author
Reader
communicatesTo
type
type
communicatesTo
Lynda
Frank
79
Which Semantic Web?
  • Version 1"Semantic Web as Web of Data" (TBL)
  • recipeexpose databases on the web, use XML,
    RDF, integrate
  • metadata from
  • expressing DB schema semantics in machine
    interpretable ways
  • enable integration and unexpected re-use

80
Which Semantic Web?
  • Version 2Enrichment of the current Web
  • recipeAnnotate, classify, index
  • metadata from
  • automatically producing markup named-entity
    recognition, concept extraction, tagging, etc.
  • enable personalization, search, browse,..

81
Which Semantic Web?
  • Version 1Semantic Web as Web of Data
  • Version 2Enrichment of the current Web
  • Different use-cases
  • Different techniques
  • Different users

82
Four popular fallacies about the Semantic Web
Semantic Web research
83
First clear up some popular misunderstandings
  • False statement No ?
  • Semantic Web people try to enforce meaning from
    the top

They only enforce a language.They dont
enforce what is said in that language Compare
HTML enforced from the top,But content is
entirely free.
84
First clear up some popular misunderstandings
  • False statement No ?
  • The Semantic Web people will require everybody
    to subscribe to a single predefined "meaning" for
    the terms we use.

Of course, meaning is fluid, contextual,
etc. Lots of work on (semi)-automatically
bridging between different vocabularies.
85
First clear up some popular misunderstandings
  • False statement No ?
  • The Semantic Web will require users to
    understand the complicated details of formalised
    knowledge representation.

All of this is under the hood.
86
First clear up some popular misunderstandings
  • False statement No ?
  • The Semantic Web people will require us to
    manually markup all the existing web-pages.

Lots of work on automatically producing semantic
markup named-entity recognition, concept
extraction, etc.
87
The current state of Semantic Web
Semantic Web research
88
4 hard questions about the Semantic Web
  • Q1 "where does the meta-data come from?
  • NL technology is delivering on
    concept-extraction
  • Socially emerging (learning from tagging).
  • Q2 where do the meta-data-schema come from?
  • many handcrafted schema
  • hierarchy learning remains hard
  • relation extraction remains hard.
  • Q3 what to do with many meta-data schema?
  • ontology mapping/aligning remains VERY hard.
  • Q4 wheres the Web in the Semantic Web?
  • more attention to social aspects (P2P, FOAF)
  • non-textual media remains hard
  • deal with typical Web requirements.

89
Advanced Search
  • Metadata and semantic web will make advanced
    search much easier
  • Growth of web metadata.
  • Folksonomies!
  • Tools that automatically generate metadata
  • TREC 2008

90
Search for Web 3.0
  • Natural language queries
  • Search agent (avatar) understands and anticipates
    your needs
  • Personal life search with avatar

91
The Evolving Web
DATA/PROGRAMS
DOCUMENTS
92
Web Semantics
Semantic Web LayerCake (Berners-Lee,
99Swartz-Hendler, 2001)
93
Semantic Web 2008 - ?
(Jim Hendler - internal talk, Microsoft Labs,
July 2008)
94
Web 4.0 -?)
95
The next 5000 days of the web
  • Kevin Kelly
  • Founder of WIRED magazine
  • Video

96
Web 4.0
  • Machines talk back!

97
Search for Web 4.0
  • We get real help when we search!

Terminator the Sarah Connor Chronicles Camerons
on our side!
98
What we covered
  • The web of data
  • xml, rdf, others
  • Web 2.0
  • The social web
  • Web 3.0
  • The semantic web
  • Future of the web
Write a Comment
User Comments (0)
About PowerShow.com