Web Intelligence - PowerPoint PPT Presentation

About This Presentation
Title:

Web Intelligence

Description:

For example, would you consider these intelligent? ... One of the most important advances in making the web intelligent is through the use of agents. ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 30
Provided by: ott82
Category:

less

Transcript and Presenter's Notes

Title: Web Intelligence


1
Web Intelligence
  • By Otto Borchert
  • April 28, 2003

2
Background
  • Application Layer / HTTP
  • Agents
  • Present - Google / Page Rank
  • Future - Semantic Web / OWL

3
Hypertext Transfer Protocol (HTTP)
  • Application level protocol (World Wide Web)
  • Runs over TCP, normally port 80
  • Information retrieved using a URL (Uniform
    Resource Locator) protocol//hostport
  • Typical HTTP packet format
  • START_LINEltCRLFgt
  • MESSAGE_HEADERltCRLFgt
  • ltCRLFgt
  • MESSAGE_BODYltCRLFgt

4
Request Messages
  • Given by client on START_LINE
  • Includes
  • OPTIONS request information about available
    options
  • GET (one of 2 most commonly used) retrieve
    document identified in URL
  • HEAD (other most common used) retrieve
    metainformation about document identified in URL
    (find out how old a page is)
  • POST give information to server
  • PUT store document under specified URL
  • DELETE delete specified URL
  • TRACE loopback request message
  • CONNECT for use by proxies

5
Example request
  • GET http//www.cs.ndsu.nodak.edu/index.html
    HTTP/1.1
  • Give entire descriptor in START_LINE
  • GET index.html HTTP/1.1
  • Host www.cs.ndsu.nodak.edu
  • Precise page given in START_LINE, host in
    MESSAGE_HEADER

6
Server reply
  • Server replies with a Response Message
  • Contains version of HTTP being used, 3 digit code
    indicating whether or not the request was
    successful and the reason for giving that code

7
Codes
  • 1xx Informational (Request received, continuing
    process)
  • 2xx Success (Action successfully received,
    understood, and accepted)
  • 3xx Redirection (further action must be taken
    to complete the request)
  • 4xx Client Error (request contains bad syntax
    or cannot be fufilled)
  • 5xx Server Error (server failed to fulfill an
    apparently valid request)

8
Example Replies
  • HTTP/1.1 202 Accepted
  • Web page request accepted, displays page
  • HTTP/1.1 404 Not Found
  • The usual not found error
  • HTTP/1.1 301 Moved Permanently
  • The page has moved, includes a MESSAGE_HEADER
    like in request to tell where the page has been
    moved to

9
HTTP extras
  • In version 1.0 one TCP connection for each
    request. 1.1 allowed for persistent connections
  • HTTP was set up with web caching in mind. One can
    check the date a page was last updated and store
    the newest versions of frequently accessed pages
    on a local machine

10
Is the web intelligent?
  • Intelligence is a poorly defined word anyway. For
    example, would you consider these intelligent?
  • Document analysis systems for cataloging and
    summarizing Web pages
  • Profiling systems for placing selective Web
    advertising
  • Data mining and analysis
  • Tools for searching databases supported by Web
    browsers
  • Translation tools that convert to and from human
    languages
  • Statistical software for network caching,
    routing, and tracking
  • Knowledge-based systems for automated e-mail
    reading
  • Smart agents for Internet-based product and
    service marketing
  • Video object recognition and searching

11
Is the web intelligent? (2)
  • One of the most important advances in making the
    web intelligent is through the use of agents.
  • These agents take many forms including many
    listed on the previous slide

12
What is an agent?
  • No standard definition
  • Can be
  • Web Crawler
  • Travel Agent
  • Secretary
  • Hard to distinguish between agent and program.
    Agent normally performs actions based on data it
    finds, without much human intervention
  • Agents can be defined as intelligent as well
  • Act as the glue for many of the following ideas

13
The Present of Web Intelligence - Google
  • Presently the most used search engine the
    Internet has to offer.
  • Provides a unique blend of computer hardware and
    software to complete millions of user searches
    each day
  • Based on a system called Page Rank

14
PageRank
  • Developed by Larry Page and Sergey Brin at
    Stanford University (Googles founders)
  • Uses a system of link ranking
  • If there is a link from page A to page B, page B
    is correlated to page A
  • If page A is a strong page to begin with, page B
    becomes stronger as well

15
Word Association
  • On top of PageRank, there is also a system of
    word matching.
  • Word counts (Do the words exist on the page?)
  • Proximity checks (Are the words close together?)

16
Cant you cheat PageRank?
  • People try everyday!
  • Higher search ranking More exposure
  • Link Farms
  • Places where people merely have millions of links
    to a web page in hopes the target will move
    higher on the list.
  • Googles answer Page importance. Once link farms
    are discovered, they are given a negative rank,
    so if you have a page on a link farm, its rank
    will go down as well

17
Another way to cheat
  • Put lots of words related to your page in your
    page (even if they are not visible)
  • Googles answer PageRank is primary, cheaters
    are given lower priority

18
Moral Decisions
  • Wired article
  • Computer screen shows location, query pairs for
    random searches on Googles engines.
  • One search during the late hours on the West
    Coast was How to stop a friend from committing
    suicide
  • Cant do much about it but make sure they get the
    right information the next time

19
The Future of Web Intelligence
  • The Semantic Web

20
What is the Semantic Web?
  • As the web presently stands, it is complete
    nonsense to most software applications.
  • Two completely different statements
  • The ball is round
  • The round ball
  • The semantic web is a series of protocols meant
    to enrich the current web with meaning

21
Series of Protocols
  • RDF Resource Description Framework
  • OWL Web Ontology Language (extension of RDF)

22
Resource Description Framework
  • From World Wide Web Consortium webpage
  • RDF defines a mechanism for describing resources
    that makes no assumptions about a particular
    application domain, nor defines (a priori) the
    semantics of any application domain. The
    definition of the mechanism should be domain
    neutral, yet the mechanism should be suitable for
    describing information about any domain

23
RDF Some examples
  • Ora Lassila is the creator of the resource
    http//www.w3.org/Home/Lassila.
  • Abstract, conceptual Framework
  • Concrete syntax using XML

24
Abstract example
  • Subject (Resource) 
  • http//www.w3.org/Home/Lassila   
  • Predicate (Property)   
  • Creator  
  • Object (literal)   
  • "Ora Lassila
  • Graphic

25
Concrete syntax
  • Ora Lassila is the creator of the resource
    http//www.w3.org/Home/Lassila.
  • ltrdfRDFgt
  • ltrdfDescription about"http//www.w3.org/Home/
    Lassila"gt
  • ltsCreatorgtOra Lassilalt/sCreatorgt
  • lt/rdfDescriptiongt
  • lt/rdfRDFgt

26
Web Ontology Language
  • What is an ontology?
  • defines the terms used to describe and represent
    an area of knowledge
  • OWL defines ontologies for use on the web
  • Actually an extension of RDF

27
Ontologies
  • Date and Time
  • Countries of the World
  • Wines
  • Space Shuttle Information

28
Some example OWL statements
  • ltowlClass rdfID"WineGrape"gt
  • ltrdfssubClassOf rdfresource"foodGrape" /gt
  • lt/owlClassgt
  • ltowlClass rdfID"WhiteWine"gt
  • ltowlintersectionOf rdfparseType"Collection"gt
  • ltowlClass rdfabout"Wine" /gt
  • ltowlRestrictiongt
  • ltowlonProperty rdfresource"hasColor"
    /gt
  • ltowlhasValue rdfresource"White" /gt
  • lt/owlRestrictiongt
  • lt/owlintersectionOfgt
  • lt/owlClassgt

29
Conclusion
  • Web intelligence is a broad new field for
    exploration
  • Present efforts like Google can be improved upon
    with more semantic information
  • Any questions?
Write a Comment
User Comments (0)
About PowerShow.com