Web Intelligence - PowerPoint PPT Presentation


PPT – Web Intelligence PowerPoint presentation | free to download - id: e1057-YmM1Y


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Web Intelligence


For example, would you consider these intelligent? ... One of the most important advances in making the web intelligent is through the use of agents. ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 30
Provided by: ott82


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Web Intelligence

Web Intelligence
  • By Otto Borchert
  • April 28, 2003

  • Application Layer / HTTP
  • Agents
  • Present - Google / Page Rank
  • Future - Semantic Web / OWL

Hypertext Transfer Protocol (HTTP)
  • Application level protocol (World Wide Web)
  • Runs over TCP, normally port 80
  • Information retrieved using a URL (Uniform
    Resource Locator) protocol//hostport
  • Typical HTTP packet format
  • ltCRLFgt

Request Messages
  • Given by client on START_LINE
  • Includes
  • OPTIONS request information about available
  • GET (one of 2 most commonly used) retrieve
    document identified in URL
  • HEAD (other most common used) retrieve
    metainformation about document identified in URL
    (find out how old a page is)
  • POST give information to server
  • PUT store document under specified URL
  • DELETE delete specified URL
  • TRACE loopback request message
  • CONNECT for use by proxies

Example request
  • GET http//www.cs.ndsu.nodak.edu/index.html
  • Give entire descriptor in START_LINE
  • GET index.html HTTP/1.1
  • Host www.cs.ndsu.nodak.edu
  • Precise page given in START_LINE, host in

Server reply
  • Server replies with a Response Message
  • Contains version of HTTP being used, 3 digit code
    indicating whether or not the request was
    successful and the reason for giving that code

  • 1xx Informational (Request received, continuing
  • 2xx Success (Action successfully received,
    understood, and accepted)
  • 3xx Redirection (further action must be taken
    to complete the request)
  • 4xx Client Error (request contains bad syntax
    or cannot be fufilled)
  • 5xx Server Error (server failed to fulfill an
    apparently valid request)

Example Replies
  • HTTP/1.1 202 Accepted
  • Web page request accepted, displays page
  • HTTP/1.1 404 Not Found
  • The usual not found error
  • HTTP/1.1 301 Moved Permanently
  • The page has moved, includes a MESSAGE_HEADER
    like in request to tell where the page has been
    moved to

HTTP extras
  • In version 1.0 one TCP connection for each
    request. 1.1 allowed for persistent connections
  • HTTP was set up with web caching in mind. One can
    check the date a page was last updated and store
    the newest versions of frequently accessed pages
    on a local machine

Is the web intelligent?
  • Intelligence is a poorly defined word anyway. For
    example, would you consider these intelligent?
  • Document analysis systems for cataloging and
    summarizing Web pages
  • Profiling systems for placing selective Web
  • Data mining and analysis
  • Tools for searching databases supported by Web
  • Translation tools that convert to and from human
  • Statistical software for network caching,
    routing, and tracking
  • Knowledge-based systems for automated e-mail
  • Smart agents for Internet-based product and
    service marketing
  • Video object recognition and searching

Is the web intelligent? (2)
  • One of the most important advances in making the
    web intelligent is through the use of agents.
  • These agents take many forms including many
    listed on the previous slide

What is an agent?
  • No standard definition
  • Can be
  • Web Crawler
  • Travel Agent
  • Secretary
  • Hard to distinguish between agent and program.
    Agent normally performs actions based on data it
    finds, without much human intervention
  • Agents can be defined as intelligent as well
  • Act as the glue for many of the following ideas

The Present of Web Intelligence - Google
  • Presently the most used search engine the
    Internet has to offer.
  • Provides a unique blend of computer hardware and
    software to complete millions of user searches
    each day
  • Based on a system called Page Rank

  • Developed by Larry Page and Sergey Brin at
    Stanford University (Googles founders)
  • Uses a system of link ranking
  • If there is a link from page A to page B, page B
    is correlated to page A
  • If page A is a strong page to begin with, page B
    becomes stronger as well

Word Association
  • On top of PageRank, there is also a system of
    word matching.
  • Word counts (Do the words exist on the page?)
  • Proximity checks (Are the words close together?)

Cant you cheat PageRank?
  • People try everyday!
  • Higher search ranking More exposure
  • Link Farms
  • Places where people merely have millions of links
    to a web page in hopes the target will move
    higher on the list.
  • Googles answer Page importance. Once link farms
    are discovered, they are given a negative rank,
    so if you have a page on a link farm, its rank
    will go down as well

Another way to cheat
  • Put lots of words related to your page in your
    page (even if they are not visible)
  • Googles answer PageRank is primary, cheaters
    are given lower priority

Moral Decisions
  • Wired article
  • Computer screen shows location, query pairs for
    random searches on Googles engines.
  • One search during the late hours on the West
    Coast was How to stop a friend from committing
  • Cant do much about it but make sure they get the
    right information the next time

The Future of Web Intelligence
  • The Semantic Web

What is the Semantic Web?
  • As the web presently stands, it is complete
    nonsense to most software applications.
  • Two completely different statements
  • The ball is round
  • The round ball
  • The semantic web is a series of protocols meant
    to enrich the current web with meaning

Series of Protocols
  • RDF Resource Description Framework
  • OWL Web Ontology Language (extension of RDF)

Resource Description Framework
  • From World Wide Web Consortium webpage
  • RDF defines a mechanism for describing resources
    that makes no assumptions about a particular
    application domain, nor defines (a priori) the
    semantics of any application domain. The
    definition of the mechanism should be domain
    neutral, yet the mechanism should be suitable for
    describing information about any domain

RDF Some examples
  • Ora Lassila is the creator of the resource
  • Abstract, conceptual Framework
  • Concrete syntax using XML

Abstract example
  • Subject (Resource) 
  • http//www.w3.org/Home/Lassila   
  • Predicate (Property)   
  • Creator  
  • Object (literal)   
  • "Ora Lassila
  • Graphic

Concrete syntax
  • Ora Lassila is the creator of the resource
  • ltrdfRDFgt
  • ltrdfDescription about"http//www.w3.org/Home/
  • ltsCreatorgtOra Lassilalt/sCreatorgt
  • lt/rdfDescriptiongt
  • lt/rdfRDFgt

Web Ontology Language
  • What is an ontology?
  • defines the terms used to describe and represent
    an area of knowledge
  • OWL defines ontologies for use on the web
  • Actually an extension of RDF

  • Date and Time
  • Countries of the World
  • Wines
  • Space Shuttle Information

Some example OWL statements
  • ltowlClass rdfID"WineGrape"gt
  • ltrdfssubClassOf rdfresource"foodGrape" /gt
  • lt/owlClassgt
  • ltowlClass rdfID"WhiteWine"gt
  • ltowlintersectionOf rdfparseType"Collection"gt
  • ltowlClass rdfabout"Wine" /gt
  • ltowlRestrictiongt
  • ltowlonProperty rdfresource"hasColor"
  • ltowlhasValue rdfresource"White" /gt
  • lt/owlRestrictiongt
  • lt/owlintersectionOfgt
  • lt/owlClassgt

  • Web intelligence is a broad new field for
  • Present efforts like Google can be improved upon
    with more semantic information
  • Any questions?
About PowerShow.com