Search Engine using Web Mining - PowerPoint PPT Presentation


PPT – Search Engine using Web Mining PowerPoint presentation | free to download - id: 45330a-MzIwY


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Search Engine using Web Mining


Search Engine using Web Mining Web Mining Web Usage Mining is the process of applying data mining techniques to the discovery of usage patterns from Web data. – PowerPoint PPT presentation

Number of Views:327
Avg rating:3.0/5.0
Slides: 15
Provided by: anu126
Tags: engine | mining | search | usage | using | web


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Search Engine using Web Mining

Search Engine using Web Mining
  • COMS E6125.001
  • Web Enhanced Information Mgmt
  • Prof. Gail Kaiser
  • Presented By
  • Rupal Shah (UNI rrs2146)

Web Mining
  • Web Usage Mining is the process of applying data
  • techniques to the discovery of usage patterns
    from Web data.
  • Data mining efforts associated with the Web is
    known as Web
  • Mining.

Classification of Web Mining
  • Content Mining refers to the discovery of useful
    information from Web content, including text,
    images, audio, and video. Web content mining
    research includes resource discovery from the
    Web, document categorization and clustering, and
    information extraction from Web pages.
  • Usage Mining Web link structure has been widely
    used to infer important information about Web
  • Structure Mining to understand the structure of
    the Web as a whole. Citations (linkages) among
    Web pages are usually indicators of high
    relevance or good quality. The term in-links to
    indicate the hyperlinks pointing to a page and
    the term out-links to indicate the hyperlinks
    found in a page.

Data Source
  • The usage data collected at the different sources
    will represent the
  • navigation patterns of different segments of the
    overall Web
  • Traffic, ranging from single user, and single
    site browsing behavior
  • to multi user and multi site access patterns.
  • Server Level Collection
  • Client Level Collection
  • Proxy Level Collection

Server Level Collection
  • A Web server log is an important source for
    performing Web Usage Mining because it explicitly
    records the browsing behavior of site visitors.
  • The data recorded in server logs reflects the
    access of a Web site by multiple users. These
    logs can be stored in various formats such as
    Common log or Extended log formats.
  • Cookies are tokens generated by the Web server
    for individual client browsers in order to
    automatically track the site visitors. Tracking
    of individual users is not an easy task due to
    the stateless connection model of the HTTP

  • Cached page views are not recorded in a server
    log. In addition, any important information
    passed through the POST method will not be
    available in a server log.

Client Level Collection
  • It can be implemented by using a remote agent
    (such as Java scripts or Java applets) or by
    modifying the source code of an existing browser
    (such as Mosaic or Mozilla) to enhance its data
    collection capabilities.
  • The implementation of client-side data collection
    methods requires user cooperation, either in
    enabling the functionality of the Java scripts
    and Java applets, or to voluntarily use the
    modified browser.

Proxy Level Collection
  • A Web proxy acts as an intermediate level of
    caching between client browsers and Web servers.
    Proxy caching can be used to reduce the loading
    time of a Web page experienced by users as well
    as the network traffic load at the server and
    client sides.
  • Proxy traces may reveal the actual HTTP requests
    from multiple clients to multiple Web servers.
    This may serve as a data source for
    characterizing the browsing behavior of a group
    of anonymous users sharing a common proxy server.

Pattern Discovery
  • Discovering sequential pattern is to find
    inter-transaction patterns such that the presence
    of a set of items is followed by another item in
    the timestamp ordered transaction set. In Web
    server transaction logs a visit by a client is
    recorded over a period of time.
  • The discovery of sequential patterns in Web
    server access logs allows Web based organizations
    to predict user visit patterns and helps in
    targeting advertising aimed at groups of users
    based on these patterns By analyzing this
    information the Web mining system can determine
    temporal relationships.

Pattern Analysis
  • Pattern Analysis is to filter out uninteresting
    rules or patterns from the set found in the
    pattern discovery phase. The exact analysis
    methodology is usually governed by the
    application for which Web mining is done.
  • The most common form of pattern analysis consists
    of a knowledge query mechanism such as SQL.
  • Content and structure information can be used to
    filter out patterns containing pages of a certain
    usage type, content type, or pages that match a
    certain hyperlink structure.

Application of Web Mining
  • Counter-Terrorism
  • E-Commerce
  • Security Threat and many more

Future Scope of Web Mining
  • Web mining research has been the difficulty of
    creating suitable test collections that can be
    reused by researchers. A test collection is
    important because it allows researchers to
    compare different algorithms using a standard
    test-bed under the same conditions, without being
    affected by such factors as Web page changes or
    network traffic variations.
  • Although textual documents are comparatively easy
    to index, retrieve, and analyze, operations on
    multimedia files are much more difficult to
    perform and with multimedia content on the Web
    growing rapidly, Web mining has become a
    challenging problem. Various machine-learning
    techniques have been employed to address this
    issue. Predictably, research in pattern
    recognition and image analysis has been adapted
    for study of multimedia documents on the Web.

  • As Web and its usage continues to grow, so it
    grows the opportunity to analyze Web data and
    extract all manner of useful knowledge from it.
  • Web Mining is still in their initial stage and
    should continue to develop as Web evolves. One
    future research direction for Web Mining is
    Multimedia data mining. In addition to textual
    documents like HTML, MS Word, PDF and Plain text
    files, a large number of multimedia documents are
    contained on the Web such as images, audio and

  • Thank You