Dispatching Java agents to user for data extraction from third party web sites - PowerPoint PPT Presentation

About This Presentation
Title:

Dispatching Java agents to user for data extraction from third party web sites

Description:

Dispatching Java agents to user for data extraction from third party web sites Alex Roque F.I.U. HPDRC – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 17
Provided by: log73
Learn more at: http://users.cis.fiu.edu
Category:

less

Transcript and Presenter's Notes

Title: Dispatching Java agents to user for data extraction from third party web sites


1
Dispatching Java agents to user for data
extraction from third party web sites
  • Alex Roque
  • F.I.U. HPDRC

2
Introduction
  • Since the WWW has grown exponentially, data
    retrieval has become an intensive research topic.
  • However, mechanisms and tools that give users
    more power over the data on the web have not
    grown in parallel with data increase.
  • For example, no tools exists that allow the user
    to extract data from HTML context and use in an
    external application.

3
  • A tool created to allow a more coherent and wider
    set of automatic data extraction, was the Data
    Extractor system, which treats any Web sites as
    a data source.
  • Data extractor has two kinds of implementation,
    as a standalone server solution and a set of
    functionality that can be embedded in
    applications and provide them with data from the
    internet.

4
Data Extractor Inefficiencies
  • Performance in multi client conditions
  • Network performance issues
  • Legal issues
  • Installing exclusive local server for clients is
    a, however, it is expensive. Our alternative, is
    MDRA Mobile Data Retrieval Agents.

5
MDRA Composition and Delivery
  • The mobile agents server, contains a wrapper
    portal and a knowledgebase
  • Functionality is as follows
  • 1) Users connect to wrapper portal and request
    wrapper
  • 2) In response, package to extract data is
    constructed and sent to client
  • 3) Data extraction takes place in client

6
  • Wrapper portals List and package wrappers,
    authenticates users, and allows them to change
    and save their queries (references to wrappers).
  • Knowledgebase Contains information about
    available wrappers, their parameters and status.
  • Wrappers can be thought of as lightweight
    programs which use a predefined OO library to
    strip desired information.

7
MDRA Architecture
  • Mobile wrapper controller Responsible for
    controlling behavior of wrappers and flow of data
  • Wrappers Same as the ones used in Data
    Extractor, process which strips data from web
    site.
  • Data Extraction Library Contains functionality
    essential for extraction and network operations.
    Compact can be cached if no update is required.
  • Outer packaging Interface for uniting numerous
    wrappers and controllers.

8
How does execution take place?
  1. Query formulation
  2. Agent construction and delivery
  3. Agent Execution
  4. Data Delivery

9
Query Formulation
  • User connects to wrapper portal, wrappers are
    listed, user selects desired wrapper(s) as well
    configures execution parameters.
  • This configuration can be saved for future
    reference.

10
Agent construction and delivery
  • Wrapper portal begins packaging including outer
    packaging module, wrapper parameter information,
    wrapper controller, wrapper and Data Extraction
    Library.
  • Components that change frequently are packaged
    separately from the one that do (aids caching).
  • Compression or digital signatures take place.

11
Agent execution
  • Once delivered to the client, wrappers interact
    with WWW sites, and extract the desired data.
  • Data is passed to outer packaging controller
    where it can be used in applications or stored in
    various mediums.

12
Data Delivery
  • Data retrieved may be transferred to other
    applications programmatically, stored in various
    mediums (Excel, XML, Text), or stored in
    databases.
  • May be used for statistical data collection.

13
Source Code Implementation
  • Because the system needs to have a high degree of
    portability, JAVA language was used to perform
    the implmentation.
  • Previous Data Extractor was written in Java, so
    in order to reuse modules, JAVA was again used.
  • Speed Performance issues were addressed 7.

14
MDRA Framework
  • In order to deliver MDRA to clients, the method
    of delivery is that of a Java Applet.
  • Applets allow to portability which allows clients
    of different platforms to participate in this
    data retrieval.
  • Since framework code and libraries do not change
    often, browsers that cache java applets will keep
    parts that do not change

15
Security
  • Applets must be digitally signed in order to for
    them to access system and network resources
    needed for the retrieval.
  • Proxy servers may be created where the applet was
    downloaded from in order to give Applets ability
    to download third party web sites. However, this
    option is prone to a high bottleneck congestion.

16
Conclusion
  • MDRA lease data extraction services to users,
    which retrieve data that can be exported to other
    applications,
  • This distributed approach takes the load on the
    centralized server architecture.
  • Future research includes different MDRA
    implementations (standalone, embedded in client
    side), and tuning of agent performance.
Write a Comment
User Comments (0)
About PowerShow.com