Web content mining - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Web content mining

Description:

Web content mining – PowerPoint PPT presentation

Number of Views:838
Avg rating:3.0/5.0
Slides: 11
Provided by: home79
Category:

less

Transcript and Presenter's Notes

Title: Web content mining


1
Web content mining
  • Oral presentation of the project
  • -------Zheng xianghan,Huwen,Zhangli

2
1.Short summary
  • Introduction of Background
  • The meaning of the project
  • What we are doing in this project

3
2. Introduction, discribing planned work
  • The goal of this project is to create a crawler
    that downloads the images in a web page and tries
    to classify the content of each image into
    different categories, e.g. ,mathematical formula,
    logo, buttons, and so on. The focus should be on
    automatic detection of image usage that reduces
    the accessibility of a web page.

4
Planed work
  • Download the Harvestman ,and then learn it (some
    of the Python code in the image crawler)
  • Learn Entropy, Linescan, Benford and so on
  • Programming according to several algorithms above
    to categorize the picture
  • Final report

5
3. Literature survey
  • We checked many algorithms which related to our
    project, such as standard deviation, entropy,
    pixel search, benfords law, etc. We also learn
    some python language in order to implement our
    algorithms. The reference is listed

6
Reference
  • www.python.org
  • www.pythonware.com/products/pil/
  • Very usful website for learning python
    language.
  • Digital Image Processing Richard E.Woods
  • An advanced book which introduce the image
    processing technology.
  • Benford article, www.rexswain.com/benford
  • Introduction and application of the benford
    algorithm
  • www.striker.ottawa.on.ca/aland/isreal
  • Isreal-a picture analysis tool

7
4.Problem statement
  • Algorithm
  • We find some obstacle in understanding the
    algorithms, because of the lack knowledge in
    image processing. We should find more efficient
    algorithms and then find which one is the best in
    classifying different kinds of the image.
  • Programming
  • We had to learn Python language to apply the
    algorithm, which is also new to us.

8
5.Requirement
  • We need more reference about the algorithm.
  • Python software and PIL (Python image library) to
    code.
  • Harvestman software to crawl the image from the
    websites.

9
Member Role in the project
  • Zheng xianghan-------leader of the web content
    mining project.
  • Huwen--------learn the algorithm in the field of
    Digital image processing.
  • Zhangli-------Coding based on some algorithms.

10
Addition
  • Web page and the contact information
  • http//home.hia.no/xiangz05/
  • Zheng xianghan xiangz05_at_hia.no
  • Huwen hwcarey_at_hotmail.com
  • Zhangli ttbb620_at_hotmail.com
Write a Comment
User Comments (0)
About PowerShow.com