Geographic Web Information Retrieval - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Geographic Web Information Retrieval

Description:

Zip-codes. Area-codes. On Site Level. Whois. Business ... Area code. SEARCH. 13. Dynamic Geographic-IR. Replacing the 'next' button. Closer. Continue ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 23
Provided by: Bobo
Category:

less

Transcript and Presenter's Notes

Title: Geographic Web Information Retrieval


1
Geographic Web Information Retrieval
  • Alexander Markowetz, University of Marburg
  • Thomas Brinkhoff, FH Oldenburg
  • Bernhard Seeger, University of Marburg

2
Current Situation In Web-IR
  • Everybody is online
  • But never seen

3
Current Situation In Web-IR
  • Queries are too short
  • Resultsets are too large
  • You can effectively block your competitors
  • Good results get buried
  • Smaller Results
  • Ways to drill the ice-berg

4
Solutions
  • Personalized Search
  • Dynamic/Interactive Search

5
Geographic Web-IR
  • Location is the most personal property
  • All business is local
  • People already use the web geographically
  • Yoga Brooklyn
  • Linux usergroup Frankfurt
  • And get poor results
  • We are going to make that a lot better

6
How-Not-To
  • Semantic Web
  • If just everybody included Geographic Markup in
    their web-pages
  • Two problems
  • Chicken-Egg
  • Malicious Webmaster
  • Metatags Anyone?
  • Bottomline
  • Semantic web is for B2B situations only.

7
How-To
  • Modify traditional IR techniques to extract
    geographic markers
  • Multigranular approach
  • Extending basic Web-IR
  • Map pages to geographic positions
  • Footprint
  • Aggregate and Cluster them
  • Build Applications
  • Geographic Search
  • Geographic Web-Mining

8
Geocoding
  • Footprint
  • Geographic Position of a Webpage
  • Set of points and polygons, associated with some
    amplitude

9
Preliminaries
  • Basic IR Assumptions can easily be extended to
    geographic-IR
  • Radius-1 Hypothesis
  • Radius-2 Hypothesis (co-citation)
  • Intra-Site Hypothesis
  • Intra-subdomain
  • Intra-directory

10
Multigranularity
  • Information extraction on different levels
  • Domain
  • Subdomain
  • Directory
  • File
  • Need to aggregate

11
Sources
  • On all levels
  • Names of places
  • Zip-codes
  • Area-codes
  • On Site Level
  • Whois
  • Business Directories
  • Links
  • Density over a given area
  • Radius-1 and Radius-2
  • Geospatial Mapping and Navigation of the Web,
    Kevin S. McCurley, 10th WWW, 2001
  • Computing Geographical Scopes of Web Resources,
    J. Ding, L. Gravano, and N. Shivakumar, VLDB 2000

12
Geographic Search
  • SEARCH
  • A simple interface
  • Not so exciting, but...

13
Dynamic Geographic-IR
  • Replacing the next button

14
Locality
  • Final ranking is a (linear) combination of
    importance and geographic distance.
  • Chances are
  • Amazon will still rank first no matter where you
    are
  • Amazon is a global bully
  • Idea
  • Eliminate global bullies by computing importance
    differently
  • Give less weight to links that span a longer
    distance

15
Evaluation
  • Evaluation Web-IR is hard
  • Evaluating geo-Search is even harder
  • Mistakes are hard to find

16
Impact of geo-IR
  • Next generation Search Engine
  • Location based Service
  • For cellphones under UMTS
  • Move traffic from AE
  • Local companies will get more traffic
  • Increase Profits from Adwords
  • Smallest businesses will advertise online
  • Locally focused
  • The Leaflet-industry will shrink

17
Geographic Web-Mining
  • The web reflects human society.
  • Distorted
  • Delayed/Ahead
  • A lot of interesting social questions can be
    answered by looking at a large webcrawl
  • You can save time and money compared to
    door-to-door surveys
  • This is widely used
  • But
  • Most of them are of geographic nature

18
Example Queries
  • Where in Germany are vintage sneakers a trend?
  • Is there a fashion authority that is accepted in
    all regions of Germany?
  • Do Britney and Madonna have the same audience?
  • Draw a map of Germany with all sites about
    vintage sneakers.
  • Find all fashion-sites that get a min of 1000
    equally distributed links.
  • Map the areas in Germany, where there are
    significantly more Sites for B. than for M.
  • Precise Semantics?

19
Current Work
  • Older Prototype
  • Metasearch on top of lycos.de
  • Screen-scrape re-order
  • Whois only
  • Did very well

20
Current Work
  • Current Prototype for Geographic Search
  • Limited to Germany .de domains
  • 50.000.000 Pages
  • Expected online by late summer
  • In co-operation with
  • Yen-Yu Chen
  • Xiaohui Long
  • Torsten Suel
  • Polytechnic University, Brooklyn

21
Reinventing Web-IR
  • Nearly no (academic) work in geo-IR
  • Allmost every aspect of Web-IR needs to be looked
    at again
  • Interfaces
  • Query processing
  • Index distribution
  • Link analysis
  • User profile analysis
  • Spam detection
  • Even
  • Other aspects of personalized search
  • Changes in the web

22
Thank you
  • Any questions?
Write a Comment
User Comments (0)
About PowerShow.com