Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings

1 / 24
About This Presentation
Title:

Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings

Description:

Undue Influence: Eliminating the Impact of Link Plagiarism on Web ... http://mortgage-rate-refinancing.com/mortgage-calculator.html. Factors that degrade HITS ... – PowerPoint PPT presentation

Number of Views:395
Avg rating:3.0/5.0
Slides: 25
Provided by: wu1

less

Transcript and Presenter's Notes

Title: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings


1
Undue Influence Eliminating the Impact of Link
Plagiarism on Web Search Rankings
  • Baoning Wu and Brian D. Davison
  • Lehigh University
  • Symposium on Applied Computing 2006

2
Motivation
  • Link-based ranking algorithms are important to
    current popular search engines. (e.g., HITS for
    Teoma)
  • Link farms will deteriorate the performance of
    link-based ranking algorithms

3
HITS algorithm
  • Each page has two measures, authority score a
    shows how good this page is for a query, hub
    score h shows the possibility that the page
    points to good authority pages. E is the
    adjacency matrix.
  • a ET h
  • h E a

4
Example for query weather
  • http//www.tripadvisor.com/
  • http//www.virtualtourist.com/
  • http//www.abed.com/memoryfoam.html
  • http//www.abed.com/furniture.html
  • http//www.rental-car.us/
  • http//www.accommodation-specials.com/
  • http//www.lasikeyesurgery.com/
  • http//www.lasikeyesurgery.com/lasik-surgery.asp
  • http//mortgage-rate-refinancing.com/
  • http//mortgage-rate-refinancing.com/mortgage-calc
    ulator.html

5
Factors that degrade HITS
  • Mutually reinforcing relationships
  • Duplicate pages
  • Link farms

6
Complete hyperlink
  • Definition
  • The link with its anchor text as a unit.
  • Duplication of a complete link is a much stronger
    sign of copying behavior on the Web than a
    duplicate link target.

7
Document - Complete link Matrix
8
Bipartite Graph
  • Two disjoint sets X and Y, each edge starts from
    an element in X and ends with an element in Y.

9
Link farms
  • Link farms are usually densely connected via
    multiple overlapping small bipartite cores.
  • Task to detect densely connected bipartite
    components from document - complete link matrix

10
Algorithm for finding bipartite components
11
Result k2 and l2
12
Adjustment document-document matrix
13
Final matrix
14
Weighted adjacency matrix
15
Experiment HITS result of rental car
  • http//www.discountcars.net/
  • http//www.motel-discounts.com/
  • http//www.stlouishoteldeals.com/
  • http//www.richmondhoteldeals.com/
  • http//www.jacksonvillehoteldeals.com/
  • http//www.jacksonhoteldeals.com/
  • http//www.keywesthoteldeals.com/
  • http//www.austinhoteldeals.com/
  • http//www.gatlinburghoteldeals.com/
  • http//www.ashevillehoteldeals.com/

16
Experiment BH HITS result of rental car
  • http//www.rentadeal.com/
  • http//www.allaboutstlouis.com/
  • http//www.allaboutboston.com/
  • https//travel2.securesites.com/
  • about_travelguides/addlisting.html
  • http//www.allaboutsanfranciscoca.com/
  • http//www.allaboutwashingtondc.com/
  • http//www.allaboutalbuquerque.com/
  • http//www.allabout-losangeles.com/
  • http//www.allabout-denver.com/
  • http//www.allabout-chicago.com/

17
Experiment CL-HITS result of rental car
  • http//www.hertz.com/
  • http//www.avis.com/
  • http//www.nationalcar.com/
  • http//www.thrifty.com/
  • http//www.dollar.com/
  • http//www.alamo.com/
  • http//www.budget.com/
  • http//www.enterprise.com/
  • http//www.budgetrentacar.com/
  • http//www.europcar.com/

18
Experiment BH HITS result of translation
online
  • http//www.no-gambling.com/
  • http//www.teleorg.org/
  • http//ong.altervista.org/
  • http//bx.b0x.com/
  • http//video-poker.batcave.net/
  • http//www.websamba.com/marketing-campaigns
  • http//online-casino.o-f.com/
  • http//caribbean-poker.webxis.com/
  • http//roulette.zomi.net/
  • http//teleservices.netfirms.com/

19
Experiment CL-HITS result of translation
online
  • http//www.freetranslation.com/
  • http//www.systransoft.com/
  • http//babelfish.altavista.com/
  • http//www.yourdictionary.com/
  • http//dictionaries.travlang.com/
  • http//www.google.com/
  • http//www.foreignword.com/
  • http//www.babylon.com/
  • http//www.worldlingo.com/products_services
  • /worldlingo_translator.html
  • http//www.allwords.com/

20
Duplicate example BH-HITS result of maps
  • http//www.maps.com/
  • http//www.mapsworldwide.com/
  • http//www.cartographic.com/
  • http//www.amaps.com/
  • http//www.cdmaps.com/
  • http//www.ewpnet.com/maps.htm
  • http//mapsguidesandmore.com/
  • http//www.njdiningguide.com/maps.html
  • http//www.stanfords.co.uk/
  • http//www.delorme.com/

21
Duplicate example CL-HITS result of maps
  • http//www.maps.com/
  • http//maps.yahoo.com/
  • http//www.delorme.com/
  • http//tiger.census.gov/
  • http//www.davidrumsey.com/
  • http//memory.loc.gov/ammem/gmdhtml/gmdhome.html
  • http//www.esri.com/
  • http//www.maptech.com/
  • http//www.streetmap.co.uk/
  • http//www.libs.uga.edu/darchive/hargrett/maps/map
    s.html

22
User evaluation
23
Discussion
  • Using link alone, the precision at 10 is 66.4.
    Much lower than using complete link.
  • Random anchor texts.

24
  • Questions?
  • baw4_at_cse.lehigh.edu
  • davison_at_cse.lehigh.edu
Write a Comment
User Comments (0)