TypoSquatting: a Nuisance or a Threat to Your Traffic - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

TypoSquatting: a Nuisance or a Threat to Your Traffic

Description:

Search engines typo-corrections and browser auto-completions! ... Top Ten Target Domains. Responsible of 55% to all typo-squatting queries of Alexa-500 ... – PowerPoint PPT presentation

Number of Views:191
Avg rating:3.0/5.0
Slides: 43
Provided by: malmishai
Category:

less

Transcript and Presenter's Notes

Title: TypoSquatting: a Nuisance or a Threat to Your Traffic


1
Typo-Squatting a Nuisance or a Threat to Your
Traffic?
  • Mishari Almishari

2
Outline
  • Introduction
  • Background
  • Methodology
  • Parked Domain Classifier
  • Measurements
  • Future Work
  • Related Work
  • Conclusion

3
Introduction - Motivation
  • Traffic is important to web domains!
  • no point of launching without incoming traffic
  • Loosing/Gaining traffic means loosing/gaining
    money
  • One way to price the ADS is Pay Per Click Model
  • Traffic Diversion could be a serious threat to a
    domain

4
Introduction - Motivation
  • Typos may attract traffic
  • Users vulnerable to making typos
  • Users may forget about visiting target domain
  • Threat to Target Domain!
  • Intentionally registering such typo domains is
    called Typo-squatting

5
Introduction - Goal
  • To study how much traffic typo-squatters can get
    from target domains
  • Are those domains attracting much traffic?
  • There are many typo-squatting domains registered
    (Banerjee et al., 08)
  • Search engines typo-corrections and browser
    auto-completions!
  • How much traffic target domains are loosing?
  • Is it of negligible ratio or a serious threat?
  • Do users go back to target domains or get
    distracted?

6
Introduction - Challenges
  • How to identify typo-squatting domains?
  • Does Typo mean Typo-squatting?
  • Short Domains
  • www.abc.com and www.abd.com
  • Longer Domains
  • www.walmart.com and www.walkmart.com
  • If not, how can we?
  • Hijacking indicator

7
Introduction - Contribution
  • Automatic and accurate identification of
    typo-squatting domains (Measurement Methodology)
  • Bound on how much traffic target domains are
    loosing towards typo-squatting domains
    (Measurement Results)

8
Outline
  • Introduction
  • Background
  • Methodology
  • Parked Domain Classifier
  • Measurements
  • Related Work
  • Future Work
  • Conclusion

9
Background Domain Parking
  • Domain Parking is the practice of showing a
    temporary page for an unused domain before
    launching it

10
Background - Domain Parking
11
Background Domain Parking
12
Background Domain Parking
13
Background Domain Parking
  • Domain Parking Service
  • Parks and hosts unused domains
  • Monetize the traffic by showing ads
  • Many Typo-squatting domains are parked domains
    (Wang et al, 06), (Keats, 07)

14
Outline
  • Introduction
  • Background
  • Methodology
  • Parked Domain Classifier
  • Measurements
  • Future Work
  • Related Work
  • Conclusion

15
Methodology
  • Data Collection
  • Identifying Typo-Squatting Domains

16
Methodology - Data Collection
  • DNS traces _at_ UCI Revolvers
  • Internal requests to domain names
  • DNS query proceeds http request
  • Caching limitation
  • Our study represents a lower-bound

17
Methodology - Data Collection
Our Machine
UCI Resolver
UCI NET
INTERNET
USER QUERY
DATE TIME HASHED-IP DOMAIN TYPE
CLASS
18
Methodology Identify Typo-squatting Domain
  • Identify Similar Domains
  • Single Error Typo
  • Single error accounts for 90-95 of spelling/typo
    errors (Pollock et al, 83)
  • www.walmart.com and www.wamart.com
  • gTLD substitution
  • www.amazon.com and www.amazon.org

19
Methodology Identify Typo-squatting Domains
  • But Similar domain is not enough!
  • www.abc.com and www.abd.com
  • www.walmart.com and www.walkmart.com
  • www.usps.com and www.usps.org
  • Random Sample
  • More than 54 are not Typo-squatting

Need to Identify Hijacking Intention
20
Methodology Identify Typo-squatting Domain
  • Identify Hijacking Indicator
  • Parked Domain (Ads listing)
  • 88
  • Forwarding to other domains
  • 8
  • Others Inappropriate Content,

Parked Domain as the indicator
21
Methodology Identify Typo-squatting Domain
Similar Domain
Parked Domain
AND
Typo-Squatting Domain
22
Methodology Identify Typo-squatting Domain
  • How to identify Parked Domain?
  • Parked Domain Classifier
  • 96
  • Presence of Parking signatures
  • Well-known parking signatures (domain names/urls)

23
Methodology - Summary
Identify Similar Domains
Identify Parked Domains
List of Typo-squatting Domains
24
Outline
  • Introduction
  • Background
  • Methodology
  • Parked Domain Classifier
  • Measurements
  • Future Work
  • Related Work
  • Conclusion

25
Parked Domain Classifier
Build Data Set
Extract Core Features
Combine Into Classifier
26
Data Set
  • Data Set consists of 2,800 domains
  • 700 are parked domain
  • Collected from MS Strider Website
  • 2,100 are non-parked domains
  • Collected From the fourteen Yahoo Directory Top
    Categories

27
Feature Selection
  • Heuristically, Identify common features in
    parked domain
  • Compute the distribution of those features for
    verification
  • Common Link Ratio Max

28
Feature Selection
29
Combining Features Into Classifier
  • Tried Different Classifier Algorithms
  • Decision Tree
  • SVM
  • K-Nearest Neighbor
  • Random Forest
  • The best performance

30
Outline
  • Introduction
  • Background
  • Methodology
  • Parked Domain Classifier
  • Measurements
  • Future Work
  • Related Work
  • Conclusion

31
DATA Sets
  • DNS Traces
  • Four Months
  • 30 million domains ( 2 billion hits) ( 30,000
    users)
  • Target Domain Set
  • Alexas Top 500 popular domains
  • 53,000,000 hits

32
Typo-Squatting Domains Hits
  • 1,332 typo-squatting
  • 13,431 hits ( 110 a day)
  • Is it Large or Small?
  • 500 Target Domains
  • 4 Month Period
  • 30,000 users
  • Given Similar Ratio may translate to non-trivial
    number
  • 30,000 110 Per Day
  • 300,000 1,100 Per Day
  • 3000,000 11,000 (X 365 4,000,000 A YEAR)

33
Typo-squatting Ratio
  • 0.025 of total number of queries
  • (89 , 1) (70, 0.1) ( 57, 0.01)

34
User Correction Ratio Alexa-500
  • 54 of typo-squatting queries are corrected
  • 51 squatted target domains have most squat
    hits corrected

35
Potential Hit Loss
  • Potential Hit Loss Ratio 0.012
  • (92 , 1) (78, 0.1) (64, 0.01)

36
Potential Money Loss
  • 75 do not point to target domains
  • Referring Typo-Sqt Ratio 0.008
  • (96, 1) (91, 0.1) ( 81, 0.01)

37
Non-existing Similar Domains
  • 8,285 potential hits ( 500 non-existing typo
    domain)
  • 0.015 of total number of queries
  • (96, 1) (83, 0.1) (66, 0.01)

38
Typo-Squatting Distribution
  • 19 of all Typo-squatting hits

39
Top Ten Typo-squatting Domains
  • 19 of all Typo-squatting hits

40
Top Ten Target Domains
  • Responsible of 55 to all typo-squatting queries
    of Alexa-500
  • 50 Million hits of www.facebook.com

41
Typo Characterization
  • Most Typos are single errors (95 VS 5)
  • Most gTLD sub are com to org (50)
  • Add 37 are of non-adjacent keys
  • Sub 77 are of non-adjacent keys
  • Sub 13 of substitutions are a and o
  • Spelling error

42
Typo-squatting Domains TP60
  • 15,499 hits
  • 0.045 of total number of queries
  • (76, 1) (60, 0.5)

43
Outline
  • Introduction
  • Background
  • Methodology
  • Parked Domain Classifier
  • Measurements
  • Future Work
  • Related Work
  • Conclusion

44
Future Work
  • How much of the ads budget go to squatters?
  • Enhance our identification technique
  • See, if the results hold at other ISPs
  • Typo Modeling for getting traffic back

45
Outline
  • Introduction
  • Background
  • Methodology
  • Parked Domain Classifier
  • Measurements
  • Future Work
  • Related Work
  • Conclusion

46
Related Work
  • MS Strider Project Wang et al. Sruti06
  • McAfee Study Keats McAfee White Paper 07
  • JAAL project Banerjee et al. Infocom 08

47
Outline
  • Introduction
  • Background
  • Methodology
  • Parked Domain Classifier
  • Measurements
  • Future Work
  • Related Work
  • Conclusion

48
Conclusion
  • Accurately and automatically identify
    typo-squatting domains
  • How much traffic go to typo-squatters
  • Bound on how much traffic the target domain is
    loosing towards typo-squatting
  • inconsequential
Write a Comment
User Comments (0)
About PowerShow.com