Title: TypoSquatting: a Nuisance or a Threat to Your Traffic
1Typo-Squatting a Nuisance or a Threat to Your
Traffic?
2Outline
- Introduction
- Background
- Methodology
- Parked Domain Classifier
- Measurements
- Future Work
- Related Work
- Conclusion
3Introduction - Motivation
- Traffic is important to web domains!
- no point of launching without incoming traffic
- Loosing/Gaining traffic means loosing/gaining
money - One way to price the ADS is Pay Per Click Model
- Traffic Diversion could be a serious threat to a
domain
4Introduction - Motivation
- Typos may attract traffic
- Users vulnerable to making typos
- Users may forget about visiting target domain
- Threat to Target Domain!
- Intentionally registering such typo domains is
called Typo-squatting
5Introduction - Goal
- To study how much traffic typo-squatters can get
from target domains - Are those domains attracting much traffic?
- There are many typo-squatting domains registered
(Banerjee et al., 08) - Search engines typo-corrections and browser
auto-completions! - How much traffic target domains are loosing?
- Is it of negligible ratio or a serious threat?
- Do users go back to target domains or get
distracted?
6Introduction - Challenges
- How to identify typo-squatting domains?
- Does Typo mean Typo-squatting?
- Short Domains
- www.abc.com and www.abd.com
- Longer Domains
- www.walmart.com and www.walkmart.com
- If not, how can we?
- Hijacking indicator
7Introduction - Contribution
- Automatic and accurate identification of
typo-squatting domains (Measurement Methodology) - Bound on how much traffic target domains are
loosing towards typo-squatting domains
(Measurement Results)
8Outline
- Introduction
- Background
- Methodology
- Parked Domain Classifier
- Measurements
- Related Work
- Future Work
- Conclusion
9Background Domain Parking
- Domain Parking is the practice of showing a
temporary page for an unused domain before
launching it
10Background - Domain Parking
11Background Domain Parking
12Background Domain Parking
13Background Domain Parking
- Domain Parking Service
- Parks and hosts unused domains
- Monetize the traffic by showing ads
- Many Typo-squatting domains are parked domains
(Wang et al, 06), (Keats, 07)
14Outline
- Introduction
- Background
- Methodology
- Parked Domain Classifier
- Measurements
- Future Work
- Related Work
- Conclusion
15Methodology
- Data Collection
- Identifying Typo-Squatting Domains
16Methodology - Data Collection
- DNS traces _at_ UCI Revolvers
- Internal requests to domain names
- DNS query proceeds http request
- Caching limitation
- Our study represents a lower-bound
17Methodology - Data Collection
Our Machine
UCI Resolver
UCI NET
INTERNET
USER QUERY
DATE TIME HASHED-IP DOMAIN TYPE
CLASS
18Methodology Identify Typo-squatting Domain
- Identify Similar Domains
- Single Error Typo
- Single error accounts for 90-95 of spelling/typo
errors (Pollock et al, 83) - www.walmart.com and www.wamart.com
- gTLD substitution
- www.amazon.com and www.amazon.org
19Methodology Identify Typo-squatting Domains
- But Similar domain is not enough!
- www.abc.com and www.abd.com
- www.walmart.com and www.walkmart.com
- www.usps.com and www.usps.org
- Random Sample
- More than 54 are not Typo-squatting
Need to Identify Hijacking Intention
20Methodology Identify Typo-squatting Domain
- Identify Hijacking Indicator
- Parked Domain (Ads listing)
- 88
- Forwarding to other domains
- 8
- Others Inappropriate Content,
Parked Domain as the indicator
21Methodology Identify Typo-squatting Domain
Similar Domain
Parked Domain
AND
Typo-Squatting Domain
22Methodology Identify Typo-squatting Domain
- How to identify Parked Domain?
- Parked Domain Classifier
- 96
- Presence of Parking signatures
- Well-known parking signatures (domain names/urls)
23Methodology - Summary
Identify Similar Domains
Identify Parked Domains
List of Typo-squatting Domains
24Outline
- Introduction
- Background
- Methodology
- Parked Domain Classifier
- Measurements
- Future Work
- Related Work
- Conclusion
25Parked Domain Classifier
Build Data Set
Extract Core Features
Combine Into Classifier
26Data Set
- Data Set consists of 2,800 domains
- 700 are parked domain
- Collected from MS Strider Website
- 2,100 are non-parked domains
- Collected From the fourteen Yahoo Directory Top
Categories
27Feature Selection
- Heuristically, Identify common features in
parked domain - Compute the distribution of those features for
verification
28Feature Selection
29Combining Features Into Classifier
- Tried Different Classifier Algorithms
- Decision Tree
- SVM
- K-Nearest Neighbor
- Random Forest
- The best performance
30Outline
- Introduction
- Background
- Methodology
- Parked Domain Classifier
- Measurements
- Future Work
- Related Work
- Conclusion
31DATA Sets
- DNS Traces
- Four Months
- 30 million domains ( 2 billion hits) ( 30,000
users) - Target Domain Set
- Alexas Top 500 popular domains
- 53,000,000 hits
-
32Typo-Squatting Domains Hits
- 1,332 typo-squatting
- 13,431 hits ( 110 a day)
- Is it Large or Small?
- 500 Target Domains
- 4 Month Period
- 30,000 users
- Given Similar Ratio may translate to non-trivial
number - 30,000 110 Per Day
- 300,000 1,100 Per Day
- 3000,000 11,000 (X 365 4,000,000 A YEAR)
33Typo-squatting Ratio
- 0.025 of total number of queries
- (89 , 1) (70, 0.1) ( 57, 0.01)
34User Correction Ratio Alexa-500
- 54 of typo-squatting queries are corrected
- 51 squatted target domains have most squat
hits corrected
35Potential Hit Loss
- Potential Hit Loss Ratio 0.012
- (92 , 1) (78, 0.1) (64, 0.01)
36Potential Money Loss
- 75 do not point to target domains
- Referring Typo-Sqt Ratio 0.008
- (96, 1) (91, 0.1) ( 81, 0.01)
37Non-existing Similar Domains
- 8,285 potential hits ( 500 non-existing typo
domain) - 0.015 of total number of queries
- (96, 1) (83, 0.1) (66, 0.01)
38Typo-Squatting Distribution
- 19 of all Typo-squatting hits
39Top Ten Typo-squatting Domains
- 19 of all Typo-squatting hits
40Top Ten Target Domains
- Responsible of 55 to all typo-squatting queries
of Alexa-500 - 50 Million hits of www.facebook.com
41Typo Characterization
- Most Typos are single errors (95 VS 5)
- Most gTLD sub are com to org (50)
- Add 37 are of non-adjacent keys
- Sub 77 are of non-adjacent keys
- Sub 13 of substitutions are a and o
- Spelling error
42Typo-squatting Domains TP60
- 15,499 hits
- 0.045 of total number of queries
- (76, 1) (60, 0.5)
43Outline
- Introduction
- Background
- Methodology
- Parked Domain Classifier
- Measurements
- Future Work
- Related Work
- Conclusion
44Future Work
- How much of the ads budget go to squatters?
- Enhance our identification technique
- See, if the results hold at other ISPs
- Typo Modeling for getting traffic back
45Outline
- Introduction
- Background
- Methodology
- Parked Domain Classifier
- Measurements
- Future Work
- Related Work
- Conclusion
46Related Work
- MS Strider Project Wang et al. Sruti06
- McAfee Study Keats McAfee White Paper 07
- JAAL project Banerjee et al. Infocom 08
47Outline
- Introduction
- Background
- Methodology
- Parked Domain Classifier
- Measurements
- Future Work
- Related Work
- Conclusion
48Conclusion
- Accurately and automatically identify
typo-squatting domains - How much traffic go to typo-squatters
- Bound on how much traffic the target domain is
loosing towards typo-squatting - inconsequential