Phishing Webpage Detection - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Phishing Webpage Detection

Description:

Title: Phishing Web Pages Detection Author: CHEN, JAU-YUAN Last modified by: Jau-Yuan Chen Created Date – PowerPoint PPT presentation

Number of Views:252
Avg rating:3.0/5.0
Slides: 20
Provided by: CHEN306
Category:

less

Transcript and Presenter's Notes

Title: Phishing Webpage Detection


1
Phishing Webpage Detection
  • Jau-Yuan Chen
  • COMS E6125 WHIM
  • March 24, 2009

2
What is Phishing?
  • Source "Phishing Activity Trends Report," APWG,
    December 2008
  • APWG Anti-Phishing Working Group
  • (Definition)
  • Phishing is a criminal mechanism employing both so
    cial engineer-ing and technical subterfuge to stea
    l consumers personal identity data and financial 
    account credentials.  
  • Social-engineering schemes use spoofed e-mails pur
    porting to be from legitimate businesses and agenc
    ies to lead consumers to counterfeit websites des
    igned to trick recipients into divulging financia
    l data such as usernames and passwords. 
  • Technical-subterfuge schemes plant crimeware onto 
    PCs to steal credentials directly, often using sy
    stems to intercept consumers online account user 
    names and passwords - and to corrupt local naviga
    tional infrastructures to misdirect consumers to c
    ounterfeit websites (or authentic websites through
     phisher-controlled proxies used to monitor and i
    ntercept consumers keystrokes). 

3
Severity of the Phishing Problem
  • The number of crimeware-spreading sites infecting
    PCs with password-stealing crimeware reached an
    all time high of 31,173 in December, 2008.
  • Unique phishing reports submitted to APWG
    recorded a yearly high of 34,758 in December,
    2008.
  • in 2007 (a survey by Gartner, Inc.)
  • more than 3.2 billion was lost to phishing
    attacks in the US
  • 3.6 million adults lost money in phishing attacks

4
WHY PHISHING PAGE DETECTION?
5
eBay?
Its difficult to distinguish these pages!
6
Most Targeted Industry
7
Current Anti-phishing Solutions
  • text-based page analysis
  • URL analysis
  • HTML parsing
  • keyword extraction
  • however, phishers can easily avoid detection by
    using non-html components, such as
  • images,
  • Flash,
  • ActiveX, etc.

8
Image-based Anti-phishing Schemefocus on "what
you see", not "how the page is
composed"!J.-Y. Chen, and K.-T. Chen, A
Robust Local Feature-based Scheme for Phishing
Page Detection and Discrimination, Web 2.0 Trust
2008. K.-T. Chen, J.-Y. Chen, C.-R. Huang, and
C.-S. Chen, Fighting Phishing with
Discriminative Keypoint Features of Webpages,
IEEE Internet Computing, to appear.
9
Page Matching
10
Page Scoring
















effective grids
11
Page Classification
  • naïve Bayesian classifier with 10-fold
    cross-validation
  • training data
  • a pre-stored phishing page set a legitimate
    page set
  • phishing page set (positive data set)
  • comparisons between phishing pages and their
    target pages
  • legitimate page set (negative data set)
  • comparisons between legitimate pages of different
    sites

12
Performance Evaluation
13
Data description
  • phishing pages 2,058 pages on 74 sites
  • source http//www.phishtank.com,
    http//www.antiphishing.org
  • records of top 5 phishing target sites are more
    than half of our records
  • potential target pages 300 vulnerable pages
  • source http//www.ciphertrust.com/resources/stati
    stics/
  • pre-stored data set
  • positive 2,058 comparisons
  • negative 44,000 comparisons

Domain Number of Records
eBay 701
PayPal 632
Marshall Ilsley 138
Charter One 116
Bank of America 51
14
Earth Movers Distance (EMD) based Scheme
  • Fu et al., IEEE Trans. on Dependable Secure
    Computing, 2006
  • the 1st image-based phishing detecting approach
  • to evaluate the distance between two signatures
  • Signature (S)
  • the frequency and the centroid of each color used
  • Weight (p, q)
  • a linear combination of the Euclidian distance
    and the centroids of colors
  • Visual similarity degree (VSD)
  • VSD 1 (EMD)a
  • pros simple and fast
  • cons only suitable for basic phishing cases
  • it tends to fail if phishing pages and the
    official ones are partially similar
  • however, phishing pages are usually partially
    different from their targets!

15
Parameter Settings
  • CCH settings
  • levels to describe salient points (L) 4
  • Euclidean distance between two salient points
    (Dist) 7 pixels
  • input image size original webpage resolution
    (mostly 800 600)
  • k-means parameter (k) 4
  • naïve Bayesian classifier
  • EMD settings
  • we follow the suggestion in Fu et al.'s previous
    work
  • input image size 100 100 (Lanczos3 resampling
    algorithm)
  • color degrading factor (CDF) 32
  • amplifier for the EMD value (a) 0.5
  • the of colors used for the signature (Ss) 20
  • the weight for the color distance (p) 0.5
  • the weight for the color centroid distance (q)
    0.5
  • naïve Bayesian classifier is used instead of
    per-page threshold

16
  • Top 5 Phishing Target Sites
  • AUC
  • CCH 0.998
  • EMD 0.956

17
  • Impact of Image Size on Computation Time

18
Conclusions
  • We proposed an image-based phishing detection
    technique with local features.
  • Our experimental results show that we have
  • an over 96 successful phishing recognition rate,
    and
  • less than 0.30 second per phishing identification
    on average.
  • Our experiments show that local features are more
    suitable than global information for phishing
    page detection.

19
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com