spamalytics - PowerPoint PPT Presentation

About This Presentation
Title:

spamalytics

Description:

Spam: Why? Chris Kanich Christian Kreibich Kirill Levchenko Brandon Enright Vern Paxson Geoffrey M. Voelker Stefan Savage ... – PowerPoint PPT presentation

Number of Views:252
Avg rating:3.0/5.0
Slides: 52
Provided by: Momboth2
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: spamalytics


1
Spam Why?
Chris Kanich Christian Kreibich Kirill
Levchenko Brandon Enright Vern Paxson Geoffrey M.
Voelker Stefan Savage


2
What is Computer security?
3
What is Computer security?
  • Most of computer science is about providing
    functionality
  • User Interface
  • Software Design
  • Algorithms
  • Operating Systems/Networking
  • Compilers/PL
  • Microarchitecture
  • VLSI/CAD
  • Computer security is not about functionality
  • It is about how the embodiment of functionality
    behaves in the presence of an adversary
  • Security mindset think like a bad guy

4
My Background
  • Collaborative Center for Internet Epidemiology
    and Defenses (CCIED)
  • UCSD/ICSI group created in response to worm
    threat
  • Very well funded, many strong partners
  • Goals
  • Internet epidemiology measuring/understanding
    attacks
  • Automated defenses stopping outbreaks/attacks
  • Economic and legal issues that other stuff

5
Many big successes
  • 50 papers, lots of tech transfer, big sytems,
    etc
  • Network Telescope
  • Passive monitor for gt 1of routable Internet
    addr space
  • Potemkin GQ Honeyfarms
  • Active VM honeypot servers on gt250k IP addresses
  • Earlybird
  • On-line learning of new worm signatures in lt 1ms

6
But depressing truth
  • We didnt stop Internet worms, let alone
    malware, let alone cybercrime nor did
    anyone else.
  • At best, moved it around a bit.
  • By any meaningful metric the bad guys are
    winning
  • Mistake looking at this solely as a technical
    problem

7
Key threat transformations of the 21st century
  • Efficient large-scale compromises
  • Internet communications model
  • Software homogeneity
  • User naïveity/fatigue
  • Centralized control
  • Makes compromised host acommodity good
  • Platform economy
  • Profit-driven applications
  • Commodity resources (IP, bandwidth, storage,
    CPU)
  • Unique resources(PII/credentials, CD-Keys,
    address book, etc)

8
DDoS for sale
  • Emergence of economic engine for Internet crime
  • SPAM, phishing, spyware, etc
  • Fluid third party markets for illicit digital
    goods/services
  • Bots 0.5/host, special orders, value added
    tiers
  • Cards, malware, exploits, DDoS, cashout, etc.

9
Botnet Spammer Rental Rates
  • 3.6 cents per bot week
  • 6 cents per bot week
  • 2.5 cents per bot week

gt20-30k always online SOCKs4, url is de-duped and
updated gt every 10 minutes. 900/weekly, Samples
will be sent on gt request. Monthly payments
arranged at discount prices.
gt350.00/weekly - 1,000/monthly (USD) gtType of
service Exclusive (One slot only) gtAlways
Online 5,000 - 6,000 gtUpdated every 10 minutes
gt220.00/weekly - 800.00/monthly (USD) gtType of
service Shared (4 slots) gtAlways Online 9,000 -
10,000 gtUpdated every 5 minutes
Bot Payloads
10
(No Transcript)
11
(No Transcript)
12
Key structural asymmetries
  • Defenders reactive, attackers proactive
  • Defenses public, attacker develops/tests in
    private
  • Arms race where best case for defender is to
    catch up
  • New defenses expensive, new attacks cheap
  • Defenses sunk costs/business model, attacker
    agile and not tied to particular technology
  • Low risk to attacker, high reward to attacker
  • Minimal deterrence
  • Functional anonymity on the Internet very hard
    to fix
  • Defenses hard to measure, attacks easy to measure
  • Few security metrics (no evidence-based
    security), attackers measure monetization which
    drives attack quality

13
Revisiting the problem
  • We tend to think about this in terms of technical
    means for securing computer systems
  • Most of 50-100B IT budget on cyber security is
    spent on securing the end host
  • AV, firewalls, IDS, encryption, etc
  • Single most expensive front to secure
  • Single hardest front to secure
  • But are individual end hosts valuable to bad
    guys?
  • Maybe 1.50? Even less in bulk not a pain point
  • What instead? Economically informed strategies
  • Identify and attack economic bottlenecks in value
    chain
  • This means understanding the return-on-investment
    for bad guys

14
Today the spam problem
  • We tend to focus on the costs of spam
  • gt 100 Billion spam emails sent every day
    Ironport
  • gt 1B in direct costs anti-spam
    products/services IDC
  • Estimates of indirect costs (e.g., productivity)
    10-100x more
  • But spam exists only because it is profitable
  • Someone is buying! (though no one has admitted
    it to me)
  • Our goal
  • Understand underlying economic support for spam

14
15
History of the spam business model
  • Direct Mail origins in 19th century catalog
    business
  • Idea send unsolicited advertisements to
    potential customers
  • Rough value propositionDelivery cost lt
    (Conversion rate Marginal revenue)
  • Modern direct mail (gt 60B in US)
  • Response rate 2.5 (mean per DMA)
  • CPM (cost per thousand) 250 - 1000
  • Spam is qualitatively the same

16
but quantitatively different
  • Advantages of e-mail direct marketing
  • No printing cost
  • Legitimate delivery cost low (outsourced price
    0.001/message Get Response)
  • Dominated by production lead generation cost
    (i.e. mailing list)
  • But this is for spam as a legal marketing
    vehicle a minority
  • Spam as marketing/bait for criminal enterprises
    (scams)
  • Mailing lists ? e (purchase/steal/harvest)
    lt10/M retail
  • Delivery cost ? e (botnet-based delivery)
    lt70M retail

17
Aside economic impact of anti-spam technology?
  • Suppose new technology filters out 99.9 of spam
    (at sites deploying it)
  • Little impact on delivery cost, mainly lowers
    conversion rate
  • Short term, compensate by sending more different
    e-mails or to more people
  • and pity the shmucks with the old 95 filter
  • Long term, incentive for spammer to bypass filter
  • Seems likely the outcome of anti-spam has been
  • Increased amount of spam sent
  • Change in distribution of recipient pool
  • Unclear what profit impact is (deployment biases)

18
Brief history of the spam arms race
  • Anti-spam action
  • Real-time IP blacklisting
  • Clean up open relays/proxies
  • Content-based learning
  • Site takedown
  • CAPTCHAs
  • Spammer response
  • Send via open relays/proxies
  • Delivery via compromised botnets
  • Content chaff, polymorphic spam generators, img
    spam
  • Fast-flux redirect and transparent proxies
  • CAPTCHA outsourcing, OCR-based breaking

19
Anatomy of a modern Pharma spam campaign
Courtesy Stuart Brown modernlifisrubbish.co.uk
20
Estimating spam profits
  • Recall key basic inequality
  • (Delivery Cost) lt (Conversion Rate) x (Marginal
    Revenue)
  • We have some handle on two of these (e.g.,
    Franklin07)
  • Delivery cost to send spam
  • Outsourced cost retail purchase price lt 70/M
    addrs
  • In-house cost development/management labor
  • Marginal revenue
  • Average pharma sale of 100, affiliate
    commissions 50
  • Conversion rate is fundamentally different
  • We dont know estimates vary by orders of
    magnitude

21
The measurement conundrum
  • No accident that we lack good conversion measures
  • Its easy to measure spam from a receiver
    viewpoint
  • Which MTA sent it to me?
  • What does the content contain?
  • Where do the links go? etc
  • But the key economic issue is only known by the
    sender
  • Conversion rate marginal profit revenue per
    msg sent
  • What to do?
  • Interview spammers? (0.00036) Carmack03
  • Guess? (millions of dollars a day) Corman08)
  • Send lots of spam and see who clicks on links?
    (gold standard)

22
Botnet infiltration
  • Key idea distributed CC is a vulnerability
  • Botnet authors like de-centralized communications
    for scalability and resilience, but
  • to do so, they trust their bots to be good
    actors
  • If you can modify the right bots you can observe
    and influence actions of the botnet
  • Rest of today preliminary results from a case
    study
  • Infiltrated Storm P2P botnet, instrumented 500M
    spams
  • Delivery rates (anti-spam impacts on delivery)
  • Click through (visits to spam advertized sites)
  • Conversions (purchases and purchase amounts)

Kanich, Kreibich, Levchenko, Enright, Paxson,
Voelker and Savage, Spamalytics an Empirical
Analysis of Spam Marketing Conversion, ACM CCS
2008
23
How this works in detail
  • Botnet Infiltration
  • Overview of the Storm peer-to-peer botnet
  • How does Storm work?
  • Mechanics of botnet spamming
  • How can Storms CC be instrumented?
  • Economic issues
  • Using a botnet for measurement
  • How to measure conversion via CC interposition
  • Measuring spam delivery pipeline
  • What happens to spam from when a bot sends it
  • to when a user clicks purchase at a scam site?

23
24
Storm
  • Storm is a well-known peer-to-peer botnet
  • Storm has a hierarchical architecture
  • Workers perform tasks (send spam, launch DDoS
    attacks, etc.)
  • Proxies organize workers, connect to HTTP proxies
  • Master servers controlled directly by botmaster
  • Workers and proxies are compromised hosts (bots)
  • Use a Distributed Hash Table protocol (Overnet)
    for rendezvous
  • Roughly 20,000 actives bots at any time in April
    Kanich08
  • Master servers run in bullet-proof hosting
    centers
  • Communicate with proxies and workers via command
    and control (CC) protocol over TCP

Kanich, Levchenko, Enright, Voelker and Savage,
The Heisenbot Uncertainty Problem Challenges in
Separating Bots from Chaff, LEET 2008.
25
Storm architecture
Dr. Evil
Masterservers
Proxybots
Workerbots
26
Storm setup
  • New bots decide if they are proxies or workers
  • Inbound connectivity? Yes, proxy. No, worker.
  • Proxies advertise their status via encrypted
    variant of Overnet DHT P2P protocol
  • Master sends Breath of Life packet to new
    proxies to tell them IP address of master
    servers (RSA signature)
  • Allows master servers to be mobile if necessary
  • Workers use Overnet to find proxies (tricky
    time-based key identifies request)
  • Workers send to proxy, proxy forwardsto one of
    master servers in safe data center
  • Bottom line imperfect, but remarkably
    sophisticated

27
Storm spam campaigns
  • Workers request updates to send spam
    Kreibich08
  • Dictionaries names, domains, URLs, etc.
  • Email templates for producing polymorphic spam
  • Macros instantiate fields Fdomains from
    domains dict
  • Lists of target email addresses (batches of
    500-1000 at a time)
  • Workers immediately act on these updates
  • Create a unique message for each email address
  • Send the message to the target
  • Report the results (success, failure) back to
    proxies
  • Many campaign types
  • Self-propagation malware, pharmaceutical, stocks,
    phishing,

Kreibich, Kanich, Levchenko, Enright, Voelker,
Paxson and Savage, On the Spam Campaign Trail,
LEET 2008.
28
Storm templates
  • Example Storm spam template and
    instantiation

Macro expansion to insert target email address
29
Misc Storm stuff
  • Templates updated fairly frequently (but mainly
    just header polymorphism changes)
  • A few special campaigns
  • Test campaigns
  • Special mailing list campaigns (e.g. only
    canadian recpts)
  • Storm nodes also harvest e-mail addresses
  • Grovel hard disk and send back foo_at_bar.baz
    strings
  • Re-integrated into master mailing list (some
    filtering)
  • Storm nodes also do DDoS, DNS fast flux proxying
    and Web proxying
  • Several different levels of message encoding, but
    nothing really hard to reverse yet

30
Storm in action
30
31
Interposition on Storm
  • We interpose on Storm command and control network
  • Reverse-engineered Storm protocols, communication
    scrambling, rendezvous mechanisms Kanich08
    Kreibich08
  • Run unmodified Storm proxy bots in VMs
  • Key issue Real bot workers connect to our
    proxies
  • Insert rewriting proxies between workers
    proxies
  • Transparently interpose on messages between Storm
    proxies and their associated Storm workers
  • Generic engine for rewriting traffic based on
    rules
  • Interpose to control site URLs and spam delivery
  • Which sites the spam advertises (replace urls in
    template links)
  • To whom spam gets sent (replace addrs in target
    list)

32
Modifying template links
33
Measuring click-through
  • Create two sites that mirror actual sites in spam
  • E-card (self-propagation) and pharmaceutical
  • Replace dictionaries with URLs to our sites
  • E-card (self-prop) site
  • Link to benign executable that POSTs to our
    server
  • Log all POSTs to track downloads and executions
  • Pharma site
  • Log all accesses up through clicks on purchase
  • Track the contents of shopping carts
  • Strive for verisimilitude to remove bias (spam
    filtering)
  • Site content is similar, URLs have same format as
    originals,

34
Aside having fun
35
Measuring Delivery
  • Create various test email accounts
  • At Web mail providers Hotmail, Yahoo!, Gmail
  • Behind a commercial spam filtering appliance
  • As SMTP sinks accept every message delivered
  • Put email addresses in Storm target delivery
    lists
  • Log all emails delivered to these addresses
  • Both labeled as spam (Junk E-mail) and in inbox

36
Ethical context
  • Consequentialism
  • First, do no harm (users no worse off than
    before)
  • We do not send any spam
  • Proxies are relays, worker bots send spam
  • We do not enable additional spam to be sent
  • Workers would have connected to some other proxy
  • We do not enable spam to be sent to additional
    users
  • Users are already on target lists, only add
    control addresses
  • Second, reduce harm where possible
  • Our pharma sites dont take credit card info
  • Our e-card sites dont export malicious code

36
37
Legal context
  • Warning IANAL (we had lawyers involved though)
  • CANSPAM
  • Subject to strong definition of initiator we
    dont fit it
  • ECPA
  • Our proxy is directly addressed by worker bots
    (party to communication carve out)
  • CFAA
  • We do not contact worker bots, they contact us
    (unauthorized access?)
  • We do not cause any information to be extracted
    or any fundamentally new activity to take place
  • Hard to find a good theory of damages
    (functionally indistinguishable --
    consequentialism)

37
38
But
  • In this kind of work there is little precedent
  • No agency to get permission no way to get
    indemnity
  • Lawyers tend to say I believe this activity has
    low risk of
  • We communicate our activities to a lot of people
  • Security researchers in industry, academia
  • Affected network operators/registrars
  • Law enforcement
  • FTC

38
39
Aside Spam is hard
  • Lots of operational complexities to a study like
    this
  • Net Ops notices huge Storm infestation
  • Address space cleanliness
  • Registrar issues
  • GoDaddy
  • TUCOWS
  • Abuse complaints
  • Spam site support e-mail
  • Anti-virus signatures
  • Law-enforcement

40
Spam conversion experiment
  • Experimented with Storm March 21 April 15, 2008
  • Instrumented roughly 1.5 of Storms total output

Pharmacy Campaign E-card Campaigns E-card Campaigns
Pharmacy Campaign Postcard April Fool
Worker bots 31,348 17,639 3,678
Emails 347,590,389 83,665,479 38,651,124
Duration 19 days 7 days 3 days
40
41
Spam pipeline
Sent
MTA
Visits
Conversions
Inbox
347.5M
82.7M (24)
10,522 (0.003)
28 (0.000008)
83.6 M
21.1M (25)
3,827 (0.005)
316 (0.00037)
---
40.1 M
10.1M (25)
2,721 (0.005)
225 (0.00056)
Pharma 12 M spam emails for one purchase
E-card 1 in 10 visitors execute the binary
41
42
The spammers bottom line
  • Recall that we tracked the contents of shopping
    carts
  • Using the prices on the actual site, we can
    estimate the value of the purchases
  • 28 purchases for 2,731 over 25 days, or 100/day
    (140 active)
  • We only interposed on a fraction of the workers
  • Connected to approx 1.5 of workers
  • Back-of-the-envelope (be very careful) ?
    7-10k/day for all, or 3M/year
  • With a 50 affiliate commission, 1.5M/year
    revenue
  • For self-propagation
  • Roughly 3-9k new bots/day

42
43
Summary
  • First measurement study of spam marketing
    conversion
  • Infiltrated Storm botnet, interposed on spam
    campaigns
  • Rewriting proxies take advantage of Storm
    reverse-engineering
  • Pharmaceutical spam
  • 1 in 12M conversion rate ? 1.5M/yr net revenue
  • Profitability possibly tied to infrastructure
    integration
  • Sent via retail market, this campaign would not
    be profitable
  • Ergo in-house delivery (Storm owners pharma
    spammers)
  • Self Propagation spam
  • 250k spam emails per infection
  • Social engineering effective one in ten visitors
    run executable

44
What are we doing now?
  • More analysis
  • Extending infiltration to 15 botnets
    comparative analysis
  • Characteristic fingerprints of different
    spammers/crews
  • Characterizing supply chain relationships
  • Broadly order on-line viagra, rolexes, etc
  • Cluster credit processor/merchant, mailing
    materials, etc
  • Cluster on manufacturing fingerprint (e.g., NIR
    spectroscopy)
  • Measuring monetization by purposely losing credit
    cards
  • Proactive defenses
  • Automated filter generation from templates
  • Automated classification of URLs
  • Automated vision-based detection of phishing pages

45
Security courses at UCSD
  • CSE107 Introduction to modern cryptography
  • CSE127 Computer Security
  • But
  • Security plays a role in virtually all of your
    courses

46
Questions?
Collaborative Center for Internet Epidemiology
and Defenses http//ccied.org
47
Whats next Value-chain characterization
  • Value-chain characterization
  • Empirical map establishing links between criminal
    groups and enablers
  • Affiliate programs, botnets, fast flux networks,
    registrars, payment processors, SEO/traffic
    partners, fulfillment/manufacturing
  • Data mining across huge data feeds weve built or
    established relationships for
  • Social network among criminal groups
  • Semantic Web mining

48
New Fulfillment measurements
  • About to start purchasing wide range of
    spam-advertized products
  • Watches
  • Pharma
  • Traffic
  • Cluster purchases based on
  • Merchant and processor
  • Packaging (postmark, forensic analysis of paper)
  • Artifacts of manufacturing process (e.g., FT-NIR
    on drugs)

49
New Bot-based spam filter generation
  • Observations
  • Modest number of bots send most spam
  • Virtually all bots use templates with simple
    rules to describe polymorphism
  • Templatesdictionaries regex describing spam to
    be generated
  • If we can extract or infer these from the
    botnets, we have a perfect filter for all the
    spam generated by the botnet
  • Very specific filters, extremely low FP risk

50
Early results (last week) 0 FP with 50 examples 0
FN on Storm with 500 examples Still tuning for
other botnets
51
Spare slides
52
Removing crawlers/honeyclients
  • Anyone can send email to our accounts or visit
    our Web sites, potentially muddying the waters
  • Use various heuristics to validate the logs
  • Validate spam in mailboxes was sent by us
  • Spam from other campaigns, bounce messages, etc.
  • Subject line matches our campaign, URL from our
    dictionary
  • Validate Web accesses were by users in response
  • Sites with links in spam are immediately crawled
    by Google, A/V vendors, etc.
  • Special 3rd-level DNS names, special url encoding
  • Ignore hosts that access robots.txt, dont load
    javascript, dont load flash, dont load images,
    many malformed requests

52
53
Pharma and e-card conversions
54
Who is targeted?
  • Top 20 domains
  • Many Web mail broadband providers, but very
    long tail
  • Campaigns have nearly identical distributions
  • Same scammers, or target lists sold to multiple
    scammers

54
Write a Comment
User Comments (0)
About PowerShow.com