spamalytics - PowerPoint PPT Presentation

About This Presentation

Title:

spamalytics

Description:

Spam: Why? Chris Kanich Christian Kreibich Kirill Levchenko Brandon Enright Vern Paxson Geoffrey M. Voelker Stefan Savage ... – PowerPoint PPT presentation

Number of Views:252

Avg rating:3.0/5.0

Slides: 52

Provided by: Momboth2

Learn more at: https://cseweb.ucsd.edu

Category:

more less

Transcript and Presenter's Notes

Title: spamalytics

1
Spam Why?
Chris Kanich Christian Kreibich Kirill
Levchenko Brandon Enright Vern Paxson Geoffrey M.
Voelker Stefan Savage

2
What is Computer security?
3
What is Computer security?

Most of computer science is about providing
functionality
User Interface
Software Design
Algorithms
Operating Systems/Networking
Compilers/PL
Microarchitecture
VLSI/CAD
Computer security is not about functionality
It is about how the embodiment of functionality
behaves in the presence of an adversary
Security mindset think like a bad guy

4
My Background

Collaborative Center for Internet Epidemiology
and Defenses (CCIED)
UCSD/ICSI group created in response to worm
threat
Very well funded, many strong partners
Goals
Internet epidemiology measuring/understanding
attacks
Automated defenses stopping outbreaks/attacks
Economic and legal issues that other stuff

5
Many big successes

50 papers, lots of tech transfer, big sytems,
etc
Network Telescope
Passive monitor for gt 1of routable Internet
addr space
Potemkin GQ Honeyfarms
Active VM honeypot servers on gt250k IP addresses
Earlybird
On-line learning of new worm signatures in lt 1ms

6
But depressing truth

We didnt stop Internet worms, let alone
malware, let alone cybercrime nor did
anyone else.
At best, moved it around a bit.
By any meaningful metric the bad guys are
winning
Mistake looking at this solely as a technical
problem

7
Key threat transformations of the 21st century

Efficient large-scale compromises
Internet communications model
Software homogeneity
User naïveity/fatigue
Centralized control
Makes compromised host acommodity good
Platform economy
Profit-driven applications
Commodity resources (IP, bandwidth, storage,
CPU)
Unique resources(PII/credentials, CD-Keys,
address book, etc)

8
DDoS for sale

Emergence of economic engine for Internet crime
SPAM, phishing, spyware, etc
Fluid third party markets for illicit digital
goods/services
Bots 0.5/host, special orders, value added
tiers
Cards, malware, exploits, DDoS, cashout, etc.

9
Botnet Spammer Rental Rates

3.6 cents per bot week
6 cents per bot week
2.5 cents per bot week

gt20-30k always online SOCKs4, url is de-duped and
updated gt every 10 minutes. 900/weekly, Samples
will be sent on gt request. Monthly payments
arranged at discount prices.
gt350.00/weekly - 1,000/monthly (USD) gtType of
service Exclusive (One slot only) gtAlways
Online 5,000 - 6,000 gtUpdated every 10 minutes
gt220.00/weekly - 800.00/monthly (USD) gtType of
service Shared (4 slots) gtAlways Online 9,000 -
10,000 gtUpdated every 5 minutes
Bot Payloads
10
(No Transcript)
11
(No Transcript)
12
Key structural asymmetries

Defenders reactive, attackers proactive
Defenses public, attacker develops/tests in
private
Arms race where best case for defender is to
catch up
New defenses expensive, new attacks cheap
Defenses sunk costs/business model, attacker
agile and not tied to particular technology
Low risk to attacker, high reward to attacker
Minimal deterrence
Functional anonymity on the Internet very hard
to fix
Defenses hard to measure, attacks easy to measure
Few security metrics (no evidence-based
security), attackers measure monetization which
drives attack quality

13
Revisiting the problem

We tend to think about this in terms of technical
means for securing computer systems
Most of 50-100B IT budget on cyber security is
spent on securing the end host
AV, firewalls, IDS, encryption, etc
Single most expensive front to secure
Single hardest front to secure
But are individual end hosts valuable to bad
guys?
Maybe 1.50? Even less in bulk not a pain point
What instead? Economically informed strategies
Identify and attack economic bottlenecks in value
chain
This means understanding the return-on-investment
for bad guys

14
Today the spam problem

We tend to focus on the costs of spam
gt 100 Billion spam emails sent every day
Ironport
gt 1B in direct costs anti-spam
products/services IDC
Estimates of indirect costs (e.g., productivity)
10-100x more
But spam exists only because it is profitable
Someone is buying! (though no one has admitted
it to me)
Our goal
Understand underlying economic support for spam

14
15
History of the spam business model

Direct Mail origins in 19th century catalog
business
Idea send unsolicited advertisements to
potential customers
Rough value propositionDelivery cost lt
(Conversion rate Marginal revenue)
Modern direct mail (gt 60B in US)
Response rate 2.5 (mean per DMA)
CPM (cost per thousand) 250 - 1000
Spam is qualitatively the same

16
but quantitatively different

Advantages of e-mail direct marketing
No printing cost
Legitimate delivery cost low (outsourced price
0.001/message Get Response)
Dominated by production lead generation cost
(i.e. mailing list)
But this is for spam as a legal marketing
vehicle a minority
Spam as marketing/bait for criminal enterprises
(scams)
Mailing lists ? e (purchase/steal/harvest)
lt10/M retail
Delivery cost ? e (botnet-based delivery)
lt70M retail

17
Aside economic impact of anti-spam technology?

Suppose new technology filters out 99.9 of spam
(at sites deploying it)
Little impact on delivery cost, mainly lowers
conversion rate
Short term, compensate by sending more different
e-mails or to more people
and pity the shmucks with the old 95 filter
Long term, incentive for spammer to bypass filter
Seems likely the outcome of anti-spam has been
Increased amount of spam sent
Change in distribution of recipient pool
Unclear what profit impact is (deployment biases)

18
Brief history of the spam arms race

Anti-spam action
Real-time IP blacklisting
Clean up open relays/proxies
Content-based learning
Site takedown
CAPTCHAs

Spammer response
Send via open relays/proxies
Delivery via compromised botnets
Content chaff, polymorphic spam generators, img
spam
Fast-flux redirect and transparent proxies
CAPTCHA outsourcing, OCR-based breaking

19
Anatomy of a modern Pharma spam campaign
Courtesy Stuart Brown modernlifisrubbish.co.uk
20
Estimating spam profits

Recall key basic inequality
(Delivery Cost) lt (Conversion Rate) x (Marginal
Revenue)
We have some handle on two of these (e.g.,
Franklin07)
Delivery cost to send spam
Outsourced cost retail purchase price lt 70/M
addrs
In-house cost development/management labor
Marginal revenue
Average pharma sale of 100, affiliate
commissions 50
Conversion rate is fundamentally different
We dont know estimates vary by orders of
magnitude

21
The measurement conundrum

No accident that we lack good conversion measures
Its easy to measure spam from a receiver
viewpoint
Which MTA sent it to me?
What does the content contain?
Where do the links go? etc
But the key economic issue is only known by the
sender
Conversion rate marginal profit revenue per
msg sent
What to do?
Interview spammers? (0.00036) Carmack03
Guess? (millions of dollars a day) Corman08)
Send lots of spam and see who clicks on links?
(gold standard)

22
Botnet infiltration

Key idea distributed CC is a vulnerability
Botnet authors like de-centralized communications
for scalability and resilience, but
to do so, they trust their bots to be good
actors
If you can modify the right bots you can observe
and influence actions of the botnet
Rest of today preliminary results from a case
study
Infiltrated Storm P2P botnet, instrumented 500M
spams
Delivery rates (anti-spam impacts on delivery)
Click through (visits to spam advertized sites)
Conversions (purchases and purchase amounts)

Kanich, Kreibich, Levchenko, Enright, Paxson,
Voelker and Savage, Spamalytics an Empirical
Analysis of Spam Marketing Conversion, ACM CCS
2008
23
How this works in detail

Botnet Infiltration
Overview of the Storm peer-to-peer botnet
How does Storm work?
Mechanics of botnet spamming
How can Storms CC be instrumented?
Economic issues
Using a botnet for measurement
How to measure conversion via CC interposition
Measuring spam delivery pipeline
What happens to spam from when a bot sends it
to when a user clicks purchase at a scam site?

23
24
Storm

Storm is a well-known peer-to-peer botnet
Storm has a hierarchical architecture
Workers perform tasks (send spam, launch DDoS
attacks, etc.)
Proxies organize workers, connect to HTTP proxies
Master servers controlled directly by botmaster
Workers and proxies are compromised hosts (bots)
Use a Distributed Hash Table protocol (Overnet)
for rendezvous
Roughly 20,000 actives bots at any time in April
Kanich08
Master servers run in bullet-proof hosting
centers
Communicate with proxies and workers via command
and control (CC) protocol over TCP

Kanich, Levchenko, Enright, Voelker and Savage,
The Heisenbot Uncertainty Problem Challenges in
Separating Bots from Chaff, LEET 2008.
25
Storm architecture
Dr. Evil
Masterservers
Proxybots
Workerbots
26
Storm setup

New bots decide if they are proxies or workers
Inbound connectivity? Yes, proxy. No, worker.
Proxies advertise their status via encrypted
variant of Overnet DHT P2P protocol
Master sends Breath of Life packet to new
proxies to tell them IP address of master
servers (RSA signature)
Allows master servers to be mobile if necessary
Workers use Overnet to find proxies (tricky
time-based key identifies request)
Workers send to proxy, proxy forwardsto one of
master servers in safe data center
Bottom line imperfect, but remarkably
sophisticated

27
Storm spam campaigns

Workers request updates to send spam
Kreibich08
Dictionaries names, domains, URLs, etc.
Email templates for producing polymorphic spam
Macros instantiate fields Fdomains from
domains dict
Lists of target email addresses (batches of
500-1000 at a time)
Workers immediately act on these updates
Create a unique message for each email address
Send the message to the target
Report the results (success, failure) back to
proxies
Many campaign types
Self-propagation malware, pharmaceutical, stocks,
phishing,

Kreibich, Kanich, Levchenko, Enright, Voelker,
Paxson and Savage, On the Spam Campaign Trail,
LEET 2008.
28
Storm templates

Example Storm spam template and
instantiation

Macro expansion to insert target email address
29
Misc Storm stuff

Templates updated fairly frequently (but mainly
just header polymorphism changes)
A few special campaigns
Test campaigns
Special mailing list campaigns (e.g. only
canadian recpts)
Storm nodes also harvest e-mail addresses
Grovel hard disk and send back foo_at_bar.baz
strings
Re-integrated into master mailing list (some
filtering)
Storm nodes also do DDoS, DNS fast flux proxying
and Web proxying
Several different levels of message encoding, but
nothing really hard to reverse yet

30
Storm in action
30
31
Interposition on Storm

We interpose on Storm command and control network
Reverse-engineered Storm protocols, communication
scrambling, rendezvous mechanisms Kanich08
Kreibich08
Run unmodified Storm proxy bots in VMs
Key issue Real bot workers connect to our
proxies
Insert rewriting proxies between workers
proxies
Transparently interpose on messages between Storm
proxies and their associated Storm workers
Generic engine for rewriting traffic based on
rules
Interpose to control site URLs and spam delivery
Which sites the spam advertises (replace urls in
template links)
To whom spam gets sent (replace addrs in target
list)

32
Modifying template links
33
Measuring click-through

Create two sites that mirror actual sites in spam
E-card (self-propagation) and pharmaceutical
Replace dictionaries with URLs to our sites
E-card (self-prop) site
Link to benign executable that POSTs to our
server
Log all POSTs to track downloads and executions
Pharma site
Log all accesses up through clicks on purchase
Track the contents of shopping carts
Strive for verisimilitude to remove bias (spam
filtering)
Site content is similar, URLs have same format as
originals,

34
Aside having fun
35
Measuring Delivery

Create various test email accounts
At Web mail providers Hotmail, Yahoo!, Gmail
Behind a commercial spam filtering appliance
As SMTP sinks accept every message delivered
Put email addresses in Storm target delivery
lists
Log all emails delivered to these addresses
Both labeled as spam (Junk E-mail) and in inbox

36
Ethical context

Consequentialism
First, do no harm (users no worse off than
before)
We do not send any spam
Proxies are relays, worker bots send spam
We do not enable additional spam to be sent
Workers would have connected to some other proxy
We do not enable spam to be sent to additional
users
Users are already on target lists, only add
control addresses
Second, reduce harm where possible
Our pharma sites dont take credit card info
Our e-card sites dont export malicious code

36
37
Legal context

Warning IANAL (we had lawyers involved though)
CANSPAM
Subject to strong definition of initiator we
dont fit it
ECPA
Our proxy is directly addressed by worker bots
(party to communication carve out)
CFAA
We do not contact worker bots, they contact us
(unauthorized access?)
We do not cause any information to be extracted
or any fundamentally new activity to take place
Hard to find a good theory of damages
(functionally indistinguishable --
consequentialism)

37
38
But

In this kind of work there is little precedent
No agency to get permission no way to get
indemnity
Lawyers tend to say I believe this activity has
low risk of
We communicate our activities to a lot of people
Security researchers in industry, academia
Affected network operators/registrars
Law enforcement
FTC

38
39
Aside Spam is hard

Lots of operational complexities to a study like
this
Net Ops notices huge Storm infestation
Address space cleanliness
Registrar issues
GoDaddy
TUCOWS
Abuse complaints
Spam site support e-mail
Anti-virus signatures
Law-enforcement

40
Spam conversion experiment

Experimented with Storm March 21 April 15, 2008
Instrumented roughly 1.5 of Storms total output

Pharmacy Campaign E-card Campaigns E-card Campaigns
Pharmacy Campaign Postcard April Fool
Worker bots 31,348 17,639 3,678
Emails 347,590,389 83,665,479 38,651,124
Duration 19 days 7 days 3 days
40
41
Spam pipeline
Sent
MTA
Visits
Conversions
Inbox
347.5M
82.7M (24)
10,522 (0.003)
28 (0.000008)
83.6 M
21.1M (25)
3,827 (0.005)
316 (0.00037)
---
40.1 M
10.1M (25)
2,721 (0.005)
225 (0.00056)
Pharma 12 M spam emails for one purchase
E-card 1 in 10 visitors execute the binary
41
42
The spammers bottom line

Recall that we tracked the contents of shopping
carts
Using the prices on the actual site, we can
estimate the value of the purchases
28 purchases for 2,731 over 25 days, or 100/day
(140 active)
We only interposed on a fraction of the workers
Connected to approx 1.5 of workers
Back-of-the-envelope (be very careful) ?
7-10k/day for all, or 3M/year
With a 50 affiliate commission, 1.5M/year
revenue
For self-propagation
Roughly 3-9k new bots/day

42
43
Summary

First measurement study of spam marketing
conversion
Infiltrated Storm botnet, interposed on spam
campaigns
Rewriting proxies take advantage of Storm
reverse-engineering
Pharmaceutical spam
1 in 12M conversion rate ? 1.5M/yr net revenue
Profitability possibly tied to infrastructure
integration
Sent via retail market, this campaign would not
be profitable
Ergo in-house delivery (Storm owners pharma
spammers)
Self Propagation spam
250k spam emails per infection
Social engineering effective one in ten visitors
run executable

44
What are we doing now?

More analysis
Extending infiltration to 15 botnets
comparative analysis
Characteristic fingerprints of different
spammers/crews
Characterizing supply chain relationships
Broadly order on-line viagra, rolexes, etc
Cluster credit processor/merchant, mailing
materials, etc
Cluster on manufacturing fingerprint (e.g., NIR
spectroscopy)
Measuring monetization by purposely losing credit
cards
Proactive defenses
Automated filter generation from templates
Automated classification of URLs
Automated vision-based detection of phishing pages

45
Security courses at UCSD

CSE107 Introduction to modern cryptography
CSE127 Computer Security
But
Security plays a role in virtually all of your
courses

46
Questions?
Collaborative Center for Internet Epidemiology
and Defenses http//ccied.org
47
Whats next Value-chain characterization

Value-chain characterization
Empirical map establishing links between criminal
groups and enablers
Affiliate programs, botnets, fast flux networks,
registrars, payment processors, SEO/traffic
partners, fulfillment/manufacturing
Data mining across huge data feeds weve built or
established relationships for
Social network among criminal groups
Semantic Web mining

48
New Fulfillment measurements

About to start purchasing wide range of
spam-advertized products
Watches
Pharma
Traffic
Cluster purchases based on
Merchant and processor
Packaging (postmark, forensic analysis of paper)
Artifacts of manufacturing process (e.g., FT-NIR
on drugs)

49
New Bot-based spam filter generation

Observations
Modest number of bots send most spam
Virtually all bots use templates with simple
rules to describe polymorphism
Templatesdictionaries regex describing spam to
be generated
If we can extract or infer these from the
botnets, we have a perfect filter for all the
spam generated by the botnet
Very specific filters, extremely low FP risk

50
Early results (last week) 0 FP with 50 examples 0
FN on Storm with 500 examples Still tuning for
other botnets
51
Spare slides
52
Removing crawlers/honeyclients

Anyone can send email to our accounts or visit
our Web sites, potentially muddying the waters
Use various heuristics to validate the logs
Validate spam in mailboxes was sent by us
Spam from other campaigns, bounce messages, etc.
Subject line matches our campaign, URL from our
dictionary
Validate Web accesses were by users in response
Sites with links in spam are immediately crawled
by Google, A/V vendors, etc.
Special 3rd-level DNS names, special url encoding
Ignore hosts that access robots.txt, dont load
javascript, dont load flash, dont load images,
many malformed requests