Weaponizing Noam Chomsky: Symbols and Grammars Are Fun - PowerPoint PPT Presentation

Loading...

PPT – Weaponizing Noam Chomsky: Symbols and Grammars Are Fun PowerPoint presentation | free to download - id: 2194c-MTVlN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Weaponizing Noam Chomsky: Symbols and Grammars Are Fun

Description:

Very similar messages can be encapsulated in very different ways ... Symbols appear in very distinct patterns that are more reminiscent of machine code than text. ... – PowerPoint PPT presentation

Number of Views:167
Avg rating:3.0/5.0
Slides: 93
Provided by: vdak
Learn more at: http://www.doxpara.com
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Weaponizing Noam Chomsky: Symbols and Grammars Are Fun


1
Weaponizing Noam ChomskySymbols and Grammars
Are Fun
  • Dan Kaminsky
  • Director Of Penetration Testing

2
Introduction
  • Many physicists would agree that, had it not been
    for congestion control, the evaluation of web
    browsers might never have occurred. In fact, few
    hackers worldwide would disagree with the
    essential unification of voice-over-IP and public
    private key pair. In order to solve this riddle,
    we confirm that SMPs can be made stochastic,
    cacheable, and interposable.
  • Rooter A Methodology for the Typical Unification
    of Access Points and Redundancy

3
That was BS.
  • That also got accepted into a con.
  • Automatically generated from a context free
    grammar
  • Ive been working too hard all these years ?
  • Be quiet, or I will replace you with a very
    small shell script
  • This talk is a bit of a remix
  • Patterns and symbols are interesting me as of
    late
  • Automatic determination of both is difficult,
    interesting, and unsolved
  • Integration into human symbolic systems promises
    particularly interesting results
  • So were going to explore a bit.

4
Language Is Cool
  • Language A protocol for the transmission of
    concepts and intentions between humans
  • Documentation is not available
  • Documentation does not really work
  • Learned through exposure and use
  • Significant amount of internal structure,
    redundancy, and consistency
  • Who makes language?
  • Kids.
  • Adults coin words here and there, but when
    theyre forced to invent a common language to get
    things done, its called a Pidgin, and its
    terrible
  • The kids hear it, and invent a Creole a merged
    language of significantly greater accuracy and
    depth
  • Children make languages
  • Adults make working languages
  • Programmers make barely working languages

5
Programmers Talk Funny
  • Fundamentally two languages that programmers must
    use
  • Code to Human User Interface Design
  • Code to Code File and Network Protocol
  • UI is a protocol.
  • This is obvious in retrospect.
  • There are two things this talk hopes to do
  • Correct some of the Code-Human protocols that
    are out there
  • Use human strategies to analyze Code to Code
    communications
  • Learning a protocol is learning a language.
    Humans do not learn languages quickly, and thus
    were resource bound on fuzzer development
  • Its 2007 most parsers remain unfuzzed (and
    thus just waiting to be exploited)

6
Weaponizing Noam?
  • An early inference procedure was described by
    Chomsky and Miller (1957a), as reported in
    Solomonoff (1959). Chomsky proposed a method for
    detecting loops in finite state languages. The
    approach requires a set of valid sentences, and
    an oracle that determines whether a sentence is
    in the language.The algorithm proceeds by
    deleting part of a valid sentence and asking the
    oracle whether the sentence is still valid. If it
    is, the deleted part is reinserted into the
    sequence and repeated, so that it appears twice.
    If the sentence is still in the language, a cycle
    has been detected.
  • Inferring Sequential Structure, Craig Neville
    Manning, 1996
  • This couldnt POSSIBLY be useful for building a
    structure for a dumb fuzzer to operate against.
  • Instead of seeing if the parser crashes, just see
    if it considers the input valid

7
Topics Of Discussion
  • Further Explorations in Cryptomnemonics
  • Using Names and Syllables for password
    representation
  • Sequitur-XML Merging automated structure
    discovery with the standard architecture for
    structure representation
  • which turned out to be quite nice for controlled
    structure destruction ?
  • Exploring Dotplots
  • Building a GUI
  • Exploring other domains

8
Intro To Symbol Sets
  • Machine Symbols
  • Data (AA, BB, CC)
  • Code (a(), b(), c())
  • Formats (All, Bad, Code) ?
  • Human Symbols
  • Letters (A, B, C)
  • Glyphs (A, B, C)
  • Syllables (Ah, Bee, See)
  • Words (Amazing, Bear, Clear)
  • Native Names (Alice, Bob, Charlie)
  • Things (Axe, Bone, Chimpanzee)
  • Actions (Ask, Buy, Compute)
  • Colors (Aquamarine, Blue, Chartreuse)
  • Machines can use formats, but their native format
    is raw bits
  • Humans have no concept of raw bits everything
    must be contextual
  • Long history in mnemonics of mapping arbitrary
    data to a context

9
Different Domains Have Different Strengths See
Visual Processing
10
Cryptomnemonics
  • Definition The study of human memory, as it
    applies to cryptographic systems
  • Developing in response to this
  • ssh dan_at_blahThe authenticity of host 'blah
    (1.2.3.4)' can't be established.RSA key
    fingerprint is 09a9b19984177dbac655465a
    17f88301.Are you sure you want to continue
    connecting (yes/no)?
  • The machine is acting like its integrating with
    another machine. Its not, and that matters.
  • Humans can handle hexadecimal characters but
    not that many.

11
Hex Confusion
  • After somewhere between 2 and 5 characters, most
    of you will fail to see a difference
  • Positional Bias Expect to see certain things at
    the beginning or end
  • Value Confusion Letter vs. Number is remembered
    before the actual value of letter or number
  • Glyph confusion
  • Despair Effect
  • Nobody could possibly detect a change, so its
    not rational to even try

12
Classes of Memory
  • There are three classes of memory, at least to
    the degree as is useful in cryptography
  • Rejection Ive never seen that before
  • Recognition Its that one, not that other
    one
  • Recollection Let me describe it to you.
  • SSH just requires rejection
  • Hex is not rejectable
  • Can we try another domain?

13
Exploring The Nymic Domain
  • ssh dan_at_blahKey Data julio and epifania
    dezzutti luther and rolande doornbos manual
    and twyla imbesi dirk and cuc kolopajlo
    omar and jeana hymelThe authenticity of host
    'blah (1.2.3.4)' can't be established.Are you
    sure you want to continue connecting (yes/no)?
  • Alternate mapping for 09a9b19984177dbac65
    5465a17f88301.
  • Proposed last year as a potential solution
  • There is nothing more contextual than a story,
    and there is nothing more stable in a story than
    the names of its participants
  • Stories retold are stories remembered we need
    to be exposed to the above group time and time
    again to be able to reject any deviation from it

14
How To Derive Names?
  • Original Model
  • Take US Census Data
  • Remove any names that may be easily confused with
    one another
  • Easy Bob v. Bobby
  • Hard Bob v. Robert
  • Celebrity Naming
  • Marge Godwin
  • Archaic Naming
  • Use constructs from various ancient languages
  • Mechanistic Constructs
  • Bubble Babble 64 bits xegoz-tosys-vusik-masar
  • Koremutake 64 bits darujifahe stygrifrejy

15
How Many Names?
  • Unclear what the crossover point is between hard
    from more names, and benefit from more entropy
    per name
  • Present system is 512 male name, 512 female name,
    1024 last names from US Census
  • 256/256/256 would provide 24 bits per couple
    instead of 40, and the names would be more
    recognizable. Better? How much better?
  • The more names, the more a problem position
    becomes
  • Were sensitive to names, but without a story
    context, theres no roles locking people to being
    the first or the second or the third. So the
    more names, the more bits we lose to reader
    confusion.
  • How many bits are necessary? Depends on what for.

16
Flipping The Bits
  • SSH Key Representation is not the only thing we
    can do with this technique
  • In fact, its not even the most pressing problem
  • Passwords are in crisis right now
  • PKI failed, deal with it
  • Theres an entire alternate history where XSS
    enjoys the benefits of your legal credentials
    being available and shared
  • People are being asked to generate, frequently,
    high entropy non repeated passwords
  • Theyre repeating them
  • Theyve exhausted personal entropy, and have
    moved to geometric progressions to evade lameness
    checks
  • (uoiJKL798
  • Fixed prefix

17
A Fundamental Shift
  • Generate passwords for your users.
  • But theyre hideous, nobody will remember what
    we automatically generate
  • Youre theoretically forcing them to generate
    those hideous passwords, off the top of their
    head
  • Use alternate symbolic domains to coat the
    password entropy you require in a form users can
    accept
  • Why yes, this is exactly like a tunnel. Were
    tunneling entropy over a baby name book ?

18
Change Your Ways
  • Modify your validation logic to accept long
    passwords without weird character sets
  • Punctuation and case sensitivity are weak
    symbols
  • It is easier to chain together common symbols in
    a common way, than it is to link together
    arbitrary bytes out of context
  • This is a fundamental difference between human
    symbol manipulation and the operations of
    computers

19
How Many Bits Do We Really Need?
  • Hash Validation 80-100 bits
  • We dont have a birthday paradox problem with
    hashes, since one of them is fixed.
  • 280-2100 work efforts are outside the range of
    feasibility at this time
  • Password Entry 24 bits for low security, 36-48
    bits for high security
  • Need enough to make brute force enumeration
    across all users infeasible
  • For each username, try one possible password
  • 48 bit is what were at with punctuation/case/numb
    er/8 character.

20
Limits to alternate symbol domains
  • We lose the ability to measure nextness
  • 0x10 is one less than 0x11
  • Bob ishow much less than Charlie?
  • Data may become variable length Bob is three
    characters, Charlie is seven
  • Harder to see patterns
  • Has trouble scaling to any large number of bits.
  • We cant analyze even mildly large systems using
    this translation layer

21
What Weve Been Using(Warning Sucks.)
22
Nestce pas Non Sequitur
  • Sequitur Linear Time Pattern Finder
  • Creates hierarchal Context Free Grammars from
    arbitrary input
  • Compression Algorithm in which you can look
    under the covers to see whats going on
  • Created by Craig Neville-Manning as his PhD
    thesis a decade ago
  • Hes now Chief Research Scientist at Google

23
Whats New Sequitur-XML
  • echo aabbabc ./sequitur_simple.exe
  • Why translate Gives us much easier to
    manipulate output
  • C is very good for generating the tree
  • Other languages are very good for analyzing /
    modifying the tree
  • XML is a (shockingly) good machine format for
    representing structure

24
Early Work Syntax Highlighting Using
Compression Depth
25
Whats Actually Going On?
  • (0) - (73),b4,(73),ca,(73),e6,(73),02,(74),18,(
    74),2c,(74),4a,(74),5c,(74),6e,(74),80,(74),98,(74
    ),b0,(74),c8,(74),e8,(74),fc,(74),10,(75),20,(75),
    30,(75),40,(75),50,(75),64,(75),82,(75),90,(75),9e
    ,(75)(84),d6,(84),ee,(84),0c,(85),28,(85),3c,(8
    5),4e,(85),66,(85),7e,(85),8c,(85),9e,(85),ac,(85)
    ,be,(85),ca,(85),ea,(85),08,(86),26,(86),44,(86),5
    6,(86),6a,(86),7c,(86),8a,(86),a6,(86),b6,(86),cc,
    (86),de,(86),02,(87)
  • Repeated sequence, single byte literal. Repeated
    sequence, single byte literal. Rinse, lather,
    repeat.

26
Where Things Get Most InterestingLive Symbol
Browsing!
27
Browsing HOWTO
  • For each entry in the root node,
  • If its a literal, color it white
  • If its part of a reference, color it red
  • If its clicked, color it and every other
    instance of that reference blue
  • A little buggy
  • Present implementation DOES NOT SCALE
  • But effective!

28
Symbol Links Where To Go From Here
  • Turns code on left intosymbolic set on
    rightits easy then to linkthe symbols
    togetheras per the graph.
  • This works for non-textual data
  • Sequitur imputes meaningfulsymbols from
    arbitrary inputdata

29
Context Free Grammar FuzzerTHE CFG9000
  • Reduce input data to a stream of symbols
  • Fuzz data at the symbol level, rather than at
    pure bytes
  • Shuffle
  • Drop
  • Repeat
  • Uniform Corrupt
  • Consistently corrupt all instances of a given
    symbol
  • -
  • Partially ported to the new XML framework

30
Sample CFG9000 Output
  • calculate_rule_usage(p-rulep-rulep-rulep-rule
    p-rulep-rulep-rulep-rulep-rulep-rulep-rulep
    -rulep-rulep-rule()
  • calculate_rule_usage(calculate_rule_usage(calculat
    e_rule_usage(calculate_rule_usage(calculate_rule_u
    sage(calculate_rule_usage(calculate_rule_usage(cal
    culate_rule_usage(calculate_rule_usage(calculate_r
    ule_usage(calculate_rule_usage(calculate_rule_usag
    e(calculate_rule_usage(calculate_rule_usage(calcul
    ate_rule_usage(calculate_rule_usage(calculate_rule
    _usage(calculate_rule_usage(p-rule())

31
Slashdot Fuzzed
32
Slashdot Fuzzed (2)
33
Why We Moved To XML In The First Place
  • XML is a (potentially) validating format
  • Has the concept of schemas
  • NOT THAT THEYRE ALWAYS OR EVEN OFTEN CHECKED
  • Schema validation is expensive
  • We should be able to use XML Schemas to guide
    fuzzers
  • WS-Bang
  • Excellent tool for bashing Web Services
    frameworks
  • Given a WSDL file (Web Services Description
    Language), fuzz it
  • Untidy Mostly just attacks XML parsers, doesnt
    hit the structure

34
Automatically Generating Schemas?
  • We can autogenerate Schemas from XML (to some
    degree)
  • Relaxer
  • Trang
  • Tends to capture structure better than content
  • Doesnt appear to automatically determine what
    values are valid for each field
  • Does provide framework for automatically
    extracting all instances of what can go where

35
Wireshark DemoFrom
  • pos"126" value"ff002700ff000000080000004e0054004
    6005300"
  • - size"4" pos"126" value"ff002700"
  •   .... .... .... .... .... .... ...1 Case
    Sensitive Search This FS supports CASE SENSITIVE
    SEARCHes" size"4" pos"126" show"1" value"1"
    unmaskedvalue"ff002700" /
  •   .... .... .... .... .... .... ..1. Case
    Preserving This FS supports CASE PRESERVED
    NAMES" size"4" pos"126" show"1" value"1"
    unmaskedvalue"ff002700" /
  •   .... .... .... .... .... .... .1.. Unicode On
    Disk This FS supports UNICODE NAMES" size"4"
    pos"126" show"1" value"1" unmaskedvalue"ff0027
    00" /

36
Wireshark DemoTo
  • -
  • -
  •   minOccurs"1" name"field" type"field" /
  •  
  •  

  •  
  •   tring" /
  •   type"xsdnormalizedString" /
  •  
  •  

  •  

  •   type"xsdtoken" /
  •  

37
Could we automatically extract structure from
Sequitur-XML?
  • This sequence of bytes can be reconstructed with
    these other sequences of bytes
  • No tree relationship anything can link in
    anything
  • Need to have the content awareness Relaxer lacks
    to get anything useful
  • Where might we get this content awareness?

38
What Might We Borrow From Linguistics?
  • Can we use linguistic approaches?
  • Common Elements
  • Humans Subjects, Verbs, etc.
  • Machines Delimiters, Length Fields,
    ASCII/Unicode, x86, Padding to Four Byte
    Boundries
  • Symbol Interrelationships
  • Humans We take word boundries for granted
  • Until were listening to a foreign language, and
    wonder why there arent spaces between words ?
  • Machines File formats rarely make it easy to
    see where one symbol starts and another begins
  • Does one symbol always appear before another?
    Does one symbol always found itself surrounded by
    two others?

39
How To Think Of Sequitur
  • Any time youre manipulating data as bytes, think
    of manipulating it as symbols
  • N-gram histograms on bytes - N-gram histograms
    on symbols
  • Bayesian probabilities on characters - Bayesian
    probabilities on symbols
  • Sequitur is not necessarily the best way to
    determine a grammar
  • Suffix Trees may be more accurate
  • Keiffer-Yang (redundant symbol extraction) a very
    good post-processing step to add
  • Ray removes In-Memory Grammar Requirement
  • Not all other solutions are linear time, though
  • Kind of cool to have a grammar that covers a
    750GB hard drive undergoing forensics ?s

40
Fuzzy Wuzzy Wuz A Symbol
  • Symbol analysis systems (language translators,
    etc) have issues w/ TMTOWTDI (Theres More Than
    One Way To Do It)
  • Very similar messages can be encapsulated in very
    different ways
  • Very similar messages can be encapsulated in very
    similar, but not identical ways
  • Sequitur only handles exact matches fuzzy
    grammar imputation doesnt appear to exist yet
  • We must develop this fuzziness to create
    byte-sourced XML schemas ?
  • It is a pretty wild concept, so ?
  • Are there any systems for analyzing complex,
    inequal but somewhat related sets of symbols?

41
Another Approach Dotplots
42
What Exactly Are We Doing
  • Jonathan HelmansDotPlot Patterns ALiteral
    Look at PatternLanguages offers
    anintroduction
  • Instead of to, be, not etc, we use chunks of
    data from arbitrary files
  • Instead of demanding perfect equality, we measure
    how similar the chunks are
  • If most of the bytes are in most of the same
    places, its pretty similar, if most are
    different, pretty dissimilar

43
New Video Analysis!(Nine Inch Nails, Closer)
44
More Video AnalysisCibo Matto / Michel Gondrys
Palindromatic Sugar Water
45
Weve figured out what some of these patterns
mean
46
But some code just comes out strange.
47
So How Might This Be Useful?
  • A) Format Identification
  • 1) Do different files appear different, and does
    the appearance reflect the existence of internal
    structure?
  • 2) Do different instances of the same file
    format appear similar?
  • 3) Does one format embedded in another make
    itself apparent?
  • B) Fuzzer Guidance
  • 1) Can we locate the actual byte offsets where
    one section ends and another begins?
  • 2) Can we visualize and compare fuzzer
    operations via Dotplots?

48
Format Identification
  • 1) Do different files appear different, and does
    the appearance reflect the existence of internal
    structure?
  • 2) Do different instances of the same file
    format appear similar?
  • 3) Does one format embedded in another make
    itself apparent?

49
Java Class Files
50
.NET Assemblies
51
CNNs Home Page
52
SMBTorture Traffic(Packets Note, Stop/Start Is
Visible)
53
Kernel32.dll
54
Chromosome 22(This is, after all, a genomics
hack)
55
The Legend Of Zelda
56
Format Identification
  • 1) Do different files appear different, and does
    the appearance reflect the existence of internal
    structure?
  • Answer Yes. They do.
  • 2) Do different instances of the same file
    format appear similar?
  • 3) Does one format embedded in another make
    itself apparent?

57
Books from Project GutenbergConsistent
Despite Englishs low information content, lack
of even mildly related strings causes little
self-similarity across symbol clusters
58
US CodeModerately Consistent
Legalese is a massively structured dialect.
Symbols appear in very distinct patterns that are
more reminiscent of machine code than text.
59
HTMLConsistent
HTML repeats smaller symbols (tags) and larger
symbol clusters (via template engines) regularly.
This shows up visually as a tightly repeating
pattern.
60
Java Class Files (Compared)Mildly Consistent
  • Binary code (be it bytecode or x86) tends to be
    very structured. Still, we are dependent on both
    the content and the compiler to generate distinct
    patterns.

61
x86Consistent (In Sections)
x86 tends not to be handwritten as such complex
instructions are emitted in a highly structured
form.
62
Exception?
  • 64 kilobyte graphical demonstration
  • Run through a packer ?
  • Compression removes patterns

63
NES Games
6502 Assembly Tends To Show Consistent Patterns,
But
64
Mario Games Look Rather Different.
  • Output is highly dependent on the compiler
  • Output is highly dependent upon the actual
    content
  • File formats are merely shells for actual
    content. You are analyzing the content the
    format is just syntactic sugar.

65
Format Identification
  • 1) Do different files appear different, and does
    the appearance reflect the existence of internal
    structure?
  • Answer Yes. They do.
  • 2) Do different instances of the same file
    format appear similar?
  • Answer Somewhat. Similar content looks like
    itself, but youre measuring the fundamental
    entropy of the underlying content, not the format
    of the content itself.
  • 3) Does one format embedded in another make
    itself apparent?

66
File Formats Contain Multiple SubformatsAnother
Look At Kernel32.DLL
These are all different parts of Kernel32.
67
Quickly Browsing Large FilesTilt-Shift View
  • Instead of measuring absolute Y against absolute
    X, make X relative
  • Advance through the file going down, look back a
    number of bytes going right

68
Complain All You Want.Hex Still Sucks.
69
Format Identification
  • 1) Do different files appear different, and does
    the appearance reflect the existence of internal
    structure?
  • Answer Yes. They do.
  • 2) Do different instances of the same file
    format appear similar?
  • Answer Somewhat. Similar content looks like
    itself, but youre measuring the fundamental
    entropy of the underlying content, not the format
    of the content itself.
  • 3) Does one format embedded in another make
    itself apparent?
  • Answer Yes. Multiple, distinct sections are
    clearly visible in a way that hex cannot show.

70
Fuzzer Guidance
  • 1) Can we locate the actual byte offsets where
    one section ends and another begins?
  • Why would we want to?
  • Fuzzers break parsers.
  • Many subformats to a format, many subparsers to a
    parser
  • To a rough level of approximation, fuzzing a
    single subformat lets you stress a single
    subparser
  • So once we split a file up, we can selectively
    attack one subparser at a time.
  • 2) Can we visualize and compare fuzzer
    operations via Dotplots?

71
Simple Math
We select an interesting blob from kernel32.dll.
The blob is at pixel offset 507x507, and is a
square around 570 pixels wide.
Window size on viz was 32. 50732 The interesti
ng section starts 16224 bytes into the file.
57032 The interesting section is 18240 bytes
long.
72
Whats The Actual Data?dd ifkernel32.dll bs1
skip16100 hexdump - more
73
Using Hardcorr as a first knife to locate
interesting-to-fuzz regions
74
Fuzzer Guidance
  • 1) Can we locate the actual byte offsets where
    one section ends and another begins?
  • Answer Yes. We can quickly route from the
    image to the byte offset, through basic
    arithmetic.
  • 2) Can we visualize and compare fuzzer
    operations via Dotplots?

75
Differentials
  • Major use of dotplots in bioinformatics is to
    compare one genome against another
  • Autocorrelation Compare A to A
  • Cross-Correlation Compare A to B
  • Most files are sufficiently dissimilar that not
    very interesting structure shows up
  • Notable exception Different versions of the
    same binary

76
Visual Bindiff!
77
MSVCR70.DLL v. MSVCR71.DLL
78
FuzzersVery Broken Patchers ?
Mangle.C Single Bit Differences
CFG9000 Large Scale Reordering
79
Fuzzer Guidance
  • 1) Can we locate the actual byte offsets where
    one section ends and another begins?
  • Answer Yes. We can quickly route from the
    image to the byte offset, through basic
    arithmetic.
  • 2) Can we visualize and compare fuzzer
    operations via Dotplots?
  • Answer Yes visual diffing effectively shows
    differences between files, including differences
    introduced by various flavors of fuzzers.

80
Conclusions
  • Lots of interesting work left to do
  • Unification of local presence of symbols, and
    global view of file format
  • Possible to do dotplots themselves in the
    symbolic domain
  • Use of dotplots to segment formats, which thus
    provides the tree we want for an XML schema
  • More colorful pretty pictures!

81
The Ancient TongueTCP/IP
  • Cant all be about pretty pictures ?
  • A new problem has popped up Network oligopolies
    are threatening to install firewalls that limit
    or eliminate bandwidth on a per-company basis
  • Their own media services might be fast, others
    will be slow
  • Their own VPN services might be fast, others will
    be slow
  • Question Is it possible to detect and locate
    devices violating network neutrality?

82
Whats The Closest Tool We Have?
  • Firewalk
  • Mike Schiffmans Firewall Analysis Tool
  • Packets elicit a ICMP Time Exceeded error if they
    reach a router with TTL0
  • TTL decremented by one for each hop, so you start
    low, you can trace the route to a host
  • A firewalled packet wont live long enough to
    reach TTL0
  • So you can locate the firewall, and divine things
    about its ruleset, based on when your packets
    stop getting ICMP Time Exceeded

83
Limitations of Firewalking
  • But Firewalk tells us what, not who is
    blockedand it tells us nothing about who is
    allowed to go fast, and who is made to go slow
  • Suddenly, we devolve to a much older question
    Is it possible to find out that a target firewall
    is, or is not, blocking against or accepting
    traffic from an arbitrary IP address?

84
TCP Does Speed Measurement
  • TCP speed analysis done blindly
  • Endpoints do not negotiate with one another
  • Everyone sends their packets, routers route what
    they will. Endpoints need to adjust to what the
    routers are willing to pass.
  • Routers communicate with endpoints by dropping
    their packets
  • Can we combine this router backchannel w/
    Firewalk?

85
In From The Side
  • What causes packets to drop?
  • Too many packets
  • What are we going to do?
  • Send too many packets
  • Two channels are set up
  • A primary channel, which drops packets at some
    known rate
  • A secondary channel, whose purpose it is to
    interfere (or not) with the primary channel
  • When the secondary interferes with the primary,
    we get feedback via the primary channel
  • The traffic composing the secondary channel can
    come from anywhere, be composed of anything, and
    can be TTLd just like in a normal firewalk.

86
The TTL Channel
  • Normally, you dont know which router along a
    path is dropping your packets
  • ?
  • If you are the source of the drop-inducing
    packets, you can control how far your noise goes
    out thus, you can discover which router is
    hitting its limit / censoring your net
    connection
  • ?

87
Scorchmarking
  • Why Scorchmarking?
  • Routers are burning packetsthose that get
    through might have a scorch mark or two ?
  • Basic Model
  • Client downloads a file from a site, at some
    given speed negotiated via TCP.
  • At the same time, traffic is injected from
    different IP addresses. This should cause
    drops.
  • If it doesnt, the network is either penalizing
    the primary channel (easy to drop against) or
    rewarding the secondary channel (resilient to
    drops)

88
Advanced Scorchmarking 0
  • Having to depend on a client is lame
  • Wouldnt it be nice if we could scan the Internet
    for these servers?
  • What fundamental service is a receiving client
    providing?
  • It is acknowledging our traffic letting us know
    how much it received, and how many milliseconds
    it took to receive it
  • Arent there other ways we could extract the same
    data from hosts?

89
Advanced Scorchmarking 1
  • What else will acknowledge receiving traffic from
    us?
  • TCP Servers
  • Sting, from Stefan Savage, used this to great
    effect
  • DNS Servers ?
  • Routers.
  • Supposedly, routers wont send more than a
    certain number of ICMP Time Exceeded packets per
    second
  • In reality, they seem to ICMP Time Exceeded ACK
    however much you throw at them
  • Even if they didnt, you could use the difference
    in ICMP Time Exceeded rates between Primary and
    Secondary channel, to determine whether
    interference was showing up.
  • Everyones got a NAT so you can query everyone
    for whether certain sorts of traffic are being
    blocked to them

90
Advanced Scorchmarking 2
  • So, yes.
  • You can scan for violations of Network
    Neutrality
  • You can find networks that are blocking or
    passing particular IP ranges
  • Its not exactly efficient though
  • Neutrality violations are easier to find than the
    standard FW case
  • Firewalls are normally between the WAN and the
    LAN (Slow Net - FW - Fast Net)
  • Neutrality violators are mid-WAN (Slow Net - Fw
    - Slow Net - Fast Net)
  • Easier to overload the slow net after the
    firewall
  • Boxes with max TTL rates override this

91
Speed Limits
  • Fundamental Problem Have to max out bandwidth
    on the link to trigger the backchannel
  • No packets dropping, no data
  • Means you have to DoS a link not
    scalable/legal
  • Potential Solution Find capped acknowledgers
  • The mythical ICMP Time Exceeded rate limit works
    well
  • Primary and Secondary channel both eliciting
    ITEs
  • When secondary channel gets a packet through, it
    takes up a slot on the primary channels
  • ITE is perfect, since you can TTL limit any
    packet
  • Depends on the firewall passing the primarys
    ITEs
  • Maybe Linux / NATs actually implement rate
    limits?
  • Another option What if we have code on the
    client?

92
Windows Media PlayerMore Than Just DRM. Really!
  • Bulk Transfer RTP
  • Runs over Unicast UDP
  • Yes, the same Unicast UDP that penetrates NAT so
    well!
  • Flow Control / Quality Monitoring RTCP
  • No technical reason RTCP needs to go back to the
    same address that RTP stream is coming from
  • So We pretend to provide media streams from all
    sorts of sites, and use WMP to collect traffic
    stats for us ?
  • It might work
About PowerShow.com