Title: Passive Operating System Identification: What can be inferred from TCP Syn packet headers
1Passive Operating System Identification What
can be inferred from TCP Syn packet headers?
- Patricia Carter Alan Berger
- NSWCDD, B10
- June 12, 2002
2Outline
Approach Data description Feature
generation Centroid hierarchical
clustering Superparamagnetic clustering Bayesian
classifier
3Approach
Features for each machine/IP address passively
obtained packets headers TCP Synchronize packets
only Based on multiple packets from each
machine Minimum packets used per IP is 10, max
is 1000 Feature but not signature Actual
operating system is determined from a database,
is not 100 accurate Source machine IPs are
divided into training and test sets
4Operating Systems two granularities
Dos Irix 6.1, 6.2, 6.5 Linux 6.0, 6.1,
6.2,7.0, 7.1, 7.2 Apple Mac os 7.5, 8, 8.1,
8.5, 8.6, 9 Solaris 2.4, 2.5, 2.6, 2.7, 7,
8 Windows 3.1, 95, 98, NT4, 2000, ME, XP
5Test versus Training sets
OS type train IP test IPs dos
25 44 irix 17 17
linux 24 26 apple 6
7 mac os 30 32
solaris 26 29 windows
113 200
6IP Header
Vers
Total length
IHL
Type of service
Identification number
Flags
Fragment offset
Header checksum
Time to live
Protocol
Source address
Destination address
options
IP Flags D dont fragment (X and M not used
here)
7TCP Header
Destination port
Source Port
Sequence Number
Acknowledgement number
reserve
Flags
offset
Window
Urgent pointer
Checksum
Options
TCP options end of options list, no operation
(pad), maximum segment
size, window scale,
selective ACK ok, timestamp
8All features
meanttl meantos ntos
meanlog2win selog2win nnwin
meandf meanincripid stdincripid
meanlog2seq rangelog2seq
meanlog2incrseq stdlog2incrseq
meanlog2sport maxlog2sport
minlog2sport meanincrsport stdincrsport
meannops modeMSS ndiff_MSS
ntimepstamps nwscale nsackok nendtcp
meaniptotlen stdiptotlen some examples
follow
9TTL feature
10Mean IP total length feature
11Mean log2 sequence number increment
12Tcp option wscale feature
13Mean log2 window
14Parallel coordinates view of mean features
1518- feature set
1615 - features
17Centroid Hierarchical Clusteringof OSFP Training
Data
dos irix linus apple macos
solaris windows
18 Superparamagnetic Clustering
(SPC)
. Algorithm of
Domany et al.
Clusters Components of Graph. Edges in Graph
of Points to be Clustered Are Determined Via a
Statistical Mechanics Model.
A state S of the set of points to be clustered
list of edges
which are present. The
energy E of a state also depends on
the temperature
parameter T and the collective
geometry of the points. The probability
p(S) of a state
S is inversely proportional to E
(Boltzmann distribution). An edge e
is present in the
graph used to determine the clusters at
a given T if it is
present more than half the time
(thermodynamic average the sum of p(S) over
all of the states S where
e is present is gt ½). Edges which are present
are calculated by the importance sampling Monte
Carlo algorithm of Swendsen Wang . Varying T ?
cluster hierarchy.
19Our Extension of SPC Clustering to Include
Training Data
In the SPC algorithm each point has up to
K (e.g. 50)
neighbors. For training data
points belonging to the same
cluster,
there is always an edge between them if
they are neighbors.
Training points
which are not in the same cluster can
never have an
edge between them. Points
(and possible edges) which are not in the
training set are treated
essentially as in the
original SPC method. This allows the natural
incorporation of training data within the
SPC algorithm which obtains clusters that
reflect the collective geometry of the points
being clustered. The original SPC method
formulated by Domany et al. was motivated by the
physical chemistry of magnetic materials.
20SPC confusion matrix
True Op Sys Number Predicted in Each OS
of
Test dos irx lin ap mac sol win ?
Cases correct dos 3 0 0 4 0
0 31 6 44 6 irix 3 14 0 0
0 0 0 0 17 82 linux 1 0 25
0 0 0 0 0 26 96 apple 1 0
0 2 0 0 4 0 7 28 macos 0
0 0 28 3 0 0 1 32 9 solaris
0 0 0 0 0 21 0 8 29
72 windows 52 2 1 5 2 3 121 14
200 60
21SPC confusion matrix applemac, doswindows
True Op Sys Number Predicted in Each OS of
Test
irx lin mac sol win ? Cases correct
irix 14 0 0 0 3 0 17
82 linux 0 25 0 0 1 0 26
96 macos 0 0 33 0 5 1 39
85 solaris 0 0 0 21 0 8 29
72 windows 2 1 11 3 207 20 244
85 total . correct
85
223-d image 1
233-d image 2
243-d image 3
25Simple Classifier based on theBayes Formula
Conditional probabilities are computed from the
training data
For each host, and each class i (independence
approximation)
Estimated class i is where r(i)max(r(j))
j
26Bayesian setup parameters
classes iri lin
macapp sol doswin priors
1.0000 1.0000 1.0000 1.0000
1.0000 features meanttl
meanlog2win meandf
stdincripid meanlog2seq
rangelog2seq meanlog2incrseq
stdlog2incrseq meanlog2sport
stdincrsport meannops
ntimepstamps nwscale
nsackok meaniptotlen
27Bayesian Confusion training data
CONFUSION
MATRIX
TRUE\EST iri lin macapp sol
doswin hosts correct iri 17 0 0
0 0 17 100 lin 0 24
0 0 0 24 100 macapp 0
0 32 0 4 36 88 sol
0 0 0 26 0 26
100 doswin 1 0 1 0 136 138
98 overall correct
97.5 -
28Bayesian Confusion test data
CONFUSION MATRIX
TRUE\EST
iri lin macapp sol doswin hosts correct
iri 15 0 0 0 2 17
88 lin 1 24 0 0 1 26
92 macapp 2 0 31 0 6
39 79 sol 0 0 0 29 0
29 100 doswin 0 0 2 0
260 262 99 overall
correct 96.2
29Conclusions
Features from packet header information can be
used for robust passive determination of
operating systems SPC and Bayesian methods are
useful classifiers for this problem Future
work refinement of feature sets finer
granularity of OS determination
cross-validation