BAYESIAN NETWORK - PowerPoint PPT Presentation

About This Presentation
Title:

BAYESIAN NETWORK

Description:

Outlook=sunny, Temperature=cool, Humidity=high, Wind=strong ... P(Temperature=cool|PlayTennis=yes) = 3/9 = 0.33. P(Temperature=cool|PlayTennis=no) = 1/5 = .20 ... – PowerPoint PPT presentation

Number of Views:292
Avg rating:3.0/5.0
Slides: 82
Provided by: vaibhav6
Category:
Tags: bayesian | network | cool

less

Transcript and Presenter's Notes

Title: BAYESIAN NETWORK


1
BAYESIAN NETWORK
  • Submitted By
  • Faisal Islam
  • Srinivasan Gopalan
  • Vaibhav Mittal
  • Vipin Makhija
  • Prof. Anita Wasilewska
  • State University of New York at Stony Brook

2
References
  • 1Jiawei HanData Mining Concepts and
    Techniques,ISBN 1-53860-489-8
  • Morgan Kaufman Publisher.
  • 2 Stuart Russell,Peter Norvig Artificial
    Intelligence A modern Approach ,Pearson
    education.
  • 3 Kandasamy,Thilagavati,Gunavati , Probability,
    Statistics and Queueing Theory , Sultan Chand
    Publishers.
  • 4 D. Heckerman A Tutorial on Learning with
    Bayesian Networks, In Learning in Graphical
    Models, ed. M.I. Jordan, The MIT Press, 1998.
  • 5 http//en.wikipedia.org/wiki/Bayesian_probabil
    ity
  • 6 http//www.construction.ualberta.ca/civ606/myF
    iles/Intro20to20Belief20Network.pdf
  • 7 http//www.murrayc.com/learning/AI/bbn.shtml
  • 8 http//www.cs.ubc.ca/murphyk/Bayes/bnintro.ht
    ml
  • 9 http//en.wikipedia.org/wiki/Bayesian_belief_n
    etwork

3
CONTENTS
  • HISTORY
  • CONDITIONAL PROBABILITY
  • BAYES THEOREM
  • NAÏVE BAYES CLASSIFIER
  • BELIEF NETWORK
  • APPLICATION OF BAYESIAN NETWORK
  • PAPER ON CYBER CRIME DETECTION

4
HISTORY
  • Bayesian Probability was named after
    Reverend Thomas Bayes (1702-1761).
  • He proved a special case of what is currently
    known as the Bayes Theorem.
  • The term Bayesian came into use around the
    1950s.
  • Pierre-Simon, Marquis de Laplace (1749-1827)
    independently proved a generalized version of
    Bayes Theorem.
  • http//en.wikipedia.org/wiki/Bayesian_probability

5
HISTORY (Cont.)
  • 1950s New knowledge in Artificial Intelligence
  • 1958 Genetic Algorithms by Friedberg (Holland and
    Goldberg 1985)
  • 1965 Fuzzy Logic by Zadeh at UC Berkeley
  • 1970 Bayesian Belief Network at Stanford
    University (Judea Pearl 1988)
  • The ideas proposed above was not fully
    developed until later. BBN became popular in
    the 1990s.
  • http//www.construction.ualberta.ca/civ606/myFiles
    /Intro20to20Belief20Network.pdf

6
HISTORY (Cont.)
  • Current uses of Bayesian Networks
  • Microsofts printer troubleshooter.
  • Diagnose diseases (Mycin).
  • Used to predict oil and stock prices
  • Control the space shuttle
  • Risk Analysis Schedule and Cost Overruns.

7
CONDITIONAL PROBABILITY
  • Probability How likely is it that an event will
    happen?
  • Sample Space S
  • Element of S elementary event
  • An event A is a subset of S
  • P(A)
  • P(S) 1
  • Events A and B
  • P(AB)- Probability that event A occurs given
    that event B has already occurred.
  • Example
  • There are 2 baskets. B1 has 2 red ball and 5 blue
    ball. B2 has 4 red ball and 3 blue ball. Find
    probability of picking a red ball from
    basket 1?

8
CONDITIONAL PROBABILITY
  • The question above wants P(red ball
    basket 1).
  • The answer intuitively wants the probability of
    red ball from only the sample space of basket
    1.
  • So the answer is 2/7
  • The equation to solve it is
  • P(AB) P(AnB)/P(B) Product Rule
  • P(A,B) P(A)P(B) If A and B are independent
  • How do you solve P(basket2 red ball) ???

9
BAYESIAN THEOREM
  • A special case of Bayesian Theorem
  • P(AnB) P(B) x P(AB)
  • P(BnA) P(A) x P(BA)
  • Since P(AnB) P(BnA),
  • P(B) x P(AB) P(A) x P(BA)
  • gt P(AB) P(A) x P(BA) / P(B)

A
B
10
BAYESIAN THEOREM
  • Solution to P(basket2 red ball) ?
  • P(basket 2 red ball) P(b2) x P(r b2) /
    P(r)
  • (1/2) x (4/7) / (6/14)
  • 0.66

11
BAYESIAN THEOREM
  • Example 2 A medical cancer diagnosis
    problem
  • There are 2 possible outcomes of a diagnosis
    ve, -ve. We know .8 of world population has
    cancer. Test gives correct ve result 98 of the
    time and gives correct ve result 97 of the
    time.
  • If a patients test returns ve, should we
    diagnose the patient as having cancer?

12
BAYESIAN THEOREM
  • P(cancer) .008 P(-cancer) .992
  • P(vecancer) .98 P(-vecancer) .02
  • P(ve-cancer) .03 P(-ve-cancer) .97
  • Using Bayes Formula
  • P(cancerve) P(vecancer)xP(cancer) / P(ve)
  • 0.98 x 0.008 .0078 / P(ve)
  • P(-cancerve) P(ve-cancer)xP(-cancer) /
    P(ve)
  • 0.03 x 0.992 0.0298 / P(ve)
  • So, the patient most likely does not have cancer.

13
BAYESIAN THEOREM
  • General Bayesian Theorem
  • Given E1, E2,,En are mutually disjoint events
    and P(Ei) ? 0, (i 1, 2,, n)
  • P(Ei/A) P(Ei) x P(AEi) / S P(Ei) x P(AEi)
  • i 1, 2,, n

14
BAYESIAN THEOREM
  • Example
  • There are 3 boxes. B1 has 2 white, 3 black
    and 4 red balls. B2 has 3 white, 2 black and 2
    red balls. B3 has 4 white, 1 black and 3 red
    balls. A box is chosen at random and 2 balls are
    drawn. 1 is white and other is red. What is the
    probability that they came from the first box??

15
BAYESIAN THEOREM
  • Let E1, E2, E3 denote events of choosing B1, B2,
    B3 respectively. Let A be the event that 2 balls
    selected are white and red.
  • P(E1) P(E2) P(E3) 1/3
  • P(AE1) 2c1 x 4c1 / 9c2 2/9
  • P(AE2) 3c1 x 2c1 / 7c2 2/7
  • P(AE3) 4c1 x 3c1 / 8c2 3/7

16
BAYESIAN THEOREM
  • P(E1A) P(E1) x P(AE1) / S P(Ei) x P(AEi)
  • 0.23727
  • P(E2A) 0.30509
  • P(E3A) 1 (0.23727 0.30509) 0.45764

17
BAYESIAN CLASSIFICATION
  • Why use Bayesian Classification
  • Probabilistic learning Calculate explicit
    probabilities for hypothesis, among the most
    practical approaches to certain types of
    learning problems
  • Incremental Each training example can
    incrmentally increase/decrease the probability
    that a hypothesis is correct. Prior knowledge
    can be combined with observed data.

18
BAYESIAN CLASSIFICATION
  • Probabilistic prediction Predict multiple
    hypotheses, weighted by their probabilities
  • Standard Even when Bayesian methods are
    computationally intractable, they can provide a
    standard of optimal decision making against
    which other methods can be measured

19
NAÏVE BAYES CLASSIFIER
  • A simplified assumption attributes are
    conditionally independent
  • Greatly reduces the computation cost, only
    count the class distribution.

20
NAÏVE BAYES CLASSIFIER
  • The probabilistic model of NBC is to find the
    probability of a certain class given multiple
    dijoint (assumed) events.
  • The naïve Bayes classifier applies to learning
    tasks where each instance x is described by a
    conjunction of attribute values and where the
    target function f(x) can take on any value from
    some finite set V. A set of training examples of
    the target function is provided, and a new
    instance is presented, described by the tuple
    of attribute values lta1,a2,,angt. The learner is
    asked to predict the target value, or
    classification, for this new instance.

21
NAÏVE BAYES CLASSIFIER
  • Abstractly, probability model for a classifier is
    a conditional model
  • P(CF1,F2,,Fn)
  • Over a dependent class variable C with a small
    nuumber of outcome or classes conditional over
    several feature variables F1,,Fn.
  • Naïve Bayes Formula
  • P(CF1,F2,,Fn) argmaxc P(C) x P(F1C) x
    P(F2C) xx P(FnC) / P(F1,F2,,Fn)
  • Since P(F1,F2,,Fn) is common to all
    probabilities, we donot need to evaluate the
    denomitator for comparisons.

22
NAÏVE BAYES CLASSIFIER
  • Tennis-Example

23
NAÏVE BAYES CLASSIFIER
  • Problem
  • Use training data from above to classify the
    following instances
  • ltOutlooksunny, Temperaturecool,
    Humidityhigh, Windstronggt
  • ltOutlookovercast, Temperaturecool,
    Humidityhigh, Windstronggt

24
NAÏVE BAYES CLASSIFIER
  • Answer to (a)
  • P(PlayTennisyes) 9/14 0.64
  • P(PlayTennisn) 5/14 0.36
  • P(OutlooksunnyPlayTennisyes) 2/9 0.22
  • P(OutlooksunnyPlayTennisno) 3/5 0.60
  • P(TemperaturecoolPlayTennisyes) 3/9 0.33
  • P(TemperaturecoolPlayTennisno) 1/5 .20
  • P(HumidityhighPlayTennisyes) 3/9 0.33
  • P(HumidityhighPlayTennisno) 4/5 0.80
  • P(WindstrongPlayTennisyes) 3/9 0.33
  • P(WindstrongPlayTennisno) 3/5 0.60

25
NAÏVE BAYES CLASSIFIER
  • P(yes)xP(sunnyyes)xP(coolyes)xP(highyes)xP(stro
    ngyes) 0.0053
  • P(no)xP(sunnyno)xP(coolno)xP(highno)x
    P(strongno) 0.0206
  • So the class for this instance is no. We can
    normalize the probility by
  • 0.0206/0.02060.0053 0.795

26
NAÏVE BAYES CLASSIFIER
  • Answer to (b)
  • P(PlayTennisyes) 9/14 0.64
  • P(PlayTennisno) 5/14 0.36
  • P(OutlookovercastPlayTennisyes) 4/9 0.44
  • P(OutlookovercastPlayTennisno) 0/5 0
  • P(TemperaturecoolPlayTennisyes) 3/9 0.33
  • P(TemperaturecoolPlayTennisno) 1/5 .20
  • P(HumidityhighPlayTennisyes) 3/9 0.33
  • P(HumidityhighPlayTennisno) 4/5 0.80
  • P(WindstrongPlayTennisyes) 3/9 0.33
  • P(WindstrongPlayTennisno) 3/5 0.60

27
NAÏVE BAYES CLASSIFIER
  • Estimating Probabilities
  • In the previous example, P(overcastno) 0 which
    causes the formula-
  • P(no)xP(overcastno)xP(coolno)xP(highno)xP(stron
    gnno) 0.0
  • This causes problems in comparing because the
    other probabilities are not considered. We can
    avoid this difficulty by using m- estimate.

28
NAÏVE BAYES CLASSIFIER
  • M-Estimate Formula
  • c k / n m where c/n is the original
    probability used before, k1 and m
    equivalent sample size.
  • Using this method our new values of
    probility is given below-

29
NAÏVE BAYES CLASSIFIER
  • New answer to (b)
  • P(PlayTennisyes) 10/16 0.63
  • P(PlayTennisno) 6/16 0.37
  • P(OutlookovercastPlayTennisyes) 5/12 0.42
  • P(OutlookovercastPlayTennisno) 1/8 .13
  • P(TemperaturecoolPlayTennisyes) 4/12 0.33
  • P(TemperaturecoolPlayTennisno) 2/8 .25
  • P(HumidityhighPlayTennisyes) 4/11 0.36
  • P(HumidityhighPlayTennisno) 5/7 0.71
  • P(WindstrongPlayTennisyes) 4/11 0.36
  • P(WindstrongPlayTennisno) 4/7 0.57

30
NAÏVE BAYES CLASSIFIER
  • P(yes)xP(overcastyes)xP(coolyes)xP(highyes)xP(s
    trongyes) 0.011
  • P(no)xP(overcastno)xP(coolno)xP(highno)xP(stron
    gnno) 0.00486
  • So the class of this instance is yes

31
NAÏVE BAYES CLASSIFIER
  • The conditional probability values of all the
  • attributes with respect to the class are
  • pre-computed and stored on disk.
  • This prevents the classifier from computing the
    conditional probabilities every time it runs.
  • This stored data can be reused to reduce the
  • latency of the classifier.

32
BAYESIAN BELIEF NETWORK
  • In Naïve Bayes Classifier we make the assumption
    of class conditional independence, that is given
    the class label of a sample, the value of the
    attributes are conditionally independent of one
    another.
  • However, there can be dependences between
    value of attributes. To avoid this we use
    Bayesian Belief Network which provide joint
    conditional probability distribution.
  • A Bayesian network is a form of probabilistic
    graphical model. Specifically, a Bayesian
    network is a directed acyclic graph of nodes
    representing variables and arcs representing
    dependence relations among the
    variables.

33
(No Transcript)
34
BAYESIAN BELIEF NETWORK
  • A Bayesian network is a representation of the
    joint distribution over all the variables
    represented by nodes in the graph. Let the
    variables be X(1), ..., X(n).
  • Let parents(A) be the parents of the node A. Then
    the joint distribution for X(1) through X(n) is
    represented as the product of the probability
    distributions P(XiParents(Xi)) for i 1
    to n. If X has no parents, its probability
    distribution is said to be unconditional,
    otherwise it is conditional.

35
BAYESIAN BELIEF NETWORK
36
BAYESIAN BELIEF NETWORK
  • By the chaining rule of probability, the joint
    probability of all the nodes in the graph
    above is
  • P(C, S, R, W) P(C) P(SC) P(RC)
    P(WS,R)
  • WWet Grass, CCloudy, RRain,
    SSprinkler
  • Example P(Wn-RnSnC)
  • P(WS,-R)P(-RC)P(SC)P(C)
  • 0.90.20.10.5 0.009

37
BAYESIAN BELIEF NETWORK
  • What is the probability of wet grass on a given
    day - P(W)?
  • P(W) P(WSR) P(S) P(R)
  • P(WS-R) P(S) P(-R)
  • P(W-SR) P(-S) P(R)
  • P(W-S-R) P(-S) P(-R)
  • Here P(S) P(SC) P(C) P(S-C) P(-C)
  • P(R) P(RC) P(C) P(R-C) P(-C)
  • P(W) 0.5985

38
Advantages of Bayesian Approach
  • Bayesian networks can readily handle
  • incomplete data sets.
  • Bayesian networks allow one to learn
  • about causal relationships
  • Bayesian networks readily facilitate use of
    prior knowledge.

39
APPLICATIONS OF Bayesian-Network
40
Sources/References
  • Naive Bayes Spam Filtering Using
    Word-Position-Based Attributes-
    http//www.ceas.cc/papers-2005/144.pdf
  • by- Johan Hovold, Department of Computer
    Science,Lund University Box 118, 221 00
    Lund, Sweden.E-mail johan.hovold.363_at_student.lu.s
    e
  • Presented at CEAS 2005 Second Conference on
    Email and Anti-SpamJuly 21 22, at Stanford
    University
  • Tom Mitchell , Machine Learning , Tata Mcgraw
    Hill
  • A Bayesian Approach to Filtering Junk EMail,
  • Mehran Sahami Susan Dumaisy David Heckermany
    Eric Horvitzy Gates Building
  • Computer Science Department Microsoft
    Research, Stanford University Redmond W
  • Stanford CA fsdumais heckerma
    horvitzgmicrosoftcom
  • Presented at AAAI Workshop on Learning for
    Text Categorization, July 1998, Madison,
    Wisconsin

41
Problem???
  • real world Bayesian network application
  • Learning to classify text.
  • Instances are text documents
  • we might wish to learn the target concept
    electronic news articles that I find
    interesting, or pages on the World Wide Web
    that discuss data mining topics.
  • In both cases, if a computer could learn the
    target concept accurately, it could automatically
    filter the large volume of
  • online text documents to present only the
    most relevant
  • documents to the user.

42
TECHNIQUE
  • learning how to classify text, based on the
  • naive Bayes classifier
  • its a probabilistic approach and is among the
    most effective algorithms currently known for
    learning to classify text documents,
  • Instance space X consists of all possible text
    documents
  • given training examples of some unknown target
    function f(x), which can take on any value from
    some finite set V
  • we will consider the target function classifying
    documents as interesting or uninteresting to a
    particular person, using the target values like
    and dislike to indicate these two classes.

43
Design issues
  • how to represent an arbitrary text document in
    terms of attribute values
  • decide how to estimate the probabilities required
    by the naive Bayes classifier

44
Approach
  • Our approach to representing arbitrary text
    documents is disturbingly simple Given a text
    document, such as this paragraph, we define an
    attribute for each word position in the document
    and define the value of that attribute to be the
    English word found in that position. Thus, the
    current paragraph would be described by 111
    attribute values, corresponding to the 111 word
    positions. The value of the first attribute is
    the word our, the value of the second attribute
    is the word approach, and so on. Notice that
    long text documents will require a larger number
    of attributes than short documents. As we shall
    see, this will not cause us any trouble.

45
ASSUMPTIONS
  • assume we are given a set of 700 training
    documents that a friend has classified as dislike
    and another 300 she has classified as like
  • We are now given a new document and asked to
    classify it
  • let us assume the new text document is the
    preceding paragraph

46
  • We know (P(like) .3 and P (dislike) .7 in the
    current example
  • P(ai , wkvj) (here we introduce wk to indicate
    the kth word in the English vocabulary)
  • estimating the class conditional probabilities
    (e.g., P(ai ourIdislike)) is more problematic
    because we must estimate one such probability
    term for each combination of text position,
    English word, and target value.
  • there are approximately 50,000 distinct words in
    the English vocabulary, 2 possible target values,
    and 111 text positions in the current example, so
    we must estimate 2111 50, 000 10 million such
    terms from the training data.
  • we make assumption that reduces the number of
    probabilities that must be estimated

47
  • we shall assume the probability of encountering a
    specific word wk (e.g., chocolate) is
    independent of the specific word position being
    considered (e.g., a23 versus a95) .
  • we estimate the entire set of probabilities P(a1
    wkvj), P(a2 wkvj)... by the single
    position-independent probability P(wklvj)
  • net effect is that we now require only 2 50, 000
    distinct terms of the form P(wklvj)
  • We adopt the rn-estimate, with uniform priors and
    with m equal to the size of the word vocabulary
  • n ? total number of word positions in all
    training examples whose target value is v, nk is
    the number of times word Wk is found among these
    n word positions, and Vocabulary is the total
    number of distinct words (and other tokens) found
    within the training data.

48
Final Algorithm
  • Examples is a set of text documents along with
    their target values. V is the set of all possible
    target values. This function learns the
    probability terms P( wk vj), describing the
    probability that a randomly drawn word from a
    document in class vj will be the English word Wk.
    It also learns the class prior probabilities
    P(vi). 1. collect all words, punctuation, and
    other tokens that occur in Examples Vocabulary
    ? set of all distinct words tokens occurring in
    any text document from Examples 2. calculate the
    required P(vi) and P( wk vj) probability terms
    For each target value vj in V do docsj ?
    the subset of documents from Examples for which
    the target value is vj P(v1) ? IdocsjI /
    \Examplesl Textj a single document created by
    concatenating all members of docsj n ? total
    number of distinct word positions in Textj for
    each word Wk in Vocabulary nk ? number of
    times word wk occurs in Textj P(wkIvj) ?
    nk1/nVocabulary
  • CLASSIFY_NAIVE_BAYES_TEXT( Doc) Return the
    estimated target value for the document Doc. ai
    denotes the word found in the ith position within
    Doc. positions ? all word positions in Doc
    that contain tokens found in Vocabulary Return
    VNB, where

49
  • During learning, the procedure LEARN_NAIVE_BAYES_T
    EXT examines all training documents to extract
    the vocabulary of all words and tokens that
    appear in the text, then counts their frequencies
    among the different target classes to obtain the
    necessary probability estimates. Later, given a
    new document to be classified, the procedure
    CLASSIFY_NAIVE_BAYESTEXT uses these probability
    estimates to calculate VNB according to Equation
    Note that any words appearing in the new document
    that were not observed in the training set are
    simply ignored by CLASSIFY_NAIVE_BAYESTEXT

50
Effectiveness of the Algorithm
  • Problem ? classifying usenet news articles
  • target classification for an article ?name of the
    usenet newsgroup in which the article appeared
  • In the experiment described by Joachims (1996),
    20 electronic newsgroups were considered
  • 1,000 articles were collected from each
    newsgroup, forming a data set of 20,000
    documents. The naive Bayes algorithm was then
    applied using two-thirds of these 20,000
    documents as training examples, and performance
    was measured over the remaining third.
  • 100 most frequent words were removed (these
    include words such as the and of), and any
    word occurring fewer than three times was also
    removed. The resulting vocabulary contained
    approximately 38,500 words.
  • The accuracy achieved by the program was 89.

comp.graphics misc.forsale soc.religion.christian alt.atheism
comp.os.ms-winclows.misc rec.autos talk.politics.guns sci.space
cornp.sys.ibm.pc.hardware rec.sport.baseball talk.politics.mideast sci.crypt
comp.windows.x rec.motorcycles talk.politics.misc sci.electronics
comp.sys.mac.hardware rec.sport.hockey talk.creligion.misc sci .med
51
APPLICATIONS
  • A newsgroup posting service that learns to assign
    documents to the appropriate newsgroup.
  • NEWSWEEDER systema program for reading netnews
    that allows the user to rate articles as he or
    she reads them. NEWSWEEDER then uses these rated
    articles (i.e its learned profile of user
    interests to suggest the most highly rated new
    articles each day
  • Naive Bayes Spam Filtering Using Word-
    Position-Based Attributes

52
Thank you !
53
  • Bayesian Learning Networks
  • Approach to
  • Cybercrime Detection

54
Bayesian Learning Networks Approach to
Cybercrime DetectionN S ABOUZAKHAR, A GANI
and G MANSONThe Centre for Mobile Communications
Research(C4MCR),University of Sheffield,
SheffieldRegent Court, 211 Portobello
Street,Sheffield S1 4DP, UKN.Abouzakhar_at_dcs.shef
.ac.ukA.Gani_at_dcs.shef.ac.ukG.Manson_at_dcs.shef.ac.
ukM ABUITBEL and D KINGThe Manchester School
of Engineering,University of ManchesterIT
Building, Room IT 109,Oxford Road,Manchester
M13 9PL, UKmostafa.abuitbel_at_stud.man.ac.ukDavid.
king_at_man.ac.uk
55
  • REFERENCES
  • David J. Marchette, Computer Intrusion Detection
    and Network Monitoring,
  • A statistical Viewpoint, 2001,Springer-Verlag,
    New York, Inc, USA.
  • 2. Heckerman, D. (1995), A Tutorial on Learning
    with Bayesian Networks, Technical
  • Report MSR-TR-95-06, Microsoft Corporation.
  • 3. Michael Berthold and David J. Hand,
    Intelligent Data Analysis, An Introduction, 1999,
    Springer, Italy.
  • 4. http//www.ll.mit.edu/IST/ideval/data/data_inde
    x.html, accessed on 01/12/2002
  • 5. http//kdd.ics.uci.edu/ , accessed on
    01/12/2002.
  • 6. Ian H. Witten and Eibe Frank, Data Mining,
    Practical Machine Learning Tools and
  • Techniques with Java Implementations, 2000,
    Morgan Kaufmann, USA.
  • 7. http//www.bayesia.com , accessed on 20/12/2002

56
Motivation behind the paper..
  • Growing dependence of modern society
  • on telecommunication and information
  • networks.
  • Increase in the number of interconnected
  • networks to the Internet has led to an
  • increase in security threats and cyber crimes.

57
Structure of the paper
  • In order to detect distributed network
  • attacks as early as possible, an under
  • research and development probabilistic
  • approach, based on Bayesian networks
  • has been proposed.

58
Where can this model be utilized
  • Learning Agents which deploy Bayesian network
    approach are considered to be a promising and
    useful tool in determining suspicious early
    events of Internet
  • threats.

59
Before we look at the details given in the paper
lets understand what Bayesian Networks are and
how they are constructed.
60
Bayesian Networks
  • A simple, graphical notation for conditional
    independence assertions and hence for compact
    specification of full
  • joint distributions.
  • Syntax
  • a set of nodes, one per variable
  • a directed, acyclic graph (link "directly
    influences")
  • a conditional distribution for each node given
    its
  • parents
  • P (Xi Parents (Xi))
  • In the simplest case, conditional distribution
    represented as a conditional probability table
    (CPT) giving the
  • distribution over Xi for each combination of
    parent values

61
Some conventions.
  • Variables depicted as nodes
  • Arcs represent probabilistic dependence between
  • variables.
  • Conditional probabilities
  • encode the strength of
  • dependencies.
  • Missing arcs implies
  • conditional independence.

62
Semantics
  • The full joint distribution is defined as the
    product of the
  • local conditional distributions
  • P (X1, ,Xn) pi 1 P (Xi Parents(Xi))
  • e.g., P(j ? m ? a ? ?b ? ?e)
  • P (j a) P (m a) P (a ?b, ?e) P (?b) P
    (?e)

63
Example of Construction of a BN
64
Back to the discussion of the paper.
65
Description
  • This paper shows how probabilistically Bayesian
    network detects communication network attacks,
    allowing for generalization of Network Intrusion
    Detection Systems
  • (NIDSs).

66
Goal
  • How well does our model detect or classify
  • attacks and respond to them later on.
  • The system requires the estimation of two
  • quantities
  • The probability of detection (PD)
  • Probability of false alarm (PFA).
  • It is not possible to simultaneously achieve a PD
    of 1 and PFA of 0.

67
Input DataSet
  • The 2000 DARPA Intrusion Detection Evaluation
    Program which was prepared and managed by MIT
    Lincoln Labs has provided the necessary dataset.
  • Sample dataset

68
Construction of the network
  • The following figure shows the Bayesian
  • network that has been automatically
  • constructed by the learning algorithms of
  • BayesiaLab.
  • The target variable, activity_type, is directly
  • connected to the variables that heavily
  • contribute to its knowledge such as service
  • and protocol_type.

69
(No Transcript)
70
Data Gathering
  • MIT Lincoln Labs set up an environment to
  • acquire several weeks of raw TCP dump
  • data for a local-area network (LAN)
  • simulating a typical U.S. Air Force LAN. The
  • generated raw dataset contains about few
  • million connection records.

71
Mapping the simple Bayesian Network that we saw
to the one used in the paper
72
Observation 1
  • As shown in the next figure, the most probable
    activity corresponds to a smurf attack (52.90),
    an ecr_i (ECHO_REPLY) service (52.96) and an
    icmp protocol (53.21).

73
(No Transcript)
74
Observation 2
  • What would happen if the probability of receiving
    ICMP protocol packets is increased? Would the
    probability of having a smurf attack increase?
  • Setting the protocol to its ICMP value increases
    the probability of having a smurf attack from
    52.90 to 99.37.

75
(No Transcript)
76
Observation 3
  • Lets look at the problem from the opposite
    direction. If we set the probability of portsweep
    attack to 100,then the value of some associated
    variables would inevitably vary.
  • We note from Figure 4 that the probabilities of
    the TCP protocol and private service have been
    increased from 38.10 to 97.49 and from 24.71
    to 71.45 respectively. Also, we can notice an
    increase in the REJ and RSTR flags.

77
(No Transcript)
78
How do the previous examples work??PROPOGATION
Data
Data
79
Benefits of the Bayesian Model
  • The benefit of using Bayesian IDSs is the ability
    to adjust our IDSs sensitivity.
  • This would allow us to trade off between
  • accuracy and sensitivity.
  • Furthermore, the automatic detection network
    anomalies by learning allows distinguishing the
    normal activities from the abnormal ones.
  • Allow network security analysts to see the
  • amount of information being contributed by
    each variable in the detection model to the
    knowledge of the target node

80
Performance evaluation
81
Thank you !
QUESTIONS OR QUERIES
Write a Comment
User Comments (0)
About PowerShow.com