Artificial Immune System and Its Applications - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Artificial Immune System and Its Applications

Description:

Basic Immune Models and Algorithms. Bone Marrow Models. Negative Selection Algorithms ... Repertoire of the immune system is complete (Perelson, 1989) ... – PowerPoint PPT presentation

Number of Views:2052
Avg rating:3.0/5.0
Slides: 104
Provided by: cisPk
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Artificial Immune System and Its Applications


1
Artificial Immune System and Its Applications
  • Prof. Ying TAN
  • National Laboratory on Machine Perception
  • Department of Intelligence Science
  • Peking University, Beijing 100871, P.R.China

2
Contents
  • Biological Immune System
  • Artificial Immune System
  • Basic Algorithms of AIS
  • AIS design procedure
  • Case Studies
  • Malicious Executable Detection
  • Film Recommender
  • New
  • Immuneocomputing IC
  • Danger Theory
  • Future

3
The Immune System is
Immune system a system that protects the body
from foreign substances and pathogenic organisms
by producing the immune response
  • Immunity state or quality of being resistant
    (immune), either by virtue of previous exposure
    (adaptive immunity) or as an inherited trait
    (innate immunity)

4
Why is the Immune System?
  • Immune system has following appealing features
  • Recognition
  • Anomaly detection
  • Noise tolerance
  • Robustness
  • Feature extraction
  • Diversity
  • Reinforcement learning
  • Memory
  • Dynamically changing coverage
  • Distributed
  • Multi-layered
  • Adaptive

5
Role of Biological Immune System
  • Protect our bodies from pathogen and viruses
  • Primary immune response
  • Launch a response to invading pathogens
  • Secondary immune response
  • Remember past encounters
  • Faster response the second time around

6
Immune cells
  • There are two primarily types of lymphocytes
  • B-lymphocytes (B cells)
  • T-lymphocytes (T cells)
  • Others types include macrophages, phagocytic
    cells, cytokines, etc.

7
Where is it?
8
Multiple layers of the immune system
9
Antigen
  • Substances capable of starting a specific immune
    response commonly are referred to as antigens
  • This includes some pathogens such as viruses,
    bacteria, fungi etc .

10
Biological Immune System
11
How does IS work A simplistic view
12
Self/Non-Self Recognition
  • Immune system needs to be able to differentiate
    between self and non-self cells
  • Antigenic encounters may result in cell death,
    therefore
  • Some kind of positive selection
  • Some element of negative selection

13
Immune Pattern Recognition
  • The immune recognition is based on the
    complementarity between the binding region of the
    receptor and a portion of the antigen called
    epitope.
  • Antibodies present a single type of receptor,
    antigens might present several epitopes.
  • This means that each antibody can recognize a
    single antigen

14
Clonal Selection
15
Main Properties of Clonal Selection (Burnet, 1978)
  • Elimination of self antigens
  • Proliferation and differentiation on contact of
    mature lymphocytes with antigen
  • Restriction of one pattern to one differentiated
    cell and retention of that pattern by clonal
    descendants
  • Generation of new random genetic changes,
    subsequently expressed as diverse antibody
    patterns by a form of accelerated somatic mutation

16
Immune Network Theory
  • Idiotypic network (Jerne, 1974)
  • B cells co-stimulate each other
  • Treat each other a bit like antigens
  • Creates an immunological memory

17
Reinforcement Learning and Immune Memory
  • Repeated exposure to an antigen throughout a
    lifetime
  • Primary, secondary immune responses
  • Remembers encounters
  • No need to start from scratch
  • Memory cells
  • Continuous learning

18
Learning (2)
19
Immune System Summary
Back
  • Define host (body cells) from external entities.
  • When an entity is recognized as foreign (or
    dangerous)- activate several defense mechanisms
    leading to its destruction (or neutralization).
  • Subsequent exposure to similar entity results in
    rapid immune response.
  • Overall behavior of the immune system is an
    emergent property of many local interactions.

20
Immune metaphors
Back
Other areas
Idea!
Idea
Artificial Immune Systems
Immune System
21
What is an Artificial Immune System?
Definition
  • Dasgupta99 Artificial immune systems (AIS) are
    intelligent and adaptive systems inspired by the
    immune system toward real-world problem solving

de Castro and Timmis Artificial Immune Systems
(AIS) are adaptive systems, inspired by
theoretical immunology and observed immune
functions, principles and models, which are
applied to problem solving
http//www.cs.kent.ac.uk/people/staff/jt6/aisbook/
  • Using natural immune system as a metaphor for
    solving complex computational problems.
  • Not modelling the immune system

22
AI models and their corresponding natural
prototypes
Natural prototype Biological level AI model
Natural language Left hemisphere of brain Formal logic Formal linguistic
Brain nervous net Cells Neural computing (NC) Neural networks (NN)
Biological cells Cells Cellular automata (CA)
Molecules of proteins Molecular Artificial immune systems (AIS)
Genetic code Molecular Genetic Algorithms (GA)
23
Some History
  • Developed from the field of theoretical
    immunology in the mid 1980s.
  • Suggested we might look at the IS
  • 1990 Bersini first use of immune algorithms to
    solve problems
  • Forrest et al Computer Security mid 1990s
  • Hunt et al, mid 1990s Machine learning
  • More

24
AIS Scope
  • Pattern recognition
  • Fault and anomaly detection
  • Data analysis
  • Data mining (classification/clustering)
  • Agent-based systems
  • Scheduling
  • Machine-learning
  • Autonomous navigation and control
  • Search and optimization methods
  • Artificial life
  • Security of information systems
  • Optimization
  • Just to name a few.

25
Typical Applications of AIS
Back
  • Computer Security(Forrest949698, Kephart94,
    Lamont9801,02, Dasgupta9901,
    Bentley0001,02)
  • Anomaly Detection (Dasgupta960102)
  • Fault Diagnosis (Ishida9293, Ishiguro94)
  • Data Mining Retrieval (Hunt9596,
    Timmis9901, 02)
  • Pattern Recognition (Forrest93, Gibert94, de
    Castro 02)
  • Adaptive Control (Bersini91)
  • Job shop Scheduling (Hart98, 01, 02)
  • Chemical Pattern Recognition (Dasgupta99)
  • Robotics (Ishiguro9697,Singh01)
  • Optimization (DeCastro99,Endo98, de Castro 02)
  • Web Mining (Nasaroui02,Secker05)
  • Fault Tolerance (Tyrrell, 01, 02, Timmis 02)
  • Autonomous Systems (Varela92,Ishiguro96)
  • Engineering Design Optimization (Hajela96 98,
    Nunes00)

26
Basic Immune Models and Algorithms
  • Bone Marrow Models
  • Negative Selection Algorithms
  • Clonal Selection Algorithm
  • Immune Network Models
  • Somatic Hypermutation

27
Bone Marrow Models
  • Gene libraries are used to create antibodies from
    the bone marrow
  • Antibody production through a random
    concatenation from gene libraries
  • Simple or complex libraries

28
Negative Selection (NS) Algorithms
  • Forrest 1994 Idea taken from the negative
    selection of T-cells in the thymus
  • Applied initially to computer security
  • Split into two parts
  • Censoring
  • Monitoring


Censoring
Monitoring
29
Clonal Selection Algorithm (de Castro von
Zuben, 2001)
  • 1. Initialisation Randomly initialise a
    population (P)
  • 2. Antigenic Presentation for each pattern in
    Ag, do
  • 2.1 Antigenic binding determine affinity to
    each P
  • 2.2 Affinity maturation select n highest
    affinity from P and clone and mutate prop. to
    affinity with Ag, then add new mutants to P
  • 3. Metadynamics
  • 3.1 select highest affinity P to form part of M
  • 3.2 replace n number of random new ones
  • 4. Cycle repeat 2 and 3 until stopping criteria
    (e.g. Max Generation)

30
CLONALG for PR, Learning, Optimization
Agj
Abd
Ab r
Ab m
Abj
fj
Select
Select
Fj
Ab n
Cj
Clone
L.N. de Castro, et.al., Learning and optimization
using the clonal selection principle, IEEE Trans.
Evolutionary computation, vol.6, no.3, June 2002,
pp.239-251
Select
Cj
31
Discrete Immune Network Models (Timmis Neal,
2001)
  • Initialisation create an initial network from a
    sub-section of the antigens
  • Antigenic presentation for each antigenic
    pattern, do
  • 2.1 Clonal selection and network interactions
    for each network cell,
  • determine its stimulation level (based on
    antigenic and network interaction)
  • 2.2 Metadynamics eliminate network cells with a
    low stimulation
  • 2.3 Clonal Expansion select the most stimulated
    network cells and
  • reproduce them proportionally to their
    stimulation
  • 2.4 Somatic hypermutation mutate each clone
  • 2.5 Network construction select mutated clones
    and integrate
  • 3. Cycle Repeat step 2 until termination
    condition is met

32
Immune Network Models
  • Timmis Neal, 2000
  • Used immune network theory as a basis, proposed
    the AINE algorithm

Initialize AIN For each antigen Present antigen
to each ARB in the AIN Calculate ARB stimulation
level Allocate B cells to ARBs, based on
stimulation level Remove weakest ARBs (ones that
do not hold any B cells) If termination condition
met exit else Clone and mutate remaining
ARBs Integrate new ARBs into AIN
33
Immune Network Models
  • De Castro Von Zuben (2000c)
  • aiNET, based in similar principles

At each iteration step do For each antigen
do Determine affinity to all network
cells Select n highest affinity network
cells Clone these n selected cells Increase the
affinity of the cells to antigen by reducing the
distance between them (greedy search) Calculate
improved affinity of these n cells Re-select a
number of improved cells and place into matrix
M Remove cells from M whose affinity is below a
set threshold Calculate cell-cell affinity
within the network Remove cells from network
whose affinity is below a certain
threshold Concatenate original network and M to
form new network Determine whole network
inter-cell affinities and remove all those below
the set threshold Replace r of worst
individuals by novel randomly generated ones Test
stopping criterion
34
Somatic Hypermutation
Back
  • Mutation rate in proportion to affinity
  • Very controlled mutation in the natural immune
    system
  • Trade-off between the normalized antibody
    affinity D and its mutation rate ?,

35
General Framework of AIS
Solution
Immune Algorithms
Affinity Measures
Representation
Application Domain
Problem
36
Representation Shape Space
  • Describe the general shape of a molecule
  • Describe interactions between molecules
  • Degree of binding between molecules

37
Representation
  • Vectors
  • Ab  ?Ab1, Ab2, ..., AbL?
  • Ag  ?Ag1, Ag2, ..., AgL?
  • Real-valued shape-space
  • Integer shape-space
  • Binary shape-space
  • Symbolic shape-space

38
Define their Interaction
  • Define the term Affinity
  • Affinity is related to distance
  • Euclidian
  • Other distance measures such as Hamming,
    Manhattan etc. etc.
  • Affinity Threshold

39
Shape Space Formalism
  • Repertoire of the immune system is complete
    (Perelson, 1989)
  • Extensive regions of complementarity
  • Some threshold of recognition

V

V
e
e
V
e
e




V
e
e


40
AIS Design
Back
  • Problem description
  • Deciding the immune principles used for problem
    solving
  • Engineering the AIS
  • Defining the types of immune components used
  • Defining the representation for the elements of
    the AIS
  • Applying immune principle to problem solving
  • The meta-dynamics of an AIS
  • Reverse mapping from AIS to the real problem

41
Case Studies of AIS
Back
  • Malicious Executables Detection --- From
    Z.H. Guo, Z.K. Liu, and Y. Tan, An NN-based
    Malicious Executables Detection Algorithm based
    on Immune Principles, F.Yin, J.Wang, C. Guo
    (Eds.) ISNN 2004, Springer, Lecture Notes in
    Computer Science 3174, pp. 675-680, 2004.
    (http//dblp.uni-trier.de)
  • Film Recommender --- From Dr. Dr Uwe Aickelin
    (http//www.aickelin.com), University of
    Nottingham, U.K. 2004

42
Immuneocomputing -- IC
New!
  • By Tarakanov, A. 2001.
  • Aims of
  • A proper mathematical framework
  • A new kind of computing
  • A new kind of hardware.
  • New concepts of
  • formal protein (FP) -------
    vs. neuron
  • formal immune networks (FIN)------- vs. NN
  • A.O. Tarakanov, V.A. skormin, and S.P. Sokolova,
    Immunocomputing Principles and Applications,
    Springer, 2003.

Refer to
43
Problems of Traditional Self/Non-self View
  • No reaction to foreign bacteria in gut (friendly
    bacteria).
  • No reaction to food / air / etc.
  • The human body changes over its life.
  • Auto-immune diseases.
  • How do we produce antibodies that react against
    antigens and yet avoid self?
  • Is it necessary to attack all non-self or a
    specific self?

44
The Danger Theory
New!
  • In the danger model, the idea is to recognise
    danger rather than non self.
  • The screening is accomplished post production
    through an external danger signal. Thus the
    production of autoreactive antibodies (which
    react to self) is allowed.
  • If an (e.g. autoreactive) antibody matches a
    stimulus in the absence of danger, it is removed.
    Thus harmless antigens are tolerated, and
    changing self accommodated.

Matzinger (2002). The Danger Model A renewed
sense of self , Science 296 301-304.
45
Danger Theory (cont)
  • Danger Theory
  • Not self/non-self but Danger/Non-Danger
  • Immune response is initiated in the tissues.
    Danger Zone.
  • This makes it context dependant
  • Matzinger (2002) The Danger Model A renewed
    sense of self Science 296 301-304
  • Aickelin Cayzer (2002) The Danger Theory and
    Its Application to Artificial Immune Systems,
    Proc. International Conference on AIS (ICARIS
    2002)

46
Danger Zone
47
Towards a dangerous IDS
The danger theory suggests that the immune
system reacts to threats based on the correlation
of various (danger) signals, providing a method
of grounding the immune response, i.e. linking
it directly to the attacker.
Aickelin U, Bentley P, Cayzer S, Kim J and McLeod
J (2003) 'Danger Theory The Link between AIS
and IDS?', Proceedings ICARIS-2003, 2nd
International Conference on Artificial Immune
Systems, LNCS 2787, pp 147-155
48
Other ways of using danger
Danger Crime, Antigen Suspect or... Danger
Context ?
It could also be useful for data mining, where
the danger signal is a proxy measure of
interest Danger Zone can be spatial or temporal
Andrew Secker, Alex Freitas, and Jon Timmis
(2005) Towards a danger theory inspired
artificial immune system for web mining in A
Scime, editor, Web Mining applications and
techniques, pages 145-168 (Idea Group)
49
Some Recent Applications of Danger Theory
Back
  • Anjum Iqbal, Mohd Aizaini Maarof, Danger Theory
    and Intelligent Data Processing, International
    Journal of Information Technology, Vol.1, No.1,
    2004.
  • Andrew Secker, Alex A. Freitas, and Jon Timmis,
    A Danger Thory Inspired Approach to Web Mining,
    Computing Lab. University of Kent, Canterbury,
    Kent, UK.2005
  • So on.

50
The Future
  • More formal approach required?
  • Wide possible application domains.
  • What makes the immune system unique?
  • More work with immunologists
  • Danger theory.
  • Idiotypic Networks.
  • Self-Assertion.

51
Reference for further reading
  • Books
  • Artificial Immune Systems and Their Applications
    by Dipankar Dasgupta (Editor) Springer Verlag,
    January 1999.
  • L.N. de Castro and J. Timmis, Artificial Immune
    Systems A New Computational Intelligence
    Approach, Springer, 2002.
  • A.O. Tarakanov, V.A. skormin, and S.P. Sokolova,
    Immunocomputing Principles and Applications,
    Springer, 2003.
  • Related academic papers
  • J. Timmis, P.Bentley, and Emma Hart (Eds.)
    Artificial Immune Systems, Proceedings of Second
    International Conference, ICARIS 2003, Edinburgh,
    UK, September 2003. LNCS 2787, Springer.

52
New Events
  • Special Session on Artificial Immune Systems at
    the Congress on Evolutionary Computation (CEC),
    December 8-12, 2003, Canberra, Australia.
  • Special Session on Immunity-Based Systems at
    Seventh International Conference
    on Knowledge-Based Intelligent Information 
    Engineering Systems (KES), September 3-5, 2003,
    University of Oxford, UK.  
  • Second International Conference on Artificial
    Immune Systems (ICARIS), September 1-3, 2003,
    Napier University, Edinburgh, UK.
  •  Tutorial on Artificial Immune Systems at 1st
    Multidisciplinary International Conference on
    Scheduling Theory and Applications (MISTA), 12
    August 2003, The University of Nottingham, UK.
  •  Tutorial on Immunological Computation at
    International Joint Conference on Artificial
    Intelligence (IJCAI), August 10, 2003, Acapulco,
    Mexico.
  •  Special Track on Artificial Immune Systems at
    Genetic and Evolutionary Computation Conference
    (GECCO), Chicago, USA, July 12-16, 2003

53
AIS Resources
  • Artificial Immune Systems and Their Applications
    by D Dasgupta (Editor), Springer Verlag, 1999.
  • Artificial Immune Systems A New Computational
    Intelligence Approach by L de Castro, J Timmis,
    Springer Verlag, 2002.
  • Immunocomputing Principles and Applications by A
    Tarakanov et al, Springer Verlag, 2003.
  • Third International Conference on Artificial
    Immune Systems (ICARIS), September 13-16, 2004,
    University of Catania, Italy.
  • 4th International Conference on Artificial Immune
    Systems(ICARIS), 14th-17th August, 2005 in Banff,
    Alberta, Canada

54
Thats all
First Page
55
Malicious Executables Detectionbased on
Artificial Immune Principles
Case Study 1
From Z.H. Guo, Z.K. Liu, and Y. Tan, An NN-based
Malicious Executables Detection Algorithm based
on Immune Principles, F.Yin, J.Wang, C. Guo
(Eds.) ISNN 2004, Springer Lecture Notes on
Computer Science 3174, pp. 675-680, 2004.
(http//dblp.uni-trier.de)
  • This work was supported by Natural Science
    Foundation of China with Grant No. 60273100.

56
Outline
  • Definition of Terms
  • Goal and Motivation
  • Previous Research works
  • Immune Principle for Malicious Executable
    Detection
  • Malicious Executable Detection Algorithm
  • Experiments and Discussion
  • Concluding Remarks

57
Definition of Terms
Back
  • Malicious Executable
  • is generally defined as a program that has
    some malicious functions, such as compromising a
    systems security, damaging a system or obtaining
    sensitive information without the permission of
    users. It includes virus, trojan horse, worm etc.
  • Benign Executable
  • is a normal program without any malicious
    function.

58
But Current antivirus systems attempt to detect
these new malicious programs with heuristics by
hand (costly and ineffective)
tens of thousands of new viruses / year Appear!
Dos/Win32 viruses
Computers / Information Systems
Trojan horses
Worms
Current Task Devise new methods for detecting
new ME
eMail attached viruses
Malicious executables
59
Definition of Symbols and Structures
Back
B binary code alphabet, B0,1. Seq(s,k,l)
short sequence cutting operation. Supposing s is
binary sequence, and sb(0)b(1)b(n-1), b(i)?B,
then Seq(s,k,l)b(k)b(k1)b(kl-1). E(k)
executable set, k?m,b, m denotes malicious
executable, b benign executable. E whole set of
executables, i.e., E E(m)?E(b). e(fj,n)
executable as binary sequence of length n, and
fj is executable identifier. ld detector code
length. lstep step size of detector
generation. dl detector, dl Seq(s,k,l). Dl
set of detector with code length l, i.e., Dl
dl (0), dl (1),, dl (nd-1), Dl nd.
60
Goal and Motivation
Back
  • Aiming at developing an automatic detection
    approach of new malicious executables.
  • Aiming at trying to use artificial immune system
    (AIS) and artificial neural networks (ANN), to
    detect malicious executable with a high Detection
    Rate (DR) with low False Positive Rate (FPR) over
    others.

61
Previous Related Works
Back
  • Signature-based Methods
  • Expert Knowledge-based Methods
  • Machine Learning Methods

62
Signature-based Methods
Back
  • It creates a unique tag for each malicious
    program so that future examples of it can be
    correctly classified with a small error rate. And
    relies on signatures of known malicious
    executable to generate detection models.
  • Drawbacks
  • Can not detect unknown and mutated viruses.
  • As increase of the number and type of viruses,
    its detection speed become slow dramatically. At
    the same time, the analysis of the signatures of
    viruses become very difficult, in particular, for
    the encrypted signatures.
  • (refer to IBM Anti-virus Groups report R.W. Lo,
    K.N. Levitt, and R.A. Olsson. MCF a Malicious
    Code Filter. Computers Security,
    14(6)541566., 1995.)

63
Expert Knowledge-based Methods
Back
  • Using the knowledge of a group of virus experts
    to construct heuristic classifiers for detection
    of unknown viruses.
  • Drawbacks
  • Time-consuming analysis method.
  • Only discover some unknown viruses, but its false
    detection rate is very high.
  • For detecting unknown virus based on ANN,
    IBM Anti-virus Group also proposes one method to
    detect Boot Sector viruses only.
  • (refer to W. Arnold and G. Tesauro. Automatically
    Generated Win32 Heuristic Virus Detection.
    Proceedings of the 2000 International Virus
    Bulletin Conference, 2000.)

64
Machine Learning Methods
Back
  • M.G. Schultz developed a framework that used data
    mining algorithms, i.e., Multi-Naïve Bayes
    method, to train multiple classifiers on a set of
    malicious and benign executables to detect new
    examples (unknown ME).
  • (refer to M.G. Schultz.,E. Eskin and E. Zadok
    . Data Mining Methods for Detection of New
    Malicious Executables. IEEE Symposium on Security
    and Privacy, May 2001.)

65
Biologically-motivated Information Processing
Systems
  • Brain-nervous systems Neural Networks (NN)
  • Genetic systems Genetic Algorithms(GA)
  • Immune systems Artificial Immune Systems(AIS)
  • or immunological computation.
  • NN and GA have extensively studied with wide
    applications but AIS has relative few applications

66
Natural prototypes vs. their models
Natural prototype Biological level Computing model
Natural language Left hemisphere of brain Formal logic Formal linguistic
Brain nervous net Cells Artificial Neural networks (ANN)
Biological cells Cells Cellular automata (CA)
Molecules of proteins Molecular Artificial immune systems (AIS)
Genetic code Molecular Genetic Algorithms (GA)
67
Comparison of Three Algorithms
GA (Optimisation) NN (Classification) AIS
Components Chromosome Strings Artificial Neurons Attribute Strings
Location of Components Dynamic Pre-Defined Dynamic
Structure Discrete Components Networked Components Discrete components / Networked Components
Knowledge Storage Chromosome Strings Connection Strengths Component Concentration / Network Connections
Dynamics Evolution Learning Evolution / Learning
Meta-Dynamics Recruitment / Elimination of Components Construction / Pruning of Connections Recruitment / Elimination of Components
Interaction between Components Crossover Network Connections Recognition / Network Connections
Interaction with Environment Fitness Function External Stimuli Recognition / Objective Function
68
Immune Principles for Malicious Executable
Detection
Back
  • Non-self Detection Principle
  • Anomaly Detection Based on Thickness
  • The Diversity of Detector Representation vs.
    Anomaly Detection Hole

69
Non-self Detection Principle
  • For natural immune system, all cells of body are
    categorized as two types of self and non-self.
    The immune process is to detect non-self from
    cells.
  • To realize the non-self detection, the maturation
    process of lymphocytes T cell undergoes two
    selection stages of Positive Selection and
    Negative Selection since antigenic encounters may
    result in cell death. Some computer scientists
    inspired by these two stages had proposed some
    algorithms used to detect anomaly information.
    Here, we will use the Positive Selection
    Algorithm (PSA) to perform the non-self detection
    for recognizing the malicious executable.

70
Non-self Detection by PSA
Back
Process of anomaly detection with PSA
71
Anomaly Detection Based on Thickness
Back
  • Anomaly recognition process is one process that
    immune cells detect antigens and are activated.
  • The activated threshold of immune cells is
    decided by the thickness of immune cells matching
    antigens.

72
The Diversity of Detector Representation vs.
Anomaly Detection Hole
  • The main difficulty of anomaly detection is
    utmost decreasing the anomaly detection hole. The
    natural immune system resolves this problem well
    by use of the diversity of MHC (Major
    Histocompatibility Complex) cell representations,
    which decides the diversity of anti-body touched
    in surface of T cells. This property is very
    useful in increasing the power of detecting
    mutated antigens, and decreasing the anomaly
    detection hole.
  • According to the principle, we can use the
    diversity of detector representation to decrease
    the anomaly detection hole. As was illustrated by
    following schematic drawings.

73
Schematic diagram of abnormal detection holes
(cont)
74
Reduction of abnormal detection holes by use of
the diversity of detector representations
back
Combination of detectors
75
Malicious Executable Detection Algorithm (MEDA)
  • MEDA based on AIS includes three parts,
  • Detector generation,
  • Anomaly information extraction ,
  • and Classification.

76
Flow Chart of Malicious Executable Detection
Algorithm (MEDA)
Back
MEDA
77
Generation of Detector Set
  • Detector generation algorithm
  • Begin initialize lstep?ld?k0
  • Do cutting e(fk,n) from Eg(b)
  • i0
  • While i lt n-ld-1 do
  • Begin
  • d
    Seq(e(fk,n),i, ld)
  • if d? Dld then Dld?d
  • iilstep
  • End
  • kk1
  • Until Eg(b) is empty
  • Return Dld
  • End

78
Illustration of Detector Generating Process
Back
File Hex Sequence 56 32 12 0A 34 ED FF 00 2D.
. 00 0A 34 ED FF FA 11 00 Extracting Detector
56 32 12 32 12
0A 12 0A 34

??

FF FA 11


FA 11 00
Generating Process of 24-bit Detectors with 8-bit
stepsize (ld24, lstep8)
79
Extraction of Anomaly Characteristics -- Non-self
Thickness (NST)
  • Non-self Detection
  • NST, as Anomaly Property, is defined as the ratio
    of number of non-self units to file binary
    sequence, plnn/(nnns).
  • If there are m kinds of detectors, the file has a
    NST Vector P(pl1, pl2, , plm)T.

80
NST Extraction Diagram
81
NST Extraction Algorithm
Back
  • Begin open e(fk,n)
  • Select lstep, ld
  • Set ns0, nn0, i0
  • While i lt n-ld-1 do
  • Begin
  • s Seq(e(fk,n),i,
    ld)
  • if s ? Dld then nn nn1
  • else ns ns 1
  • i i lstep
  • End
  • pld nn / ( nsnn )
  • Return pld
  • End

82
BP Network Classifier
  • We use Anomaly Property Vector (APV), i.e., NST
    vector P, as input variable of two-layer BP
    network classifier. The number of nodes of input
    layer equals to APVs dimension.
  • The Sigmoid transfer function is chosen for the
    hidden layer and Linear function for the output
    layer.

83
BP Network Classifier Structure
Back
Non-Self Thickness (NST) Vector
pl1
pl2
Out (1-ME, 0-BE)
P
plm
84
Experiments and Discussion
Back
  • Experimental Data Set
  • Generation of Detector Set
  • Experimental Result Using Single Detector Set
  • Experimental Result Using Multi-Detector Set

85
Experimental Data Set
Back
Type Files Remarks
BE 915 Win 2K OS and some application programs.
ME 3566 DOS virus, Win32 virus, Trojan, Worm, etc. from Internet.
Total 4481 All Justified by Antivirus cleaner Tools
  • BEBenign Executable
  • MEMalicious Executable

86
Generation of Detector Set
Back
  • Eg(b) is Gene of generating detector, ld
    ?16,24,32,64,96, and lstep8bits. By using the
    detector generating algorithm, we can get D16,
    D24, D32, D64, and D96, separately.
  • Table1 Detectors generation result

Code Length ld 16 24 32 64 96
Dld 65536 10,931,627 8,938,352 12,768,361 21,294,857
store structure Bitmap Index Bitmap Index Tree Tree Tree
87
Detection Result of Malicious Executables by D24
NST p24
File No.
  • NST of files, where symbol
  • x represents benign program (Red), ?
    malicious program (Blue)

(b) ROC Curve
88
Detection Result of Malicious Executables by D32
NST p32
  • NST of files, where symbol
  • x represents benign program,
  • ? malicious program

(b) ROC Curve
89
Detection Result of Malicious Executables by D64
NST p64
  • NST of files, where symbol
  • x represents benign program (Red), ?
    malicious program (Blue)

(b) ROC Curve
90
Experimental Result Using Single Detector Set
91
When FPR is fixed, relationship curves of DR
versus Code Length ld
Back
Note from the bottom to up, the FPR is 0, 0.5,
1, 2, 4, 8, and 16, in sequence.
92
Experimental Result Using Multi-Detector Set
  • This experiment selects multi-detector set to
    detect benign and malicious executables.
  • We dont use D16 because of its zero DR and also
    set D96 as upper limit because almost same DR
    values when ld 96.
  • Here we selects D24, D32, D64 and D96 four
    detector sets as anomaly detection data set, and
    uses them to extract Non-self thickness (NST)
    vector, and finally a BP network is exploited as
    classifier.
  • For the process of classification, we randomly
    selects 30 files of E(b) as Eg(b) to train a BP
    network, and use the remaining data to illustrate
    the anomaly detection performance.

93
NST Distribution and ROC Curve of Multi-Detector
Set Method
(a) NST of files for mixture of D24, D32 and
D64. x benign program (in Red), ? malicious
program (in Blue).
(b) ROC Curve of mixed detector set of D24,
D32, D64 and D96
94
Comparisons With Bayes Methods and
Signature-based Method
95
Algorithm Complexities
Back
Algorithm Operation type 1 Operation type 1 Operation type 2 Operation type 2 Operation type 3 Operation type 3 Store Space
Algorithm Name Amount Name Amount Name Amount Store Space
MEDA detectors ltrain detector matching 80ltest Computing NST 4lf additions 0.4Gb
Bayes Prob. Info. gtgtltrain Searching P(Fi/C) Depend on P(Fi/C) Computing Joint Probs. lf float multiplica-tions 1Gb
96
Remarks
Back
  • For short binary sequence and single detector set
    for the detection of malicious executables, the
    performance of D24 is the best, giving out DR
    80.6 with FPR 3.
  • For long code length of detector and
    multi-detector set, our method obtains the best
    performance of DR 97.46 with FPR 2, over
    current methods.
  • This result verifies
  • diversity of detector representation can decrease
    anomaly detection holes.
  • non-self thickness detection.

Back
97
Film Recommender
Case Study 2
From Dr. Dr Uwe Aickelin (http//www.aickelin.co
m) University of Nottingham, U.K.,
  • Prediction
  • What rating would I give a specific film?
  • Recommendation
  • Give me a top 10 list of films I might like.

98
Film Recommender (cont 1)
  • EachMovie database (70k users).
  • User Profile set of tuples movie, rating.
  • Me My user profile.
  • Neighbour User profile of others.
  • Similarity metric Correlation score.
  • Neighbourhood Group of similar users.
  • Recommendations From neighbourhood.

99
Film Recommender (cont 2)
  • User Profile set of tuples movie, rating
  • Me My user profile.
  • Neighbour User profile of others.
  • Affinity metric Correlation score.
  • Neighbourhood Group of similar users.
  • Recommendations From neighbourhood

Antigen
Antibody
Suppression
Antibody Antigen Binding
Antibody Antibody Binding
Stimulation
Group of antibodies similar to antigen and
dissimilar to other antibodies
Weighted Score based on Similarities.
100
Film Recommender (cont 3)
  • Start with empty AIS.
  • Encode target user as an antigen Ag.
  • WHILE (AIS not full) (More Users)
  • Add next user as antibody Ab.
  • IF (AIS at full size) Iterate AIS.
  • Generate recommendations from AIS.

101
Film Recommender (cont 4)
  • Suppose we have 5 users and 4 movies
  • u1(m1,v11),(m2,v12),(m3,v13).
  • u2(m1,v21),(m2,v22),(m3,v23),(m4,v24).
  • u3(m1,v31),(m2,v32),(m4,v34).
  • u4(m1,v41),(m4,v44).
  • u5(m1,v51),(m2,v52),(m3,v53), (m4,v54).
  • We do not have users votes for every film.
  • We want to predict the vote of user u4 on movie
    m3.

102
Algorithm walkthrough (1)
  • Start with empty AIS

DATABASE u1, u2, u3, u4, u5
103
Algorithm walkthrough (2)
  • Add antibodies until AIS is full

104
Algorithm walkthrough (3)
  • Table of Correlation between Ab and Ag
  • MS14, MS24, MS34.
  • Table of Correlation between Antibodies
  • MS12 CorrelCoef(Ab1, Ab2)
  • MS13 CorrelCoef(Ab1, Ab3)
  • MS23 CorrelCoef(Ab2, Ab3)

105
Algorithm walkthrough (4)
  • Calculate Concentration of each Ab
  • Interaction with Ag (Stimulation).
  • Interaction with other Ab (Suppression).

106
Algorithm walkthrough (5)
  • Generate Recommendation based on Antibody
    Concentration.

107
Film Recommender Results
  • Tested against standard method (Pearson k-nearest
    neighbours).
  • Prediction
  • Results of same quality.
  • Recommendation
  • 4 out of 5 films correct (AIS).
  • 3 out of 5 films correct (Pearson).

Back
About PowerShow.com