Artificial Immune System and Its Applications - PowerPoint PPT Presentation

View by Category
About This Presentation

Artificial Immune System and Its Applications


Basic Immune Models and Algorithms. Bone Marrow Models. Negative Selection Algorithms ... Repertoire of the immune system is complete (Perelson, 1989) ... – PowerPoint PPT presentation

Number of Views:2052
Avg rating:3.0/5.0
Slides: 104
Provided by: cisPk


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Artificial Immune System and Its Applications

Artificial Immune System and Its Applications
  • Prof. Ying TAN
  • National Laboratory on Machine Perception
  • Department of Intelligence Science
  • Peking University, Beijing 100871, P.R.China

  • Biological Immune System
  • Artificial Immune System
  • Basic Algorithms of AIS
  • AIS design procedure
  • Case Studies
  • Malicious Executable Detection
  • Film Recommender
  • New
  • Immuneocomputing IC
  • Danger Theory
  • Future

The Immune System is
Immune system a system that protects the body
from foreign substances and pathogenic organisms
by producing the immune response
  • Immunity state or quality of being resistant
    (immune), either by virtue of previous exposure
    (adaptive immunity) or as an inherited trait
    (innate immunity)

Why is the Immune System?
  • Immune system has following appealing features
  • Recognition
  • Anomaly detection
  • Noise tolerance
  • Robustness
  • Feature extraction
  • Diversity
  • Reinforcement learning
  • Memory
  • Dynamically changing coverage
  • Distributed
  • Multi-layered
  • Adaptive

Role of Biological Immune System
  • Protect our bodies from pathogen and viruses
  • Primary immune response
  • Launch a response to invading pathogens
  • Secondary immune response
  • Remember past encounters
  • Faster response the second time around

Immune cells
  • There are two primarily types of lymphocytes
  • B-lymphocytes (B cells)
  • T-lymphocytes (T cells)
  • Others types include macrophages, phagocytic
    cells, cytokines, etc.

Where is it?
Multiple layers of the immune system
  • Substances capable of starting a specific immune
    response commonly are referred to as antigens
  • This includes some pathogens such as viruses,
    bacteria, fungi etc .

Biological Immune System
How does IS work A simplistic view
Self/Non-Self Recognition
  • Immune system needs to be able to differentiate
    between self and non-self cells
  • Antigenic encounters may result in cell death,
  • Some kind of positive selection
  • Some element of negative selection

Immune Pattern Recognition
  • The immune recognition is based on the
    complementarity between the binding region of the
    receptor and a portion of the antigen called
  • Antibodies present a single type of receptor,
    antigens might present several epitopes.
  • This means that each antibody can recognize a
    single antigen

Clonal Selection
Main Properties of Clonal Selection (Burnet, 1978)
  • Elimination of self antigens
  • Proliferation and differentiation on contact of
    mature lymphocytes with antigen
  • Restriction of one pattern to one differentiated
    cell and retention of that pattern by clonal
  • Generation of new random genetic changes,
    subsequently expressed as diverse antibody
    patterns by a form of accelerated somatic mutation

Immune Network Theory
  • Idiotypic network (Jerne, 1974)
  • B cells co-stimulate each other
  • Treat each other a bit like antigens
  • Creates an immunological memory

Reinforcement Learning and Immune Memory
  • Repeated exposure to an antigen throughout a
  • Primary, secondary immune responses
  • Remembers encounters
  • No need to start from scratch
  • Memory cells
  • Continuous learning

Learning (2)
Immune System Summary
  • Define host (body cells) from external entities.
  • When an entity is recognized as foreign (or
    dangerous)- activate several defense mechanisms
    leading to its destruction (or neutralization).
  • Subsequent exposure to similar entity results in
    rapid immune response.
  • Overall behavior of the immune system is an
    emergent property of many local interactions.

Immune metaphors
Other areas
Artificial Immune Systems
Immune System
What is an Artificial Immune System?
  • Dasgupta99 Artificial immune systems (AIS) are
    intelligent and adaptive systems inspired by the
    immune system toward real-world problem solving

de Castro and Timmis Artificial Immune Systems
(AIS) are adaptive systems, inspired by
theoretical immunology and observed immune
functions, principles and models, which are
applied to problem solving
  • Using natural immune system as a metaphor for
    solving complex computational problems.
  • Not modelling the immune system

AI models and their corresponding natural
Natural prototype Biological level AI model
Natural language Left hemisphere of brain Formal logic Formal linguistic
Brain nervous net Cells Neural computing (NC) Neural networks (NN)
Biological cells Cells Cellular automata (CA)
Molecules of proteins Molecular Artificial immune systems (AIS)
Genetic code Molecular Genetic Algorithms (GA)
Some History
  • Developed from the field of theoretical
    immunology in the mid 1980s.
  • Suggested we might look at the IS
  • 1990 Bersini first use of immune algorithms to
    solve problems
  • Forrest et al Computer Security mid 1990s
  • Hunt et al, mid 1990s Machine learning
  • More

AIS Scope
  • Pattern recognition
  • Fault and anomaly detection
  • Data analysis
  • Data mining (classification/clustering)
  • Agent-based systems
  • Scheduling
  • Machine-learning
  • Autonomous navigation and control
  • Search and optimization methods
  • Artificial life
  • Security of information systems
  • Optimization
  • Just to name a few.

Typical Applications of AIS
  • Computer Security(Forrest949698, Kephart94,
    Lamont9801,02, Dasgupta9901,
  • Anomaly Detection (Dasgupta960102)
  • Fault Diagnosis (Ishida9293, Ishiguro94)
  • Data Mining Retrieval (Hunt9596,
    Timmis9901, 02)
  • Pattern Recognition (Forrest93, Gibert94, de
    Castro 02)
  • Adaptive Control (Bersini91)
  • Job shop Scheduling (Hart98, 01, 02)
  • Chemical Pattern Recognition (Dasgupta99)
  • Robotics (Ishiguro9697,Singh01)
  • Optimization (DeCastro99,Endo98, de Castro 02)
  • Web Mining (Nasaroui02,Secker05)
  • Fault Tolerance (Tyrrell, 01, 02, Timmis 02)
  • Autonomous Systems (Varela92,Ishiguro96)
  • Engineering Design Optimization (Hajela96 98,

Basic Immune Models and Algorithms
  • Bone Marrow Models
  • Negative Selection Algorithms
  • Clonal Selection Algorithm
  • Immune Network Models
  • Somatic Hypermutation

Bone Marrow Models
  • Gene libraries are used to create antibodies from
    the bone marrow
  • Antibody production through a random
    concatenation from gene libraries
  • Simple or complex libraries

Negative Selection (NS) Algorithms
  • Forrest 1994 Idea taken from the negative
    selection of T-cells in the thymus
  • Applied initially to computer security
  • Split into two parts
  • Censoring
  • Monitoring

Clonal Selection Algorithm (de Castro von
Zuben, 2001)
  • 1. Initialisation Randomly initialise a
    population (P)
  • 2. Antigenic Presentation for each pattern in
    Ag, do
  • 2.1 Antigenic binding determine affinity to
    each P
  • 2.2 Affinity maturation select n highest
    affinity from P and clone and mutate prop. to
    affinity with Ag, then add new mutants to P
  • 3. Metadynamics
  • 3.1 select highest affinity P to form part of M
  • 3.2 replace n number of random new ones
  • 4. Cycle repeat 2 and 3 until stopping criteria
    (e.g. Max Generation)

CLONALG for PR, Learning, Optimization
Ab r
Ab m
Ab n
L.N. de Castro,, Learning and optimization
using the clonal selection principle, IEEE Trans.
Evolutionary computation, vol.6, no.3, June 2002,
Discrete Immune Network Models (Timmis Neal,
  • Initialisation create an initial network from a
    sub-section of the antigens
  • Antigenic presentation for each antigenic
    pattern, do
  • 2.1 Clonal selection and network interactions
    for each network cell,
  • determine its stimulation level (based on
    antigenic and network interaction)
  • 2.2 Metadynamics eliminate network cells with a
    low stimulation
  • 2.3 Clonal Expansion select the most stimulated
    network cells and
  • reproduce them proportionally to their
  • 2.4 Somatic hypermutation mutate each clone
  • 2.5 Network construction select mutated clones
    and integrate
  • 3. Cycle Repeat step 2 until termination
    condition is met

Immune Network Models
  • Timmis Neal, 2000
  • Used immune network theory as a basis, proposed
    the AINE algorithm

Initialize AIN For each antigen Present antigen
to each ARB in the AIN Calculate ARB stimulation
level Allocate B cells to ARBs, based on
stimulation level Remove weakest ARBs (ones that
do not hold any B cells) If termination condition
met exit else Clone and mutate remaining
ARBs Integrate new ARBs into AIN
Immune Network Models
  • De Castro Von Zuben (2000c)
  • aiNET, based in similar principles

At each iteration step do For each antigen
do Determine affinity to all network
cells Select n highest affinity network
cells Clone these n selected cells Increase the
affinity of the cells to antigen by reducing the
distance between them (greedy search) Calculate
improved affinity of these n cells Re-select a
number of improved cells and place into matrix
M Remove cells from M whose affinity is below a
set threshold Calculate cell-cell affinity
within the network Remove cells from network
whose affinity is below a certain
threshold Concatenate original network and M to
form new network Determine whole network
inter-cell affinities and remove all those below
the set threshold Replace r of worst
individuals by novel randomly generated ones Test
stopping criterion
Somatic Hypermutation
  • Mutation rate in proportion to affinity
  • Very controlled mutation in the natural immune
  • Trade-off between the normalized antibody
    affinity D and its mutation rate ?,

General Framework of AIS
Immune Algorithms
Affinity Measures
Application Domain
Representation Shape Space
  • Describe the general shape of a molecule
  • Describe interactions between molecules
  • Degree of binding between molecules

  • Vectors
  • Ab  ?Ab1, Ab2, ..., AbL?
  • Ag  ?Ag1, Ag2, ..., AgL?
  • Real-valued shape-space
  • Integer shape-space
  • Binary shape-space
  • Symbolic shape-space

Define their Interaction
  • Define the term Affinity
  • Affinity is related to distance
  • Euclidian
  • Other distance measures such as Hamming,
    Manhattan etc. etc.
  • Affinity Threshold

Shape Space Formalism
  • Repertoire of the immune system is complete
    (Perelson, 1989)
  • Extensive regions of complementarity
  • Some threshold of recognition




AIS Design
  • Problem description
  • Deciding the immune principles used for problem
  • Engineering the AIS
  • Defining the types of immune components used
  • Defining the representation for the elements of
    the AIS
  • Applying immune principle to problem solving
  • The meta-dynamics of an AIS
  • Reverse mapping from AIS to the real problem

Case Studies of AIS
  • Malicious Executables Detection --- From
    Z.H. Guo, Z.K. Liu, and Y. Tan, An NN-based
    Malicious Executables Detection Algorithm based
    on Immune Principles, F.Yin, J.Wang, C. Guo
    (Eds.) ISNN 2004, Springer, Lecture Notes in
    Computer Science 3174, pp. 675-680, 2004.
  • Film Recommender --- From Dr. Dr Uwe Aickelin
    (http//, University of
    Nottingham, U.K. 2004

Immuneocomputing -- IC
  • By Tarakanov, A. 2001.
  • Aims of
  • A proper mathematical framework
  • A new kind of computing
  • A new kind of hardware.
  • New concepts of
  • formal protein (FP) -------
    vs. neuron
  • formal immune networks (FIN)------- vs. NN
  • A.O. Tarakanov, V.A. skormin, and S.P. Sokolova,
    Immunocomputing Principles and Applications,
    Springer, 2003.

Refer to
Problems of Traditional Self/Non-self View
  • No reaction to foreign bacteria in gut (friendly
  • No reaction to food / air / etc.
  • The human body changes over its life.
  • Auto-immune diseases.
  • How do we produce antibodies that react against
    antigens and yet avoid self?
  • Is it necessary to attack all non-self or a
    specific self?

The Danger Theory
  • In the danger model, the idea is to recognise
    danger rather than non self.
  • The screening is accomplished post production
    through an external danger signal. Thus the
    production of autoreactive antibodies (which
    react to self) is allowed.
  • If an (e.g. autoreactive) antibody matches a
    stimulus in the absence of danger, it is removed.
    Thus harmless antigens are tolerated, and
    changing self accommodated.

Matzinger (2002). The Danger Model A renewed
sense of self , Science 296 301-304.
Danger Theory (cont)
  • Danger Theory
  • Not self/non-self but Danger/Non-Danger
  • Immune response is initiated in the tissues.
    Danger Zone.
  • This makes it context dependant
  • Matzinger (2002) The Danger Model A renewed
    sense of self Science 296 301-304
  • Aickelin Cayzer (2002) The Danger Theory and
    Its Application to Artificial Immune Systems,
    Proc. International Conference on AIS (ICARIS

Danger Zone
Towards a dangerous IDS
The danger theory suggests that the immune
system reacts to threats based on the correlation
of various (danger) signals, providing a method
of grounding the immune response, i.e. linking
it directly to the attacker.
Aickelin U, Bentley P, Cayzer S, Kim J and McLeod
J (2003) 'Danger Theory The Link between AIS
and IDS?', Proceedings ICARIS-2003, 2nd
International Conference on Artificial Immune
Systems, LNCS 2787, pp 147-155
Other ways of using danger
Danger Crime, Antigen Suspect or... Danger
Context ?
It could also be useful for data mining, where
the danger signal is a proxy measure of
interest Danger Zone can be spatial or temporal
Andrew Secker, Alex Freitas, and Jon Timmis
(2005) Towards a danger theory inspired
artificial immune system for web mining in A
Scime, editor, Web Mining applications and
techniques, pages 145-168 (Idea Group)
Some Recent Applications of Danger Theory
  • Anjum Iqbal, Mohd Aizaini Maarof, Danger Theory
    and Intelligent Data Processing, International
    Journal of Information Technology, Vol.1, No.1,
  • Andrew Secker, Alex A. Freitas, and Jon Timmis,
    A Danger Thory Inspired Approach to Web Mining,
    Computing Lab. University of Kent, Canterbury,
    Kent, UK.2005
  • So on.

The Future
  • More formal approach required?
  • Wide possible application domains.
  • What makes the immune system unique?
  • More work with immunologists
  • Danger theory.
  • Idiotypic Networks.
  • Self-Assertion.

Reference for further reading
  • Books
  • Artificial Immune Systems and Their Applications
    by Dipankar Dasgupta (Editor) Springer Verlag,
    January 1999.
  • L.N. de Castro and J. Timmis, Artificial Immune
    Systems A New Computational Intelligence
    Approach, Springer, 2002.
  • A.O. Tarakanov, V.A. skormin, and S.P. Sokolova,
    Immunocomputing Principles and Applications,
    Springer, 2003.
  • Related academic papers
  • J. Timmis, P.Bentley, and Emma Hart (Eds.)
    Artificial Immune Systems, Proceedings of Second
    International Conference, ICARIS 2003, Edinburgh,
    UK, September 2003. LNCS 2787, Springer.

New Events
  • Special Session on Artificial Immune Systems at
    the Congress on Evolutionary Computation (CEC),
    December 8-12, 2003, Canberra, Australia.
  • Special Session on Immunity-Based Systems at
    Seventh International Conference
    on Knowledge-Based Intelligent Information 
    Engineering Systems (KES), September 3-5, 2003,
    University of Oxford, UK.  
  • Second International Conference on Artificial
    Immune Systems (ICARIS), September 1-3, 2003,
    Napier University, Edinburgh, UK.
  •  Tutorial on Artificial Immune Systems at 1st
    Multidisciplinary International Conference on
    Scheduling Theory and Applications (MISTA), 12
    August 2003, The University of Nottingham, UK.
  •  Tutorial on Immunological Computation at
    International Joint Conference on Artificial
    Intelligence (IJCAI), August 10, 2003, Acapulco,
  •  Special Track on Artificial Immune Systems at
    Genetic and Evolutionary Computation Conference
    (GECCO), Chicago, USA, July 12-16, 2003

AIS Resources
  • Artificial Immune Systems and Their Applications
    by D Dasgupta (Editor), Springer Verlag, 1999.
  • Artificial Immune Systems A New Computational
    Intelligence Approach by L de Castro, J Timmis,
    Springer Verlag, 2002.
  • Immunocomputing Principles and Applications by A
    Tarakanov et al, Springer Verlag, 2003.
  • Third International Conference on Artificial
    Immune Systems (ICARIS), September 13-16, 2004,
    University of Catania, Italy.
  • 4th International Conference on Artificial Immune
    Systems(ICARIS), 14th-17th August, 2005 in Banff,
    Alberta, Canada

Thats all
First Page
Malicious Executables Detectionbased on
Artificial Immune Principles
Case Study 1
From Z.H. Guo, Z.K. Liu, and Y. Tan, An NN-based
Malicious Executables Detection Algorithm based
on Immune Principles, F.Yin, J.Wang, C. Guo
(Eds.) ISNN 2004, Springer Lecture Notes on
Computer Science 3174, pp. 675-680, 2004.
  • This work was supported by Natural Science
    Foundation of China with Grant No. 60273100.

  • Definition of Terms
  • Goal and Motivation
  • Previous Research works
  • Immune Principle for Malicious Executable
  • Malicious Executable Detection Algorithm
  • Experiments and Discussion
  • Concluding Remarks

Definition of Terms
  • Malicious Executable
  • is generally defined as a program that has
    some malicious functions, such as compromising a
    systems security, damaging a system or obtaining
    sensitive information without the permission of
    users. It includes virus, trojan horse, worm etc.
  • Benign Executable
  • is a normal program without any malicious

But Current antivirus systems attempt to detect
these new malicious programs with heuristics by
hand (costly and ineffective)
tens of thousands of new viruses / year Appear!
Dos/Win32 viruses
Computers / Information Systems
Trojan horses
Current Task Devise new methods for detecting
new ME
eMail attached viruses
Malicious executables
Definition of Symbols and Structures
B binary code alphabet, B0,1. Seq(s,k,l)
short sequence cutting operation. Supposing s is
binary sequence, and sb(0)b(1)b(n-1), b(i)?B,
then Seq(s,k,l)b(k)b(k1)b(kl-1). E(k)
executable set, k?m,b, m denotes malicious
executable, b benign executable. E whole set of
executables, i.e., E E(m)?E(b). e(fj,n)
executable as binary sequence of length n, and
fj is executable identifier. ld detector code
length. lstep step size of detector
generation. dl detector, dl Seq(s,k,l). Dl
set of detector with code length l, i.e., Dl
dl (0), dl (1),, dl (nd-1), Dl nd.
Goal and Motivation
  • Aiming at developing an automatic detection
    approach of new malicious executables.
  • Aiming at trying to use artificial immune system
    (AIS) and artificial neural networks (ANN), to
    detect malicious executable with a high Detection
    Rate (DR) with low False Positive Rate (FPR) over

Previous Related Works
  • Signature-based Methods
  • Expert Knowledge-based Methods
  • Machine Learning Methods

Signature-based Methods
  • It creates a unique tag for each malicious
    program so that future examples of it can be
    correctly classified with a small error rate. And
    relies on signatures of known malicious
    executable to generate detection models.
  • Drawbacks
  • Can not detect unknown and mutated viruses.
  • As increase of the number and type of viruses,
    its detection speed become slow dramatically. At
    the same time, the analysis of the signatures of
    viruses become very difficult, in particular, for
    the encrypted signatures.
  • (refer to IBM Anti-virus Groups report R.W. Lo,
    K.N. Levitt, and R.A. Olsson. MCF a Malicious
    Code Filter. Computers Security,
    14(6)541566., 1995.)

Expert Knowledge-based Methods
  • Using the knowledge of a group of virus experts
    to construct heuristic classifiers for detection
    of unknown viruses.
  • Drawbacks
  • Time-consuming analysis method.
  • Only discover some unknown viruses, but its false
    detection rate is very high.
  • For detecting unknown virus based on ANN,
    IBM Anti-virus Group also proposes one method to
    detect Boot Sector viruses only.
  • (refer to W. Arnold and G. Tesauro. Automatically
    Generated Win32 Heuristic Virus Detection.
    Proceedings of the 2000 International Virus
    Bulletin Conference, 2000.)

Machine Learning Methods
  • M.G. Schultz developed a framework that used data
    mining algorithms, i.e., Multi-Naïve Bayes
    method, to train multiple classifiers on a set of
    malicious and benign executables to detect new
    examples (unknown ME).
  • (refer to M.G. Schultz.,E. Eskin and E. Zadok
    . Data Mining Methods for Detection of New
    Malicious Executables. IEEE Symposium on Security
    and Privacy, May 2001.)

Biologically-motivated Information Processing
  • Brain-nervous systems Neural Networks (NN)
  • Genetic systems Genetic Algorithms(GA)
  • Immune systems Artificial Immune Systems(AIS)
  • or immunological computation.
  • NN and GA have extensively studied with wide
    applications but AIS has relative few applications

Natural prototypes vs. their models
Natural prototype Biological level Computing model
Natural language Left hemisphere of brain Formal logic Formal linguistic
Brain nervous net Cells Artificial Neural networks (ANN)
Biological cells Cells Cellular automata (CA)
Molecules of proteins Molecular Artificial immune systems (AIS)
Genetic code Molecular Genetic Algorithms (GA)
Comparison of Three Algorithms
GA (Optimisation) NN (Classification) AIS
Components Chromosome Strings Artificial Neurons Attribute Strings
Location of Components Dynamic Pre-Defined Dynamic
Structure Discrete Components Networked Components Discrete components / Networked Components
Knowledge Storage Chromosome Strings Connection Strengths Component Concentration / Network Connections
Dynamics Evolution Learning Evolution / Learning
Meta-Dynamics Recruitment / Elimination of Components Construction / Pruning of Connections Recruitment / Elimination of Components
Interaction between Components Crossover Network Connections Recognition / Network Connections
Interaction with Environment Fitness Function External Stimuli Recognition / Objective Function
Immune Principles for Malicious Executable
  • Non-self Detection Principle
  • Anomaly Detection Based on Thickness
  • The Diversity of Detector Representation vs.
    Anomaly Detection Hole

Non-self Detection Principle
  • For natural immune system, all cells of body are
    categorized as two types of self and non-self.
    The immune process is to detect non-self from
  • To realize the non-self detection, the maturation
    process of lymphocytes T cell undergoes two
    selection stages of Positive Selection and
    Negative Selection since antigenic encounters may
    result in cell death. Some computer scientists
    inspired by these two stages had proposed some
    algorithms used to detect anomaly information.
    Here, we will use the Positive Selection
    Algorithm (PSA) to perform the non-self detection
    for recognizing the malicious executable.

Non-self Detection by PSA
Process of anomaly detection with PSA
Anomaly Detection Based on Thickness
  • Anomaly recognition process is one process that
    immune cells detect antigens and are activated.
  • The activated threshold of immune cells is
    decided by the thickness of immune cells matching

The Diversity of Detector Representation vs.
Anomaly Detection Hole
  • The main difficulty of anomaly detection is
    utmost decreasing the anomaly detection hole. The
    natural immune system resolves this problem well
    by use of the diversity of MHC (Major
    Histocompatibility Complex) cell representations,
    which decides the diversity of anti-body touched
    in surface of T cells. This property is very
    useful in increasing the power of detecting
    mutated antigens, and decreasing the anomaly
    detection hole.
  • According to the principle, we can use the
    diversity of detector representation to decrease
    the anomaly detection hole. As was illustrated by
    following schematic drawings.

Schematic diagram of abnormal detection holes
Reduction of abnormal detection holes by use of
the diversity of detector representations
Combination of detectors
Malicious Executable Detection Algorithm (MEDA)
  • MEDA based on AIS includes three parts,
  • Detector generation,
  • Anomaly information extraction ,
  • and Classification.

Flow Chart of Malicious Executable Detection
Algorithm (MEDA)
Generation of Detector Set
  • Detector generation algorithm
  • Begin initialize lstep?ld?k0
  • Do cutting e(fk,n) from Eg(b)
  • i0
  • While i lt n-ld-1 do
  • Begin
  • d
    Seq(e(fk,n),i, ld)
  • if d? Dld then Dld?d
  • iilstep
  • End
  • kk1
  • Until Eg(b) is empty
  • Return Dld
  • End

Illustration of Detector Generating Process
File Hex Sequence 56 32 12 0A 34 ED FF 00 2D.
. 00 0A 34 ED FF FA 11 00 Extracting Detector
56 32 12 32 12
0A 12 0A 34


FF FA 11

FA 11 00
Generating Process of 24-bit Detectors with 8-bit
stepsize (ld24, lstep8)
Extraction of Anomaly Characteristics -- Non-self
Thickness (NST)
  • Non-self Detection
  • NST, as Anomaly Property, is defined as the ratio
    of number of non-self units to file binary
    sequence, plnn/(nnns).
  • If there are m kinds of detectors, the file has a
    NST Vector P(pl1, pl2, , plm)T.

NST Extraction Diagram
NST Extraction Algorithm
  • Begin open e(fk,n)
  • Select lstep, ld
  • Set ns0, nn0, i0
  • While i lt n-ld-1 do
  • Begin
  • s Seq(e(fk,n),i,
  • if s ? Dld then nn nn1
  • else ns ns 1
  • i i lstep
  • End
  • pld nn / ( nsnn )
  • Return pld
  • End

BP Network Classifier
  • We use Anomaly Property Vector (APV), i.e., NST
    vector P, as input variable of two-layer BP
    network classifier. The number of nodes of input
    layer equals to APVs dimension.
  • The Sigmoid transfer function is chosen for the
    hidden layer and Linear function for the output

BP Network Classifier Structure
Non-Self Thickness (NST) Vector
Out (1-ME, 0-BE)
Experiments and Discussion
  • Experimental Data Set
  • Generation of Detector Set
  • Experimental Result Using Single Detector Set
  • Experimental Result Using Multi-Detector Set

Experimental Data Set
Type Files Remarks
BE 915 Win 2K OS and some application programs.
ME 3566 DOS virus, Win32 virus, Trojan, Worm, etc. from Internet.
Total 4481 All Justified by Antivirus cleaner Tools
  • BEBenign Executable
  • MEMalicious Executable

Generation of Detector Set
  • Eg(b) is Gene of generating detector, ld
    ?16,24,32,64,96, and lstep8bits. By using the
    detector generating algorithm, we can get D16,
    D24, D32, D64, and D96, separately.
  • Table1 Detectors generation result

Code Length ld 16 24 32 64 96
Dld 65536 10,931,627 8,938,352 12,768,361 21,294,857
store structure Bitmap Index Bitmap Index Tree Tree Tree
Detection Result of Malicious Executables by D24
NST p24
File No.
  • NST of files, where symbol
  • x represents benign program (Red), ?
    malicious program (Blue)

(b) ROC Curve
Detection Result of Malicious Executables by D32
NST p32
  • NST of files, where symbol
  • x represents benign program,
  • ? malicious program

(b) ROC Curve
Detection Result of Malicious Executables by D64
NST p64
  • NST of files, where symbol
  • x represents benign program (Red), ?
    malicious program (Blue)

(b) ROC Curve
Experimental Result Using Single Detector Set
When FPR is fixed, relationship curves of DR
versus Code Length ld
Note from the bottom to up, the FPR is 0, 0.5,
1, 2, 4, 8, and 16, in sequence.
Experimental Result Using Multi-Detector Set
  • This experiment selects multi-detector set to
    detect benign and malicious executables.
  • We dont use D16 because of its zero DR and also
    set D96 as upper limit because almost same DR
    values when ld 96.
  • Here we selects D24, D32, D64 and D96 four
    detector sets as anomaly detection data set, and
    uses them to extract Non-self thickness (NST)
    vector, and finally a BP network is exploited as
  • For the process of classification, we randomly
    selects 30 files of E(b) as Eg(b) to train a BP
    network, and use the remaining data to illustrate
    the anomaly detection performance.

NST Distribution and ROC Curve of Multi-Detector
Set Method
(a) NST of files for mixture of D24, D32 and
D64. x benign program (in Red), ? malicious
program (in Blue).
(b) ROC Curve of mixed detector set of D24,
D32, D64 and D96
Comparisons With Bayes Methods and
Signature-based Method
Algorithm Complexities
Algorithm Operation type 1 Operation type 1 Operation type 2 Operation type 2 Operation type 3 Operation type 3 Store Space
Algorithm Name Amount Name Amount Name Amount Store Space
MEDA detectors ltrain detector matching 80ltest Computing NST 4lf additions 0.4Gb
Bayes Prob. Info. gtgtltrain Searching P(Fi/C) Depend on P(Fi/C) Computing Joint Probs. lf float multiplica-tions 1Gb
  • For short binary sequence and single detector set
    for the detection of malicious executables, the
    performance of D24 is the best, giving out DR
    80.6 with FPR 3.
  • For long code length of detector and
    multi-detector set, our method obtains the best
    performance of DR 97.46 with FPR 2, over
    current methods.
  • This result verifies
  • diversity of detector representation can decrease
    anomaly detection holes.
  • non-self thickness detection.

Film Recommender
Case Study 2
From Dr. Dr Uwe Aickelin (http//
m) University of Nottingham, U.K.,
  • Prediction
  • What rating would I give a specific film?
  • Recommendation
  • Give me a top 10 list of films I might like.

Film Recommender (cont 1)
  • EachMovie database (70k users).
  • User Profile set of tuples movie, rating.
  • Me My user profile.
  • Neighbour User profile of others.
  • Similarity metric Correlation score.
  • Neighbourhood Group of similar users.
  • Recommendations From neighbourhood.

Film Recommender (cont 2)
  • User Profile set of tuples movie, rating
  • Me My user profile.
  • Neighbour User profile of others.
  • Affinity metric Correlation score.
  • Neighbourhood Group of similar users.
  • Recommendations From neighbourhood

Antibody Antigen Binding
Antibody Antibody Binding
Group of antibodies similar to antigen and
dissimilar to other antibodies
Weighted Score based on Similarities.
Film Recommender (cont 3)
  • Start with empty AIS.
  • Encode target user as an antigen Ag.
  • WHILE (AIS not full) (More Users)
  • Add next user as antibody Ab.
  • IF (AIS at full size) Iterate AIS.
  • Generate recommendations from AIS.

Film Recommender (cont 4)
  • Suppose we have 5 users and 4 movies
  • u1(m1,v11),(m2,v12),(m3,v13).
  • u2(m1,v21),(m2,v22),(m3,v23),(m4,v24).
  • u3(m1,v31),(m2,v32),(m4,v34).
  • u4(m1,v41),(m4,v44).
  • u5(m1,v51),(m2,v52),(m3,v53), (m4,v54).
  • We do not have users votes for every film.
  • We want to predict the vote of user u4 on movie

Algorithm walkthrough (1)
  • Start with empty AIS

DATABASE u1, u2, u3, u4, u5
Algorithm walkthrough (2)
  • Add antibodies until AIS is full

Algorithm walkthrough (3)
  • Table of Correlation between Ab and Ag
  • MS14, MS24, MS34.
  • Table of Correlation between Antibodies
  • MS12 CorrelCoef(Ab1, Ab2)
  • MS13 CorrelCoef(Ab1, Ab3)
  • MS23 CorrelCoef(Ab2, Ab3)

Algorithm walkthrough (4)
  • Calculate Concentration of each Ab
  • Interaction with Ag (Stimulation).
  • Interaction with other Ab (Suppression).

Algorithm walkthrough (5)
  • Generate Recommendation based on Antibody

Film Recommender Results
  • Tested against standard method (Pearson k-nearest
  • Prediction
  • Results of same quality.
  • Recommendation
  • 4 out of 5 films correct (AIS).
  • 3 out of 5 films correct (Pearson).