Algorithms for Port of Entry Inspection: Finding Optimal Binary Decision Trees - PowerPoint PPT Presentation

About This Presentation
Title:

Algorithms for Port of Entry Inspection: Finding Optimal Binary Decision Trees

Description:

Algorithms for Port of Entry Inspection: Finding Optimal Binary Decision Trees Fred S. Roberts Rutgers University Port of Entry Inspection Algorithms Port of Entry ... – PowerPoint PPT presentation

Number of Views:203
Avg rating:3.0/5.0
Slides: 109
Provided by: dimacsRut2
Category:

less

Transcript and Presenter's Notes

Title: Algorithms for Port of Entry Inspection: Finding Optimal Binary Decision Trees


1
Algorithms for Port of Entry Inspection Finding
Optimal Binary Decision Trees
Fred S. Roberts Rutgers University
2
Port of Entry Inspection Algorithms
  • Goal Find ways to intercept illicit
  • nuclear materials and weapons
  • destined for the U.S. via the
  • maritime transportation system
  • Goal inspect all containers arriving at ports
  • Even carefully inspecting 8 of containers in
    Port of NY/NJ might bring international trade to
    a halt (Larrabbee 2002)

3
Port of Entry Inspection Algorithms
  • Aim Develop decision support algorithms that
    will help us to optimally intercept illicit
    materials and weapons subject to limits on
    delays, manpower, and equipment
  • Find inspection schemes that minimize total
    cost including cost of false alarms (false
    positives) and failed alarms (false negatives)

Mobile Vacis truck-mounted gamma ray imaging
system
4
Port of Entry Inspection Algorithms
  • My work on port of entry inspection has gotten me
    and my students to some remarkable places.

Me on a Coast Guard boat in a tour of the harbor
in Philadelphia Thanks to Capt. David Scott,
Captain of Port, for taking us on the tour
5
The work on port inspection other work has led
to a new DHS center based at Rutgers.
  • Founded 2009 as a DHS University Center of
    Excellence

6
CCICADA has a wide variety of workshops,
tutorials, and programs for students and faculty
that emphasize the mathematical sciences and
homeland security.
  • For more information http//ccicada.org

7
Sequential Decision Making Problem
  • Stream of containers arrives at a port
  • The Decision Makers Problem
  • Which to inspect?
  • Which inspections next based on previous results?
  • Approach
  • decision logics Boolean methods
  • combinatorial optimization methods
  • Builds on ideas of Stroud
  • and Saeger at Los Alamos
  • National Laboratory
  • Need for new models
  • and methods

8
Sequential Diagnosis Problem
  • Such sequential diagnosis problems arise in many
    areas
  • Communication networks (testing connectivity,
    paging cellular customers, sequencing tasks, )
  • Manufacturing (testing machines, fault diagnosis,
    routing customer service calls, )
  • Medicine (diagnosing patients, sequencing
    treatments, )

9
Sequential Decision Making Problem
  • Containers arriving to be classified into
    categories.
  • Simple case 0 ok, 1 suspicious
  • Inspection scheme specifies which inspections
    are to be made based on previous observations

10
Sequential Decision Making ProblemFor Container
Inspection
  • Containers have attributes, each
  • in a number of states
  • Sample attributes
  • Levels of certain kinds of chemicals or
    biological materials
  • Whether or not there are items of a certain kind
    in the cargo list
  • Whether cargo was picked up in a certain port

11
Sequential Decision Making Problem
  • Currently used attributes
  • Does ships manifest set off an alarm?
  • What is the neutron or Gamma emission count? Is
    it above threshold?
  • Does a radiograph image come up positive?
  • Does an induced fission test come up positive?

Gamma ray detector
12
Sequential Decision Making Problem
  • We can imagine many other attributes
  • The project I have worked on is concerned with
    general algorithmic approaches.
  • We seek a methodology not tied to todays
    technology.
  • Detectors are evolving quickly.

13
Sequential Decision Making Problem
  • Simplest Case Attributes are in state 0 or 1
    (absent or present)
  • Then Container is a bit string like 011001
  • So Classification is a decision function F that
    assigns each bit string to a category.

011001
F(011001)
If attributes 2, 3, and 6 are present, assign
container to category F(011001).
14
Sequential Decision Making Problem
  • If there are two categories, 0 and 1 (safe or
    suspicious), the decision function F is a
    Boolean function.
  • Example
  • F(000) F(111) 1, F(abc) 0 otherwise
  • This classifies a container as positive iff it
    has none of the attributes or all of them.

1
15
Sequential Decision Making Problem
  • What if there are three categories, 0, ½, and 1?.
  • Example
  • F(000) 0, F(111) 1, F(abc) 1/2 otherwise
  • This classifies a container as positive if it has
    all of the attributes, negative if it has none of
    the attributes, and uncertain if it has some but
    not all of the attributes.
  • I wont discuss this case.

16
Sequential Decision Making Problem
  • Given a container, test its attributes until know
    enough to calculate the value of F.
  • An inspection scheme tells us in which order to
    test the attributes to minimize cost.
  • Even this simplified problem is hard
    computationally.

17
Sequential Decision Making Problem
  • This assumes F is known.
  • Simplifying assumption Attributes are
    independent.
  • At any point we stop inspecting and output the
    value of F based on outcomes of inspections so
    far.
  • Complications May be precedence relations in the
    components (e.g., cant test attribute a4 before
    testing a6.
  • Or cost may depend on attributes tested before.
  • F may depend on variables that cannot be
    directly tested or for which tests are too
    costly.

18
Sequential Decision Making Problem
  • Such problems are hard computationally.
  • There are many possible Boolean functions F.
  • Even if F is fixed, problem of finding a good
    classification scheme (to be defined precisely
    below) is NP-complete.
  • Several classes of Boolean functions F allow
    for efficient inspection schemes
  • - k-out-of-n systems
  • - Certain series-parallel systems
  • - Read-once systems
  • - regular systems
  • - Horn systems

19
Sensors and Inspection Lanes
  • n types of sensors measure presence or absence
    of the n attributes.
  • Many copies of each sensor.
  • Complication different characteristics of
    sensors.
  • Entities come for inspection.
  • Which sensor of a given type to
  • use?
  • Think of inspection lanes and
  • waiting on line for inspection
  • Besides efficient inspection
  • schemes, could decrease costs by
  • Buying more sensors
  • Change allocation of containers to sensor lanes.

20
Binary Decision Tree Approach
  • Sensors measure presence/absence of attributes
    so 0 or 1
  • Use two categories 0, 1 (safe or suspicious)
  • Binary Decision Tree
  • Nodes are sensors or categories
  • Two arcs exit from each sensor node, labeled left
    and right.
  • Take the right arc when sensor says the attribute
    is present, left arc otherwise

21
Binary Decision Tree Approach
  • Reach category 1 from the root only through the
    path a0 to a1 to 1.
  • Container is classified in category 1 iff it has
    both attributes a0 and a1 .
  • Corresponding Boolean function
  • F(11) 1, F(10) F(01) F(00) 0.

Figure 1
22
Binary Decision Tree Approach
  • Reach category 1 from the
  • root only through the path a1
  • to a0 to 1.
  • Container is classified in category 1 iff it has
    both
  • attributes a0 and a1 .
  • Corresponding Boolean function
  • F(11) 1, F(10) F(01) F(00) 0.
  • Note Different tree, same function

Figure 1
23
Binary Decision Tree Approach
  • Reach category 1 from the
  • root only through the path a0
  • to 1 or a0 to a1 to 1.
  • Container is classified in category 1 iff it has
    attribute
  • a0 or attribute a1 .
  • Corresponding Boolean function
  • F(11) 1, F(10) F(01) 1, F(00) 0.

Figure 1
24
Binary Decision Tree Approach
  • Reach category 1 from
  • the root by
  • a0 L to a1 R a2 R 1 or
  • a0 R a2 R 1
  • Container classified in category 1 iff it has
  • a1 and a2 and not a0 or
  • a0 and a2 and possibly a1 .
  • Corresponding Boolean function
  • F(111) F(101) F(011) 1, F(abc) 0
    otherwise.

Figure 2
25
Binary Decision Tree Approach
  • This binary decision tree corresponds to the same
    Boolean function
  • F(111) F(101) F(011) 1, F(abc) 0
    otherwise.
  • However, it has one less observation node ai. So,
    it is more efficient if all observations are
    equally costly and equally likely.

Figure 3
26
Binary Decision Tree Approach
  • So we have seen that a given Boolean function may
    correspond to different binary decision trees.
  • How do we find a low-cost or least-cost binary
    decision tree corresponding to a Boolean
    function?

27
Binary Decision Tree Approach
  • Even if the Boolean function F is fixed, the
    problem of finding the least cost binary
    decision tree for it is very hard (NP-complete).
  • For small n number of attributes, can try to
    solve it by trying all possible binary decision
    trees corresponding to the Boolean function F.
  • Even for n 4, not practical. (n 4 at Port of
    Long Beach-Los Angeles)

Port of Long Beach
28
Binary Decision Tree Approach
  • Promising Approaches
  • Heuristic algorithms, approximations to optimal.
  • Special assumptions about the Boolean function F.
  • For monotone Boolean functions, integer
    programming formulations give promising
    heuristics.
  • Stroud and Saeger (Los Alamos
  • National Lab) enumerate all
  • complete, monotone Boolean functions
  • and calculate the least expensive
  • corresponding binary decision trees.
  • Their method practical for n up to 4, not n
    5.

29
Binary Decision Tree Approach
  • Monotone Boolean Functions
  • Given two bit strings x1x2xn, y1y2yn
  • Suppose that xi ? yi for all i implies that
    F(x1x2xn) ? F(y1y2yn).
  • Then we say that F is monotone.
  • Then 111 has highest probability of being in
    category 1.

30
Binary Decision Tree Approach
  • Monotone Boolean Functions
  • Given two bit strings x1x2xn, y1y2yn
  • Suppose that xi ? yi for all i implies that
    F(x1x2xn) ? F(y1y2yn).
  • Then we say that F is monotone.
  • Example
  • n 4, F(x) 1 iff x has at least two 1s.
  • F(1100) F(0101) F(1011) 1, F(1000) 0,
    etc.

31
Binary Decision Tree Approach
  • Incomplete Boolean Functions
  • Boolean function F is incomplete if F can be
    calculated by finding at most n-1 attributes
    and knowing the value of the input string on
    those attributes
  • Example F(111) F(110) F(101) F(100) 1,
    F(000) F(001) F(010) F(011) 0.
  • F(abc) is determined without knowing b (or
    c).
  • F is incomplete.

32
Binary Decision Tree Approach
  • Complete, Monotone Boolean Functions
  • Stroud and Saeger algorithm for enumerating
    binary decision trees implementing complete,
    monotone Boolean functions.
  • Feasible to implement up to n 4.
  • Then you can find least cost tree by enumerating
    all binary decision trees corresponding to a
    given complete, monotone Boolean function and
    repeating this for all complete, monotone Boolean
    functions.

33
Binary Decision Tree Approach
  • Complete, Monotone Boolean Functions
  • Stroud and Saeger algorithm for enumerating
    binary decision trees implementing complete,
    monotone Boolean functions.
  • n 2
  • There are 6 monotone Boolean functions.
  • Only 2 of them are complete, monotone
  • There are 4 binary decision trees for calculating
    these 2 complete, monotone Boolean functions.

34
Binary Decision Tree Approach
  • Complete, Monotone Boolean Functions
  • n 3
  • 9 complete, monotone Boolean functions.
  • 60 distinct binary trees for calculating them

35
Binary Decision Tree Approach
  • Complete, Monotone Boolean Functions
  • n 4
  • 114 complete, monotone Boolean functions.
  • 11,808 distinct binary decision trees for
    calculating them.
  • (Compare 1,079,779,602 BDTs for all Boolean
    functions)

36
Binary Decision Tree Approach
  • Complete, Monotone Boolean Functions
  • n 5
  • 6894 complete, monotone Boolean functions
  • 263,515,920 corresponding binary decision trees.
  • Combinatorial explosion!
  • Need alternative approaches enumeration not
    feasible!
  • (Even worse compare 5 x 1018 BDTs corresponding
    to all Boolean functions)

37
Cost Functions
  • So far, we have figured one binary decision tree
    is cheaper than another if it has fewer nodes.
  • This is oversimplified.
  • There are more complex costs involved than number
    of sensors in a tree.

38
Cost Functions
  • Stroud-Saeger method applies to more
    sophisticated cost models, not just cost number
    of sensors in the BDT.
  • Using a sensor has a cost
  • Unit cost of inspecting one item with it
  • Fixed cost of purchasing and deploying it
  • Delay cost from queuing up at the sensor station
  • Preliminary problem disregard fixed and delay
    costs. Minimize unit costs.

39
Cost Functions Delay Costs
  • Tradeoff between fixed costs and delay costs Add
    more sensors cuts down on delays.
  • More sophisticated models describe the process of
    containers arriving
  • There are differing delay times for inspections
  • Use queuing theory to find average delay times
    under different models

40
Cost Functions
  • Unit Cost Complication How many nodes of the
    decision tree are actually visited during average
    containers inspection? Depends on distribution
    of containers.
  • Answer can also depend on probability of sensor
    errors and probability of bomb in a container.

41
Cost FunctionsUnit CostsTree Utilization
  • In our early models, we assume we are given
    probability of sensor errors and probability of
    bomb in a container.
  • This allows us to calculate expected cost of
    utilization of the tree Cutil.

42
Cost Functions
  • OTHER COSTS
  • Cost of false positive Cost of additional tests.
  • If it means opening the container, its
    expensive.
  • Cost of false negative
  • Complex issue.
  • What is cost of a bomb going off in Manhattan?

43
Cost Functions Sensor Errors
  • One Approach to False Positives/Negatives
  • Assume there can be Sensor Errors
  • Simplest model assume that all sensors checking
    for attribute ai have same fixed probability of
    saying ai is 0 if in fact it is 1, and
    similarly saying it is 1 if in fact it is 0.
  • More sophisticated analysis later describes a
    model for determining probabilities of sensor
    errors.
  • Notation X state of nature (bomb or no bomb)
  • Y outcome (of sensor or entire inspection
    process).

44
Probability of Error for The Entire Tree
  • State of nature is zero (X 0), absence of a bomb

State of nature is one (X 1), presence of a bomb
Probability of false positive (P(Y1X0)) for
this tree is given by
Probability of false negative (P(Y0X1)) for
this tree is given by
P(Y1X0) P(YA1X0) P(YB1X0)
P(YA1X0) P(YB0X0) P(YC1X0)
Pfalsepositive
P(Y0X1) P(YA0X1) P(YA1X1)
P(YB0X1)P(YC0X1) Pfalsenegative
45
Cost Function used for Evaluating the Decision
Trees.
  • CTot CFalsePositive PFalsePositive
    CFalseNegative PFalseNegative Cutil

CFalsePositive is the cost of false positive
(Type I error) CFalseNegative is the cost of
false negative (Type II error) PFalsePositive is
the probability of a false positive
occurring PFalseNegative is the probability of a
false negative occurring Cutil is the expected
cost of utilization of the tree.
46
Cost Function used for Evaluating the Decision
Trees.
CFalsePositive is the cost of false positive
(Type I error) CFalseNegative is the cost of
false negative (Type II error) PFalsePositive is
the probability of a false positive
occurring PFalseNegative is the probability of a
false negative occurring Cutil is the expected
cost of utilization of the tree. PFalsePositive
and PFalseNegative are calculated from the tree.
Cutil is calculated from tree and probabilities
of bomb in container and probability of sensor
errors. CFalsePositive, CFalseNegative are input
given information.
47
Stroud Saeger Experiments
  • Stroud-Saeger ranked all trees formed
  • from 3 or 4 sensors A, B, C and D
  • according to increasing tree costs.
  • Used cost function defined above.
  • Values used in their experiments
  • CA .25 P(YA1X1) .90 P(YA1X0) .10
  • CB 10 P(YC1X1) .99 P(YB1X0) .01
  • CC 30 P(YD1X1) .999 P(YC1X0) .001
  • CD 1 P(YD1X1) .95 P(YD1X0) .05
  • Here, Ci unit cost of utilization of sensor i.
  • Also fixed were CFalseNegative, CFalsePositive,
    P(X1)

48
Sensitivity Analysis
  • When parameters in a model are not known exactly,
    the results of a mathematical analysis can change
    depending on the values of the parameters.
  • It is important to do a sensitivity analysis let
    the parameter values vary and see if the results
    change.
  • So, do the least cost trees change if we change
    values like probability of a bomb, cost of a
    false positive, etc?

49
Stroud Saeger Experiments Our Sensitivity
Analysis
  • We have explored sensitivity of the Stroud-Saeger
    conclusions to variations in values of the three
    parameters
  • CFalseNegative, CFalsePositive, P(X1)
  • Extensive computer experimentation.
  • Fascinating results.
  • To start, we estimated
  • high and low values
  • for the parameters.

50
Stroud Saeger Experiments Our Sensitivity
Analysis
  • CFalseNegative was varied between 25 million and
    10 billion dollars
  • Low and high estimates of direct and indirect
    costs incurred due to a false negative.
  • CFalsePositive was varied between 180 and 720
  • Cost incurred due to false positive
  • (4 men (3 -6 hrs) (15 30 /hr)
  • P(X1) was varied between 1/10,000,000 and
    1/100,000

51
Stroud Saeger Experiments Our Sensitivity
Analysis
  • n 3 (use sensors A, B, C)
  • Varied the parameters
  • CFalseNegative, CFalsePositive, P(X1)
  • We chose the value of one of these parameters
    from the interval of values
  • Then explored the highest ranked tree as the
    other two parameters were chosen at random in the
    interval of values.
  • 10,000 experiments for each fixed value.
  • We looked for the variation in the top-ranked
    tree and how the top-rank related to choice of
    parameter values.
  • Very surprising results.

52
Frequency of Top-ranked Trees when CFalseNegative
and CFalsePositive are Varied
  • 10,000 randomized experiments (randomly selected
    values of CFalseNegative and CFalsePositive from
    the specified range of values) for the median
    value of P(X1).
  • The above graph has frequency counts of the
    number of experiments when a particular tree was
    ranked first or second or third and so on.
  • Only three trees (7, 55 and 1) ever came first. 6
    trees came second, 10 came third, 13 came fourth.

53
Frequency of Top-ranked Trees when CFalseNegative
and P(X1) are Varied
  • 10,000 randomized experiments for the median
    value of CFalsePositive.
  • Only 2 trees (7 and 55) ever came first. 4 trees
    came second. 7 trees came third. 10 and 13 trees
    came 4th and 5th respectively.

54
Frequency of Top-ranked Trees when P(X1) and
CFalsePositive are Varied
  • 10,000 randomized experiments for the median
    value of CFalseNegative.
  • Only 3 trees (7, 55 and 1) ever came first. 6
    trees came second. 10 trees came third. 13 and 16
    trees came 4th and 5th respectively.

55
Most Frequent Tree Groups Attaining the Top Three
Ranks.
  • Trees 7, 9 and 10

All the three decision trees have been generated
from the same Boolean function 00000111
representing F(000)F(001)F(111) Both Tree 9 and
Tree 10 are ranked second and third more than 99
of the times when Tree 7 is ranked first.
56
Most Frequent Tree Groups Attaining the Top Three
Ranks
  • Trees 55, 57 and 58

All three trees correspond to the same Boolean
function 01111111 Tree ranked 57 is second 96
of the times and tree 58 is third 79 of the
times when tree 55 is ranked first.
57
Most Frequent Tree Groups Attaining the Top Three
Ranks
  • Trees 1, 3, and 2

All three trees correspond to the same Boolean
function 00000001 Tree 3 is ranked second 98 of
times and tree 2 is ranked third 80 of the
times when tree 1 is ranked first.
58
Most Frequent Tree Groups Attaining the Top Three
Ranks
  • Challenge Why so few trees?
  • Why these trees?
  • Why so few Boolean functions?
  • Why these Boolean functions?

59
Stroud Saeger Experiments Sensitivity Analysis
4 Sensors
  • Second set of computer experiments n 4
  • (use sensors, A, B, C, D).
  • Same values as before.
  • Experiment 1 Fix values of two of
    CFalseNegative, CFalsePositive, P(X1) and vary
    the third through their interval of possible
    values.
  • Experiment 2 Fix a value of one of
    CFalseNegative, CFalsePositive, P(X1) and vary
    the other two.
  • Do 10,000 experiments each time.
  • Look for the variation in the highest ranked
    tree.

60
Stroud Saeger Experiments Our Sensitivity
Analysis 4 Sensors
  • Experiment 1 Fix values of two of
    CFalseNegative, CFalsePositive, P(X1) and vary
    the third.

61
CTot vs CFalseNegative for Ranked 1 Trees (Trees
11485(9651) and 10129(349))
Only two trees ever were ranked first, and one,
tree 11485, was ranked first in 9651 out of
10,000 runs.
62
CTot vs CFalsePositive for Ranked 1 Trees (Tree
no. 11485 (10000))
One tree, number 11485, was ranked first every
time.
63
CTot vs P(X1) for Ranked 1 Trees (Tree no.
11485(8372), 10129(488), 11521(1056))
Three trees dominated first place. Trees
10201(60), 10225(17) and 10153(7) also achieved
first rank but with relatively low frequency.
64
Tree Structure For Top Trees
Tree number 11485 Boolean Expr 0101011101111111
Tree number 10129 Boolean Expr 0001011101111111
Note how close the Boolean expressions are
65
Most Frequent Tree Groups Attaining the Top Three
Ranks
  • Same challenge as before Why so few trees?
  • Why these trees?
  • Why so few Boolean functions?
  • Why these Boolean functions?

66
Stroud Saeger Experiments Our Sensitivity
Analysis 4 Sensors
  • Experiment 2 Fix the values of one of
    CFalseNegative, CFalsePositive, P(X1) and vary
    the others.

67
Stroud Saeger Experiments Our Sensitivity
Analysis 4 Sensors
  • Experiment 2 Fix the values of one of
    CFalseNegative, CFalsePositive, P(X1) and vary
    the others.
  • Similar
  • results

68
Conclusions from Sensitivity Analysis
  • Considerable lack of sensitivity to modification
    in parameters for trees using 3 or 4 sensors.
  • Very few optimal trees.
  • Very few Boolean functions arise among optimal
    and near-optimal trees.
  • Surprising results.

69
New Idea Searching through a Generalized Tree
Space
  • Sometimes adding more possibilities results in
    being able to do more efficient searches.
  • We expand the space of trees from those
    corresponding to Stroud and Saegers Complete
    and Monotonic Boolean Functions to Complete and
    Monotonic BDTs.
  • Advantages
  • Unlike Boolean functions, BDTs may not have to
    consider all sensor inputs to give a final
    decision.
  • Allow more potentially useful trees to
    participate in the analysis
  • Help define an irreducible tree space for search
    operations

70
Revisiting Monotonicity
  • Monotonic Decision Trees
  • A binary decision tree will be called monotonic
    if all the left leaves are class 0 and all the
    right leaves are class 1.
  • Example

All these trees correspond to same monotonic
Boolean function Only one is a monotonic BDT.
71
Revisiting Completeness
  • Complete Decision Trees
  • A binary decision tree will be called complete if
    every sensor occurs at least once in the tree
    and, at any non-leaf node in the tree, its left
    and right sub-trees are not identical.
  • Example

72
The CM Tree Space
complete, monotonic BDTs
No. of attributes Distinct BDTs Trees From CM Boolean Functions Complete, Monotonic BDTs
2 74 4 4
3 16,430 60 114
4 1,079,779,602 11,808 66,936
73
Tree Neighborhood and Tree Space
  • Define tree neighborhood by giving operations for
    moving from one tree in CM Tree Space to another.
  • We have developed an algorithm for finding
    low-cost BDTs by searching through CM Tree Space
    from a tree to one of its neighbors.

74
Search Operations in Tree Space
  • Split
  • Pick a leaf node and replace it with a sensor
    that is not already present in that branch, and
    then insert arcs from that sensor to 0 and to 1.

75
Search Operations
  • Swap
  • Pick a non-leaf node in the tree and swap it
    with its parent node such that the new tree is
    still monotonic and complete and no sensor occurs
    more than once in any branch.

76
Search Operations
  • Merge
  • Pick a parent node of two leaf nodes and make
    it a leaf node by collapsing the two leaf nodes
    below it, or pick a parent node with one leaf
    node, collapse both the parent node and its one
    leaf node, and shift the sub-tree up in the tree
    by one level.

a b c
0 c d 1 0 1 0 1
a b c
0 c d 1 d 1 0 1 0
1
a b c
0 d d 1 0 1 0 1
MERGE
77
Search Operations
  • Replace
  • Pick a node with a sensor occurring more than
    once in the tree and replace it with any other
    sensor such that no sensor occurs more than once
    in any branch.

78
(No Transcript)
79
Tree Neighborhood and Tree Space
  • Define tree neighborhood by using these four
    operations for moving from one tree in CM Tree
    Space to another.
  • Irreducibility
  • Theorem Any tree in the CM tree space can be
    reached from any other tree by using these
    neighborhood operations repetitively
  • An irreducible CM tree space helps search for
    the cheapest trees using neighborhood operations

80
Tree Neighborhood and Tree Space
  • Sketch of Proof of the Theorem
  • Simple Tree
  • A simple tree is defined as a CM tree in which
    every sensor occurs exactly once in such a way
    that there is exactly one path in the tree with
    all sensors in it.

81
Tree Neighborhood and Tree Space
  • Sketch of Proof of the Theorem
  • To Prove Given any two trees t1, t2 in CM tree
    space, t2 can be reached from t1 by a sequence of
    neighborhood operations
  • We prove this in three different steps
  • 1. Any tree t1 can be converted to a simple tree
    ts1
  • 2. Any simple tree ts1 can be converted to any
    other simple tree ts2
  • 3. Any simple tree ts2 can be converted to any
    tree t2

82
Tree Space Traversal
  • Naïve Idea Greedy Search
  • Randomly start at any tree in the CM tree space
  • Find its neighboring trees using the above
    operations
  • Move to the neighbor with the lowest cost
  • Iterate until we find a minimum
  • Problem The CM Tree space is highly multi-modal
    (more than one local minimum)!
  • Therefore, we implement a stochastic search
    algorithm with simulated annealing to find the
    best tree

83
Tree Space Traversal
  • Stochastic Search
  • Randomly start at any tree in CM space
  • Find its neighboring trees, and evaluate each one
    for its total cost
  • Select next move according to a probability
    distribution over the neighboring trees
  • To deal with the multimodality of the tree space,
    we introduce Simulated Annealing
  • Make more random jumps initially, gradually
    decrease the randomness and finally converge at
    the overall minimum

84
Results Searching CM Tree Space
  • We were able to perform experiments for 3, 4 and
    5 sensors successfully by searching CM Tree
    Space.
  • Results show improvement compared to the
    extensive search method over BDTs corresponding
    to complete, monotone Boolean functions. E.g., n
    4 (66,936 trees)
  • 100 different experiments were performed
  • Each experiment was started 10 times randomly at
    some tree in CM Tree Space and chains were formed
    by making stochastic moves in the neighborhood,
    until we find a local minimum
  • Only 4890 trees were examined on average for
    every experiment
  • Global minimum was found 82 out of 100 times
    while the second best tree was found 10 times
  • The method found trees that were less costly than
    those found by earlier searches of BDTs
    corresponding to complete, monotonic Boolean
    functions.

85
Genetic Algorithms-based Approach
  • Structure-based neighborhood moves allow very
    short moves only. Therefore,
  • Techniques like Genetic Algorithms and
    Evolutionary Techniques may suggest ways for
    getting more efficiently to better trees, given a
    population of good trees

86
Genetic Algorithms-based Approach
  • Started implementing genetic algorithms-based
    techniques for tree space traversal
  • Basically, we try to get better trees from the
    current population of good trees using the
    basic genetic operations on them
  • Selection
  • Crossover
  • Mutation
  • Here, better decision trees correspond to lower
    cost decision trees than the ones in the current
    population (good).

87
Genetic Algorithms-based Approach
  • Selection
  • Select a random, initial population of N trees
    from CM tree space
  • Crossover
  • Performed k times between every pair of trees
    in the current best population, bestPop

88
Genetic Algorithms-based Approach
  • For each crossover operation between two trees,
    we randomly select a node in each tree and
    exchange their subtrees
  • However, we impose certain restrictions on the
    selection of nodes, so that the resultant trees
    still lie in CM tree space

89
Genetic Algorithms-based Approach
  • Mutation
  • Performed after every m generations of the
    algorithm
  • We do two types of mutations
  • 1. Generate all neighbors of the current best
    population and put them into the gene pool
  • 2. Replace a fraction of the trees of bestPop
    with random trees from the CM tree space

90
Genetic Algorithms-based Approach
  • Only 1600 trees had to be examined to obtain the
    10 best trees for 4 sensors!

91
Modeling Sensor Errors
  • One Approach to Sensor Errors Modeling Sensor
    Operation
  • Threshold Model
  • Sensors have different discriminating power
  • Many use counts (e.g., Gamma radiation counts)
  • See if count exceeds
  • threshold
  • If so, say attribute is present.

92
Modeling Sensor Errors
  • Threshold Model
  • Sensor i has discriminating power Ki,
    threshold Ti
  • Attribute present if counts exceed Ti
  • Seek threshold values that minimize the overall
    cost function, including costs of inspection,
    false positive/negative
  • Assume readings of category 0 containers follow a
    Gaussian distribution and similarly category 1
    containers
  • Simulation approach

93
Probability of Error for Individual Sensors
  • For ith sensor, the type 1 (P(Yi1X0)) and type
    2 (P(Yi0X1)) errors are modeled using Gaussian
    distributions.
  • State of nature X0 represents absence of a bomb.
  • State of nature X1 represents presence of a
    bomb.
  • ?i represents the outcome (count) of sensor i.
  • Si is variance of the distributions
  • PD prob. of detection, PF prob. of false pos.

Ki
?i
94
Modeling Sensor Errors
The probability of false positive for the ith
sensor is computed as P(Yi1X0) 0.5
erfcTi/v2 The probability of detection for the
ith sensor is computed as P(Yi1X1) 0.5
erfc(Ti-Ki)/(Sv2) erfc complementary error
function erfc(x) ?(1/2,x2)/sqrt(?)
The following experiments have been done using
sensors A, B, C and using KA 4.37 SA 1 KB
2.9 SB 1 KC 4.6 SC 1 We then varied
the individual sensor thresholds TA, TB and TC
from -4.0 to 4.0 in steps of 0.4. These values
were chosen since they gave us an ROC curve
for the individual sensors over a complete range
P(Yi1X0) and P(Yi1X1)
95
Frequency of First Ranked Trees for Variations in
Sensor Thresholds
  • Extensive Search 68,921 experiments were
    conducted, as each Ti was varied through its
    entire range. (n 3)
  • The above graph has frequency counts of the
    number of experiments when a particular tree was
    ranked first. There are 15 such trees. Tree 37
    had the highest frequency of attaining rank one.

96
Modeling Sensor Errors
  • A number of trees ranking first in other
    experiments also ranked first here.
  • Similar results in case of n 4.
  • 4,194,481 experiments.
  • 244 different trees were ranked first in at least
    one experiment.
  • Trees ranked first in other experiments also
    frequently appeared first here.
  • Conclusion considerable insensitivity to change
    of threshold.

97
New Approaches to Optimum Threshold Computation
  • Extensive search over a range of thresholds
    (e.g., -4.0 to 4.0 in steps of 0.4) has some
    practical drawbacks
  • Large number of threshold values for every sensor
  • Large step size
  • Grows exponentially with the number of sensors
    (computationally infeasible for n gt 4)
  • A non-linear optimization approach proves more
    satisfactory
  • A combination of Gradient Descent and modified
    Newtons methods

98
Problems with Standard Approaches
  • Gradient Descent Method
  • Too small step size results in large number of
    iterations to reach the minimum
  • Too big step size results in skipping the minimum
  • Newtons Method
  • The convergence depends largely on the starting
    point. This method occasionally drifts in the
    wrong direction and hence fails to converge.
  • Solution combination of gradient descent and
    Newtons methods
  • This works well.

99
Results Threshold Optimization
  • Costs of false positive CFalsePositive and false
    negative CFalseNegative and prior probability of
    occurrence of a bad container, P(X1), were fixed
    as medians of the min and max values given by
    Stroud and Saeger (same as we used in earlier
    experiments)
  • We were able to converge to a (hopefully-close-to-
    minimum) cost every time with a modest number of
    iterations changing thresholds.

100
Results Threshold Optimization
  • We were able to converge to a (hopefully-close-to-
    minimum) cost every time with a modest number of
    iterations changing thresholds. For example
  • For 3 sensors, it took an average of 0.081
    seconds (as opposed to 0.387 seconds using
    extensive search) to converge to a cost for all
    114 trees studied
  • For 4 sensors, it took an average of 0.196
    seconds (as opposed to more than 2 seconds using
    extensive search) to converge to a cost for all
    66,936 trees studied
  • In each case, min cost attained with new
    algorithm was lower, and often much lower, than
    that attained with extensive search.

101
Results Threshold Optimization
  • Many times the minimum obtained using the
    optimization method was considerably less than
    the one from the extensive search technique.

102
Closing Comments
  • Very few optimal trees optimality insensitive to
    changes in parameters.
  • Extensive search techniques become practically
    infeasible beyond a very small number of sensors
  • Studying an irreducible tree space helps us to
    search for the best trees rather than
    evaluating all the trees for their cost
  • A new stochastic search algorithm allows us to
    search for optimum inspection schemes beyond 4
    sensors successfully
  • Our new threshold optimization algorithms provide
    faster ways to arrive at a low tree cost cost is
    lower and often much lower than in extensive
    search

103
Discussion and Future Work
  • Future Work Explain why conclusions are so
    insensitive to variation in parameter values.
  • Future Work Explore the structure of the optimal
    trees and compare the different optimal trees.
  • Future Work Develop methods for
  • approximating the optimal tree.

Pallet vacis
104
Discussion and Future Work
  • Future work More than two values of an attribute
  • (present, absent, present with probability gt 75,
    absent with probability at least 75)
  • (ok, not ok, ok with probability gt 99, ok with
    probability between 95 and 99)
  • Future work In the Boolean function model
    inferring the Boolean function from observations
    (partially defined Boolean functions)

105
Discussion and Future Work
  • Future work Need for more complicated cost
    models bringing in costs of delays

106
Discussion and Future Work
  • Future work Because of the rapid growth in
    number of trees in CM Tree Space when the number
    of sensors grows, it is necessary to try to
    reduce the number of trees we need to search
    through.
  • A notion of tree equivalence could be
    incorporated when the number of sensors go beyond
    5 or 6
  • We hope that incorporating this into our model
    will enable us to extend our model to a large
    number of sensors

107
  • Collaborators on this Work
  • Saket Anand
  • David Madigan
  • Richard Mammone
  • Sushil Mittal
  • Saumitr Pathak
  • Research Support
  • Dept. of Homeland Security University Programs
  • Domestic Nuclear Detection Office
  • Office of Naval Research
  • National Science Foundation
  • Los Alamos National Laboratory
  • Rick Picard
  • Kevin Saeger
  • Phil Stroud

108
This work has gotten me places I never thought
Id go.
More information http//ccicada.orghttp//dimac
s.rutgers.edu froberts_at_dimacs.rutgers.edu
Write a Comment
User Comments (0)
About PowerShow.com