Algorithms%20for%20Port%20of%20Entry%20Inspection%20for%20WMDs - PowerPoint PPT Presentation

About This Presentation
Title:

Algorithms%20for%20Port%20of%20Entry%20Inspection%20for%20WMDs

Description:

Then: Container is a binary string like 011001 ... This classifies a container as positive iff it has none of the attributes or all ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 90
Provided by: dimacsR
Category:

less

Transcript and Presenter's Notes

Title: Algorithms%20for%20Port%20of%20Entry%20Inspection%20for%20WMDs


1
Algorithms for Port of Entry Inspection for WMDs
Fred S. Roberts DIMACS Center, Rutgers University
2
Port of Entry Inspection Algorithms
  • Goal Find ways to intercept illicit
  • nuclear materials and weapons
  • destined for the U.S. via the
  • maritime transportation system
  • Currently inspecting only small
  • of containers arriving at ports
  • Even inspecting 8 of containers in Port of NY/NJ
    might bring international trade to a halt
    (Larrabbee 2002)

3
Port of Entry Inspection Algorithms
  • Aim Develop decision support algorithms that
    will help us to optimally intercept illicit
    materials and weapons subject to limits on
    delays, manpower, and equipment
  • Find inspection schemes that minimize total
    cost including cost of false positives and
    false negatives

Mobile Vacis truck-mounted gamma ray imaging
system
4
Sequential Decision Making Problem
  • Stream of containers arrives at a port
  • The Decision Makers Problem
  • Which to inspect?
  • Which inspections next based on previous results?
  • Approach
  • decision logics
  • combinatorial optimization methods
  • Builds on ideas of Stroud
  • and Saeger at Los Alamos
  • National Laboratory
  • Need for new models
  • and methods

5
Sequential Diagnosis Problem
  • Such sequential diagnosis problems arise in many
    areas
  • Communication networks (testing connectivity,
    paging cellular customers, sequencing tasks, )
  • Manufacturing (testing machines, fault diagnosis,
    routing customer service calls, )
  • Artificial intelligence/CS (optimal derivation
    strategies in knowledge bases, best-value
    satisficing search, coding decision trees, )
  • Medicine (diagnosing patients, sequencing
    treatments, )

6
Sequential Decision Making Problem
  • Containers arriving to be classified into
    categories.
  • Simple case 0 ok, 1 suspicious
  • Inspection scheme specifies which inspections
    are to be made based on previous observations

7
Sequential Decision Making Problem
  • Containers have attributes, each
  • in a number of states
  • Sample attributes
  • Levels of certain kinds of chemicals or
    biological materials
  • Whether or not there are items of a certain kind
    in the cargo list
  • Whether cargo was picked up in a certain port

8
Sequential Decision Making Problem
  • Currently used attributes
  • Does ships manifest set off an alarm?
  • What is the neutron or Gamma emission count? Is
    it above threshold?
  • Does a radiograph image come up positive?
  • Does an induced fission test come up positive?

Gamma ray detector
9
Sequential Decision Making Problem
  • We can imagine many other attributes
  • This project is concerned with general
    algorithmic approaches.
  • We seek a methodology not tied to todays
    technology.
  • Detectors are evolving quickly.

10
Sequential Decision Making Problem
  • Simplest Case Attributes are in state 0 or 1
  • Then Container is a binary string like 011001
  • So Classification is a decision function F that
    assigns each binary string to a category.

011001
F(011001)
If attributes 2, 3, and 6 are present, assign
container to category F(011001).
11
Sequential Decision Making Problem
  • If there are two categories, 0 and 1, decision
    function F is a boolean function.
  • Example
  • F(000) F(111) 1, F(abc) 0 otherwise
  • This classifies a container as positive iff it
    has none of the attributes or all of them.

1
12
Sequential Decision Making Problem
  • Given a container, test its attributes until know
    enough to calculate the value of F.
  • An inspection scheme tells us in which order to
    test the attributes to minimize cost.
  • Even this simplified problem is hard
    computationally.

13
Sequential Decision Making Problem
  • This assumes F is known.
  • Simplifying assumption Attributes are
    independent.
  • At any point we stop inspecting and output the
    value of F based on outcomes of inspections so
    far.
  • Complications May be precedence relations in the
    components (e.g., cant test attribute a4 before
    testing a6.
  • Or cost may depend on attributes tested before.
  • F may depend on variables that cannot be
    directly tested or for which tests are too
    costly.

14
Sequential Decision Making Problem
  • Such problems are hard computationally.
  • There are many possible boolean functions F.
  • Even if F is fixed, problem of finding a good
    classification scheme (to be defined precisely
    below) is NP-complete.
  • Several classes of functions F allow for
    efficient inspection schemes
  • k-out-of-n systems
  • Certain series-parallel systems
  • Read-once systems
  • regular systems
  • Horn systems

15
Sensors and Inspection Lanes
  • n types of sensors measure presence or absence
    of the n attributes.
  • Many copies of each sensor.
  • Complication different characteristics of
    sensors.
  • Entities come for inspection.
  • Which sensor of a given type to
  • use?
  • Think of inspection lanes and
  • queues.
  • Besides efficient inspection
  • schemes, could decrease costs by
  • Buying more sensors
  • Change allocation of containers to sensor lanes.

16
Binary Decision Tree Approach
  • Sensors measure presence/absence of attributes.
  • Binary Decision Tree
  • Nodes are sensors or categories (0 or 1)
  • Two arcs exit from each sensor node, labeled left
    and right.
  • Take the right arc when sensor says the attribute
    is present, left arc otherwise

17
Binary Decision Tree Approach
  • Reach category 1 from the root only through the
    path a0 to a1 to 1.
  • Container is classified in category 1 iff it has
    both attributes a0 and a1 .
  • Corresponding boolean function F(11) 1, F(10)
    F(01) F(00) 0.

Figure 1
18
Binary Decision Tree Approach
  • Reach category 1 from the root by
  • a0 L to a1 R a2 R 1 or
  • a0 R a2 R1
  • Container classified in category 1 iff it has
  • a1 and a2 and not a0 or
  • a0 and a2 and possibly a1.
  • Corresponding boolean function F(111) F(101)
    F(011) 1, F(abc) 0 otherwise.

Figure 2
19
Binary Decision Tree Approach
  • This binary decision tree corresponds to the same
    boolean function
  • F(111) F(101) F(011) 1, F(abc) 0
    otherwise.
  • However, it has one less observation node ai. So,
    it is more efficient if all observations are
    equally costly and equally likely.

Figure 3
20
Binary Decision Tree Approach
  • Even if the boolean function F is fixed, the
    problem of finding the optimal binary decision
    tree for it is very hard (NP-complete).
  • For small n number of attributes, can try to
    solve it by brute force enumeration.
  • Even for n 5, not practical. (n 4 at Port of
    Long Beach-Los Angeles)

Port of Long Beach
21
Binary Decision Tree Approach
  • Promising Approaches
  • Heuristic algorithms, approximations to optimal.
  • Special assumptions about the boolean function F.
  • Example For monotone boolean functions,
    integer programming formulations give promising
    heuristics.
  • Stroud and Saeger enumerate
  • all complete, monotone
  • boolean functions and calculate
  • the least expensive corresponding
  • binary decision trees.

22
Binary Decision Tree Approach
  • Monotone Boolean Functions
  • Given two strings x1x2xn, y1y2yn
  • Suppose that xi ? yi for all i implies that
    F(x1x2xn) ? F(y1,y2yn).
  • Then we say that F is monotone.
  • Then 111 has highest probability of being in
    category 1.

23
Binary Decision Tree Approach
  • Incomplete Boolean Functions
  • Boolean function F is incomplete if F can be
    calculated by finding at most n-1 attributes
    and knowing the value of the input string on
    those attributes
  • Example F(111) F(110) F(101) F(100) 1,
    F(000) F(001) F(010) F(011) 0.
  • F(abc) is determined without knowing b (or
    c).
  • F is incomplete.

24
Binary Decision Tree Approach
  • Complete, Monotone Boolean Functions
  • Stroud and Saeger algorithm for enumerating
    binary decision trees implementing complete,
    monotone boolean functions.
  • Feasible to implement up to n 4.
  • n 2
  • There are 6 monotone boolean functions.
  • Only 2 of them are complete, monotone
  • There are 4 binary decision trees for calculating
    these 2 complete, monotone boolean functions.

25
Binary Decision Tree Approach
  • Complete, Monotone Boolean Functions
  • n 3
  • 9 complete, monotone boolean functions.
  • 60 distinct binary trees for calculating them
  • n 4
  • 114 complete, monotone boolean functions.
  • 11,808 distinct binary decision trees for
    calculating them.

26
Binary Decision Tree Approach
  • Complete, Monotone Boolean Functions
  • n 5
  • 6894 complete, monotone boolean functions
  • 263,515,920 corresponding binary decision trees.
  • Combinatorial explosion!
  • Need alternative approaches enumeration not
    feasible!

27
Cost Functions
  • Above analysis Only uses number of sensors
  • Using a sensor has a cost
  • Unit cost of inspecting one item with it
  • Fixed cost of purchasing and deploying it
  • Delay cost from queuing up at the sensor station
  • Preliminary problem disregard fixed and delay
    costs. Minimize unit costs.

28
Cost Functions
  • Simplification so far Disregard characteristics
    of population of entities being inspected.
  • Only count number of observation (attribute)
    nodes in the tree.
  • Unit Cost Complication How many nodes of the
    decision tree are actually visited during average
    containers inspection? Depends on distribution
    of containers. In our early models, will depend
    on probability of sensor errors and probability
    of bomb in a container.

29
Cost Functions Delay Costs
  • Tradeoff between fixed costs and delay costs Add
    more sensors cuts down on delays.
  • Stochastic process of containers arriving
  • Distribution of delay times for inspections
  • Use queuing theory to find average delay times
    under different models

30
Cost Functions
  • Cost of false positive Cost of additional tests.
  • If it means opening the container, its very
    expensive.
  • Cost of false negative
  • Complex issue.
  • What is cost of a bomb going off in Manhattan?

31
The Brute Force Approach
  • The cost of each binary decision tree
    corresponding to a complete, monotone boolean
    function is calculated.
  • The optimum tree is selected.
  • Optimum depends on assumptions about sensor
    errors, costs of false positive and false
    negative outcomes, and unit, fixed, and delay
    costs for each sensor.

32
Cost Functions Sensor Errors
  • One Approach to False Positives/Negatives
  • Assume there can be Sensor Errors
  • Simplest model assume that all sensors checking
    for attribute ai have same fixed probability of
    saying ai is 0 if in fact it is 1, and
    similarly saying it is 1 if in fact it is 0.
  • More sophisticated analysis later describes a
    model for determining probabilities of sensor
    errors.
  • Notation X state of nature (bomb or no bomb)
  • Y outcome (of sensor or entire inspection
    process).

33
Probability of Error for The Entire Tree
  • State of nature is zero (X 0), absence of a bomb

State of nature is one (X 1), presence of a bomb
Probability of false positive (P(Y1X0)) for
this tree is given by
Probability of false negative (P(Y0X1)) for
this tree is given by
P(Y1X0) P(YA1X0) P(YB1X0)
P(YA1X0) P(YB0X0) P(YC1X0)
P(Y0X1) P(YA0X1) P(YA1X1)
P(YB0X1)P(YC0X1)
34
Cost Function used for Evaluating the Decision
Trees.
  • CTot CFalsePositive PFalsePositive
    CFalseNegative PFalseNegative Cutil

CFalsePositive is the cost of false positive
(Type I error) CFalseNegative is the cost of
false negative (Type II error) PFalsePositive is
the probability of a false positive
occurring PFalseNegative is the probability of a
false negative occurring Cutil is the cost of
utilization of the tree.
The error probability of the entire tree is
computed from the error probabilities of the
individual sensors.
35
Cost Function used for Evaluating the Decision
Trees.
Cutil is the cost of utilization of the
tree. Simplest assumption Cutil is the
expected sum of unit costs associated with the
tree. Count unit cost of each sensor each time it
is used. Use P(X 1) and probability of errors
at each type of sensor to calculate expected
value. Later models for distribution of
attributes of containers and more sophisticated
analysis of expected cost of utilizing the tree,
bringing in delay costs.
36
Stroud Saeger Experiments
  • Stroud-Saeger ranked all trees formed
  • from 3 or 4 sensors A, B, C and D
  • according to increasing tree costs.
  • Used cost function defined above.
  • Values used in their experiments
  • CA .25 P(YA1X1) .90 P(YA1X0) .10
  • CB 10 P(YC1X1) .99 P(YB1X0) .01
  • CC 30 P(YD1X1) .999 P(YC1X0) .001
  • CD 1 P(YD1X1) .95 P(YD1X0) .05
  • Here, Ci cost of utilization of sensor i.
  • Also fixed were CFalseNegative, CFalsePositive,
    P(X1)

37
Stroud Saeger Experiments Our Sensitivity
Analysis
  • We have explored sensitivity of the Stroud-Saeger
    conclusions to variations in values of these
    three parameters.
  • We estimated high and low values for these
    parameters.
  • We chose one of the values from the interval of
    values and then explored the highest ranked tree
    as the other two were chosen at random in the
    interval of values. 10,000 experiments for each
    pair of fixed values.
  • We looked for the variation in the top-ranked
    tree and how the top-rank related to choice of
    parameter values.
  • Very surprising results.

38
Stroud Saeger Experiments Our Sensitivity
Analysis
  • CFalseNegative was varied between 25 million and
    10 billion dollars
  • Low and high estimates of direct and indirect
    costs incurred due to a false negative.
  • CFalsePositive was varied between 180 and 720
  • Cost incurred due to false positive
  • (4 men (3 -6 hrs) (15 30 /hr)
  • P(X1) was varied between 1/10,000,000 and
    1/100,000

39
Stroud Saeger Experiments Sensitivity Analysis
  • First set of experiments 3 attributes or types
    of sensors, A, B, C.
  • Extensive computer experimentation.

40
Frequency of Top-ranked Trees when CFalseNegative
and CFalsePositive are Varied
  • 10,000 randomized experiments (randomly selected
    values of CFalseNegative and CFalsePositive from
    the specified range of values) for the median
    value of P(X1).
  • The above graph has frequency counts of the
    number of experiments when a particular tree was
    ranked first or second, or third and so on.
  • Only three trees (7, 55 and 1) ever came first. 6
    trees came second, 10 came third, 13 came fourth.

41
Frequency of Top-ranked Trees when CFalseNegative
and P(X1) are Varied
  • 10,000 randomized experiments for the median
    value of CFalsePositive.
  • Only 2 trees (7 and 55) ever came first. 4 trees
    came second. 7 trees came third. 10 and 13 trees
    came 4th and 5th respectively.

42
Frequency of Top-ranked Trees when P(X1) and
CFalsePositive are Varied
  • 10,000 randomized experiments for the median
    value of CFalseNegative.
  • Only 3 trees (7, 55 and 1) ever came first. 6
    trees came second. 10 trees came third. 13 and 16
    trees came 4th and 5th respectively.

43
Most Frequent Tree Groups Attaining the Top Three
Ranks.
  • Trees 7, 9 and 10

All the three decision trees have been generated
from the same boolean expression 00000111
representing F(000)F(001)F(111) Both Tree 9 and
Tree 10 are ranked second and third more than 99
of the times when Tree 7 is ranked first.
44
Most Frequent Tree Groups Attaining the Top Three
Ranks
  • Trees 55, 57 and 58

The boolean expression for these three decision
trees is 01111111 Tree ranked 57 is second 96 of
the times and tree 58 is third 79 of the times
when tree 55 is ranked first.
45
Most Frequent Tree Groups Attaining the Top Three
Ranks
  • Trees 1, 3, and 2

The boolean expression for these three decision
trees is 00000001 Tree 3 is ranked second 98 of
times and tree 2 is ranked third 80 of the
times when tree 1 is ranked first.
46
Values of CFalseNegative and CFalsePositive when
Tree 7 was Ranked First
  • This is a graph of CFalsePositive against
    CFalseNegative values obtained from the
    randomized experiments. The black dots represent
    points at which tree 7 scored first rank.

47
Values of CFalseNegative and CFalsePositive when
Tree 55 was Ranked First
  • Tree 55 fills up the lower area in the range of
    CFalseNegative and CFalsePositive values.

48
Values of CFalseNegative and CFalsePositive when
Tree 1 was Ranked First
  • Tree 1 fills up the major area in the range of
    CFalseNegative and CFalsePositive.

49
Values of CFalseNegative and CFalsePositive for
the Three First Ranked Trees
  • Trees 7, 55 and 1 fill up the entire area in the
    range of CFalseNegative and CFalsePositive among
    themselves.

50
Values of CTot, CFalseNegative and CFalsePositive
for First Ranked Trees
  • This graph shows total costs for trees 7, 55 and
    1 in the respective regions in which they were
    ranked first.
  • Each trees total cost is a hyperplane which cuts
    other hyperplanes as it gains and then loses
    first rank.

51
Values of CTot, CFalseNegative and CFalsePositive
for Trees 1, 7 and 55 (Even When They Were not
Ranked First).
  • This graph shows the extended CTot hyperplanes
    for trees 7, 55 and 1 for all regions.

52
Values of CFalseNegative and P(X1) when Tree 7
was Ranked First
  • Tree 7 again fills up the major area in the range
    of CFalseNegative and P(X1).

53
Values of CFalseNegative and P(X1) when Tree 55
was Ranked First
  • Tree 55 fills up the rest of the area in the
    range of CFalseNegative and P(X1).

54
Values of CFalseNegative and P(X1) for First
Ranked Trees
  • Together trees 7 and 55 fill up the entire region
    of CFalseNegative and P(X1).

55
Variations of CTot, CFalseNegative and P(X1) for
First Ranked Trees
  • This graph has CTot on the 3rd axis for trees 7
    and 55 in the respective regions in which they
    were most optimal.
  • Each trees total cost is a conic surface.

56
Values of CFalsePositive and P(X1) When Tree 7
was Ranked First
  • Tree 7 fills up the major area in the range of
    CFalsePositive and P(X1).

57
Values of CFalsePositive and P(X1) when Tree 55
was Ranked First
  • Tree 55 fills up the lower area in the range of
    CFalsePositive and P(X1).

58
Values of CFalsePositive and P(X1) when Tree 1
was Ranked First
  • Tree 1 fills up the major area in the range of
    CFalsePositive and P(X1).

59
Values of CFalsePositive and P(X1) for First
Ranked Trees
  • Trees 7, 55 and 1 fill up the entire area in the
    range of CFalsePositive and P(X1) among
    themselves.

60
Values of CTot, CFalsePositive and P(X1) for
First Ranked Trees
  • This graph shows total costs for trees 7, 55 and
    1 in the respective regions in which they were
    most optimal.
  • Each trees total cost is a hyperplane which cuts
    other hyperplanes as it gains and then loses
    first rank.

61
Modeling Sensor Errors
  • One Approach to Sensor Errors Modeling Sensor
    Operation
  • Threshold Model
  • Sensors have different discriminating power
  • Many use counts (e.g., Gamma radiation counts)
  • See if count exceeds
  • threshold
  • If so, say attribute is present.

62
Modeling Sensor Errors
  • Threshold Model
  • Sensor i has discriminating power Ki,
    threshold Ti
  • Attribute present if counts exceed Ti
  • Calculate fraction of objects in each category
    whose readings exceed T
  • Seek threshold values that minimize all costs
    inspection, false positive/negative
  • Assume readings of category 0 containers follow a
    Gaussian distribution and similarly category 1
    containers
  • Simulation approach

63
Probability of Error for Individual Sensors
  • For ith sensor, the type 1 (P(Yi1X0)) and type
    2 (P(Yi0X1)) errors are modeled using Gaussian
    distributions.
  • State of nature X0 represents absence of a bomb.
  • State of nature X1 represents presence of a
    bomb.
  • ?i represents the outcome (count) of sensor i.
  • Si is variance of the distributions

Ki
?i
64
Modeling Sensor Errors
The probability of false positive for the ith
sensor is computed as P(Yi1X0) 0.5
erfcTi/v2 The probability of detection for the
ith sensor is computed as P(Yi1X1) 0.5
erfc(Ti-Ki)/(Sv2) erfc complementary error
function erfc(x) ?(1/2,x2)/sqrt(?)
The following experiments have been done using
sensors A, B, C and using KA 4.37 SA 1 KB
2.9 SB 1 KC 4.6 SC 1 We then varied
the individual sensor thresholds TA, TB and TC
from -4.0 to 4.0 in steps of 0.4. These values
were chosen since they gave us an ROC curve
(see later for the individual sensors over a
complete range P(Yi1X0) and P(Yi1X1)
65
Frequency of First Ranked Trees for Variations in
Sensor Thresholds
  • 68,921 experiments were conducted, as each Ti was
    varied through its entire range.
  • The above graph has frequency counts of the
    number of experiments when a particular tree was
    ranked first. There are 15 such trees. Tree 37
    had the highest frequency of attaining rank one.

66
Stroud Saeger Experiments Our Sensitivity
Analysis 4 Sensors
  • Second set of computer experiments 4
    attributes or types of sensors, A, B, C, D.
  • Same values as before.
  • Experiment 1 Fix values of two of
    CFalseNegative, CFalsePositive, P(X1) and vary
    the third.
  • Experiment 2 Fix a value of one of
    CFalseNegative, CFalsePositive, P(X1) and vary
    the other two through their interval of possible
    values. Do 10,000 experiments each time.
  • Look for the variation in the highest ranked
    tree.

67
Stroud Saeger Experiments Our Sensitivity
Analysis 4 Sensors
  • Experiment 1 Fix values of two of
    CFalseNegative, CFalsePositive, P(X1) and vary
    the third.

68
CTot vs CFalseNegative for Ranked 1 Trees (Trees
11485(9651) and 10129(349))
Only two trees ever were ranked first, and one,
tree 11485, was ranked first in 9651 out of
10,000 runs.
69
CTot vs CFalsePositive for Ranked 1 Trees (Tree
no. 11485 (10000))
One tree, number 11485, was ranked first every
time.
70
CTot vs P(X1) for Ranked 1 Trees (Tree no.
11485(8372), 10129(488), 11521(1056))
Three trees dominated first place. Trees
10201(60), 10225(17) and 10153(7) also achieved
first rank but with relatively low frequency.
71
Tree Structure and corresponding Boolean
Expressions
Tree number 11485 Boolean Expr 0101011101111111
Tree number 10129 Boolean Expr 0001011101111111
72
Stroud Saeger Experiments Our Sensitivity
Analysis 4 Sensors
  • Experiment 2 Fix the values of one of
    CFalseNegative, CFalsePositive, P(X1) and vary
    the others.

73
Frequency of First Ranked Trees when Two
Parameters (CFalseNegative and CFalsePositive)
were Varied Keeping P(X1) Constant at Randomly
Selected Values.
  • 10,000 randomized experiments with randomly
    selected values of P(X1)
  • The experiments were repeated for 20 different
    randomly selected values of P(X1)

74
Frequency of First Ranked Trees when Two
Parameters (CFalseNegative and P(X1)) were
Varied Keeping CFalsePositive Constant at
Randomly Selected Values.
  • 10,000 randomized experiments with randomly
    selected values of CFalsePositive
  • The experiments were repeated for 20 different
    randomly selected values of CFalsePositive

75
Frequency of First Ranked Trees when Two
Parameters (P(X1) and CFalsePositive) were
Varied Keeping CFalseNegative Constant at
Randomly Selected Values.
  • 10,000 randomized experiments with randomly
    selected values of CFalseNegative
  • The experiments were repeated for 20 different
    randomly selected values of CFalseNegative

76
Variation of CTot wrt CFalseNegative and
CFalsePositive, for Tree Ranked First (Tree nos.
11485 and 10129)
CTot CFalsePositive P(X0)P(Y1X0)
CFalseNegative P(X1)P(Y0X1) Cutil
77
Variation of CTot wrt CFalseNegative and P(X1),
for Tree Ranked First(Tree no. 11485(8121),10129(7
28) and 11521(984))
Trees 505, 5105, 5129, 9541, 10153, 10201 and
10225 also attained rank 1, but with very low
frequency (lt100).
CTot CFalsePositive P(X0)P(Y1X0)
CFalseNegative P(X1)P(Y0X1) Cutil
78
Variation of CTot wrt CFalsePositive and P(X1),
for Tree Ranked First(Tree no. 11485(7162),10129(1
690) and 11521(851))
Trees 10153, 10201 and 10225 also attained first
rank, 80, 195 and 22 times respectively.
CTot CFalsePositive P(X0)P(Y1X0)
CFalseNegative P(X1)P(Y0X1) Cutil
79
Receiver Operating Characteristic (ROC) Curve
ROC Curve
  • The ROC curve is the plot of the probability of
    correct detection (PD) vs. the probability of
    false positive (PF).
  • The ROC curve is used to select an operating
    point, which provides the tradeoff between the PD
    and PF
  • Each sensor has a ROC curve and the combination
    of the sensors into a decision tree has a
    composite ROC curve.
  • The parameter which is varied to get different
    operating points on the ROC curve is the sensor
    threshold and a combination of thresholds for the
    decision tree.
  • Equal Error Rate (EER) is the operating point on
    the ROC curve where PF 1 PD
  • We can use ROC curves to identify optimal
    thresholds for sensors.

80
Receiver Operating Characteristic (ROC) Curve
ROC Curve
  • We seek operating characteristics of sensors that
    place us in the upper left hand corner of the ROC
    curve.
  • Here, PF is small and PD is large.

81
Performance of Sensors Against that of Tree 37
(Most Frequent Tree Attaining Rank 1)
  • The black, blue and red dotted lines represent
    performance characteristics (ROC curve) of
    sensors A, B and C.
  • The green dots represent the performance
    characteristics (P(Y1X0), P(Y1X1)) of the
    tree over all combinations of sensor thresholds
    (Ti).

82
Performance of Sensors Against that of Tree 37
  • This zoomed-in figure of the ROC curve displays
    the region of high detection probabilities and
    low false positive probabilities.
  • Points lying on the diagonal line are the Equal
    Error Rates for this tree and the sensors. The
    tree achieves equal error rates of 0.0027 while
    sensors A, B and C have EERs of 0.0145, 0.0738,
    0.0107.

83
Best Possible ROC Curve for Tree 37
  • Assuming performance probabilities (P(Y1X1)
    and P(Y1X0)) to be monotonically related (in
    the sense that P(Y1X1) can be called a
    monotonic function of P(Y1X0)), we can find an
    ROC curve for the tree consisting of the set
    containing maximum P(Y1X1) value corresponding
    to given P(Y1X0) value.
  • The blue dots represent such an ROC curve, the
    best ROC curve for tree 37.

84
Conclusions from Sensitivity Analysis
  • Considerable lack of sensitivity to modification
    in parameters for trees using 3 or 4 sensors.
  • Very few optimal trees.
  • Very few boolean functions arise among optimal
    trees.

85
Some Complications
  • More complicated cost models bringing in costs
    of delays
  • More than two values of an attribute
  • (present, absent, present with probability gt 75,
    absent with probability at least 75)
  • (ok, not ok, ok with probability gt 99, ok with
    probability between 95 and 99)
  • Inferring the boolean function from observations
    (partially defined boolean functions)

86
Some Research Challenges
  • Explain why conclusions are so insensitive to
    variation in parameter values.
  • Explore the structure of the optimal trees and
    compare the different optimal trees.
  • Develop less brute force methods for finding
    optimal trees that might work if there are more
    than 4 attributes.
  • Develop methods for
  • approximating the optimal tree.

Pallet vacis
87
Closing Remark
  • Recall that the cost of inspection includes the
    cost of failure, including failure to foil a
    terrorist plot.
  • There are many ways to lower the total cost of
    inspection
  • Use more efficient
  • orders of inspection.
  • Find ways to inspect
  • more containers.
  • Find ways to cut down
  • on delays at inspection lanes.

88
Research Team
  • Saket Anand, Rutgers, ECE graduate student
  • Endre Boros, Rutgers, Operations Research
  • Elsayed Elsayed, Rutgers, Ind. Systems
    Engineering
  • Liliya Fedzhora, Rutgers, Operations Res. grad.
    student
  • Paul Kantor, Rutgers, Schl. of Infor. Library
    Studies
  • Abdullah Karaman, Rutgers Ind. Syst. Eng. grad.
    student
  • Alex Kogan, Rutgers, Business School
  • Paul Lioy, Rutgers/UMDNJ, Environmental and
    Occupational Health and Sciences Institute
  • David Madigan, Rutgers, Statistics
  • Richard Mammone, Rutgers, Center for Advanced
    Information Processing
  • S. Muthukrishnan, Rutgers, Computer Science
  • Saumitr Pathek, Rutgers ECE graduate student
  • Richard Picard, Los Alamos, Statistical Sciences
    Group
  • Fred Roberts, Rutgers, DIMACS Center
  • Kevin Saeger, Los Alamos, Homeland Security
  • Phillip Stroud, Los Alamos, Systems Engineering
    and Integration Group
  • Hao Zhang, Rutgers Ind. Systems Eng., graduate
    student

89
  • Collaborators on Sensitivity Analysis
  • Saket Anand
  • David Madigan
  • Richard Mammone
  • Saumitr Pathak
  • Research Support
  • Office of Naval Research
  • National Science Foundation
  • Los Alamos National Laboratory
  • Rick Picard
  • Kevin Saeger
  • Phil Stroud
Write a Comment
User Comments (0)
About PowerShow.com