Creating Data Representations - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Creating Data Representations

Description:

Ensuring Consistency. Assume a BPN with a training set including the exemplars (a, ... Consistency ... Ensuring Consistency. Alternatively, we can alter the ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 25
Provided by: marcpo9
Category:

less

Transcript and Presenter's Notes

Title: Creating Data Representations


1
Creating Data Representations
  • Another way of representing n-ary data in a
    neural network is using one neuron per feature,
    but scaling the (analog) value to indicate the
    degree to which a feature is present.
  • Good examples
  • the brightness of a pixel in an input image
  • the distance between a robot and an obstacle
  • Poor examples
  • the letter (1 26) of a word
  • the type (1 6) of a chess piece

2
Creating Data Representations
  • This can be explained as follows
  • The way NNs work (both biological and artificial
    ones) is that each neuron represents the
    presence/absence of a particular feature.
  • Activations 0 and 1 indicate absence or presence
    of that feature, respectively, and in analog
    networks, intermediate values indicate the extent
    to which a feature is present.
  • Consequently, a small change in one input value
    leads to only a small change in the networks
    activation pattern.

3
Creating Data Representations
  • Therefore, it is appropriate to represent a
    non-binary feature by a single analog input value
    only if this value is scaled, i.e., it represents
    the degree to which a feature is present.
  • This is the case for the brightness of a pixel or
    the output of a distance sensor (feature
    obstacle proximity).
  • It is not the case for letters or chess pieces.
  • For example, assigning values to individual
    letters (a 0, b 0.04, c 0.08, , z 1)
    implies that a and b are in some way more similar
    to each other than are a and z.
  • Obviously, in most contexts, this is not a
    reasonable assumption.

4
Creating Data Representations
  • It is also important to notice that, in
    artificial (not natural!), completely connected
    networks the order of features that you specify
    for your input vectors does not influence the
    outcome.
  • For the network performance, it is not necessary
    to represent, for example, similar features in
    neighboring input units.
  • All units are treated equally neighborhood of
    two neurons does not imply to the network that
    these represent similar features.
  • Of course once you specified a particular order,
    you cannot change it any more during training or
    testing.

5
Creating Data Representations
  • If you wanted to represent the state of each
    square on the tic-tac-toe board by one analog
    value, which would be the better way to do this?
  • ltemptygt 0
  • X 0.5
  • O 1

X 0 ltemptygt 0.5 O 1
Not a good scale!Goes from neutral
tofriendly and thenhostile.
More natural scale!Goes from friendly
toneutral and thenhostile.
6
Representing Time
  • So far we have only considered static data, that
    is, data that do not change over time.
  • How can we format temporal data to feed them into
    an ANN in order to detect spatiotemporal patterns
    or even predict future states of a system?
  • The basic idea is to treat time as another input
    dimension.
  • Instead of just feeding the current data (time
    t0) into our network, we expand the input vectors
    to contain n data vectors measured at t0, t0 -
    ?t, t0 - 2?t, t0 - 3?t, , t0 (n 1)?t.

7
Representing Time
  • For example, if we want to predict stock prices
    based on their past values (although other
    factors also play a role)

t
0
8
Representing Time
  • In this case, our input vector would include
    seven components, each of them indicating the
    stock values at a particular point in time.
  • These stock values have to be normalized, i.e.,
    divided by 1,000, if that is the estimated
    maximum value that could occur.
  • Then there would be a hidden layer, whose size
    depends on the complexity of the task.
  • And there could be exactly one output neuron,
    indicating the stock price after the following
    time interval (to be multiplied by 1,000).

9
Representing Time
  • For example, a backpropagation network could do
    this task.
  • It would be trained with many stock price samples
    that were recorded in the past so that the price
    for time t0 ?t is already known.
  • This price at time t0 ?t would be the desired
    output value of the network and be used to apply
    the BPN learning rule.
  • Afterwards, if past stock prices indeed allow the
    prediction of future ones, the network will be
    able to give some reasonable stock price
    predictions.

10
Representing Time
  • Another example
  • Let us assume that we want to build a very simple
    surveillance system.
  • We receive bitmap images in constant time
    intervals and want to determine for each quadrant
    of the image if there is any motion visible in
    it, and what the direction of this motion is.
  • Let us assume that each image consists of 10 by
    10 grayscale pixels with values from 0 to 255.
  • Let us further assume that we only want to
    determine one of the four directions N, E, S, and
    W.

11
Representing Time
  • As said before, it makes sense to represent the
    brightness of each pixel by an individual analog
    value.
  • We normalize these values by dividing them by
    255.
  • Consequently, if we were only interested in
    individual images, we would feed the network with
    input vectors of size 100.
  • Let us assume that two successive images are
    sufficient to detect motion.
  • Then at each point in time, we would like to feed
    the network with the current image and the
    previous image that we received from the camera.

12
Representing Time
  • We can simply concatenate the vectors
    representing these two images, resulting in a
    200-dimensional input vector.
  • Therefore, our network would have 200 input
    neurons, and a certain number of hidden units.
  • With regard to the output, would it be a good
    idea to represent the direction (N, E, S, or W)
    by a single analog value?
  • No, these values do not represent a scale, so
    this would make the network computations
    unnecessarily complicated.

13
Representing Time
  • Better solution
  • 16 output neurons with the following
    interpretation

This way, the network can, in a straightforward
way, indicate the direction of motion in each
quadrant (Q1, Q2, Q3, and Q4). Each output value
could specify the amount (or speed?) of the
corresponding type of motion.
14
Exemplar Analysis
  • When building a neural network application, we
    must make sure that we choose an appropriate set
    of exemplars (training data)
  • The entire problem space must be covered.
  • There must be no inconsistencies
    (contradictions) in the data.
  • We must be able to correct such problems
    without compromising the effectiveness of the
    network.

15
Ensuring Coverage
  • For many applications, we do not just want our
    network to classify any kind of possible input.
  • Instead, we want our network to recognize whether
    an input belongs to any of the given classes or
    it is garbage that cannot be classified.
  • To achieve this, we train our network with both
    classifiable and garbage data (null
    patterns).
  • For the the null patterns, the network is
    supposed to produce a zero output, or a
    designated null neuron is activated.

16
Ensuring Coverage
  • In many cases, we use a 11 ratio for this
    training, that is, we use as many null patterns
    as there are actual data samples.
  • We have to make sure that all of these exemplars
    taken together cover the entire input space.
  • If it is certain that the network will never be
    presented with garbage data, then we do not
    need to use null patterns for training.

17
Ensuring Consistency
  • Sometimes there may be conflicting exemplars in
    our training set.
  • A conflict occurs when two or more identical
    input patterns are associated with different
    outputs.
  • Why is this problematic?

18
Ensuring Consistency
  • Assume a BPN with a training set including the
    exemplars (a, b) and (a, c).
  • Whenever the exemplar (a, b) is chosen, the
    network adjust its weights to present an output
    for a that is closer to b.
  • Whenever (a, c) is chosen, the network changes
    its weights for an output closer to c, thereby
    unlearning the adaptation for (a, b).
  • In the end, the network will associate input a
    with an output that is between b and c, but is
    neither exactly b or c, so the network error
    caused by these exemplars will not decrease.
  • For many applications, this is undesirable.

19
Ensuring Consistency
  • To identify such conflicts, we can apply a
    (binary) search algorithm to our set of
    exemplars.
  • How can we resolve an identified conflict?
  • Of course, the easiest way is to eliminate the
    conflicting exemplars from the training set.
  • However, this reduces the amount of training data
    that is given to the network.
  • Eliminating exemplars is the best way to go if it
    is found that these exemplars represent invalid
    data, for example, inaccurate measurements.
  • In general, however, other methods of conflict
    resolution are preferable.

20
Ensuring Consistency
  • Another method combines the conflicting patterns.
  • For example, if we have exemplars
  • (0011, 0101),(0011, 0010),
  • we can replace them with the following single
    exemplar
  • (0011, 0111).
  • The way we compute the output vector of the new
    exemplar based on the two original output vectors
    depends on the current task.
  • It should be the value that is most similar (in
    terms of the external interpretation) to the
    original two values.

21
Ensuring Consistency
  • Alternatively, we can alter the representation
    scheme.
  • Let us assume that the conflicting measurements
    were taken at different times or places.
  • In that case, we can just expand all the input
    vectors, and the additional values specify the
    time or place of measurement.
  • For example, the exemplars
  • (0011, 0101),(0011, 0010)
  • could be replaced by the following ones
  • (100011, 0101),(010011, 0010).

22
Ensuring Consistency
  • One advantage of altering the representation
    scheme is that this method cannot create any new
    conflicts.
  • Expanding the input vectors cannot make two or
    more of them identical if they were not identical
    before.

23
Training and Performance Evaluation
  • How many samples should be used for training?
  • Heuristic At least 5-10 times as many samples as
    there are weights in the network.
  • Formula (Baum Haussler, 1989)
  • P is the number of samples, W is the number of
    weights to be trained, and a is the desired
    accuracy (e.g., proportion of correctly
    classified samples).

24
Training and Performance Evaluation
  • What learning rate ? should we choose?
  • The problems that arise when ? is too small or to
    big are similar to the Adaline.
  • Unfortunately, the optimal value of ? entirely
    depends on the application.
  • Values between 0.1 and 0.9 are typical for most
    applications.
  • Often, ? is initially set to a large value and is
    decreased during the learning process.
  • Leads to better convergence of learning, also
    decreases likelihood of getting stuck in local
    error minimum at early learning stage.
Write a Comment
User Comments (0)
About PowerShow.com