Neural Network Training - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Neural Network Training

Description:

You can break up the hydrophobicity scale into ranges, and assign amino acids on ... to represent each residue by a real number (e.g. hydrophobicity index) ... – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 31

Provided by: randyz

Category:

more less

Transcript and Presenter's Notes

Title: Neural Network Training

1
Lecture 4

Neural Network Training Applications
Chapts. 6-9
Wu McLarty

2
Input/Output Design

Choices of methods for encoding inputs and
decoding outputs drive the design of a neural
network application.
Training against amino acid sequence data is
perhaps the most complex challenge, as
applications may involve coding both sequence and
physicochemical properties.

3
Amino Acid Properties

Variation in amino acid properties is found in
the twenty different sidechains.
The sidechains vary in
Size, ranging from 1 atom (Glycine) to 16 atoms
(Tryptophan, Arginine), volume from 60.1 A3
(Glycine) to 227.8 A3 (Tryptophan).
Polarity, ranging from hydrophobic
(Phenylalanine, Valine, etc.), to neutral polar
(Serine, Threonine, Tyrosine, etc.) to ionized
(Glutamate, Lysine, etc.)

4
Protein Folding

The varying properties of a the sidechains
support the phenomenon called protein folding
some sequences can adopt stable three-dimensional
conformations called folds. The fold lends the
sequence a distinct 3D shape, which may include
functionally-important features such as crevices
(which may be binding sites for smaller ligand
molecules).

5
Describing Protein Conformation
http//swissmodel.expasy.org/course/text/chapter1.
htm
6
Some Principles of Protein Folding

The fold will tend to sequester hydrophobic
sidechains from solvent, while keeping polar
sidechains well-solvated.
Hydrophobic sidechains cluster to form the
hydrophobic core
Backbone hydrogen bond donors and acceptors often
become partly or fully desolvated upon folding
this potential free-energy penalty is avoided by
the preference to form regular secondary
structures (local folds) with regular patterns
of hydrogen bonding. These structures compensate
for the loss of hydrogen bonds to solvent.

7
Categories of secondary (local) structure
http//swissmodel.expasy.org/course/text/chapter1.
htm
8
Alpha Helix
9
Beta Sheet
10
Beta Sheet Topologies
11
Antiparallel Example
12
Turns
13
Secondary Structure Propensity

A given amino acid residue is more or less
compatible with a particular secondary
structure class.
AAs in turns tend to be small and possibly polar.
Proline is excluded from the interior of helical
segments, may be found at the ends
Leucine often found as interior resides in
helices.
Propensity can be represented using numerical
indices calculated from statistics determined for
experimental protein structures (e.g. Chou-Fasman
rules).

14
What Properties to Encode?

One can work directly with the 20 amino acids
use a 20-character alphabet directly for input.
One can use physicochemical properties to
represent an amino acid for example, a separate
index representing the hydrophobicity of each
residue in a window, using a particular scale
(e.g. Kyte-Doolittle, Engelman, Eisenberg). These
are all coded as real numbers.
You can break up the hydrophobicity scale into
ranges, and assign amino acids on this basis.
Or, use a reduced alphabet, based on similar
physical properties, or statistical propensity to
cross-substitute under selective pressure (as
expressed in the PAM matrices).

15
Reduced Alphabets
16
Sequence Encoding

Direct Encoding - code each sequence character as
a vector
Maintains relative positions of sequence
characters, no loss of information
Drawback forced to scan the sequence with a
window of fixed length
Usually use an indicator vector, a string of 0s
and a single 1, which specifies the identity of
the residue
Also possible to use binary numbers (e.g. A00,
T01, G10, C11), although this denser
representation appears not to work as well as the
indicator approach.
Also possible to represent each residue by a real
number (e.g. hydrophobicity index)

Indirect Encoding - Use the entire sequence to
generate the input
To include maximal information while restricting
the size of the input, use n-gram hashing.
Assign each residue an identity using a selected
alphabet of length M.
Sliding a window of length n across the sequence,
and count the number of occurrences of each
n-tuple.
Input is of size Mn, where each input corresponds
to a possible n-tuple. The magnitude of each
input is the count accumulated for the
corresponding n-tuple.
Different kinds of measures can be combined for
example, an n-gram hash vector along with an
additional input that measures average
hydrophobic index over the entire sequence.

18
Input Trimming

In general, it is desirable to limit the size of
the input.
Smaller networks tend to generalize better
Smaller networks are faster to train
If inputs are correlated, they can be combined
into a aggregate descriptor. There are a number
of statistical methods to approach this,
including
Principle Component Analysis (PCA)
Singular Value Decomposition (SVD)
Partial Least Square regression (PLS)

19
Output

The simplest element to design
If N categories are being simultaneously
predicted, there will need to be N outputs.
A yes/no classification can be made for each
category using a threshold function, or the
output strengths can be used directly as measures
of confidence.

20
Network Design

As a general principle, networks should be kept
as small as possible, in terms of both numbers of
units and connections. Smaller networks are less
prone to overfitting, and thus generalize
better.
In network growing, the number of units in a
hidden layer is steadily increased, retraining at
each step, until the optimum performance of the
network is realized.
In network pruning, one begins with a large
network with good performance, and then applies
an automated method to identify connections with
small weights, and neurons with low activation.
Connections and neurons can then be culled,
reducing the size of the network, and presumably
its capacity to generalize.

21
Network Training

Learning rate can have a big impact on how
quickly an optimum set of parameters is found. A
poor choice of rate may make it impossible to
achieve convergence.
The backpropagation algorithm is sensitive to the
initial weights in some cases, convergence may
depend on the initial conditions.
In general, it is not necessary or desirable to
train a network until the error function reaches
a minimum. The network may generalize better is
training is stopped short of the minimum, by
application of a user-specified tolerance.

22
Training/Validation Sets

The most critical component of constructing a
neural network application
Training is hampered by uneven representation of
categories to be recognized, and by incorrect
annotation (e.g. bad examples).
Often negative examples outnumber positives by
orders of magnitude one often proceeds by
generating negatives that are randomized or noisy
versions of positive examples, thus maintaining a
balance between these two sets.

23
Evaluation

In cases where we assign outputs to discrete
categories, we can apply a number of standardized
measures of performance. These require that we
count up, for a given validation set,
True Positives (TP) examples that belong in the
positive category, and are correctly assigned.
False Positives (FP) negative examples that are
incorrectly assigned as positive.
True Negatives (TN) examples that belong in the
negative category, and are correctly assigned.
False Negatives (FN) positive examples that are
incorrectly assigned as negative.
Total Examples TP FP TN FN

24
Evaluation Measures

Sensitivity (correctly assigning the
positives)
Specificity (correctly assigning the negatives)

Positive Predictive Value (probability that a
positive assignment is correct)
Negative Predictive Value (probability that a
negative assignment is correct)

Accuracy (Probability of correct assignment,
positive or negative)

27
Chapters 9-11

These chapters provide a literature survey of
applications of neural network methods to
sequence analysis and protein structure
prediction.
Its very valuable resource.

28
Loops, etc. in Java

Lewis Loftus,
Chapt. 5

29
Loops, etc.

Its just like in C. We will not cover this in
any detail. The only interesting extras
Be aware that Java often uses boolean variables
to represent truth values. Recognize the special
symbols true 1 false 0
Be aware that the String class provides the
equals() method for comparing two strings,
character-by-character.

30
But, an interesting graphics example!

Chapter 5 also provides some more examples of GUI
design. The most important topic presented has to
do with how an ActionListener distinguishes
events generated by different objects.
We will look at the program in Listing 5.24,
QuoteOptions.

Write a Comment

User Comments (0)