EEL 4930 6 5930 5, Spring 06 Physical Limits of Computing - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

EEL 4930 6 5930 5, Spring 06 Physical Limits of Computing

Description:

EEL 4930 6 / 5930 5, Spring 06. Physical Limits of Computing. Slides for a course taught by ... Required Background Material in Computing & Physics. Fundamentals ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 63
Provided by: Michael2156
Learn more at: https://eng.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: EEL 4930 6 5930 5, Spring 06 Physical Limits of Computing


1
EEL 4930 6 / 5930 5, Spring 06Physical Limits
of Computing
http//www.eng.fsu.edu/mpf
  • Slides for a course taught byMichael P. Frankin
    the Department of Electrical Computer
    Engineering

2
Physical Limits of ComputingCourse Outline
Currently I am working on writing up a set of
course notes based on this outline,intended to
someday evolve into a textbook
  • Course Introduction
  • Moores Law vs. Modern Physics
  • Foundations
  • Required Background Material in Computing
    Physics
  • Fundamentals
  • The Deep Relationships between Physics and
    Computation
  • IV. Core Principles
  • The two Revolutionary Paradigms of Physical
    Computation
  • V. Technologies
  • Present and Future Physical Mechanisms for the
    Practical Realization of Information Processing
  • VI. Conclusion

3
Part II. Foundations
  • This first part of the course quickly reviews
    some key background knowledge that you will need
    to be familiar with in order to follow the later
    material.
  • You may have seen some of this material before.
  • Part II is divided into two chapters
  • Chapter II.A. The Theory of Information and
    Computation
  • Chapter II.B. Required Physics Background

4
Chapter II.A. The Theory of Information and
Computation
  • In this chapter of the course, we review a few
    important things that you need to know about
  • II.A.1. Combinatorics, Probability, Statistics
  • II.A.2. Information Communication Theory
  • II.A.3. The Theory of Computation

5
Section II.A.2. The Theory of Information and
Communication
  • This section is a gentle introduction to some of
    the basic concepts of information theory
  • also known as communication theory.
  • Sections
  • (a) Basic Concepts of Information
  • (b) Quantifying Information
  • (c) Information vs. Entropy
  • (d) Communication Channels
  • Later in the course, we will describe Shannons
    famous theorems concerning the fundamental limits
    of channel capacity.
  • As well as some newer, more general quantum
    limits on classical and quantum communication.

6
Subsection II.A.2.a Basic Concepts of
Information
  • Etymology of InformationVarious Senses of
    InformationInformation-Related Concepts

7
Etymology of Information
  • Earliest historical usage in English (from Oxford
    English Dictionary)
  • The act of informing,
  • As in education, instruction, training.
  • Five books come down from Heaven for information
    of mankind. (1387)
  • Or a particular item of training, i.e., a
    particular instruction.
  • Melibee had heard the great skills and reasons
    of Dame Prudence, and her wise informations and
    techniques. (1386)
  • Derived by adding the action noun ending ation
    (descended from Latins tio) to the
    pre-existing verb to inform,
  • Meaning to give form (shape) to the mind
  • to discipline, instruct, teach
  • Men so wise should go and inform their kings.
    (1330)
  • And inform comes from Latin informare, derived
    from noun forma (form),
  • Informare means to give form to, or to form an
    idea of.
  • Latin also even already contained the derived
    word informatio,
  • meaning concept or idea.
  • Note The Greek words e?d?? (eídos) and µ??f?
    (morphé),
  • Meaning form, or shape,
  • were famously used by Plato ( later Aristotle)
    in a technical philosophical sense, to denote the
    true identity or ideal essence of something.
  • Well see that our modern concept of physical
    information is not too dissimilar!

8
Information Our Definition
  • Information is that which distinguishes one thing
    (entity) from another.
  • It is (all or part of) an identification or
    description of the thing.
  • It is a specification of (some or all of) the
    things properties or characteristics.
  • We can consider that every thing carries or
    embodies a complete description of itself.
  • It does this simply in virtue of its own being,
    its own existence.
  • In philosophy, this inherent description is
    called the entitys form or constitutive essence.

9
Specific Senses of Information
  • But, let us also take care to distinguish between
    the following uses of information
  • A form or pattern of information
  • An abstract configuration of information, as
    opposed to a specific instantiation.
  • Many separate instances of information contained
    in separate objects may have identical patterns,
    or content.
  • We may say that those instances are copies of
    each other.
  • An instance or copy of information
  • A specific instantiation (i.e., as found in a
    specific entity) of some general form.
  • A holder or slot or location for storing
    information
  • An indefinite or variable (mutable) instance of
    information (or place where instances may be)
    that may take on different forms at different
    times or in different situations.
  • A wraith (pulse? cloud?) of information
  • A physical state or set of states, dynamically
    changing over time.
  • A moving, constantly-mutating instance of
    information, where the container of that
    information may even transition from one physical
    system to another.
  • A stream of information
  • An indefinitely large instance of information,
    extended over time
  • A piece of information
  • All or a portion of a pattern, instance, wraith,
    or stream of information.
  • A nugget of information
  • A piece of information, together with an
    associated semantic interpretation of that
    information.
  • A nugget is often implicitly a valuable,
    important fact (the meaning of a fact is a true
    statement).

10
Information-related concepts
  • It will also be convenient to discuss the
    following
  • A container or embodiment of information
  • A physical system that contains some particular
    instance of, placeholder for, or pulse of
    information. (An embodiment contains nothing but
    that.)
  • A symbol or message
  • A form or instance of information or its
    embodiment produced with the intent that it
    should convey some specific meaning, or semantic
    content.
  • A message is typically a compound object
    containing a number of symbols.
  • An interpretation or meaning of information
  • A particular semantic interpretation of a form
    (pattern of information), tying it to potentially
    useful facts of interest.
  • May or may not be the originally intended
    meaning!
  • A representation of information
  • An encoding of one pattern of information within
    some other (frequently larger) pattern.
  • The representation goes according to some
    particular language or code.
  • A subject of information
  • An entity that is identified or described by a
    given pattern of information.
  • May be abstract or concrete, mathematical or
    physical

11
Information Concept Map
Meaning (interpretationof information)
Nugget (valuablepiece of information)
Has a
Describes, identifies
Interpretedto get
Representedby
Has a
Measures
Quantity ofinformation
Thing (subjector embodiment)
Form (pattern ofinformation)
Measures size of
Has a
Measures
Instantiatedby/in
Maybe a
Instantiates,has
Measures
Forms, composes
Contains, carries, embodies
Has a changing
Wraith (dynamic body of information,changing
cloud of states)
Physicalentity orsystem
Instance (copyof a form)
Can move from one to another
12
Example A Byte in a Register
  • The bit-sequence 01000001 is a particular form.
  • Suppose there is an instance of this particular
    pattern of bits in a certain machine register in
    the computer on my desk.
  • The register hardware is a physical system that
    is a container of this particular instance.
  • The physical subsystem delimited by the high and
    low ranges of voltage levels on the registers
    storage nodes embodies this sequence of bits.
  • The register could hold other forms as well it
    provides a holder that can contain an instance of
    any form that is a sequence of 8 bits.
  • When the register is erased later, the specific
    wraith of information that it contained will not
    be destroyed, but will only be released into the
    environment.
  • Although in an altered form which is scrambled
    beyond all hope of recognition.
  • The instance of 01000001 contained in the
    register at this moment happens to be intended to
    represent the letter A (which is another form,
    and is a symbol)
  • The meaning of this particular instance of the
    letter A is that a particular students grade in
    this class (say, Joe Smiths) is an A.
  • The valuable nugget of information which is the
    fact that Joe has an A is also represented in
    this register.
  • The subject of this particular piece of
    information is Joes grade in the class.
  • The quantity of information contained in the
    machine register is Log(256) 1 byte 8 bits,
    because the slot could hold any of 256 different
    forms (bit patterns).
  • But in the context of my grading application, the
    quantity of information contained in the message
    A is only Log(5) 2.32 bits, since only 5
    grades (A,B,C,D,F) are allowed.
  • The size of the form A is 8 bits in the context
    of the encoding being used.

13
Subsection II.A.2.b Quantifying Information
  • Capacity of Compound Holders
  • Logarithmic Information Measures
  • Indefinite Logarithms
  • Logarithmic Units

14
Quantifying Information
  • One way to quantify forms of information is to
    try to count how many distinct ones there are.
  • Unfortunately, the number of all conceivable
    forms is infinite.
  • However, we can count the forms in particular
    finite subsets
  • Consider a situation defined in such a way that a
    given information holder (in the context of that
    situation) can only take on forms that are chosen
    from among some definite, finite number N of
    possible distinct forms.
  • One way to try to characterize the informational
    size or capacity of the holder (the amount of
    information in it) would then be to simply
    specify the value of N, the number of forms it
    could have.
  • This would describe the amount of variability of
    its form.
  • However, the raw number N by itself does not seem
    to have the right mathematical properties to
    characterize the size of the holder in an
    intuitive way
  • Intuition tells us that the capacity of holders
    should be additive
  • E.g., it is intuitively clear that two pages of a
    book should be able to hold twice as much
    information as one page.

15
Compound Holders
  • Consider a holder of information C that is
    composed by taking two separate and independent
    holders of information A, B, and considering them
    together as constituting a single compound holder
    of information.
  • Suppose now also that A has N possible forms, and
    that B has M possible forms.
  • Clearly then, due to the product rule of
    combinatorics, C as a whole has NM possible
    distinct forms.
  • Each is obtained by assigning a form to A and a
    form to B independently.
  • But should the size of the holder C be the
    product of the sizes of A and B?
  • It would seem more natural to say sum,so that
    the whole is the sum of the parts.
  • How can we arrange for this to be true?

Holder C Has NM forms
Holder A
Holder B
N possibleforms
M possibleforms
16
Measuring Information with Logarithms
  • Fortunately, we can convert the product to a sum
    by using logarithmic units for measuring
    information.
  • Due to the rule about logarithms that log(NM)
    log(N) log(M).
  • So, if we declare that the size or capacity or
    amount of information in a holder of information
    is defined to be the logarithm of the number of
    different forms it can have,
  • Then we can say, the size of the compound holder
    C is the sum of the sizes of the sub-holders A
    and B that it comprises.
  • Only problem What base do we use for the
    logarithm here?
  • Different bases would give different numeric
    answers.
  • Any base could be chosen by convention, but would
    be arbitrary.
  • A choice of a particular base amounts to choosing
    an information unit.
  • Arguably, the most elegant answer is
  • Leave the base unspecified, and declare that an
    amount of information is not a number, but rather
    is a dimensioned indefinite logarithm quantity.

17
Indefinite Logarithms
  • Definition. Indefinite logarithm. For any real
    number xgt0, let the indefinite logarithm of x,
    written Logx, be defined as
  • Logx ?b.logbx
  • In other words, the value of Logx is a function
    object with one argument (b), where this function
    takes any value of the base b (gt1) and returns
    the resulting value of logb x.
  • E.g., Log256 ?b.logb 256 (a function of 1
    argument),
  • So for example, Log256(2) 8 and Log256(16)
    2
  • Sums, negations, and scalar multiples of
    indefinite logarithm objects can be defined quite
    naturally,
  • by simply operating on the value of the
    lambda-function in the corresponding way.
  • See the paper The Indefinite Logarithm,
    Logarithmic Units, and the Nature of Entropy in
    the readings for details.

(using lambda- calculus notation)
18
Indefinite Logarithms as Curves
  • The object LogN can also be identified with the
    entire curve or graph (point set) (b, logb N)
    b gt 1.

Note The Log1curve is 0 every-where, the
Log4curve is twice ashigh as the
Log2curve, Log10 isLog2Log5 andis
also equal to (log210)?Log2 3.322?Log2 In
general, each curve is just some constant
multiple of each other curve!
Larger valuesof the argument N
Log10
Log4
Log3
Log2
Log1
19
Indefinite Exponential
  • The inverse of the indefinite logarithm function
    could be called the indefinite exponential.
  • Definition. Indefinite exponential. Given any
    indefinite logarithm object L, the indefinite
    exponential of L, written ExpL, is defined
    by ExpL bL(b)
  • where b gt 0 may be any real number.
  • This definition is meaningful because all values
    of b will give the same result x, since for any
    b, we have that
  • where x is the unique real number such that
    LLogx.
  • Thus, ExpLogx x and LogExpL L always.

20
Logarithmic Quantities Units
  • Theorem. Any given indefinite-logarithm quantity
    Logx is equal to a scalar multiple of any fixed
    indefinite-logarithm quantity, called a
    logarithmic unit Lu Logu, where u can be any
    standard chosen base gt0, and where the scalar
    coefficient is logu x. Or, in symbols,
  • Logx (logu x)Lu.
  • Example Let the logarithmic unit b L2
    Log2 be called the bit (binary digit).
  • Then Log16 4 Log2 4b (4 bits).

21
Some Common Logarithmic Units
  • Decibel dB L1.2589 Log100.1
  • Binary digit or bit b L2 Log2
  • A.k.a. the octave when used to measure tonal
    intervals in music.
  • Neper or nat n Le Loge
  • A.k.a. Boltzmanns constant k in physics, as we
    will see!
  • Octal digit o L8 3L2 Log8
  • Bel or decimal digit d L10 10 dB Log10
  • A.k.a. order of magnitude, power of ten, decade,
    Richter-scale point.
  • Nibble or hex digit h L16 4L2 Log16
  • Byte or octet B L256 8L2 Log256
  • Kilobit or really kibibit kb Log21,024
  • Joule per Kelvin J/K Log103.14558e22
    (roughly)
  • Units of physical entropy are equivalent to
    indefinite-logarithm units!

22
Conversions Between Different Logarithmic Units
  • Suppose we are given a logarithmic quantity Q
    expressed as a multiple of some logarithmic unit
    La (that is, QcaLa where ca is a number),
  • and suppose that we wish to re-express Q in terms
    of some other logarithmic unit Lb, i.e., as Q
    cbLb.
  • The ratio between the two logarithmic units
    LaLoga and LbLogb is La/Lb logba.
  • So, cb Q/Lb caLa/Lb calogba.
  • And so Q (calogba) Lb.
  • Example. How many nats are there in a byte?
  • The quantity to convert is 8 bits, Q 1B 8b
    8L2.
  • The answer should be in units of nats, n Le.
  • Thus, Q (8loge 2)Le (80.693) n 5.55 nats.

23
Capacity of a Holder of Information
  • After all of that, we can now make the following
    definition
  • Definition. The (information) capacity C of (or
    the amount of information I contained in) a
    holder of information H that has N possible forms
    is defined to be the indefinite logarithm of N,
    that is, CH LogN.
  • Like any indefinite logarithm quantity, CH can be
    expressed in any of the units previously
    discussed bits, nats, etc.
  • Example. A memory chip manufacturer develops a
    memory technology in which each memory cell
    (capacitor) can reliably hold any of 16
    distinguishable logic voltage levels. (For
    example, 0V, 0.1V, 0.2V, , 1.5V.) Therefore,
    what is the information capacity of each cell
    (that is, of its logical macrostate, as defined
    by this set of levels), when expressed both in
    bits, and in kB units (nats)?
  • Answer. N16, so Ccell Log16
  • Log16 (log2 16)Log2 4Log2 4 bits
  • Log16 (loge 16)Loge 2.77Loge 2.8 kB

24
Subsection II.A.2.c Information and Entropy
  • Complexity of a Form
  • Optimal Encoding of a Form
  • Entropy of a Probability Distribution
  • Marginal Conditional Entropy
  • Mutual Information

25
Quantifying The Size of a Form
  • In the previous subsection, we managed to
    quantify the size or capacity of a holder of
    information, as the indefinite logarithm of the
    number of different forms that it could have.
  • But, that doesnt tell us how we might quantify
    the size or complexity of a specific form, by
    itself.
  • What about saying The size of a given form is
    the capacity of the smallest holder that could
    have that form?
  • The problem with that definition is
  • We can always imagine a holder constrained to
    only have that form and no other.
  • Its capacity would be Log1 0.
  • So, with this definition the size of all forms
    would be 0.
  • Another idea Lets measure the size of a form
    in terms of the number of small, unit-size pieces
    of information that it takes to describe it.
  • To do this, we need some language that we can use
    to represent forms, e.g., an encoding of forms in
    terms of sequences of symbols.
  • Given a language, we can then say that the size
    or informational complexity K of a given form is
    the length of the shortest symbol string that
    represents it in our chosen language.
  • Its maximally compressed description.
  • At first, this definition seems pretty ambiguous,
    but

26
Why Complexity is Meaningful
  • In their algorithmic information theory,
    Kolmogorov and Chaitin showed that informational
    complexity is almost language-independent, up to
    a fixed (language-dependent) additive constant.
  • In the case of universal (Turing-complete)
    languages.
  • Also, whenever we have a probability distribution
    over a set of possible forms, Shannon showed us
    how to choose an encoding of the forms that
    minimizes the expected size of the codeword
    instance that is needed.
  • This choice of encoding then minimizes the
    expected complexity of the forms under the given
    distribution.
  • If such a probability distribution is available,
    we can then assume that the language has been
    chosen appropriately so as to minimize the
    expected length of the forms shortest
    description.
  • We define this minimized expected complexity to
    be the entropy of the system under the given
    distribution.

27
Definition of Entropy
  • The definition of entropy which we just stated is
    very important, and well worth repeating.
  • Definition. Entropy. Given a probability
    distribution over a set of forms, the entropy of
    that distribution is the expected form
    complexity, according to whatever encoding of
    forms yields the smallest expected complexity.
  • The definition can be applied to also define the
    entropy of information holders, wraiths, etc.
    that have given probability distributions over
    their possible forms.

28
The Optimal Encoding
  • Suppose a specific form F has probability p.
  • Thus, it has improbability i 1/p.
  • Note that this is the same probability that F
    would have if it were one of i equally-likely
    forms.
  • We saw earlier that a holder of information
    having i possible forms is characterized as
    containing a quantity of information Logi.
  • So, it seems reasonable to declare that the
    complexity K of the form F itself is, in fact,
    K(F) Logi Log1/p -Logp.
  • This suggests that in the optimal encoding
    language, the description of the form F could be
    held in a holder of that capacity.
  • In his Mathematical Theory of Communication
    (1949) Claude Shannon showed that in fact this is
    exactly correct,
  • In an asymptotic limit where we permit ourselves
    to consider encodings in which many similar
    systems (whose forms are chosen from the same
    distribution) are described together.
  • Modern block-coding schemes (turbo codes, etc.)
    in fact closely approach Shannons ideal encoding
    efficiency.

29
Optimal Encoding Example
  • Suppose a system has four forms A, B, C, D with
    the following probabilities
  • p(A)½, p(B)¼, p(C)p(D)1/8.
  • Note that the probabilities sum to 1, as they
    must.
  • Then the corresponding improbabilities are
  • i(A)2, i(B)4, i(C)i(D)8.
  • And the form sizes (log-improbabilities) are
  • K(A) Log2 1 bit, K(B) Log4 2 Log2
    2 bits, K(C) K(D) Log8 3 Log2 3
    bits.
  • Indeed, in this example, we can encode the forms
    using bit-strings of exactly these lengths, as
    follows
  • A0, B10, C110, D111.
  • Note that this code is self-delimiting
  • the symbols can be concatenated together without
    ambiguity.

0
1
A
1
0
B
1
0
C
D
30
The Entropy Formula
  • Naturally, if we have a probability distribution
    over the possible forms F of some system (holder
    of information),
  • We can easily calculate the expected complexity K
    of the systems form, which is the entropy S of
    the system.
  • This is possible since K itself is a random
    variable,
  • a function of the event that the system has a
    specific form F.
  • The entropy S of the system is then
  • We can also view this formula as a simple
    additive sum of the entropy contributions s pK
    pLogp-1 arising from the individual forms.
  • The largest single contribution to entropy comes
    from individual forms that have probability p
    1/e, in which case s Loge/e .531 bits.
  • The entropy formula is often credited to Shannon,
    but it was already known was being used by
    Boltzmann in the 1800s.

31
Visualizing the Contributions toEntropy in a
Probability Distribution
Contribution to
s in
32
Known vs. Unknown Information
  • We can consider the informational capacity I
    LogN of a holder that is defined as having N
    possible forms as telling us the total amount of
    information that the holder contains.
  • Meanwhile, we can consider its entropy S
    ?Logi(f)? as telling us how much of the total
    information that it contains is unknown to us.
  • How much unknown information the holder contains,
    In the perspective specified by the distribution
    p().
  • Since S I, we can also define the amount of
    known information (hereby dubbed extropy)
    contained in the holder as X I - S.
  • Note that our probability distribution p() over
    the holders form could change (if we gain or
    lose knowledge about it),
  • Thus, the holders entropy S and extropy X may
    also change.
  • However, note that the total informational size
    of a given holder, I LogN X S, always
    still remains a constant.
  • Entropy and extropy can be viewed as two forms of
    information, which can be converted to each
    other, but whose total amount is conserved.

33
Information/Entropy Example
  • Consider a tetrahedral die which maylie on any
    of its 4 faces labeled 1,2,3,4
  • We say that the answer to the question Which
    side is down? is a holder of information having
    4 possible forms.
  • Thus, the total amount of information contained
    in this holder, and in the orientation of the
    physical die itself, is Log4 2 bits.
  • Now, suppose the die is weighted so that p(1)½,
    p(2)¼, and p(3)p(4)1/8 for its post-throw
    state.
  • Then K(1)1b, K(2)2b, and K(3)S(4)3b.
  • The entropy of the holder is then S 1.75 bits.
  • This much, this much information remains unknown
    to us before we have taken a look at the thrown
    die.
  • The extropy (known information) is already X
    0.25 bits.
  • Exactly one-fourth of a bits worth of knowledge
    about the outcome is already expressed by this
    specific probability distribution p().
  • This much information about the dies state is
    already known to us even before we have looked at
    it.

34
HolderVariable, FormValue, and Types of Events.
  • A holder corresponds to a variable V.
  • Also associated with a set of possible values
    v1,v2,.
  • Meanwhile, a form corresponds to a value v of
    that variable.
  • A primitive event is a proposition that assigns a
    specific form v to a specific holder, Vv.
  • I.e., a specific value to a specific variable.
  • A compound event is a conjunctive proposition
    that assigns forms to multiple holders,
  • E.g., Vv, Uu, Ww.
  • A general event is a disjunctive set of primitive
    and/or compound events.
  • Essentially equivalent to a Boolean combination
    of assignment propositions.

35
Four Concepts to Distinguish
  • A set corresponds to (is a)
  • System, state space, sample space, situation
    space, outcome space.
  • A partitioning of the set is a
  • Subsystem, state variable, mutex/ex set of
    events
  • A section of the partitioning, or a subset of the
    set, is a
  • Subsystem state, macrostate, value of variable,
    event, abstract proposition
  • An individual element is
  • System configuration, microstate, primitive
    event, complete outcome.

36
Entropy of a Binary Variable
Below, little s of an individual form or
probability denotesthe contribution to the total
entropy of a form with that probability.
Maximum s(p) (1/e) nat (lg e)/e bits .531
bits _at_ p 1/e .368
37
Proof that a form w. improbability econtributes
the most to the entropy
  • Lets find the slope of the s curve
  • Take the derivative, using standard calculus
  • But now, whats the derivative of an indefinite
    logarithm quantity like Logp-1?
  • Lets rewrite Logp-1 as k ln p-1 (where the
    constant k Le Loge is the indefinite log of
    e), so then
  • Plugging this is to the earlier equation, we get
  • Now just set this to 0 and solve for p

38
Joint Distributions over Two Holders
  • Let X, Y be two holders, each with many forms
    x1, x2, and y1, y2, .
  • Let xy represent the compound event Xx,Yy.
  • Note the set of all xys is a mutually exclusive
    and exhaustive set.
  • Suppose we have available a joint probability
    distribution p(xy) over the compound holder XY.
  • This then implies the reduced or marginal
    distributions p(x)?y p(xy) and p(y)?x p(xy).
  • We also thus have conditional probabilities
    p(xy) and p(yx), according to the usual
    definitions.
  • And we have mutual probability ratios r(xy).

39
Joint, Marginal, Conditional Entropyand Mutual
Information
  • The joint entropy S(XY) ?Logi(xy)?.
  • The (prior, marginal or reduced) entropy S(X)
    S(p(x)) ?Logi(x)?. Likewise for S(Y).
  • The entropy of each subsystem, taken by itself.
  • Entropy is subadditive S(XY) S(X) S(Y).
  • The conditional entropy S(XY) ExyS(p(xy))
  • The expected entropy after Y is observed.
  • Theorem S(XY) S(XY) - S(Y). Joint entropy
    minus that of Y.
  • The mutual information I(XY) ?Logr(xy)?.
  • We will also prove Theorem I(XY) S(X) -
    S(XY).
  • Thus the mutual information is the expected
    reduction of entropy in either subsystem as a
    result of observing the other.

40
Conditional Entropy Theorem
The conditional entropy of X given Y is the joint
entropy of XY minus the entropy of Y.
41
Mutual Information is Mutual Reduction in Entropy
And likewise, we also have I(XY) S(Y) -
S(YX), since the definition of I is symmetric.
I(XY) S(X) S(Y) - S(XY)
Also,
42
Visualization of Mutual Information
  • Let the total length of the bar below represent
    the total amount of entropy in the system XY.

S(YX) conditional entropy of Y given X
S(X) entropy of X
S(XY) joint entropy of X and Y
S(XY) conditional entropy of X given Y
S(Y) entropy of Y
43
Example 1
  • Suppose the sample space of primitive events
    consists of 5-bit strings Bb1b2b3b4b5.
  • Chosen at random with equal probability (1/32).
  • Let variable Xb1b2b3b4, and Yb3b4b5.
  • Then S(X) ___ bits, and S(Y) ___ b.
  • Meanwhile S(XY) ___ b.
  • Thus S(XY) ___ b, and S(YX) ___ b
  • And so I(XY) ___ b.

4
3
5
2
1
2
44
Example 2
  • Let the sample space A consist of the 8 letters
    a,b,c,d,e,f,g,h. (All equally likely.)
  • Let X partition A into x1a,b,c,d and
    x2e,f,g,h.
  • Y partitions A into y1a,b,e, y2c,f,
    y3d,g,h.
  • Then we have
  • S(X) 1 bit.
  • S(Y) 2(3/8 log 8/3) (1/4 log 4) 1.561278
    bits
  • S(YX) (1/2 log 2) 2(1/4 log 4) 1.5 bits.
  • I(XY) 1.561278b - 1.5b .061278 b.
  • S(XY) 1b 1.5b 2.5 b.
  • S(XY) 1b - .061278b .938722 b.

Y
a
b
c
d
X
e
f
g
h
(Meanwhile, the total information content of the
sample space log 8 3 bits)
45
Effective Entropy?
  • In many situations, using the ideal Shannon
    compression may not be feasible in practice.
  • E.g., too few instances, block coding not
    available, no source model
  • Or, a short algorithmic description of a given
    form might exist, but it might be infeasible to
    compute it
  • However, given the following
  • A holder with an associated set of forms
  • A probability distribution over the forms
  • A particular encoding strategy
  • E.g., an effective (short run-time) compression
    algorithm
  • we can define the effective entropy of the holder
    in this situation to be the expected compressed
    size of its encoded form, as compressed by the
    available algorithm.
  • This then is the definition of what the entropy
    can be considered to be for practical purposes
    given the capabilities in that situation.

46
Subsection II.A.2.d Communication Channels
  • Shannons Paradigm
  • Channel Capacity
  • Shannons Theorems

47
Communication Theory
  • Shannons Ph.D. thesis Mathematical Theory of
    Communication (1948) is the seminal work that
    established the field of Communication Theory.
  • a.k.a. Information Theory.
  • It deals with the theory of noiseless and noisy
    communication channels for transmitting messages
    consisting of sequences of symbols chosen from a
    probability distribution.
  • Where the channel can be any medium or process
    for communicating information through space
    and/or time.
  • Shannon proves (among other things) that every
    channel has a certain capacity for transmitting
    information, and that this capacity is related to
    the entropy of the source and channel probability
    distributions.
  • At rates less than the channels capacity, coding
    schemes exist that can transmit information with
    an arbitrarily small probability of error.

48
Shannons Paradigm
  • A communication system is any system intended for
    communicating messages (nuggets of information)
  • Selected from among some set of possible
    messages.
  • Often, the set of possible messages must be
    astronomically large
  • In general, such a system will include the
    following six basic components
  • (1) Information Source (4) Noise Source
  • (2) Transmitter (5) Receiver
  • (3) Channel (6) Destination

Noise Source
Destination
Information Source
Receiver
Transmitter
Noise
Channel
Re-ceivedSignal
Message
Signal
Message
49
Discrete Noiseless Channels
  • A channel is simply any medium for the
    communication of signals, which carry messages.
  • Meaningful instances of information.
  • A discrete channel supports the communication of
    discrete signals consisting of sequences (or
    other kinds of patterns) made up of discrete
    (distinguishable) symbols.
  • There may be constraints on what sequences are
    allowed
  • If the channel is noiseless, we can assume that
    the signals are communicated exactly from
    transmitter to receiver.
  • Noisy channels will be addressed in a later part
    of the theory
  • The information transmission capacity C of a
    discrete noiseless channel can be defined as
  • where t is the duration of the signal (in time)
    and N(t) is the number of mutually
    distinguishable signals of duration t.
  • This is just the asymptotic information capacity
    of the channel (considered as a container of
    information) per unit time.

50
Ergodic Information Sources
  • In general, we can consider the information
    source to be producing a stream of information of
    unbounded length.
  • Even if the individual messages are short, we can
    always consider situations where there are
    unbounded sequences of such messages.
  • For the theory to apply, we must consider the
    source to be produced by an ergodic process.
  • This is a process for which all streams look
    statistically similar in the long run
  • In the limit of sufficiently long streams
  • A discrete ergodic process can be modeled by a
    Hidden Markov Model (HMM) with a unique
    stationary distribution.
  • An HMM is essentially just a Finite State Machine
    with nondeterministic transitions between states,
    and no input
  • But with output, which may be nondeterministic
    also
  • A stationary distribution is just a probability
    distribution over states that is an eigenfunction
    of the HMMs transition probability matrix.

51
Example Ergodic Source
  • This Markov model qualifies as an ergodic source
  • The equilibrium distribution for this particular
    Markov process is pA1/6, pB5/6.
  • A typical long output string will be 1/6 As, 5/6
    Bs.
  • Also, a B is more likely immediately following
    another B
  • Since the machine has memory (more than 1 state),
    note that successive symbols are in general not
    independent of each other!

.5
A
B
A
A
B
.9
.5
B
.1
Transition matrix
52
Noiseless Coding Theorem
  • For a given discrete ergodic process, let pi be
    the distribution of symbol probabilities, in the
    stationary (equilibrium) distribution. this is
    not quite right
  • And let H be the entropy of this distribution.
  • With a probability that approaches 1 as n??, a
    typical sequence of n symbols produced by the
    process will have improbability i ? ExpnH,
  • And therefore, we can represent this block of
    symbols by a codeword with size S Log i
    nH.
  • As n gets large, the relative inefficiency that
    results from using codewords that are integer
    multiples of some fixed unit (e.g. a bit) in size
    approaches 0.

53
Formal Version of Theorem
  • Theorem. Shannons Noiseless Coding Theorem. For
    any e,d gt 0, there is an n0 such that for any
    length ngtn0, the sequences of length n all fall
    into two sets
  • Atypical sequences whose total probability is
    less than e.
  • Typical sequences, each of whose probability
    satisfies
  • Where K log i log p-1 is the particular
    sequences complexity
  • Due to the theorem, codewords of size S nH will
    be within nd of the optimal code length (the
    sequence complexity) K, for all of the typical
    sequences.
  • The atypical sequences will require longer
    codewords (or cause errors), but they can be made
    as rare as desired.
  • The efficiency of such a code approaches optimal.

54
Existence of Near-Optimal Code
  • Let Nn(e) be the number of sequences of length n,
    except for as many of the least-probable
    sequences as we can gather but still have total
    probability lte.
  • Theorem For all values of e in the range 0ltelt1,
  • Implication For large values of n, the number of
    likely sequences Nn(e) roughly matches the
    number ExpnH of different codewords of size nH.
  • Thus, there are not too many such sequences for
    us to be able to code for each one with a unique
    codeword.

55
Noiseless Coding Example
  • In the previous example, the entropy turns out to
    be H 0.650022422 bits.
  • Thus, for example, an average sequence of, say, n
    20 symbols will be encoded with 20H
    13.00044843 bits (almost exactly 13 bits) in an
    ideal code.
  • Here is a fairly typical sequence of 20 symbols
    (here there are 3/20 1/6 As)
  • BBBBBABBBABBBBBBABBB
  • Probability 2.6478?10-5 Improbability (1 in)
    37767 Complexity 15.205 bits
  • Here is an atypical sequence of 20 symbols
    (17/20 ? 1/6 As)
  • AAAABAAAAAAABAAAAAAB
  • Probability 1.2716?10-8 Improbability (1 in)
    78,643,200 Complexity 26.229 bits
  • So, e.g., one fairly efficient encoding for this
    particular ergodic source would be
  • Chop the data stream into 20-symbol blocks.
  • Sort all 220 of the possible 20-symbol sequences
    by order of their probability.
  • Assign the 213 8,192 most-likely sequences (the
    typical sequences) unique binary codewords
    consisting of a 1, followed by a 13-bit sequence
    number.
  • Assign the other 220-213 1,040,384 sequences
    (the atypical sequences) longer codewords
    consisting of, say, a 0 followed by all 20 bits
    of the full symbol string.
  • Shannons theorem tells us that all of the
    atypical sequences put together have some
    relatively small total probability e.
  • And if e is not small enough to be ignored, we
    can simply choose a larger block size n.
  • Assuming eltlt1, the expected encoding size of each
    block will be 14 bits
  • Note this is only one bit larger than the ideal
    size of of 13 bits.
  • The extra bit needed to mark the sequence as
    typical becomes negligible for large n.

56
Actual Data Relating to Example
57
Roulette Wheel Coding
  • An algorithm for coding continuous data streams
    from known ergodic information sources.
  • Doesnt require data blocking!
  • We imagine that a single spin of a very large
    roulette wheel generates theentire infinite
    input data sequence
  • As the wheel slowly comes to a halt,infinitely
    many symbols are generated
  • Each possible initial subsequence maps to a
    region of the wheel orientations having an
    angular size that is proportional to the
    subsequences probability
  • Simultaneously, the sequences encoding is
    generated in the same way
  • But with a different symbol distribution that
    lends itself toward efficient transmission (e.g.,
    equiprobable 0s and 1s)

58
Roulette Wheel Example
  • Here is how the roulette wheel is marked for the
    particular ergodic process described earlier
  • This is the mapping from input code sequences to
    wheel positions.
  • The wheel region for each sequence is
    proportional to its probability.

0
360
A(1/6)
B (5/6)
1stsymbol
A
B
A
B (5/6 x 9/10)
2ndsymbol
A
B
A
B
A
B
A
B
3rdsymbol

59
Roulette Wheel Output Code
  • The output codespace divides up the wheel
    similarly to (but more regularly than) the input
    one
  • Note Output sequencelengths areexactly
    theirLogi values!
  • This is whythe code isso efficient

0
360
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
60
Roulette Wheel Algorithm
  • Main variable in the algorithm
  • A current range R of possible roulette wheel
    positions, initially 0,1.
  • Representing wheel pointer angles in the range
    from 0 to 360.
  • Places on the wheel where the pointer might end
    up pointing (or ball bouncing)
  • Encoding algorithm
  • 1. When the first input symbol is received,
    initialize R to a sub-region of 0,1
    corresponding to the symbol, of size equal to its
    probability.
  • E.g., for our example, initially A ? 0, 1/6, B
    ? 1/6, 1
  • 2. Whenever R becomes a sub-range of 0, ½ or
    ½, 1,
  • output 0 or 1 respectively,
  • And re-map R from its respective size-1/2
    sub-range back up into the whole 0,1 region.
    (I.e., translate it appropriately and double it
    in size.)
  • 3. When the next symbol is received, narrow R
    down to a corresponding subrange of size
    proportional to the probability of that
    transition.
  • E.g., if B was just received, then an A takes the
    present range down to its first 1/10, and a B
    takes it to its last 9/10.
  • 4. Repeat steps 2 and 3 indefinitely (or include
    a special stop symbol).
  • The decoding algorithm is very similar.

61
Pros and Cons of the Algorithm
  • Pros
  • Handles continuous data streams
  • Doesnt require knowing length of stream in
    advance, or blocking
  • Its an on-line algorithm
  • It doesnt require a preprocessing phase to
    construct the code
  • It only requires tracking the range, the last
    symbol seen
  • Extremely simple to implement can do it in HW,
    low latency
  • Yields an asymptotically optimal code (prove
    this!)
  • In the case of sources modeled by (non-hidden)
    Markov chains
  • Cons
  • Not adaptive Requires knowing the source model
  • Although this could be fixed in a more
    sophisticated version
  • Its only optimal for non-hidden Markov models
  • To do more optimally for cases where a full HMM
    is needed, the algorithm would need to remember
    more past symbols
  • Requires unbounded precision in arithmetic!
  • Otherwise, rounding errors accumulate and are
    rapidly magnified

62
Additional Communication Theory Topics to Include
Eventually
  • I dont yet have slides dealing with
  • Noisy channel coding theorems
  • Error correcting codes
  • Continuous channels
  • Quantum channels
  • To save time, we wont cover these topics right
    now,
  • Though we will hopefully get to some basic
    elements of quantum communication theory later on
    in the course
Write a Comment
User Comments (0)
About PowerShow.com