Probability and Statistics - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Probability and Statistics

Description:

Qualitative data is often termed catagorical data. ... Histograms, Ogive, Pareto Diagrams, Pie Charts. Exploratory Data Analysis. Stem-and-Leaf Diagram ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 40
Provided by: ramazan3
Category:

less

Transcript and Presenter's Notes

Title: Probability and Statistics


1
  • Probability and Statistics
  • Lecture notes 03

2
Lesson Overview
  • Types of Data
  • Qualitative (Categorical)
  • Quantitative (Numerical)
  • Discrete vs. Continuous
  • Levels of Measurement
  • Nominal, Ordinal, Interval, Ratio
  • Data Summary and Presentation
  • The Stem-and-leaf Diagram
  • The Frequency Distribution Tables
  • Histogram
  • The Box Plot
  • Time Sequence Plots

3
Types of Data
  • Data can be classified as either numeric or
    nonnumeric.
  • Specific terms are used as follows
  • Qualitative data are nonnumeric.
  • Poor, Fair, Good, Better, Best, colors
    (ignoring any physical causes), and types of
    material straw, sticks, bricks are examples of
    qualitative data.
  • Qualitative data is often termed catagorical
    data.
  • Some books use the terms individual and variable
    to reference the objects and characteristics
    described by a set of data.
  • They also stress the importance of exact
    definitions of these variables, including what
    units they are recorded in.
  • The reason the data were collected is also
    important.

4
Types of Data
  • Quantitative data are numeric.
  • Quantitative data are further classified as
    either discrete or continuous.
  • Discrete data are numeric data that have a finite
    number of possible values.A classic example of
    discrete data is a finite subset of the counting
    numbers, 1,2,3,4,5 perhaps corresponding to
    Strongly Disagree... Strongly Agree.
  • When data represent counts, they are discrete. An
    example might be how many students were absent on
    a given day. Counts are usually considered exact
    and integer. Consider, however, if three tradies
    make an absence, then aren't two tardies equal to
    0.67 absences?

5
Quantitative data / Types of Data
  • Continuous data have infinite possibilities 1.4,
    1.41, 1.414, 1.4142, 1.141421...The real numbers
    are continuous with no gaps or interruptions.
  • Physically measureable quantities of length,
    volume, time, mass, etc. are generally considered
    continuous. At the physical level
    (microscopically), especially for mass, this may
    not be true, but for normal life situations is a
    valid assumption.
  • The structure and nature of data will greatly
    affect our choice of analysis method. By
    structure we are referring to the fact that, for
    example, the data might be pairs of measurements.

6
Levels of Measurement
  • The experimental (scientific) method depends on
    physically measuring things.
  • The concept of measurement has been developed in
    conjunction with the concepts of numbers and
    units of measurement.
  • Statisticians categorize measurements according
    to levels.
  • Each level corresponds to how this measurement
    can be treated mathematically.

7
Levels of Measurement (Measurement Scales) Four
common types
  • Nominal Nominal data have no order and thus only
    gives names or labels to various categories.
  • Ordinal Ordinal data have order, but the
    interval between measurements is not meaningful.
  • Interval Interval data have meaningful intervals
    between measurements, but there is no true
    starting point (zero).
  • Ratio Ratio data have the highest level of
    measurement. Ratios between measurements as well
    as intervals are meaningful because there is a
    starting point (zero).
  • (Gender is something you are born with, whereas
    sex is something you should get a license for.)

8
Levels of Measurement (measurement Scales) Four
common types
  • Nominal scales are for things that are mutually
    exclusive/non-overlapping, but there is no order
    or ranking.  For example, professors are divided
    into departments by subject, but no subject is
    ranked as better than another.
  • Ordinal Levels of Rank are categories that can be
    ordered, but not precisely.  For example, letter
    grades, movie quality (excellent, good, adequate,
    bad, terrible).
  • Interval Level ranks the data in precise scales,
    but there is no meaningful zero.  For example IQ
    tests and temperature. Neither have a meaningful
    zero.
  • Ratio Level Data can be ranked and there are
    precise differences between the ranks, as well as
    having a meaningful zero.  For example Height,
    weight, Salary, and Age.

9
Types of Data / Levels of Measurement
  • Example 1 ColorsTo most people, the colors
    black, brown, red, orange, yellow, green, blue,
    violet, gray, and white are just names of colors.
  • To an electronics student familiar with
    color-coded resistors, this data is in ascending
    order and thus represents at least ordinal data.
  • To a physicist, the colors red, orange, yellow,
    green, blue, and violet correspond to specific
    wavelengths of light and would be an example of
    ratio data.

10
Types of Data / Levels of Measurement
  • Example 2 TemperaturesWhat level of measurement
    a temperature is depends on which temperature
    scale is used.Specific values 0C 32F
    273.15 K 491.69R     100C 212F 373.15 K
    671.67R     -17.8C 0F 255.4 K
    459.67Rwhere C refers to Celsius F refers to
    Fahrenheit K refers to Kelvin R refers to
    Rankine.
  • Only Kelvin and Rankine have zeroes (starting
    point) and ratios can be found. Celsius and
    Fahrenheit are interval data certainly order is
    important and intervals are meaningful. However,
    a 180 dashboard is not twice as hot as the 90
    outside temperature (Fahrenheit assumed)!
    Although ordinal data should not be used for
    calculations, it is not uncommon to find averages
    formed from data collected which represented
    Strongly Disagree, ..., Strongly Agree! Also,
    averages of nominal data (zip codes, social
    security numbers) is rather meaningless!

11
Data Sources
  • Published source
  • Designed experiment
  • Survey
  • Observational study

12
DATA SUMMARY
13
DATA SUMMARY AND PRESENTATION
  • The Stem-and-leaf Diagram
  • The Frequency Tables
  • Standard, Relative, and Cumulative
  • Histograms
  • The Box Plot
  • Time Sequence Plots

14
Graphical Displays
  • The distribution of a variable describes what
    values the variable takes and how often each
    value occurs.
  • The frequency of any value of a variable is the
    number of times that value occurs in the data.
  • The relative frequency of any value is the
    proportion (fraction or percent) of all
    observations that have that value.

15
DATA SUMMARY AND PRESENTATION
  • Frequency Tables Standard, Relative, and
    Cumulative
  • Histograms, Ogive, Pareto Diagrams,
  • Pie Charts
  • Exploratory Data Analysis
  • Stem-and-Leaf Diagram
  • Boxplots

16
Graphical Displays
  • The distribution of a variable describes what
    values the variable takes and how often each
    value occurs.
  • The frequency of any value of a variable is the
    number of times that value occurs in the data.
  • The relative frequency of any value is the
    proportion (fraction or percent) of all
    observations that have that value.

17
Types of Variables
  • Categorical variable Places an individual into
    one of several categories.
  • Examples Gender, race, political party, zip code
  • Quantitative variable Takes numerical values for
    which arithmetic operations make sense.
  • Examples OYS score, number of vote, cost of
    textbooks

18
Graphs for categorical variables
  • Pie charts require relative frequencies since
    they display percentages and not raw data. The
    relative frequency of each category corresponds
    to the percent of the pie that is occupied by
    that category.
  • Bar graphs display data where the categories are
    on the horizontal axis and the frequencies (or
    relative frequencies) are on the vertical axis.

19
Graphs for quantitative variables
  • Histograms
  • The data are divided into classes of equal width
    and the number (or percentage) of observations in
    each class is counted.
  • Data scale is on the horizontal axis.
  • Frequency (or relative frequency) scale is on the
    vertical axis.
  • Bars are draw where base of each bar covers the
    class, height of each bar covers the frequency
    (or relative frequency).

20
  • Stem-plots or Stem and Leaf Displays
  • Separate each observation in a stem unit (all but
    the final rightmost digit of (rounded) data) and
    a leaf unit (the final digit of (rounded) data).
  • Write the stems in a vertical column, smallest to
    largest from top to bottom.
  • Write each leaf in the row to the right of its
    stem, in increasing order.

21
Histograms vs. Stem plots
  • Both are used to describe the distribution of
    data.
  • Stemplots display actual data values.
  • Stemplots are used for small data sets (less than
    100 values).
  • Histogram can be constructed for larger data sets.

22
Common Distributional Shapes
  • A symmetric distribution is one where both sides
    about the center line are approximately mirror
    images of each other.
  • A skewed distribution is one where one side of
    the center line contains more data than the
    other.
  • Skewed to the right The right side of the
    histogram extends much farther than the left
    side.
  • Skewed to the left The left side of the
    histogram extends much farther than the right
    side.

23
Common Distributional Shapes
  • A bimodal distribution has two humps where much
    of the data lies.  
  • All classes occur with approximately the same
    frequency in a uniform distribution.
  • An outlier in any graph of data is an individual
    observation that falls outside the overall
    pattern of the graph.

24
DATA SUMMARY AND PRESENTATION
  • THE STEM-AND-LEAF DIAGRAM
  • A stem-and-leaf diagram is a good way to obtain
    an informative visual display of a data set
  • x1, x2, ..., xn,
  • where each number xi consists of at least two
    digits.
  • To construct a stem-and-leaf diagram, we divide
    each number xi into two parts
  • a stem, consisting of one or more of the leading
    digits, and
  • a leaf, consisting of the remaining digits.

25
  • Write the stems in a vertical column, smallest to
    largest from top to bottom.
  • Write each leaf in the row to the right of its
    stem, in increasing order.

26
THE STEM-AND-LEAF DIAGRAM
  • EXAMPLE
  • Construct a stem-and-leaf display for the
    following data

27
THE STEM-AND-LEAF DIAGRAM
  • SOLUTION
  • We will select as stem values the numbers 7, 8,
    9, 10, 11, , 24.
  • The resulting stem-and-leaf diagram is presented
    in the following figure.

28
THE STEM-AND-LEAF DIAGRAM
29
THE STEM-AND-LEAF DIAGRAMStem is sorted in
decreasing order, leaf ordered in increasing order
30
THE STEM-AND-LEAF DIAGRAM
  • Inspection of this display immediately reveals
    that most of the data lie between 110 and 200 and
    that a central value is somewhere between 150 and
    160. Furthermore, the data are distributed
    approximately symmetrically about the central
    value.
  • The stem-and-leaf diagram enables us to determine
    quickly some important feature of the data that
    were not immediately obvious in the original
    display in original table.

31
THE FREQUENCY DISTRIBUTION TABLES
  • Frequency Tables
  • Frequency refers to the number of times each
    category occurs in the original data
  • A frequency table lists in one column the data
    categories or classes and in another column the
    corresponding frequencies.
  • A common way to summarize or present data is
    with a standard frequency table.

32
Frequency Tables
  • Often, the category column will have continuous
    data and hence be presented via a range of
    values. In such a case, terms used to identify
    the class limits, class boundaries, class widths,
    and class marks must be well understood.
  • Class limits are the largest or smallest numbers
    which can actually belong to each class. Each
    class has a lower class limit and an upper class
    limit.
  • Class boundaries are the numbers which separate
    classes. They are equally spaced halfway between
    neighboring class limits.

33
Frequency Tables
  • Class marks are the midpoints of the classes. It
    may be necessary to utilize class marks to find
    the mean and standard deviation, etc. of data
    summarized in a frequency table.
  • Class width is the difference between two class
    boundaries (or corresponding class limits).

34
Frequency Tables
  • Following are guidelines for constructing
    frequency tables.
  • The classes must be "mutually exclusive"no
    element can belong to more than one class.
  • Even if the frequency is zero, include each and
    every class.
  • Make all classes the same width. (However, open
    ended classes may be inevitable.)
  • Target between 5 and 20 classes, depending on the
    range and number of data points.
  • Keep the limits as simple and as convenient as
    possible.

35
Frequency Tables
  • Relative freqency tables contain the relative
    frequency instead of absolute frequency. Relative
    frequencies can be expressed either as
    percentages or their decimal fraction
    equivalents.
  • Cumulative frequency tables contain frequencies
    which are cumulative for subsequent classes. In a
    cumulative frequency table, the words less than
    usually also appear in the left column.

36
Frequency Tables
  • The frequency distribution
  • A frequency distribution is a more compact
    summary of data than a stem-and-leaf diagram.
  • To construct a frequency distribution, we must
    divide the range of the data into intervals,
    which are usually called class intervals, cells,
    or bins.

37
Frequency Distrubion Tables
  • EXAMPLE
  • Construct the frequency distribution table for
    the following data

38
THE FREQUENCY DISTRIBUTION TABLES
  • SOLUTION
  • Class relative
  • frequency
  • Cumulative
  • frequency

39
Frequency Distrubion Tables
Another example containing student distributions
as follows
Write a Comment
User Comments (0)
About PowerShow.com