Loading...

PPT – ENMA 420/520 Statistical Processes Spring 2007 PowerPoint presentation | free to download - id: 6469e1-OWQ4N

The Adobe Flash plugin is needed to view this content

ENMA 420/520Statistical ProcessesSpring 2007

- Michael F. Cochrane, Ph.D.
- Dept. of Engineering Management
- Old Dominion University

ENMA 420/520Syllabus

- Instructor
- Michael F. Cochrane, Ph.D.
- Chief Analyst, US Joint Forces Command, Joint

Experimentation Directorate (J9) - Phones
- 757-203-5131 (Office)
- mcochran_at_odu.edu
- http//www.odu.edu/engr/mcochrane

ENMA 420/520Syllabus - Continued

- Office hours (by appointment)
- Textbooks
- Statistics for Engineering and the Sciences
- Mendenhall, William and Sincich, Terry
- Data Analysis with Microsoft Excel
- Berk and Carey

ENMA 420/520Syllabus - Continued

- Course Objectives
- Target audience
- Never taken statistics or need refresher
- Purpose
- Foundation linkages to engineering applications

ENMA 420/520Syllabus - Continued

- Spreadsheet software
- Excel
- Theory ? application
- Data Analysis add-in to Excel (/Tools/Data

Analysis) - Software as an enabling tool
- Availability of more powerful statistical sw
- JMP, SAS, SPSS, _at_Risk
- Not as prevalent as spreadsheet sw
- Targeted to specialized users
- Will demo

ENMA 420/520Syllabus - Continued

End of course project for 520 students

- Course format
- Class schedule in syllabus
- Two examinations
- Mid term gt take home
- Final gt in-class
- Two quizzes gt take home
- Homework
- Assigned but not graded

ENMA 420/520Syllabus - Continued

- ENMA 520
- Mid-term exam 30 points
- Final exam 30 points
- Two quizzes 30 points
- Course project 10 points
- ENMA 420
- Mid-term exam 30 points
- Final exam 30 points
- Two quizzes 40 points

ENMA 420/520Syllabus - Continued

- Do not copy everything on board!!!
- Use the web (www.odu.edu/engr/mcochrane )

ENMA 420/520Syllabus - Continued

- The ODU Honor Code

Probability StatisticsGetting Started

- Flow of course
- Statistics as means of information processing
- Introduction to probability
- Standard probability models
- Statistics as means of inferring population

characteristics - Alternative approach to inferential statistics

Getting StartedClass 1

- Reading assignments
- M S
- Sections 1.1 - 1.4 (Introduction)
- Sections 2.1 - 2.8 (Descriptive Statistics)
- Recommended Problems
- MS Chapter 2 45-48, 53-56

The textbook chapters will always be from

the syllabus!

Statistics Principal Branches

- Descriptive Statistics
- Organizing
- Summarizing
- Describing
- Contrasting
- Develop insights into
- data sets

- Inferential Statistics
- Inferring
- Estimating
- Modeling
- Develop insights into populations

Data Sets Populations Samples

- Population (universe)
- Enumerative data set
- Examples of population data sets?
- Sample
- Subset of data from population
- Examples of sample data sets?

Important to distinguish between a population

a sample

Parameters - characteristics of

population Statistic - characteristic of sample

Are All Data the Same?Measurable Data

- Quantitative data
- Measurable quantity (numerically valued scales)
- Example of quantitative data?
- Interval data
- Distinct units of distance but no zero
- Example?
- Ratio data
- Distinct units of distance and a zero
- Example?

Types of DataNon-Measurable Data

- Qualitative (categorical) data
- Non-measurable quantity
- Caution sometimes quantified for convenience
- Example of qualitative data?
- Nominal data
- No meaningful order
- Example?
- Ordinal
- Distinct ranking possible
- Example?

Recurring Key Concept A Data Detective

- A data detective
- Essential for modeling
- Understand data underlying processes
- Prerequisite first step
- Population or sample?
- Type of data?
- Data patterns
- Data as clue to process
- Next step - describing data!

Descriptive Statistics

- Digest data
- Arrange or present data
- Develop summary characteristics of data
- Communicating with data

Topics and Concepts

- Describing qualitative data
- Various graphical methods
- Describing quantitative data
- Graphical methods
- Numerical descriptions of data sets
- Important concepts
- Understand strengths weaknesses of methods
- Suitability of method to specific applications
- Present data to highlight insights

Describing Qualitative Data

- Basic steps
- Define categories
- Assign observations ? categories
- Category frequency
- Category relative frequency
- Present graphically
- Key concept
- Minimize category ambiguity
- Observation ? 1 only 1 category
- Examples?

Categorical DataPrincipal Graphing Techniques

- Visualizing categorical data
- Histograms
- Frequency
- Relative frequency
- Cumulative frequency
- Pie charts
- Pictographs
- Use common sense
- What is message you are communicating?

Visualizing Categorical DataExample Problem

- Problem Cause of accidents in Florida (1988)

Categorical DataExample Problem

- Many different ways to present data
- Use good judgment
- Which approach highlights your message
- What if you had data for 1988 1989?
- What if you had data for many years?

Visualizing DataGraphing Quantitative Data

- Same fundamental purpose
- Communicate information
- Gain insights into data sets, relationships
- Methods to be discussed
- Dot plots
- Stem leaf diagrams
- Histograms (relative cumulative frequency)

Visualizing Quantitative DataExample Data Set

- Took sample of home mortgage rates in a

neighborhood

Dot PlotA Quick Dirty Method

- Simplest graph
- Suitable for small data sets
- Single axis ? scale that spans data range
- Range ? minimum to maximum values
- Each observation is dot on axis
- Most statistical software includes
- Excel does not provide

Example Dot Plot

- Constructed using MiniTab
- What are strengths weaknesses?

Dot Plot Summary

- Advantages
- Easy to construct (back of napkin)
- Identify range, possible outliers, distribution

of data - Outlier ? highly unusual observation
- Disadvantages
- Limited to small data sets
- May be difficult to reconstruct original data set
- Have to be careful with scale

Dot Plot Compressed Scaling

- Note contrast with previous dot plot
- Lose observation measurability
- However can now observe data groupings
- Before ? observations evenly distributed
- Now ? see mound shaped distribution

Stem Leaf DiagramsAlternative Graphing

Technique

- Steps in constructing
- Divide observation into 2 parts
- Stem leaf (choose convenient scales)
- 8.2
- List stems in order
- Proceed through all observations
- Arrange leaves in order

Stem Leaf DiagramExample Data Set

- Stem-and-leaf of Rate
- Leaf Unit 0.10
- 2 6 08
- 8 7 024579
- (6) 8 001235
- 6 9 025
- 3 10 058

- Stem-and-leaf of Rate
- Leaf Unit 0.10
- 1 6 0
- 2 6 8
- 5 7 024
- 8 7 579
- (5) 8 00123
- 7 8 5
- 6 9 02
- 4 9 5
- 3 10 0
- 2 10 58

Stem Leaf Diagrams Summary

- Advantages
- Visualize data groupings
- Can recreate original data set
- Simple to construct
- Disadvantages
- Limited to small data sets

HistogramsDealing With Large Data Sets

- Visualizing large data sets
- Aggregate observations into classes
- Observations lose individual identity
- Example data
- Possible classes
- 6 - 7
- 7 - 8
- 8 - 9
- and so forth

HistogramsFour Easy Steps

- Determine range of data
- Divide range into convenient class intervals
- Key step
- Consider open intervals for extremes
- Text mentions rules of thumb
- Excel will do it for you (do not let it!)
- Count observations in each interval
- Graph

HistogramExample Problem Using Excel

- Consider home mortgage problem
- Build
- Histogram
- Relative frequency histogram
- Cumulative frequency histogram
- Also called ogive
- Pareto diagram

Insert Excel Demo Here

(No Transcript)

(No Transcript)

(No Transcript)

(No Transcript)

(No Transcript)

Quantitative DataNumerical Descriptors

- Develop numeric characterizations of data
- Central tendency
- Locate general location (center) of data
- Variation
- Describe dispersion (spread) of observations in

set - Relative standing
- Describe observations relative to others in set
- Note each provides different perspective of data.

Measures of Central Tendency

- Three most common measures
- Mean
- Median
- Mode
- Others exist
- Trimmed mean
- Truncated mean
- Conditional mean

Measures of Central TendencyThe Average

Measures of Central TendencyMedian

- Median - the middle observation
- 50 of observations below, 50 above
- m? median of sample
- ? ? median of population
- A resistant measure
- Relatively insensitive to extreme values
- Contrast to the mean!

Determining the Median

Determining the Median Even Numbered Example

Determining the Median Odd Numbered Example

Measures of Central TendencyThe Mode

- Mode of Y
- Value yi that occurs with most frequency
- Note
- Mode may or may not exist
- Y1, 2, 4, 6
- There may be one or more modes
- One mode Unimodal
- Two modes Bimodal
- More than two modes Multimodal

Measures of Central TendencyExcel Special

Functions

- Mean
- Average ( )
- Median
- Median ( )
- Mode
- Mode ( )

Types of Frequency CurvesA Symmetric Frequency

Distribution

Where are the mean, median and mode?

Types of Frequency CurvesA Symmetric Frequency

Distribution

Where are the mean, median and mode?

Types of Frequency CurvesAn Asymmetric Frequency

Distribution

Skewed to the Right, Right Tailed, or Positive

Skewness Where are the mean, median and mode?

Types of Frequency CurvesAn Asymmetric Frequency

Distribution

Skewed to the Right, Right Tailed, or Positive

Skewness Where are the mean, median and mode?

Types of Frequency CurvesAn Asymmetric Frequency

Distribution

Skewed to the Left, Left Tailed or Negative

Skewness Where are the mean, median and mode?

Types of Frequency CurvesAn Asymmetric Frequency

Distribution

Skewed to the Left, Left Tailed or Negative

Skewness Where are the mean, median and mode?

Measures of Dispersion

- Central tendency incomplete/misleading picture
- Consider 3 sets of student grades
- Y1 60, 60, 100, 100 What is the mean?
- Y2 30, 100, 95, 95 What is the mean?
- Y3 75, 80, 80, 85 What is the mean?

Measures of Dispersion

- Will discuss
- Range
- Variance
- Standard deviation

Measure of DispersionThe Range

- Range ? Max(yi) - Min(yi)
- Note Excel special functions
- Is the range resistant to extreme values?

Measures of DispersionVariance and Standard

Deviation

General CommentsVariances

- Why not use average deviation from the mean as a

measure of dispersion? - Why is denominator (n-1) for a sample?
- What happens to variances of samples and

populations as n increases? - Make sure you can use relevant Excel functions
- Easier expression

Distribution Heuristics

- Empirical rule
- Assume a bell shaped frequency curve
- Distribution of observations about mean
- 68 of observations ? s
- 95 of observations ? 2s
- gt 99 of observations ? 3s
- What is basis?

Distribution Heuristics

- Camp-Meidel Inequality
- Assume unimodal distribution

Distribution Heuristics

- Chebyshevs Rule
- For any distribution (no assumptions)
- For k gt1

Distribution HeuristicsExample

- Resistors have following characteristics
- ? 200? and ? 2 ?
- Chebyshevs Rule
- k2 at least 75 of resistors within 200?4
- No assumptions needed
- Camp-Meidel
- k2 at least 88.9 of resistors within 200?4
- Assumption distribution is unimodal
- Empirical Rule
- 95 of resistors within 200?4
- Assumption distribution is bell shaped
- Note impact of the assumptions.

Measure of DispersionCoefficient of Variation

- Data set one (a sample)
- Y15, 10, 12, 15
- mean 10.5, std. deviation 4.20
- Data set two (a sample)
- Y250, 100, 120, 150
- mean 105, std. deviation 42.03
- Is Y2 more widely dispersed than Y1??

Coefficient of Variation

- Notes
- CV compensates for different scaling
- CV is unitless
- Can compare distributions with different units
- Fails to be useful when mean is ? 0

Measures of Relative StandingPercentiles

Special Percentiles

- QL - lower quartile or 25th percentile
- Qm - mid-quartile or 50th percentile (median)
- QU - upper quartile or 75th percentile
- Suppose that 650 is QU for GRE scores, what can

you say about peoples scores?

Data Detective

- Exploratory data analysis
- Understanding data
- Six characteristics of a data set
- Shape - unimodal, bimodal?
- Location - measures of central tendency
- Spread - variance
- Outliers - unusual observations in data set
- Clustering - similar to modality
- Granularity - what discrete values are allowed?

Reasons for outliers

- Measurement errors
- Observation comes from different population
- Rare occurrence

Identifying OutliersEmpirical Rule

- Objective
- Determine how many standard deviations

observation is from mean - Approach
- Standardized variables ? z scores

How many std. deviations is yi away from the mean?

Identifying OutliersUsing z Scores

- Suppose you had observation with z 3.5
- What would you naturally conclude?
- Note
- May be used on populations or samples

Identifying OutliersBoxplots

- Another method for visualizing data set
- Data variation
- Identification of outliers
- Need to understand components of boxplot

Boxplot Example Car Horsepower Ratings

Outer fence

Inner fence

75th percentile

Mean (104.5)

Median (94)

25th percentile

Inner fence

Descriptive Statistics

- The art of communications
- Learn the tools available
- Understand purpose of analysis
- Use common sense
- Look for insights into process/experiment

Homework Assignment Class 1

- Reading assignments
- M S
- Sections 1.1 - 1.4 (Introduction)
- Sections 2.1 - 2.8 (Descriptive Statistics)
- B C
- Chapters 1 through 4
- Recommended Problems
- MS Chapter 2 45-48, 53-56