Statistical Techniques for Analyzing Quantitative Data - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Statistical Techniques for Analyzing Quantitative Data

Description:

The null hypothesis H0 expresses the idea that the observed difference is due to ... as the statement we hope or suspect is true instead of the null hypothesis. ... – PowerPoint PPT presentation

Number of Views:733

Avg rating:3.0/5.0

Slides: 42

Provided by: maryamr

Category:

more less

Transcript and Presenter's Notes

Title: Statistical Techniques for Analyzing Quantitative Data

1
Statistical Techniques for Analyzing Quantitative
Data

Maryam Ramezani
Values in Computer Technology
CSC 426

2
Outline
3
Role of Statistics in Research

With Statistics , we can summarize large bodies
of data, make predictions about future trends
,and determine when different experimental
treatments have led to significantly different
outcomes.
Statistics are among the most powerful tools in
the research's toolbox.

4
How statistics come to research?

In quantitative research we use numbers to
represent physical or nonphysical phenomena
We use statistics to summarize and interpret
numbers

5
Exploring and Organizing a Data Set

Look at your data and find the ways of organizing
them
example Scores of test for 11 children
What do you see?

Ruth 96, Robert 60, chuck 68, Margaret 88 Tom
56, Mary 92,Ralph 64, Bill 72,Alice 80 Adam
76,Kathy 84
6
Exploring and Organizing a Data Set
Alphabetical Order
7
Using Computer Spreadsheets to Organize and
Analyze Data

Sorting
Graphing
Formulas
What Ifs
Save, Store, recall, update information

8
Functions of Statistics

Descriptive Statistics
describes what the data look like
Inferential Statistics
inference about a large population by collecting
small samples.

9
Considering the Nature of the Data

Continuous or discrete
Nominal, ordinal, interval or ratio scale
Normal or non-normal distribution

10
Continuous versus Discrete Variables

Continuous Data takes on any value within a
finite or infinite interval. You can count, order
and measure continuous data.
Example height, weight, temperature, the amount
of sugar in an orange, the time required to run a
mile.
Discrete Data values / observations belong are
distinct and separate, i.e. they can be counted
(1,2,3,....).
Example the number of kittens in a litter the
number of patients in a doctors surgery the
number of flaws in one metre of cloth gender
(male, female) blood group (O, A, B, AB).

11
Nominal Data

the numbers are simply labels. You can count but
not order or measure nominal data
Example males could be coded as 0, females as 1
marital status of an individual could be coded as
Y if married, N if single.
classification data, e.g. m/f
no ordering, e.g. it makes no sense to state that
M gt F
arbitrary labels, e.g., m/f, 0/1, etc

12
Ordinal Data

ordered but differences between values are not
important
e.g., Like scales, rank on a scale of 1..5 your
degree of satisfaction
rating of 2 rather than 1 might be much less than
the difference in enjoyment expressed by giving a
rating of 4 rather than 3.
You can count and order, but not measure, ordinal
data.

13
Interval Data

ordered, constant scale, but no natural zero
differences make sense, but ratios do not
e.g. 30-2020-10, but 20/10 is not twice
as hot!
e.g. Dates the time interval between the starts
of years 1981 and 1982 is the same as that
between 1983 and 1984, namely 365 days. The zero
point, year 1 AD, is arbitrary time did not
begin then

14
Ratio Data

Like interval data but has true zero
Ordered, Constant scale, natural zero
e.g., height, weight, age, length

15
Normal and Non-Normal Distributions
16
Normal Distribution
17
Non-Normal Distributions
Skewed to the Left(Negatively Skewed)
Skewed to the Right (Positevely Skewed)
18
Leptokurtic and Platykurtic Distributions
19
Descriptive Statistics

Descriptive Statistics describes data
Points of Central Tendency
Amount of Variability
Relation of different variables to each other

20
Points Of Central Tendency Mean

Measuring center If the n observations are x1,
x2,, xn, arithmetic mean is

Geometric Mean
e.x. Biological growth, Population growth
21
Measure of Central Tendency
22
Measures of Variability
How great is the Spread? RangeHighest
Score-Lowest score the quartiles The pth
percentile of a distribution is the value such
that p percent of the observations fall at or
below it. The 50th percentile median, M The
25th percentile first quartile, Q1 The 75th
percentile third quartile, Q3 Interquartile
Quartile 3- Quartile 1

Example
13 13 16 19 21 21 23 23 24 26 26 27 27
27 28 28 30 30
M?, Q1?, Q3?

23
Measures of Variability
Standard Devastation
standardized score
24
Measure of Relationship Correlation

correlation indicates the strength and direction
of a linear relationship between two variables.
See page 266 for other examples or correlation
statistics

25
Notes about Correlation

Substantial correlations between two
characteristics needs reasonable Validity and
Reliability in measuring
Correlation does not indicate causation

26
Examples of using Statistics in Computer Science

Conceptual Representation of User Transactions or
Sessions

Pageview/objects
Session/user data
27
Inferential Statistics

We use the samples as estimate of population
parameter.
The quality of all statistical analysis depends
on the quality of the sample data

Random Sampling every unit in the population
has an equal chance to be Chosen A random sample
should represent the population well, so sample
statistics from a random sample should provide
reasonable estimates of population parameters
28
Some definitions

Parameter describes a population
Statistic describes a sample

A parameter is a characteristic or quality of a
population that in concept is constant ,however,
its value is variable. example radius is a
parameter in a circle
29
Inferential Statistics

Estimate a population parameter from a random
sample
Test statistically hypotheses

30
Inferential Statistics Estimate a Population
Parameter from Sample

All sample statistics have some error in
estimating population parameters
Example estimate mean height of 10 year old boys
in Chicago, Sample200 boys
How close the sample mean is to the population
mean?
we dont know but we know
The mean from an infinite number of samples form
a normal distribution.
The population mean equals the average (mean) of
all samples.
The Standard deviation of sample distribution (
standard error) is directly related to the std
of the characteristic in question for the overall
population.

31
Standard Error

Standard error tell us how much the particular
mean vary from one sample to another when all
samples are the same size and drawn randomly from
the sample population.
Standard Error
n is size of all samples and s is the population
std which we dont have!
We use the std of sample

32
Accuracy of the Estimator
As in many problems, there is a trade off between
accuracy and dollars.
What we will get from our money if we
invest dollars in obtaining a larger size?
n 100? n 200?
33
Point versus Interval Estimate

A point estimate is a single value--a
point--taken from a sample and used to estimate
the corresponding parameter of a population
, s, s2 and r estimate µ, s, s2, ?
respectively
An interval estimate is a range of values--an
interval within whose limits a population
parameter probably lies.
we say that we are 95 confident that the unknown
population mean lies in the interval

95 confidence interval for µ.
(x -2?/(n1/2), x2 ?/(n1/2))

In only 5 of all samples,
the sample mean x is not in the above interval,
that is 5 of all samples give inaccurate results.

34
Testing Hypothesis

Confidence intervals are used when the goal of
our analysis is to estimate an unknown parameter
in the population.
A second goal of a statistical analysis is to
verify some claim about the population on the
basis of the data.
Research Hypothesis /Statistical hypothesis
A test of significance is a procedure to assess
the truth about a hypothesis using the observed
data. The results of the test are expressed in
terms of a probability that measures how well the
data support the hypothesis.

35
Example To determine whether the mean nicotine
content of a brand of cigarettes is greater than
the advertised value of 1.4 milligrams, a health
advocacy group takes a sample of 500 cigarettes
and measures the amount of nicotine in the
sample.
Sample values The sample average of nicotine
1.51 mlg The standard deviation 1.016.
The estimated amount of nicotine is 1.51mlg,
based on the sample values. The standard error
of the sample average is S.E.s.d./sqrt(n-1)0.04
5 Is there an actual difference between the
sample value (1.51mlg) and the advertised value
(1.4 mlg)? Or is it just due to sampling
error? To answer this question we need a Test of
Significance
36
Stating an hypotheses
The null hypothesis H0 expresses the idea that
the observed difference is due to chance. It is a
statement of no effect or no difference,
and is expressed in terms of the population
parameter.
Let ? denote the true average amount of
nicotine. H0 ? 1.4mlg
The alternative hypothesis Ha represents the idea
that the difference is real. It is expressed as
the statement we hope or suspect is true instead
of the null hypothesis.
The alternative hypothesis states that the
cigarettes contain a higher amount of nicotine,
that is Ha ? gt 14mlg
37
General comments on stating hypotheses

It is not easy to state the null and the
alternative hypothesis!
The hypotheses are statements on the population
values.
The alternative hypothesis Ha is often called
researcher hypothesis, because it is the
hypothesis we are interested about.
A significance test is a test against the null
hypothesis
Often we set Ha first and then Ho is defined as
the opposite statement!

38
Errors in Hypothesis testing

Type I Error the null hypothesis is rejected
when it is in fact true that is, H0 is wrongly
rejected.
Type II Error the null hypothesis H0, is not
rejected when it is in fact false

39
Meta- Analysis

Meta-analysis refers to the analysis of
analyses...the statistical analysis of a large
collection of analysis results from individual
studies for the purpose of integrating the
findings. (Glass, 1976, p. 3)
Conduct a fairly extensive search for relevant
studies
Identify appropriate studies to include in
meta-analysis
Convert each studys results to a common
statistical index

40
Using Statistical Software Packages

SPSS
SAS
Matlab Statistics toolbox
SYSTAT, Minitab, Stat View, Statistica

41
Interpreting the Data

Relating the findings to the original research
problem and to the specific research questions
and hypothesis
Relating the findings to preexisting literature,
concepts, theories and research results.
Determining whether the findings have practical
significance as well as statistical significance
Identifying limitations of the study

Write a Comment

User Comments (0)