# Basic Statistics Introduction and Overview - PowerPoint PPT Presentation

PPT – Basic Statistics Introduction and Overview PowerPoint presentation | free to download - id: 66f7d7-MDA0Z

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Basic Statistics Introduction and Overview

Description:

### Title: Basic Statistics Update Author: student Last modified by: mperri Created Date: 12/20/2007 3:58:19 PM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 62
Provided by: Student
Category:
Tags:
Transcript and Presenter's Notes

Title: Basic Statistics Introduction and Overview

1
Basic Statistics Introduction and Overview
• Matthew Perri, Bs. Pharm., Ph.D., R.Ph.
• Professor of Pharmacy
• January 2014

2
There are lies, damn lies and statistics.
• British Prime Minister Benjamin Disraeli
• Popularized by Mark Twain

3
Statistical thinking will one day be as necessary
for efficient citizenship as the ability to read
and write.
• H.G. Wells

4
If you know twelve concepts about a given topic
you will look like an expert to people who know
two or three.
• H.G. Wells

5
Why should pharmacists understand statistics?
• Understanding statistics will enable you to draw
your own conclusions and make decisions
• Will you recommend this drug to patients or
physicians?
• Is the drug likely to work for your patients?
• Is it better or safer than existing therapies?
• Should the drug be listed on the formulary or
PDL?
• Should there be dispensing limits, refill limits,
prior authorization, limiting prescribing
authority?

6
Case Study
• You are a recent Pharm. D. graduate from the
University of GA. Two of your professors (Drs.
May and Perri) have served over the last two
decades on the GA Department of Community Health
Drug Utilization Review Board (DURB). The DURB
is the governing body of physicians, pharmacists
and others that study and select appropriate drug
therapy for the lives covered by all GA state
funded health plans (e.g., Medicaid, State
Merritt, Board of Regents, Peach Care). Upon
departure from the Board, the Commissioner sought
input from Drs. May and Perri about who might be
good to replace them on this body of decision
makers. Dr. Perri made the recommendation to
include you on the list of possible candidates
and you were eventually selected by the
commissioner. The DURB meets quarterly and prior
to each meeting a binder is sent to all members
reviewing the disease states and recent
literature about the drugs to be reviewed at the
next meeting.

7
Essential Concepts and Thoughts
• Statistics let you make general conclusions from
limited data.
• Statistics is not intuitive. (Not easy to
understand or use.)
• Statistical conclusions are always presented in
terms of probability.
• All statistical tests are based on assumptions.
• Decisions about how to analyze data should be
• A confidence interval quantifies precision and is
easy to interpret.
• A P Value tests a null hypothesis and is hard to
understand at first.
• Statistically significant does not mean the
effect or phenomenon is large or
scientifically clinically important.
• Not statistically different does not mean the
effect or phenomenon is absent, small or
scientifically clinically irrelevant.
• Multiple comparisons make it hard to interpret
statistical results which is why we have
statistics to help fix that. (ANOVA, range tests)
• Correlation does not mean causation.
• Published statistics tend to be optimistic.

8
Statistics is just another language.
• What we hope to do here is to teach you the
basics needed to navigate evaluation of research.
• Things you might need to know in a Spanish
speaking country
• Dónde está el cuarto de baño por favor?
• Déme una cerveza por favor.
• Dónde está la biblioteca?
• In statistics
• Were the data normally distributed?
• What was the mean? Standard deviation?

9
Case Study
• General QUESTION
• HOW DO PEOPLE LET YOU KNOW THEY ARE AT YOUR DOOR
AND WANT TO COME IN?
• They ring the doorbell.
• They knock.
• They stand outside, studying kinetics, until you
open the door for your own reasons.

10
A Possible Investigation
• Possible research questions
• Data Sources?
• How do people knock on someones door?
• How many times do they knock?
• Do people speak when they knock?
• Search literature and review/compile the results
of previous studies on this subject
• Survey people and ask them how they knock
• Observe people as they knock and record data

11
Study 1 American Knocking Practices
• Questions/Propositions
• People generally approach a residence and knock
when they wish to enter.
• Describe how people knock when at someones door.
• Method
• Review available data
• Design survey, experiment, interviews or some
combination.
• Database

12
Results
• Descriptive Statistics
• Number of events observed (also known as n or
sample size) was 35.
• Sheldon knocked between 0 and 30,000
(self-reported) times when approaching Pennys
door.
• He used 1, 2, 6 and 30,000 knocks each one time.
(The 1 was the robot)
• He knocked for Leonard, then Penny, 5 times, with
one instance where he knocked for Penny first.
• Penny knocked one time on Sheldons door, in this
case she knocked three times.
• In one instance, he knocked, then approached an
interior door where he knocked a second time.
• Parametric Statistics
• The average number of knocks was 860.06 (mean)
• The most common number of knocks was 3 (mode)
• The median number of knocks was 3 (1, 2, 3, 6,
30000)
• The standard deviation of the mean number of
knocks was 4997.46

13
Results
• Without any other information, which of the
following can we infer
• In this sample, three knocks were used to alert
the resident that someone was at the door.
• People in general knock three times.
• Knocking three times is always effective in
getting someone to answer the door.
• Tony Orlando and Dawn ( http//www.youtube.com/wat
ch?vk7Jvsbcxunc ) were wrong in the 70s when
they concluded that
• You should knock three times on the ceiling
• You should knock twice on the pipe if the answer
is no
• In our data, knocks were always associated with
the calling out of a name and this process was
repeated.
• If someone is at your door and they knock three
times, followed by your name three times, and
this is repeated three times, it is likely to be
Sheldon.
• Sheldon has issues.

14
Lets take one of these conclusions and explore
it more thoroughly from a statistical
perspective.
• People in general knock 3 times.
• How would our results have changed if we had seen
only a subset of the data? (Smaller sample
size) For example what if we missed the flash
how would the results have changed?
• The average number of knocks was 3 (mean)
• The most common number of knocks was 3 (mode)
• The median number of knocks was 3 (1, 2, 3, 6,
30000)
• The standard deviation of the mean number of
knocks was 0.641689

15
Direction for future research
• Good research always poses new questions.
• Additional research questions for this example
• Is there a time when two knocks are sufficient?
• Are mechanical/technological means of knocking
just as effective as in person knocking?
• How hard would it be to find a new apartment?

16
Statistics and Biostatistics
• Statistics
• Techniques and procedures regarding the
collection, organization, analysis,
interpretation and presentation of information
that can be stated numerically (Kuzma and
Bohnenblust, 2004)
• Biostatistics
• Application of statistics to the biomedical
sciences

17
Types of Statistics
• Descriptive Statistics
• Sometimes, formal statistical analyses are not
needed or desired, depending on the research
questions. Descriptive stats tell us something
• Number of drug overdose fatalities in 2013
• Pharmacy student acceptance rate at UGA College
of Pharmacy
• Demographics of a study population (2)
• Numbers of patients experiencing an adverse
reaction to a medication.

18
Types of Statistics
• Inferential Statistics
• Observed information is incomplete and uncertain,
so we cant know for sure instead we infer.
• Drawing conclusions based on observed
information.
• Generalizing from the specifics (as is done in
most clinical research).
• Example
• Once-daily aminoglycoside (ODA) regimens have
been studied.
• When done in one location, e.g., Athens
Regional, what, if anything can or should we
infer, or generalize to other patient groups?
• What about a different dose? Would these results
still apply?

19
Terms
• Variables vs. Data
• Survey vs. Experiment
• Population vs. Samples
• Response Rate
• Sampling Techniques

20
Variables vs. Data
• When making a gentamycin dosing recommendation,
you need to understand the patients
characteristics, such as age, weight and height.
• In statistics, patient characteristics are
referred to as variables (e.g., Systolic Blood
Pressure) because the observed values change.
• The actual values of the characteristics
(variables) recorded are referred to as data
(e.g., 115 mmHg)

21
Survey vs. Experiment
• Surveys
• Observations of events or phenomena over which
few, if any, controls are imposed i.e., teaching
evaluations
• Teaching evaluations, political opinion polls,
satisfaction studies are all examples of survey
research.
• Experiments
• Design a research plan that manipulates, for
example, dosage, e.g., 50mg drug A v. 100mg or
placebo
• Studying the effects on health outcomes before
antipsychotic agents in GA Medicaid.
• Studying two doses of a new drug for toxicity.

22
Survey vs. Experiment
• Both survey and experiments are important
research designs
• FDA requires all drugs submitted for approval to
be evaluated by experimental research to
substantiate their safety and efficacy
• However, survey design is often used in
post-marketing surveillance for monitoring safety

23
Population vs. Samples
• A population is a set of persons (or objects)
having a common observable characteristic
• A sample is a subset of a population
• The goal is for this subset to be as
representative of the population as possible.
• Example
• The US population was 317,330,434 as of 830AM
January 8, 2014.1
• The CBS News Poll surveyed a sample of 808 adults
to assess preferences for presidential
candidates.

(1) http//www.census.gov/main/www/popclock.html
24
Consider
• If you wanted to study all insulin-dependent
diabetics, is there any way you could create a
list of all insulin dependent diabetics from
which to draw a sample?
• You can create / collect a random sample of
patients who generally represent the population
in question then draw inferences from this
group and generalize our results to all
insulin-dependent diabetics based on how well
your sample mimics the entire population. (Note
what assumption does this require you to make?)

25
Are 2nd Year Pharmacy Students at UGA COP an
example of a population or a sample?
26
It depends
• 2nd Year Rx Students are a sample (but probably
not random which we will talk about in a
minute) of many populations, such as all pharmacy
students at UGA, all pharmacy students in the US,
students at UGA, etc., or even a sample of the
US population. However, they are also the total
population of 2nd year pharmacy students at UGA
you to know the perspective you are taking.

27
Sampling
• Sampling nomenclature is important to
understanding research design and to evaluating
studies. The goal in evaluation of sampling
methods is to make sure the right population was
sampled for the study and the sample was
created properly.
• We dont want to accidentally observe the
Sheldons of the world.

28
Sampling terms
• Sampling frame
• a complete, non-overlapping list of the persons
or objects in the population.
• e.g., Want to draw a sample of GA pharmacists we
could use the database of all registered GA
pharmacists as a sampling frame
• Hard to develop a sampling frame for studying
patients with asthma, or any condition for that
matter. This makes finding a representative
sample very important.
• Random sampling is the primary method of
obtaining a sample that is representative of a
larger population and an issue which can have a
huge impact on study results.

29
Random Samples
• Random Sample
• Sample units are chosen in an unpredictable way
• i.e., using a random number table, putting all
the names in a hat
• Types
• Simple random sample all members have equal
chance of selection.
• Cluster units are selected in groups such as
geographic area (Northeast, Southeast, Central
,West) then a random sample is created in each
area.
• Stratified choosing sub-groups or strata
(e.g., race, gender, age group, education) within
a population and sampling from within these
groups.

30
Random Sample
• Same as putting ALL the names of a population in
a hat, mix them up, and select however many names
you want.
• Note, it must be all the names and each has the
same chance of being selected.
• Avoids known and unknown biases on average
• Helps convince others that the study was
conducted properly
• It is the basis for statistical theory that
underlies hypothesis testing and confidence
intervals

31
Other Sampling Techniques
• You may see other techniques used in bio-medical
research
• Convenience Sample
• e.g., intercepting patients after having a
prescription filled at a local community pharmacy
or shopping mall.
• Systematic sampling
• e.g., take a phone book and pick a random place
to start, then take every 9th name in the book.
• Stratified sampling
• Cluster sampling
• Otherse.g., snowball sampling (which is kind of
cool)

32
Convenience Samples
• Often used when it is virtually impossible to
select a random sample
• Underlying assumption is that the sample will
accurately represent the population
• Example Estimate the average PCAT scores for
pharmacy students in the US, would you
• Use UGA Class of 2014 pharmacy students as a
study sample and survey some number of students?
While we might do this we have to ask, how
representative would this actually be?
• Use multiple pharmacy schools?
• In a clinical trial, we might recruit patients
from multiple doctors offices to get a better
picture.

33
Stratified Sample
• Grouping members of the population into
homogenous groups.
• Strata should be mutually exclusive, subjects can
be in only one strata, no group should be
excluded.
• Then, use random or systematic sampling to id
subjects in each strata.
• Can be proportional or not.
• Proportional If the population consists of 60
in the male stratum and 40 in the female
stratum, then the relative size of the two
samples (three males, two females) should reflect
this proportion.
• Sometimes this is used in medical research, e.g.,
where you want to study patients with certain
characteristics obesity, gender, pregnancy,
past history of disease, etc.

34
Questions
• Why is random sampling less prone to bias than
convenience sampling?
• Think about how we selected our convenience
• Does using a random sample guarantee a
representative sample?

35
Response Rate / Bias
• Similar meanings clinically and statistically.
Clinically it is how many patients responded in a
certain manner.
• Consider a random sample of college students in
the US. You sent out a questionnaire to these
students to assess how frequently college
students skip classes.
• The response rate is how many (usually )
students completed and returned the
questionnaire.
• Is a 50 response rate good enough?
• Generally, the higher the response rate, the more
representative the sample, but extremely high
response rates may not always be required.
• Is there any potential for bias in a study like
this?

36
Sampling Bias
• Sampling bias exists if the sample of data you
received are not representative of the
population, e.g., studied only a certain age
group when all age groups were of concern.
• In our previous example, bias may occur students
who returned the questionnaire are somehow
inherently different from those who did not.
• e.g., one could infer that more diligent students
are more likely to respond than less studious
ones.

37
Clinical Trials
• Clinical trials often employ a non-random sample
they do however use random assignment of
patients to groups (arms) within the study.

38
Sampling Bottom Line
• Assess how subjects were identified and used in
research.
• Researchers often have to make hard choices in
their investigations regarding how to find
subjects for research. Sampling procedures must
be appropriate for the study population.
• Studies are rarely perfect and most have their
own biases random sampling/assignment can help.
• We seldom get definitive answers, so we make
inferences from the data and analyses we do have.
• Learning statistics will allow you to understand
the assumptions researchers make so that you can
• Thought question Is a sample of healthy
volunteers ever a good sample to study a drug?

39
Descriptive Statistics
40
Descriptive Statistics
• Descriptive statistics are used to describe the
main features of a collection of data in
quantitative terms.
• Descriptive statistics are distinguished from
inferential stats (we talked about these last
time) in that descriptive statistics
quantitatively summarize a data set, rather than
being used to support inferential statements
• Even when a data analysis draws its main
conclusions using statistical analysis,
descriptive statistics are generally presented
along with more formal analyses, to give the
audience an overall sense of the data being
analyzed.

41
Samples of Descriptive Statistics
• Pharmacy Manpower Trends http//www.pharmacymanp
ower.com/trends.jsp
• Research Article Gabapentin for RLS

HCV Treatment Study
42
Background Classifying and Organizing Data
• Recall that data observations which are the
values of the variables you record.
• 4 Basic Levels of Measurement Scales Nominal,
Ordinal, Interval and Ratio
• Qualitative scales (Nominal and Ordinal)
• Nominal scale
• Eye color Blue, green, or brown
• No rank or order to the categories
• Presence or absence of a disease
• Gender

43
Background Classifying and Organizing Data
• Ordinal scale
• All the characteristics of a nominal scale, plus
there is a ranking among the categories
• e.g., Mild, Moderate, Severe
• First place, Second place, Third place
• Strongly Agree - - - - Strongly Disagree
• Wong-Baker Faces Scale

44
Measurement Scales
• Quantitative scales
• Interval scale
• Designates an equal-interval ordering
• No true zero point
• The distance between 1 and 2 is the same as the
distance between 49 and 50
• Fahrenheit temperature scale 0 degrees F does
not mean no temperature
• 60 degrees F is not twice as warm as 30 degrees
• Ratio scale
• All the above plus, a true zero point
• Wealth 0 means no money
• 100 is twice as much as 50

45
Levels of Measurement
• Defining levels of measurement facilitates the
choice of appropriate statistical techniques for
data analysis
• Nominal ? ? ? Ratio
• Increasing ability to use higher level
statistical analyses
• Non-parametric testing is generally performed
with nominal and ordinal level data
• Parametric testing with interval and ratio

www.statsoft.com/textbook/stnonpar.htm
46
Quantitative Data
• Interval and Ratio data can further be classified
as
• Discrete data
• Data are in whole numbers and measured by nominal
or ordinal scales
• Number of children, number of times you been
married, date of birth, etc.
• Continuous data
• Data may (but are not required) take on
fractional values
• Temperature (37.5 degrees), age, Body Mass Index
(BMI)
• The type of data you have dictates the statistics
you will use.
• Generally, nominal ordinal use non-parametric
and interval and ratio levels use parametric
stats.

47
Some Examples of Descriptive Statistics
48
Cumulative Frequency Polygon
49
Scatter Plot
50
Pie Chart
51
Incorporating the Web into your communication mix
yields strategic benefits
n482
DTC encourages consumers to look for more
information by going to the Web.
encourage consumers to talk to their MDs about
From recent research on DTC ads by Menon,
Desphande and Perri
52
Distributions
• Normal (symmetrical) Distribution (bell shaped)

53
Distributions
• Nonsymmetrical Distribution

54
Distributions
• Bimodal Distribution

55
Summarizing Data
• Descriptive Statistics
• For normally distributed data, measured on
interval and ratio level scales, the appropriate
measure of central tendency is the mean.
• The median is most appropriate for data measured
on ordinal scales (but can still be used for
continuous data)
• Mode is the appropriate measure of central
tendency for nominal data.

56
Measures of Central Tendency
• Mean is calculated by summing all the
observations and dividing the sum by the number
of observations
• Median is the observation that divides the
distribution of data into equal parts
• Mode is the observation that occurs most
frequently

57
Example
• Data Monthly income of 10 college students
• 300, 375, 485, 500, 600, 625, 1000, 2000,
3000, 3500
• Mean
• ( 300 375 485 500 600 625 1000 2000
3000 3500) / 10 1238.5
• Median
• average of 600 and 625 612.5 (half the data
above, half below.)
• Mode there is no mode

58
Measures of Variation
• Range
• Largest value smallest value
• Sometimes see quartiles (75th vs. 25th quartiles,
with the median at the 50th quartile)
• Mean Deviation (Standard Deviation)
• Sum of the deviations of each variable from the
mean observation divided by sample size its
the average deviation of all observations from
the mean
• Variance
• Is computed by squaring each deviation from the
mean, adding them up and dividing their sum by
one less than n
• Note The closer the data are around the mean,
the smaller the standard deviation.

59
Measures of Variation
• Coefficient of variation
• Not as common as mean, s.d., variance, or range.
• Expressed as a percentage, with higher
percentages indicating greater variation
• Calculated by taking the s.d. and dividing by the
mean, X100.
• Useful in comparing the amount of variability
between data.
• e.g., not much point in comparing the standard
deviation of HbA1c values with the standard
deviation of blood glucose values because they
are measured on different scales. You could
compare coefficient of variation (percentage) to
see which has the greater variability.

60
Example of Range LIPITOR Benefit 1 Lower
Cholesterol Along with diet and exercise, LIPITOR
cholesterol) by 39 to 60. (The average effect
depends on dose) ?? Lower your triglycerides (a
type of fat found in your blood) by 19 to 37.
(The average effect depends on dose) ?? Raise
your HDL ("good" cholesterol) by up to 9. (The
average effect depends on dose)
HBX_OU50o231273701663762220 accessed 1/8/08
61
Summary
• The type of data dictates the measure of central
tendency that most accurately represents the
data.
• Sometimes data are best described by summarizing
in a descriptive fashion.
• Otherwise, data are described by a measure of
central tendency and a measure of variation mean
and standard deviation.
• Sometimes a combination of both are used.