Basic Statistics Introduction and Overview - PowerPoint PPT Presentation


PPT – Basic Statistics Introduction and Overview PowerPoint presentation | free to download - id: 66f7d7-MDA0Z


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Basic Statistics Introduction and Overview


Title: Basic Statistics Update Author: student Last modified by: mperri Created Date: 12/20/2007 3:58:19 PM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Date added: 18 January 2020
Slides: 62
Provided by: Student


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Basic Statistics Introduction and Overview

Basic Statistics Introduction and Overview
  • Matthew Perri, Bs. Pharm., Ph.D., R.Ph.
  • Professor of Pharmacy
  • Director, Pharmacy Care Administration
  • Graduate Program
  • January 2014

There are lies, damn lies and statistics.
  • British Prime Minister Benjamin Disraeli
  • Popularized by Mark Twain

Statistical thinking will one day be as necessary
for efficient citizenship as the ability to read
and write.
  • H.G. Wells

If you know twelve concepts about a given topic
you will look like an expert to people who know
two or three.
  • H.G. Wells

Why should pharmacists understand statistics?
  • Understanding statistics will enable you to draw
    your own conclusions and make decisions
  • Will you recommend this drug to patients or
  • Is the drug likely to work for your patients?
  • Is it better or safer than existing therapies?
  • Should the drug be listed on the formulary or
  • Should there be dispensing limits, refill limits,
    prior authorization, limiting prescribing

Case Study
  • You are a recent Pharm. D. graduate from the
    University of GA. Two of your professors (Drs.
    May and Perri) have served over the last two
    decades on the GA Department of Community Health
    Drug Utilization Review Board (DURB). The DURB
    is the governing body of physicians, pharmacists
    and others that study and select appropriate drug
    therapy for the lives covered by all GA state
    funded health plans (e.g., Medicaid, State
    Merritt, Board of Regents, Peach Care). Upon
    departure from the Board, the Commissioner sought
    input from Drs. May and Perri about who might be
    good to replace them on this body of decision
    makers. Dr. Perri made the recommendation to
    include you on the list of possible candidates
    and you were eventually selected by the
    commissioner. The DURB meets quarterly and prior
    to each meeting a binder is sent to all members
    reviewing the disease states and recent
    literature about the drugs to be reviewed at the
    next meeting.

Essential Concepts and Thoughts
  • Statistics let you make general conclusions from
    limited data.
  • Statistics is not intuitive. (Not easy to
    understand or use.)
  • Statistical conclusions are always presented in
    terms of probability.
  • All statistical tests are based on assumptions.
  • Decisions about how to analyze data should be
    made in advance.
  • A confidence interval quantifies precision and is
    easy to interpret.
  • A P Value tests a null hypothesis and is hard to
    understand at first.
  • Statistically significant does not mean the
    effect or phenomenon is large or
    scientifically clinically important.
  • Not statistically different does not mean the
    effect or phenomenon is absent, small or
    scientifically clinically irrelevant.
  • Multiple comparisons make it hard to interpret
    statistical results which is why we have
    statistics to help fix that. (ANOVA, range tests)
  • Correlation does not mean causation.
  • Published statistics tend to be optimistic.

Statistics is just another language.
  • What we hope to do here is to teach you the
    basics needed to navigate evaluation of research.
  • Things you might need to know in a Spanish
    speaking country
  • Dónde está el cuarto de baño por favor?
  • Déme una cerveza por favor.
  • Dónde está la biblioteca?
  • In statistics
  • Were the data normally distributed?
  • What was the mean? Standard deviation?

Case Study
  • General QUESTION
  • They ring the doorbell.
  • They knock.
  • They stand outside, studying kinetics, until you
    open the door for your own reasons.

A Possible Investigation
  • Possible research questions
  • Data Sources?
  • How do people knock on someones door?
  • How many times do they knock?
  • Do people speak when they knock?
  • Search literature and review/compile the results
    of previous studies on this subject
  • Survey people and ask them how they knock
  • Observe people as they knock and record data

Study 1 American Knocking Practices
  • Questions/Propositions
  • People generally approach a residence and knock
    when they wish to enter.
  • Describe how people knock when at someones door.
  • Method
  • Review available data
  • Design survey, experiment, interviews or some
  • Database
  • Sample http//

  • Descriptive Statistics
  • Number of events observed (also known as n or
    sample size) was 35.
  • Sheldon knocked between 0 and 30,000
    (self-reported) times when approaching Pennys
  • He used 1, 2, 6 and 30,000 knocks each one time.
    (The 1 was the robot)
  • He knocked for Leonard, then Penny, 5 times, with
    one instance where he knocked for Penny first.
  • Penny knocked one time on Sheldons door, in this
    case she knocked three times.
  • In one instance, he knocked, then approached an
    interior door where he knocked a second time.
  • Parametric Statistics
  • The average number of knocks was 860.06 (mean)
  • The most common number of knocks was 3 (mode)
  • The median number of knocks was 3 (1, 2, 3, 6,
  • The standard deviation of the mean number of
    knocks was 4997.46

  • Without any other information, which of the
    following can we infer
  • In this sample, three knocks were used to alert
    the resident that someone was at the door.
  • People in general knock three times.
  • Knocking three times is always effective in
    getting someone to answer the door.
  • Tony Orlando and Dawn ( http//
    ch?vk7Jvsbcxunc ) were wrong in the 70s when
    they concluded that
  • You should knock three times on the ceiling
  • You should knock twice on the pipe if the answer
    is no
  • In our data, knocks were always associated with
    the calling out of a name and this process was
  • If someone is at your door and they knock three
    times, followed by your name three times, and
    this is repeated three times, it is likely to be
  • Sheldon has issues.

Lets take one of these conclusions and explore
it more thoroughly from a statistical
  • People in general knock 3 times.
  • How would our results have changed if we had seen
    only a subset of the data? (Smaller sample
    size) For example what if we missed the flash
    how would the results have changed?
  • The average number of knocks was 3 (mean)
  • The most common number of knocks was 3 (mode)
  • The median number of knocks was 3 (1, 2, 3, 6,
  • The standard deviation of the mean number of
    knocks was 0.641689

Direction for future research
  • Good research always poses new questions.
  • Additional research questions for this example
  • Is there a time when two knocks are sufficient?
  • Are mechanical/technological means of knocking
    just as effective as in person knocking?
  • How hard would it be to find a new apartment?

Statistics and Biostatistics
  • Statistics
  • Techniques and procedures regarding the
    collection, organization, analysis,
    interpretation and presentation of information
    that can be stated numerically (Kuzma and
    Bohnenblust, 2004)
  • Biostatistics
  • Application of statistics to the biomedical

Types of Statistics
  • Descriptive Statistics
  • Sometimes, formal statistical analyses are not
    needed or desired, depending on the research
    questions. Descriptive stats tell us something
    about a phenomenon or population
  • Number of drug overdose fatalities in 2013
  • Pharmacy student acceptance rate at UGA College
    of Pharmacy
  • Demographics of a study population (2)
  • Numbers of patients experiencing an adverse
    reaction to a medication.
  • Consumer awareness of advertising.

Types of Statistics
  • Inferential Statistics
  • Observed information is incomplete and uncertain,
    so we cant know for sure instead we infer.
  • Drawing conclusions based on observed
  • Generalizing from the specifics (as is done in
    most clinical research).
  • Example
  • Once-daily aminoglycoside (ODA) regimens have
    been studied.
  • When done in one location, e.g., Athens
    Regional, what, if anything can or should we
    infer, or generalize to other patient groups?
  • What about a different dose? Would these results
    still apply?

  • Variables vs. Data
  • Survey vs. Experiment
  • Population vs. Samples
  • Response Rate
  • Sampling Techniques

Variables vs. Data
  • When making a gentamycin dosing recommendation,
    you need to understand the patients
    characteristics, such as age, weight and height.
  • In statistics, patient characteristics are
    referred to as variables (e.g., Systolic Blood
    Pressure) because the observed values change.
  • The actual values of the characteristics
    (variables) recorded are referred to as data
    (e.g., 115 mmHg)

Survey vs. Experiment
  • Surveys
  • Observations of events or phenomena over which
    few, if any, controls are imposed i.e., teaching
  • Teaching evaluations, political opinion polls,
    satisfaction studies are all examples of survey
  • Experiments
  • Design a research plan that manipulates, for
    example, dosage, e.g., 50mg drug A v. 100mg or
  • Studying the effects on health outcomes before
    and after limiting formulary access to
    antipsychotic agents in GA Medicaid.
  • Studying two doses of a new drug for toxicity.

Survey vs. Experiment
  • Both survey and experiments are important
    research designs
  • FDA requires all drugs submitted for approval to
    be evaluated by experimental research to
    substantiate their safety and efficacy
  • However, survey design is often used in
    post-marketing surveillance for monitoring safety

Population vs. Samples
  • A population is a set of persons (or objects)
    having a common observable characteristic
  • A sample is a subset of a population
  • The goal is for this subset to be as
    representative of the population as possible.
  • Example
  • The US population was 317,330,434 as of 830AM
    January 8, 2014.1
  • The CBS News Poll surveyed a sample of 808 adults
    to assess preferences for presidential

(1) http//
  • If you wanted to study all insulin-dependent
    diabetics, is there any way you could create a
    list of all insulin dependent diabetics from
    which to draw a sample?
  • You can create / collect a random sample of
    patients who generally represent the population
    in question then draw inferences from this
    group and generalize our results to all
    insulin-dependent diabetics based on how well
    your sample mimics the entire population. (Note
    what assumption does this require you to make?)

Are 2nd Year Pharmacy Students at UGA COP an
example of a population or a sample?
It depends
  • 2nd Year Rx Students are a sample (but probably
    not random which we will talk about in a
    minute) of many populations, such as all pharmacy
    students at UGA, all pharmacy students in the US,
    students at UGA, etc., or even a sample of the
    US population. However, they are also the total
    population of 2nd year pharmacy students at UGA
    COP. Answering questions about a sample requires
    you to know the perspective you are taking.

  • Sampling nomenclature is important to
    understanding research design and to evaluating
    studies. The goal in evaluation of sampling
    methods is to make sure the right population was
    sampled for the study and the sample was
    created properly.
  • We dont want to accidentally observe the
    Sheldons of the world.

Sampling terms
  • Sampling frame
  • a complete, non-overlapping list of the persons
    or objects in the population.
  • e.g., Want to draw a sample of GA pharmacists we
    could use the database of all registered GA
    pharmacists as a sampling frame
  • Hard to develop a sampling frame for studying
    patients with asthma, or any condition for that
    matter. This makes finding a representative
    sample very important.
  • Random sampling is the primary method of
    obtaining a sample that is representative of a
    larger population and an issue which can have a
    huge impact on study results.

Random Samples
  • Random Sample
  • Sample units are chosen in an unpredictable way
  • i.e., using a random number table, putting all
    the names in a hat
  • Types
  • Simple random sample all members have equal
    chance of selection.
  • Cluster units are selected in groups such as
    geographic area (Northeast, Southeast, Central
    ,West) then a random sample is created in each
  • Stratified choosing sub-groups or strata
    (e.g., race, gender, age group, education) within
    a population and sampling from within these

Random Sample
  • Same as putting ALL the names of a population in
    a hat, mix them up, and select however many names
    you want.
  • Note, it must be all the names and each has the
    same chance of being selected.
  • Advantages
  • Avoids known and unknown biases on average
  • Helps convince others that the study was
    conducted properly
  • It is the basis for statistical theory that
    underlies hypothesis testing and confidence

Other Sampling Techniques
  • You may see other techniques used in bio-medical
  • Convenience Sample
  • e.g., intercepting patients after having a
    prescription filled at a local community pharmacy
    or shopping mall.
  • Systematic sampling
  • e.g., take a phone book and pick a random place
    to start, then take every 9th name in the book.
  • Stratified sampling
  • Cluster sampling
  • Otherse.g., snowball sampling (which is kind of

Convenience Samples
  • Often used when it is virtually impossible to
    select a random sample
  • Underlying assumption is that the sample will
    accurately represent the population
  • Example Estimate the average PCAT scores for
    pharmacy students in the US, would you
  • Use UGA Class of 2014 pharmacy students as a
    study sample and survey some number of students?
    While we might do this we have to ask, how
    representative would this actually be?
  • Use multiple pharmacy schools?
  • In a clinical trial, we might recruit patients
    from multiple doctors offices to get a better

Stratified Sample
  • Grouping members of the population into
    homogenous groups.
  • Strata should be mutually exclusive, subjects can
    be in only one strata, no group should be
  • Then, use random or systematic sampling to id
    subjects in each strata.
  • Can be proportional or not.
  • Proportional If the population consists of 60
    in the male stratum and 40 in the female
    stratum, then the relative size of the two
    samples (three males, two females) should reflect
    this proportion.
  • Sometimes this is used in medical research, e.g.,
    where you want to study patients with certain
    characteristics obesity, gender, pregnancy,
    past history of disease, etc.

  • Why is random sampling less prone to bias than
    convenience sampling?
  • Think about how we selected our convenience
    sample of events from YouTube.
  • Does using a random sample guarantee a
    representative sample?

Response Rate / Bias
  • Similar meanings clinically and statistically.
    Clinically it is how many patients responded in a
    certain manner.
  • Consider a random sample of college students in
    the US. You sent out a questionnaire to these
    students to assess how frequently college
    students skip classes.
  • The response rate is how many (usually )
    students completed and returned the
  • Is a 50 response rate good enough?
  • Generally, the higher the response rate, the more
    representative the sample, but extremely high
    response rates may not always be required.
  • Is there any potential for bias in a study like

Sampling Bias
  • Sampling bias exists if the sample of data you
    received are not representative of the
    population, e.g., studied only a certain age
    group when all age groups were of concern.
  • In our previous example, bias may occur students
    who returned the questionnaire are somehow
    inherently different from those who did not.
  • e.g., one could infer that more diligent students
    are more likely to respond than less studious

Clinical Trials
  • Clinical trials often employ a non-random sample
    they do however use random assignment of
    patients to groups (arms) within the study.

Sampling Bottom Line
  • Assess how subjects were identified and used in
  • Researchers often have to make hard choices in
    their investigations regarding how to find
    subjects for research. Sampling procedures must
    be appropriate for the study population.
  • Studies are rarely perfect and most have their
    own biases random sampling/assignment can help.
  • We seldom get definitive answers, so we make
    inferences from the data and analyses we do have.
  • Learning statistics will allow you to understand
    the assumptions researchers make so that you can
    make your best professional judgment.
  • Thought question Is a sample of healthy
    volunteers ever a good sample to study a drug?

Descriptive Statistics
Descriptive Statistics
  • Descriptive statistics are used to describe the
    main features of a collection of data in
    quantitative terms.
  • Descriptive statistics are distinguished from
    inferential stats (we talked about these last
    time) in that descriptive statistics
    quantitatively summarize a data set, rather than
    being used to support inferential statements
    about the population in question.
  • Even when a data analysis draws its main
    conclusions using statistical analysis,
    descriptive statistics are generally presented
    along with more formal analyses, to give the
    audience an overall sense of the data being

Samples of Descriptive Statistics
  • Pharmacy Manpower Trends http//www.pharmacymanp
  • Research Article Gabapentin for RLS

HCV Treatment Study
Background Classifying and Organizing Data
  • Recall that data observations which are the
    values of the variables you record.
  • 4 Basic Levels of Measurement Scales Nominal,
    Ordinal, Interval and Ratio
  • Qualitative scales (Nominal and Ordinal)
  • Nominal scale
  • Eye color Blue, green, or brown
  • No rank or order to the categories
  • Presence or absence of a disease
  • Gender

Background Classifying and Organizing Data
  • Ordinal scale
  • All the characteristics of a nominal scale, plus
    there is a ranking among the categories
  • e.g., Mild, Moderate, Severe
  • First place, Second place, Third place
  • Strongly Agree - - - - Strongly Disagree
  • Wong-Baker Faces Scale

Measurement Scales
  • Quantitative scales
  • Interval scale
  • Designates an equal-interval ordering
  • No true zero point
  • The distance between 1 and 2 is the same as the
    distance between 49 and 50
  • Fahrenheit temperature scale 0 degrees F does
    not mean no temperature
  • 60 degrees F is not twice as warm as 30 degrees
  • Ratio scale
  • All the above plus, a true zero point
  • Wealth 0 means no money
  • 100 is twice as much as 50

Levels of Measurement
  • Defining levels of measurement facilitates the
    choice of appropriate statistical techniques for
    data analysis
  • Nominal ? ? ? Ratio
  • Increasing ability to use higher level
    statistical analyses
  • Non-parametric testing is generally performed
    with nominal and ordinal level data
  • Parametric testing with interval and ratio
Quantitative Data
  • Interval and Ratio data can further be classified
  • Discrete data
  • Data are in whole numbers and measured by nominal
    or ordinal scales
  • Number of children, number of times you been
    married, date of birth, etc.
  • Continuous data
  • Data may (but are not required) take on
    fractional values
  • Temperature (37.5 degrees), age, Body Mass Index
  • The type of data you have dictates the statistics
    you will use.
  • Generally, nominal ordinal use non-parametric
    and interval and ratio levels use parametric

Some Examples of Descriptive Statistics
Cumulative Frequency Polygon
Scatter Plot
Pie Chart
Incorporating the Web into your communication mix
yields strategic benefits
DTC encourages consumers to look for more
information by going to the Web.
Searching the web for more information will
encourage consumers to talk to their MDs about
advertised Rxs
From recent research on DTC ads by Menon,
Desphande and Perri
  • Normal (symmetrical) Distribution (bell shaped)

  • Nonsymmetrical Distribution

  • Bimodal Distribution

Summarizing Data
  • Descriptive Statistics
  • For normally distributed data, measured on
    interval and ratio level scales, the appropriate
    measure of central tendency is the mean.
  • The median is most appropriate for data measured
    on ordinal scales (but can still be used for
    continuous data)
  • Mode is the appropriate measure of central
    tendency for nominal data.

Measures of Central Tendency
  • Mean is calculated by summing all the
    observations and dividing the sum by the number
    of observations
  • Median is the observation that divides the
    distribution of data into equal parts
  • Mode is the observation that occurs most

  • Data Monthly income of 10 college students
  • 300, 375, 485, 500, 600, 625, 1000, 2000,
    3000, 3500
  • Mean
  • ( 300 375 485 500 600 625 1000 2000
    3000 3500) / 10 1238.5
  • Median
  • average of 600 and 625 612.5 (half the data
    above, half below.)
  • Mode there is no mode

Measures of Variation
  • Range
  • Largest value smallest value
  • Sometimes see quartiles (75th vs. 25th quartiles,
    with the median at the 50th quartile)
  • Mean Deviation (Standard Deviation)
  • Sum of the deviations of each variable from the
    mean observation divided by sample size its
    the average deviation of all observations from
    the mean
  • Variance
  • Is computed by squaring each deviation from the
    mean, adding them up and dividing their sum by
    one less than n
  • Note The closer the data are around the mean,
    the smaller the standard deviation.

Measures of Variation
  • Coefficient of variation
  • Not as common as mean, s.d., variance, or range.
  • Expressed as a percentage, with higher
    percentages indicating greater variation
  • Calculated by taking the s.d. and dividing by the
    mean, X100.
  • Useful in comparing the amount of variability
    between data.
  • e.g., not much point in comparing the standard
    deviation of HbA1c values with the standard
    deviation of blood glucose values because they
    are measured on different scales. You could
    compare coefficient of variation (percentage) to
    see which has the greater variability.

Example of Range LIPITOR Benefit 1 Lower
Cholesterol Along with diet and exercise, LIPITOR
is proven to help you ?? Lower your LDL ("bad"
cholesterol) by 39 to 60. (The average effect
depends on dose) ?? Lower your triglycerides (a
type of fat found in your blood) by 19 to 37.
(The average effect depends on dose) ?? Raise
your HDL ("good" cholesterol) by up to 9. (The
average effect depends on dose)
HBX_OU50o231273701663762220 accessed 1/8/08
  • The type of data dictates the measure of central
    tendency that most accurately represents the
  • Sometimes data are best described by summarizing
    in a descriptive fashion.
  • Otherwise, data are described by a measure of
    central tendency and a measure of variation mean
    and standard deviation.
  • Sometimes a combination of both are used.
  • More information about your sample is better when
    it comes to informing those who may want to draw
    conclusions from your work.