Loading...

PPT – Basic Statistics Introduction and Overview PowerPoint presentation | free to download - id: 66f7d7-MDA0Z

The Adobe Flash plugin is needed to view this content

Basic Statistics Introduction and Overview

- Matthew Perri, Bs. Pharm., Ph.D., R.Ph.
- Professor of Pharmacy
- Director, Pharmacy Care Administration
- Graduate Program
- January 2014

There are lies, damn lies and statistics.

- British Prime Minister Benjamin Disraeli
- Popularized by Mark Twain

Statistical thinking will one day be as necessary

for efficient citizenship as the ability to read

and write.

- H.G. Wells

If you know twelve concepts about a given topic

you will look like an expert to people who know

two or three.

- H.G. Wells

Why should pharmacists understand statistics?

- Understanding statistics will enable you to draw

your own conclusions and make decisions - Will you recommend this drug to patients or

physicians? - Is the drug likely to work for your patients?
- Is it better or safer than existing therapies?
- Should the drug be listed on the formulary or

PDL? - Should there be dispensing limits, refill limits,

prior authorization, limiting prescribing

authority?

Case Study

- You are a recent Pharm. D. graduate from the

University of GA. Two of your professors (Drs.

May and Perri) have served over the last two

decades on the GA Department of Community Health

Drug Utilization Review Board (DURB). The DURB

is the governing body of physicians, pharmacists

and others that study and select appropriate drug

therapy for the lives covered by all GA state

funded health plans (e.g., Medicaid, State

Merritt, Board of Regents, Peach Care). Upon

departure from the Board, the Commissioner sought

input from Drs. May and Perri about who might be

good to replace them on this body of decision

makers. Dr. Perri made the recommendation to

include you on the list of possible candidates

and you were eventually selected by the

commissioner. The DURB meets quarterly and prior

to each meeting a binder is sent to all members

reviewing the disease states and recent

literature about the drugs to be reviewed at the

next meeting.

Essential Concepts and Thoughts

- Statistics let you make general conclusions from

limited data. - Statistics is not intuitive. (Not easy to

understand or use.) - Statistical conclusions are always presented in

terms of probability. - All statistical tests are based on assumptions.
- Decisions about how to analyze data should be

made in advance. - A confidence interval quantifies precision and is

easy to interpret. - A P Value tests a null hypothesis and is hard to

understand at first. - Statistically significant does not mean the

effect or phenomenon is large or

scientifically clinically important. - Not statistically different does not mean the

effect or phenomenon is absent, small or

scientifically clinically irrelevant. - Multiple comparisons make it hard to interpret

statistical results which is why we have

statistics to help fix that. (ANOVA, range tests) - Correlation does not mean causation.
- Published statistics tend to be optimistic.

Statistics is just another language.

- What we hope to do here is to teach you the

basics needed to navigate evaluation of research. - Things you might need to know in a Spanish

speaking country - Dónde está el cuarto de baño por favor?
- Déme una cerveza por favor.
- Dónde está la biblioteca?
- In statistics
- Were the data normally distributed?
- What was the mean? Standard deviation?

Case Study

- General QUESTION

- ANSWERS

- HOW DO PEOPLE LET YOU KNOW THEY ARE AT YOUR DOOR

AND WANT TO COME IN?

- They ring the doorbell.
- They knock.
- They stand outside, studying kinetics, until you

open the door for your own reasons.

A Possible Investigation

- Possible research questions

- Data Sources?

- How do people knock on someones door?
- How many times do they knock?
- Do people speak when they knock?

- Search literature and review/compile the results

of previous studies on this subject - Survey people and ask them how they knock
- Observe people as they knock and record data

Study 1 American Knocking Practices

- Questions/Propositions
- People generally approach a residence and knock

when they wish to enter. - Describe how people knock when at someones door.
- Method
- Review available data
- Design survey, experiment, interviews or some

combination. - Database
- Sample http//www.youtube.com/watch?vtKV4XYD3xK4

Results

- Descriptive Statistics
- Number of events observed (also known as n or

sample size) was 35. - Sheldon knocked between 0 and 30,000

(self-reported) times when approaching Pennys

door. - He used 1, 2, 6 and 30,000 knocks each one time.

(The 1 was the robot) - He knocked for Leonard, then Penny, 5 times, with

one instance where he knocked for Penny first. - Penny knocked one time on Sheldons door, in this

case she knocked three times. - In one instance, he knocked, then approached an

interior door where he knocked a second time. - Parametric Statistics
- The average number of knocks was 860.06 (mean)
- The most common number of knocks was 3 (mode)
- The median number of knocks was 3 (1, 2, 3, 6,

30000) - The standard deviation of the mean number of

knocks was 4997.46

Results

- Without any other information, which of the

following can we infer - In this sample, three knocks were used to alert

the resident that someone was at the door. - People in general knock three times.
- Knocking three times is always effective in

getting someone to answer the door. - Tony Orlando and Dawn ( http//www.youtube.com/wat

ch?vk7Jvsbcxunc ) were wrong in the 70s when

they concluded that - You should knock three times on the ceiling
- You should knock twice on the pipe if the answer

is no - In our data, knocks were always associated with

the calling out of a name and this process was

repeated. - If someone is at your door and they knock three

times, followed by your name three times, and

this is repeated three times, it is likely to be

Sheldon. - Sheldon has issues.

Lets take one of these conclusions and explore

it more thoroughly from a statistical

perspective.

- People in general knock 3 times.
- How would our results have changed if we had seen

only a subset of the data? (Smaller sample

size) For example what if we missed the flash

how would the results have changed? - The average number of knocks was 3 (mean)
- The most common number of knocks was 3 (mode)
- The median number of knocks was 3 (1, 2, 3, 6,

30000) - The standard deviation of the mean number of

knocks was 0.641689

Direction for future research

- Good research always poses new questions.
- Additional research questions for this example
- Is there a time when two knocks are sufficient?
- Are mechanical/technological means of knocking

just as effective as in person knocking? - How hard would it be to find a new apartment?

Statistics and Biostatistics

- Statistics
- Techniques and procedures regarding the

collection, organization, analysis,

interpretation and presentation of information

that can be stated numerically (Kuzma and

Bohnenblust, 2004) - Biostatistics
- Application of statistics to the biomedical

sciences

Types of Statistics

- Descriptive Statistics
- Sometimes, formal statistical analyses are not

needed or desired, depending on the research

questions. Descriptive stats tell us something

about a phenomenon or population - Number of drug overdose fatalities in 2013
- Pharmacy student acceptance rate at UGA College

of Pharmacy - Demographics of a study population (2)
- Numbers of patients experiencing an adverse

reaction to a medication. - Consumer awareness of advertising.

Types of Statistics

- Inferential Statistics
- Observed information is incomplete and uncertain,

so we cant know for sure instead we infer. - Drawing conclusions based on observed

information. - Generalizing from the specifics (as is done in

most clinical research). - Example
- Once-daily aminoglycoside (ODA) regimens have

been studied. - When done in one location, e.g., Athens

Regional, what, if anything can or should we

infer, or generalize to other patient groups? - What about a different dose? Would these results

still apply?

Terms

- Variables vs. Data
- Survey vs. Experiment
- Population vs. Samples
- Response Rate
- Sampling Techniques

Variables vs. Data

- When making a gentamycin dosing recommendation,

you need to understand the patients

characteristics, such as age, weight and height. - In statistics, patient characteristics are

referred to as variables (e.g., Systolic Blood

Pressure) because the observed values change. - The actual values of the characteristics

(variables) recorded are referred to as data

(e.g., 115 mmHg)

Survey vs. Experiment

- Surveys
- Observations of events or phenomena over which

few, if any, controls are imposed i.e., teaching

evaluations - Teaching evaluations, political opinion polls,

satisfaction studies are all examples of survey

research. - Experiments
- Design a research plan that manipulates, for

example, dosage, e.g., 50mg drug A v. 100mg or

placebo - Studying the effects on health outcomes before

and after limiting formulary access to

antipsychotic agents in GA Medicaid. - Studying two doses of a new drug for toxicity.

Survey vs. Experiment

- Both survey and experiments are important

research designs - FDA requires all drugs submitted for approval to

be evaluated by experimental research to

substantiate their safety and efficacy - However, survey design is often used in

post-marketing surveillance for monitoring safety

Population vs. Samples

- A population is a set of persons (or objects)

having a common observable characteristic - A sample is a subset of a population
- The goal is for this subset to be as

representative of the population as possible. - Example
- The US population was 317,330,434 as of 830AM

January 8, 2014.1 - The CBS News Poll surveyed a sample of 808 adults

to assess preferences for presidential

candidates.

(1) http//www.census.gov/main/www/popclock.html

Consider

- If you wanted to study all insulin-dependent

diabetics, is there any way you could create a

list of all insulin dependent diabetics from

which to draw a sample? - You can create / collect a random sample of

patients who generally represent the population

in question then draw inferences from this

group and generalize our results to all

insulin-dependent diabetics based on how well

your sample mimics the entire population. (Note

what assumption does this require you to make?)

Are 2nd Year Pharmacy Students at UGA COP an

example of a population or a sample?

It depends

- 2nd Year Rx Students are a sample (but probably

not random which we will talk about in a

minute) of many populations, such as all pharmacy

students at UGA, all pharmacy students in the US,

students at UGA, etc., or even a sample of the

US population. However, they are also the total

population of 2nd year pharmacy students at UGA

COP. Answering questions about a sample requires

you to know the perspective you are taking.

Sampling

- Sampling nomenclature is important to

understanding research design and to evaluating

studies. The goal in evaluation of sampling

methods is to make sure the right population was

sampled for the study and the sample was

created properly. - We dont want to accidentally observe the

Sheldons of the world.

Sampling terms

- Sampling frame
- a complete, non-overlapping list of the persons

or objects in the population. - e.g., Want to draw a sample of GA pharmacists we

could use the database of all registered GA

pharmacists as a sampling frame - Hard to develop a sampling frame for studying

patients with asthma, or any condition for that

matter. This makes finding a representative

sample very important. - Random sampling is the primary method of

obtaining a sample that is representative of a

larger population and an issue which can have a

huge impact on study results.

Random Samples

- Random Sample
- Sample units are chosen in an unpredictable way
- i.e., using a random number table, putting all

the names in a hat - Types
- Simple random sample all members have equal

chance of selection. - Cluster units are selected in groups such as

geographic area (Northeast, Southeast, Central

,West) then a random sample is created in each

area. - Stratified choosing sub-groups or strata

(e.g., race, gender, age group, education) within

a population and sampling from within these

groups.

Random Sample

- Same as putting ALL the names of a population in

a hat, mix them up, and select however many names

you want. - Note, it must be all the names and each has the

same chance of being selected. - Advantages
- Avoids known and unknown biases on average
- Helps convince others that the study was

conducted properly - It is the basis for statistical theory that

underlies hypothesis testing and confidence

intervals

Other Sampling Techniques

- You may see other techniques used in bio-medical

research - Convenience Sample
- e.g., intercepting patients after having a

prescription filled at a local community pharmacy

or shopping mall. - Systematic sampling
- e.g., take a phone book and pick a random place

to start, then take every 9th name in the book. - Stratified sampling
- Cluster sampling
- Otherse.g., snowball sampling (which is kind of

cool)

Convenience Samples

- Often used when it is virtually impossible to

select a random sample - Underlying assumption is that the sample will

accurately represent the population - Example Estimate the average PCAT scores for

pharmacy students in the US, would you - Use UGA Class of 2014 pharmacy students as a

study sample and survey some number of students?

While we might do this we have to ask, how

representative would this actually be? - Use multiple pharmacy schools?
- In a clinical trial, we might recruit patients

from multiple doctors offices to get a better

picture.

Stratified Sample

- Grouping members of the population into

homogenous groups. - Strata should be mutually exclusive, subjects can

be in only one strata, no group should be

excluded. - Then, use random or systematic sampling to id

subjects in each strata. - Can be proportional or not.
- Proportional If the population consists of 60

in the male stratum and 40 in the female

stratum, then the relative size of the two

samples (three males, two females) should reflect

this proportion. - Sometimes this is used in medical research, e.g.,

where you want to study patients with certain

characteristics obesity, gender, pregnancy,

past history of disease, etc.

Questions

- Why is random sampling less prone to bias than

convenience sampling? - Think about how we selected our convenience

sample of events from YouTube. - Does using a random sample guarantee a

representative sample?

Response Rate / Bias

- Similar meanings clinically and statistically.

Clinically it is how many patients responded in a

certain manner. - Consider a random sample of college students in

the US. You sent out a questionnaire to these

students to assess how frequently college

students skip classes. - The response rate is how many (usually )

students completed and returned the

questionnaire. - Is a 50 response rate good enough?
- Generally, the higher the response rate, the more

representative the sample, but extremely high

response rates may not always be required. - Is there any potential for bias in a study like

this?

Sampling Bias

- Sampling bias exists if the sample of data you

received are not representative of the

population, e.g., studied only a certain age

group when all age groups were of concern. - In our previous example, bias may occur students

who returned the questionnaire are somehow

inherently different from those who did not. - e.g., one could infer that more diligent students

are more likely to respond than less studious

ones.

Clinical Trials

- Clinical trials often employ a non-random sample

they do however use random assignment of

patients to groups (arms) within the study.

Sampling Bottom Line

- Assess how subjects were identified and used in

research. - Researchers often have to make hard choices in

their investigations regarding how to find

subjects for research. Sampling procedures must

be appropriate for the study population. - Studies are rarely perfect and most have their

own biases random sampling/assignment can help. - We seldom get definitive answers, so we make

inferences from the data and analyses we do have. - Learning statistics will allow you to understand

the assumptions researchers make so that you can

make your best professional judgment. - Thought question Is a sample of healthy

volunteers ever a good sample to study a drug?

Descriptive Statistics

Descriptive Statistics

- Descriptive statistics are used to describe the

main features of a collection of data in

quantitative terms. - Descriptive statistics are distinguished from

inferential stats (we talked about these last

time) in that descriptive statistics

quantitatively summarize a data set, rather than

being used to support inferential statements

about the population in question. - Even when a data analysis draws its main

conclusions using statistical analysis,

descriptive statistics are generally presented

along with more formal analyses, to give the

audience an overall sense of the data being

analyzed.

Samples of Descriptive Statistics

- Pharmacy Manpower Trends http//www.pharmacymanp

ower.com/trends.jsp - Research Article Gabapentin for RLS

HCV Treatment Study

Background Classifying and Organizing Data

- Recall that data observations which are the

values of the variables you record. - 4 Basic Levels of Measurement Scales Nominal,

Ordinal, Interval and Ratio - Qualitative scales (Nominal and Ordinal)
- Nominal scale
- Eye color Blue, green, or brown
- No rank or order to the categories
- Presence or absence of a disease
- Gender

Background Classifying and Organizing Data

- Ordinal scale
- All the characteristics of a nominal scale, plus

there is a ranking among the categories - e.g., Mild, Moderate, Severe
- First place, Second place, Third place
- Strongly Agree - - - - Strongly Disagree
- Wong-Baker Faces Scale

Measurement Scales

- Quantitative scales
- Interval scale
- Designates an equal-interval ordering
- No true zero point
- The distance between 1 and 2 is the same as the

distance between 49 and 50 - Fahrenheit temperature scale 0 degrees F does

not mean no temperature - 60 degrees F is not twice as warm as 30 degrees
- Ratio scale
- All the above plus, a true zero point
- Wealth 0 means no money
- 100 is twice as much as 50

Levels of Measurement

- Defining levels of measurement facilitates the

choice of appropriate statistical techniques for

data analysis - Nominal ? ? ? Ratio
- Increasing ability to use higher level

statistical analyses - Non-parametric testing is generally performed

with nominal and ordinal level data - Parametric testing with interval and ratio

www.statsoft.com/textbook/stnonpar.htm

Quantitative Data

- Interval and Ratio data can further be classified

as - Discrete data
- Data are in whole numbers and measured by nominal

or ordinal scales - Number of children, number of times you been

married, date of birth, etc. - Continuous data
- Data may (but are not required) take on

fractional values - Temperature (37.5 degrees), age, Body Mass Index

(BMI) - The type of data you have dictates the statistics

you will use. - Generally, nominal ordinal use non-parametric

and interval and ratio levels use parametric

stats.

Some Examples of Descriptive Statistics

Cumulative Frequency Polygon

Scatter Plot

Pie Chart

Incorporating the Web into your communication mix

yields strategic benefits

n482

DTC encourages consumers to look for more

information by going to the Web.

Searching the web for more information will

encourage consumers to talk to their MDs about

advertised Rxs

From recent research on DTC ads by Menon,

Desphande and Perri

Distributions

- Normal (symmetrical) Distribution (bell shaped)

Distributions

- Nonsymmetrical Distribution

Distributions

- Bimodal Distribution

Summarizing Data

- Descriptive Statistics
- For normally distributed data, measured on

interval and ratio level scales, the appropriate

measure of central tendency is the mean. - The median is most appropriate for data measured

on ordinal scales (but can still be used for

continuous data) - Mode is the appropriate measure of central

tendency for nominal data.

Measures of Central Tendency

- Mean is calculated by summing all the

observations and dividing the sum by the number

of observations - Median is the observation that divides the

distribution of data into equal parts - Mode is the observation that occurs most

frequently

Example

- Data Monthly income of 10 college students
- 300, 375, 485, 500, 600, 625, 1000, 2000,

3000, 3500 - Mean
- ( 300 375 485 500 600 625 1000 2000

3000 3500) / 10 1238.5 - Median
- average of 600 and 625 612.5 (half the data

above, half below.) - Mode there is no mode

Measures of Variation

- Range
- Largest value smallest value
- Sometimes see quartiles (75th vs. 25th quartiles,

with the median at the 50th quartile) - Mean Deviation (Standard Deviation)
- Sum of the deviations of each variable from the

mean observation divided by sample size its

the average deviation of all observations from

the mean - Variance
- Is computed by squaring each deviation from the

mean, adding them up and dividing their sum by

one less than n - Note The closer the data are around the mean,

the smaller the standard deviation.

Measures of Variation

- Coefficient of variation
- Not as common as mean, s.d., variance, or range.
- Expressed as a percentage, with higher

percentages indicating greater variation - Calculated by taking the s.d. and dividing by the

mean, X100. - Useful in comparing the amount of variability

between data. - e.g., not much point in comparing the standard

deviation of HbA1c values with the standard

deviation of blood glucose values because they

are measured on different scales. You could

compare coefficient of variation (percentage) to

see which has the greater variability.

Example of Range LIPITOR Benefit 1 Lower

Cholesterol Along with diet and exercise, LIPITOR

is proven to help you ?? Lower your LDL ("bad"

cholesterol) by 39 to 60. (The average effect

depends on dose) ?? Lower your triglycerides (a

type of fat found in your blood) by 19 to 37.

(The average effect depends on dose) ?? Raise

your HDL ("good" cholesterol) by up to 9. (The

average effect depends on dose)

http//www.lipitor.com/learn-about-lipitor/lipitor

-benefits.jsp?setShowOn../learn-about-lipitor/hom

e.jspsetShowHighlightOn../learn-about-lipitor/li

pitor-benefits.jspsourcegoogleHBX_PKc_lipitor

HBX_OU50o231273701663762220 accessed 1/8/08

Summary

- The type of data dictates the measure of central

tendency that most accurately represents the

data. - Sometimes data are best described by summarizing

in a descriptive fashion. - Otherwise, data are described by a measure of

central tendency and a measure of variation mean

and standard deviation. - Sometimes a combination of both are used.
- More information about your sample is better when

it comes to informing those who may want to draw

conclusions from your work.