Introduction to Biostatistics - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Introduction to Biostatistics

Description:

Title: Review of key biostatistical concepts relevant to EBM Author: Haroon Saloojee Last modified by: Wits-Admin Created Date: 7/11/2004 8:23:31 PM – PowerPoint PPT presentation

Number of Views:355
Avg rating:3.0/5.0
Slides: 69
Provided by: HaroonS
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Biostatistics


1
Introduction to Biostatistics
Prof Haroon Saloojee Division of Community
Paediatrics
2
Introduction to BiostatisticsLecture 1
Summarising your data 1
3
The evidence-based clinicians motto
  • In God we trust.
  • All others must bring data.

4
Challenges
  • Statistical ideas can be difficult and
    intimidating
  • Thus
  • Statistical results are often skipped-over when
    reading scientific literature
  • Data is often misinterpreted

5
Misinterpretation of Data
  • Celebrating birthdays is healthy

Statistics show that those that celebrate the
most birthdays, live the longest.
6
You may think that
  • A Bar Chart is a map of the locations of the
    nearest taverns
  • A p-value is the result of a urinalysis
  • A t-test is a taste test between rooibos tea and
    Five Roses tea

7
Course Structure
  • BIO-SADISTICS
  • Four 45-minute lectures
  • PowerPoint presentations on student web site
  • Some text (content) also on web page
  • Plus, additional internet links

8
Syllabus for the Course
  • ?? SESSION 1 Summarizing your data 1
  • Types of data (quantitative and categorical
    variables)
  • Describing data- average (mean, median, and mode)
  • Displaying data graphically (box plots,
    histograms, bar charts, pie diagrams)
  • Frequency distributions
  • SESSION 2 Summarizing your data 2
  • The normal distribution
  • Describing data spread (range, variance,
    standard deviation, z score)
  • Quartiles, percentiles
  • Standard error of the mean
  • Confidence intervals
  • SESSION 3 Sampling principles
  • Study Population
  • The sample
  • Random sampling
  • Non random sampling
  • Sampling bias
  • Sample size and power
  • SESSION 4 Statistical tests and the concept of
    significance
  • Hypothesis testing
  • p value
  • Statistical versus clinical significance
  • Parametric versus non-parametric methods

9
Free textbook on-line
Statistics at Square One
http//bmj.bmjjournals.com/collections/statsbk/ind
ex.shtml
10
http//www.medstatsaag.com/mcqs.asp
Relevant topics Handling data 1, 4, 5, 6,
7 Sampling 10, 11 Hypothesis testing 17, 18
11
Todays Lecture
  • What types of data are there?
  • (numerical vs. categorical variables)
  • Describing data - measures of central tendency
    (mean, median and mode)
  • Summarising data graphically (histograms, box
    plots, bar charts, pie diagrams)

12
Types of data
13
Types of Data
  • Numerical data
  • Discrete
  • Examples
  • No. of children
  • No. asthma attacks in a week
  • No. of rooms in home

14
Types of Data
  • Numerical data
  • Continuous
  • Any value on the continuum is possible (even
    fractions or decimals)
  • Examples
  • Weight
  • Age
  • Temperature
  • Heart rate

15
Types of Data
  • Categorical data
  • Nominal
  • Mutually exclusive unordered categories
  • Examples
  • Sex (male, female)
  • Eye colour (brown, grey, green, blue)
  • Are you happy? (Yes, No)
  • Diarrhoea (Present, absent)
  • Can summarize in
  • Tables using counts and percentages
  • Bar Chart

16
Types of Data
  • Categorical data
  • Ordinal (ordered categories)
  • Examples
  • Degree of agreement
  • (Strongly Agree, Agree, Disagree, Strongly
    disagree)
  • Severity of injury
  • Severe, Moderate, Mild
  • Income level
  • High, medium, low

17
PRACTICE
Discrete or Continuous ?
Nominal or Ordinal?
  • mg of tar in cigarettes
  • number of people in a car
  • high to low temperature in
  • any day
  • weight
  • time
  • number of children in the
  • average family
  • Average / above avg / below average
  • Colours of Smarties
  • Grades (A, B, C, D, F)

Continuous
Ordinal
Discrete
Nominal
Continuous
Ordinal
Continuous
Continuous
Discrete
18
(No Transcript)
19
Data Summaries
  • It is ALWAYS a good idea to summarise your data
  • You become familiar with the data and the
    characteristics of the people that you are
    studying
  • You can also identify problems or errors with the
    data (data management issues).

20
Summarising and Describing Continuous Data
  • Measures of the centre of data (central tendency)
  • Mean
  • Median
  • Mode

21
Definitions
  • The arithmetic mean is what is commonly called
    the average. The mean is the sum of all the
    scores divided by the number of scores.
  • The median is the middle of a distribution half
    the scores are above the median and half are
    below the median.
  • The mode is the most frequently occurring score
    in a distribution

22
  • It has been said that a fellow with one leg
    frozen in ice and the other leg in boiling water
    is comfortable
  • on average.
  • J.M. Yancy

23
Sample Mean X
  • The Average or Arithmetic Mean
  • Add up data, then divide by sample size (n)
  • The sample size n is the number of observations
    (pieces of data)
  • ?? Example
  • Systolic blood pressures (mmHg)
  • X1 120
  • X2 80
  • X3 90
  • X4 110
  • X5 95
  • n 5

24
Notation
S (sigma) denotes the summation of a set of
values x is the variable usually used to
represent the individual data values n
represents the number of data values in a
sample N represents the number of data values in
a population
x is pronounced x-bar and denotes the mean of
a set of Sample values
  • µ is pronounced mu and denotes the mean of all
    values in a population

25
Definitions
  • Mean
  • the value obtained by adding the scores and
    dividing the total by the number of scores

S x
x
Sample
n
S x
µ
Population
N
26
Notes on Sample Mean
  • Also called sample average or arithmetic mean
  • Sensitive to extreme values
  • - One data point could make a great change in
    sample mean
  • Why is it called the sample mean?
  • To distinguish it from population mean

27
Population Versus Sample
  • Population - The entire group you want
    information about
  • For example The blood pressure of all
    20-year-old male university students in South
    Africa
  • Sample - A part of the population from which we
    actually collect information and draw conclusions
    about the whole population
  • For example Sample of blood pressures (n50)
    of 20-year-old male university students in South
    Africa
  • The sample mean X is not the population mean µ

28
Population Versus Sample
  • We dont know the population mean µ but would
    like to know it
  • We draw a sample from the population
  • We calculate the sample mean X
  • How close is X to µ?
  • Statistical theory will tell us how close X is to
    µ
  • Statistical inference is the process of trying to
    draw conclusions about the population from the
    sample

29
Weighted Mean
S (w x)
x
S
w
Your grade in many courses are weighted means
(averages). In other words, some things count
(are weighted) more than others.
30
Geometric Means
These are histograms rotated 90º, and box
plots. Note how the log transformation gives a
symmetric distribution.
31
(No Transcript)
32
  • 5 5 5 3 1 5 1 4
    3 5 2
  • 1 1 2 3 3 4 5 5
    5 5 5
  • (in order)
  • exact middle MEDIAN is 4
  • 1 1 3 3 4 5 5 5
    5 5
  • no exact middle -- shared by two numbers
  • MEDIAN is 4.5

4 5
4.5
2
33
Mode
  • The score that occurs most frequently
  • Bimodal
  • Multimodal
  • No Mode
  • The only measure of central tendency that can be
    used with nominal data

34
Examples
  • Mode is 5
  • Bimodal 2 6
  • No Mode

a. 5 5 5 3 1 5 1 4 3 5 b.
2 2 2 3 4 5 6 6 6 7 9 c.
2 3 6 7 8 9 10
  • Mode is 3
  • No Mode

d. 2 2 3 3 3 4 e. 2 2 3
3 4 4 5 5
35
Shapes of the Distribution
36
Shapes of the Distribution
37
Distribution Characteristics
38
Shapes of the Distribution
Example Height of students in the class
39
Shapes of the Distribution
Example Serum cholesterol level
40
Shapes of the Distribution
Example Birth weight of newborn babies
41
Shapes of the Distribution
42
(No Transcript)
43
Some visual ways to summarize data
  • Tables
  • Frequency table
  • Graphs
  • Histograms
  • Bar graphs
  • Box plots
  • Line plots
  • Scatter graphs
  • Charts
  • Bar chart
  • Pie diagram

44
Frequency Tables
  • Summarizes a variable with counts and percentages
  • The variable is categorical
  • Note that you can take a continuous variable and
    create categories with it
  • How do you create categories for a continuous
    variable?
  • Choose cutoffs that are biologically meaningful
  • Natural breaks in the data

45
Example of frequency table
When raw data are arranged with frequencies, they
are said to form a frequency table for ungrouped
data. When the data are divided into groups/
classes, they are called grouped data. The
classes have to be decided according to the range
of data and size of class. The number of
observations lying in a particular class is
called its frequency and the table showing
classes with frequencies is called a frequency
table. The total of frequencies of a particular
class and of all classes prior to that class is
called the cumulative frequency of that class.
46
Graphical Summaries
  • Histograms
  • Continuous or ordinal data on horizontal axis
  • Bar Graphs
  • Nominal data
  • No order to horizontal axis
  • Box Plots
  • Continuous data

47
Histogram
A histogram is a graphic representation of the
frequency distribution of a variable. Vertical
rectangles (bars) are drawn in such a way that
their bases lie on a linear scale representing
different intervals, and their heights are
proportional to the frequencies of the values
within each of the intervals.
48
Bar Chart
A bar chart is a method of presenting discrete
data organized in such a way that each
observation can fall into one of mutually
exclusive categories. The frequencies (or
percentages) are listed along the Y axis and the
categories of the variable along the X axis. The
heights of the bars correspond to the
frequencies. The bars should be of equal width
and they should not be touching me other bars.
49
Difference between bar chart and histogram
  • Bar charts for categories that are separate
  • Histograms if you got categories by dividing up
    continuous data.
  • Bars do not touch, histogram rectangles do touch.

50
Line graph
If the mid-points of the top of the bars of a
histogram are connected together by a line and if
the bars were omitted from the display, the
resultant graph will be a line graph (also called
a frequency polygon). Line graphs are good at
showing trends over a period of time. When trends
of rates (e.g. death rate, Infant Mortality Rate,
etc.) are to be displayed it is better done with
line graphs rather than histograms.
51
Scatter plot
Also called a scattergram. This a method of
displaying the distribution of two variables in
relation to each other another. The value of one
variables is measured on the X axis and the
values of the other on the Y axis. The variables
have to be on a continuous scale. Each plot thus
has two values (coordinates) from the Y and X
axis scales. A wide scatter of the plots denotes
poor correlation between the two variables. If
the two variables are perfectly correlated, then
all the plots will fall on the diagonal
(regression line).
52
Survival curve
53
Pie chart
This is a circular diagram (can be shown as 2-D
or 3-D) divided into segments, each representing
a category or subset of data (part of the whole).
The amount for each category is proportional to
the area of the sector (slice of the pie). The
total area of the circle is 100 and it
represents the total population that is being
shown.
54
Pictures of DataContinuous Variables
  • Histograms
  • Means and medians do not tell whole story
  • Differences in spread (variability)
  • Differences in shape of the distribution

55
How to Make a Histogram
  • Divide range of data into intervals (bins) of
    equal width
  • Count the number of observations in each class
  • Draw the histogram
  • Label scales

56
Pictures of Data Histograms
57
Pictures of Data Histograms
58
Pictures of Data Histograms
59
Box plot
  • Another common visual display tool is the box
    plot
  • Gives good insight into distribution shape in
    terms of skewness and outlying values
  • Very nice tool for easily comparing distribution
    of continuous data in multiple groups can be
    plotted side by side

60
Box plot
A box plot provides an excellent visual summary
of many important aspects of a distribution. The
box stretches from the lower hinge (defined as
the 25th percentile) to the upper hinge (the 75th
percentile) and therefore contains the middle
half of the scores in the distribution. The
median is shown as a line across the box.
Therefore 1/4 of the distribution is between this
line and the top of the box and 1/4 of the
distribution is between this line and the bottom
of the box.
61
Hospital Length of Stay
62
Box plot Length of Stay
63
Box plot Length of Stay
64
Misuse of graphics
  • " It pays to be wide awake in studying any graph.
    The thing looks so simple, so frank, and .so
    appealing. that the careless are easily fooled. "
    - M J Moroney.
  • Graphs and charts are often misused. The honest
    researcher must have a good handle on how graphs
    can be used to deliberately mislead people so
    that such misadventures can be avoided.
  • Common tricks used to mislead
  • The problem of scaling
  • The Advertiser's Graph
  • The transformed graph
  • The chart with too much data

65
Which graph to use?
Statistical methods depend on the form of a set
of data, which can be assessed with some common
useful graphics Graph Name Y-axis X-axis
Histogram Count Category Scatterpl
ot Continuous Continuous Dot
Plot Continuous Category Box
Plot Percentiles Category Line Plot Mean
or value Category
66
Example of MCQ 1
The arithmetic mean of a set of values a) Is a
particular type of average.b) Is a useful
summary measure of location if the data are
skewed to the right.c) Coincides with the median
if the distribution of the data is
symmetrical.d) Is always greater than the
median.e) Cannot be calculated if the data set
contains both positive and negative values
67
Example of MCQ 2
  • A histogram
  • a) Can be used instead of a pie chart to display
    categorical data.
  • b) Is similar to a bar chart but there are no
    gaps between the bars.
  • c) Contains contiguous bars, with the height of
    each bar being proportional to the frequency of
    the observations in the range specified by the
    bar.
  • d) Can be used to display either a frequency or a
    relative frequency distribution.
  • e) Is used to show the relationship between two
    variables.

68
Any questions?
Write a Comment
User Comments (0)
About PowerShow.com