YY Teo Associate Professor Saw Swee Hock School of Public Health, NUS Department of Statistics - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

YY Teo Associate Professor Saw Swee Hock School of Public Health, NUS Department of Statistics

Description:

Genome Institute of Singapore, A*STAR A pharmaceutical firm is developing a medical drug, that purportedly treats severe headache. During the clinical trials ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 66
Provided by: Teo3
Category:

less

Transcript and Presenter's Notes

Title: YY Teo Associate Professor Saw Swee Hock School of Public Health, NUS Department of Statistics


1
ST1232 Statistics in the Life Sciences
  • YY TeoAssociate ProfessorSaw Swee Hock School
    of Public Health, NUSDepartment of Statistics
    Applied Probability, NUSLife Sciences Institute,
    NUSGenome Institute of Singapore, ASTAR

2
Lesson Structure
  • 13 weeks of 2 lectures (of 2 hours) per week
  • Practically, 17-18 lectures planned, newspaper
    statistics, conferences, etc.
  • Tutorials in computer labs from week 3 onwards
    (11 weeks of tutorials)
  • Consultation (Fridays 2pm 3.30pm)
  • 3 assessments
  • tutorial participation (10)
  • mid-term quiz (30)
  • end-of-term exam (60)

3
Resources
  • Lectures, slides, tutorials
  • Fred Ramsey and Dan Schafer (2001) The
    Statistical Sleuth. 2nd edition, Duxbury Press
  • Julie Pallant. SPSS Survival Manual A
    Step-by-Step Guide to Data Analysis Using SPSS
    for Windows. 3rd edition, Open University Press
  • http//www.statistics.nus.edu.sg/statyy/ST1232

4
Tutorials
  • Note the available time slots and sign up at the
    CORS system http//www.nus.edu.sg/cors/. The
    tutorial will be at S16-05-102 (Com lab 2)
  • T1 Mondays (8am 9am)
  • T2 Mondays (9am 10am)
  • T3 Tuesdays (8am 9am)
  • T4 Tuesdays (9am 10am)
  • T5 Wednesdays (9am 10am)
  • T6 Wednesdays (10am 11am)
  • T7 Wednesdays (11am 12pm)
  • T8 Thursdays (9am 10am)
  • T9 Thursdays (10am 11am)
  • T10 Thursdays (11am 12pm)
  • T11 Fridays (8am 9am)
  • T12 Fridays (9am 10am)
  • T13 Fridays (10am 11am)

5
Medical Statistics
  • Quantitative basis to human diseases and traits
  • Progression from observational science!
  • Statistics and mathematics required for this
    advancement, from observational to quantitative

6
Statistics in medical research
Medical statistics
7
Pregnancy Test Kit
A woman buys a pregnancy test kit, and is
interested to find out whether she is pregnant.
One hypothesis in this case (status quo), is
that she is not pregnant. The other hypothesis
(hypothesis of interest), is that she is
pregnant. Test kit may show ve indicating
there is evidence to suggest pregnancy ve
indicating lack of evidence to suggest pregnancy
8
Pregnancy Test Kit
The test kit may either be accurate, or
inaccurate.
Actually pregnant
Actually not pregnant
Correct ve diagnosis(Sensitivity, or Power)
Incorrect ve diagnosis
Test kit shows ve
Incorrect ve diagnosis
Correct ve diagnosis(Specificity)
Test kit shows ve
9
Sensitivity or Specificity?
  • Objective of the experiment

10
Sensitivity or Specificity?
  • Objective of the experiment
  • HIV diagnostic kit, 99.9 sensitive and 99.5
    specific

11
Sensitivity or Specificity?
  • Objective of the experiment
  • HIV diagnostic kit, 99.9 sensitive and 99.5
    specific

Correct identification of HIV ves
12
Sensitivity or Specificity?
  • Objective of the experiment
  • HIV diagnostic kit, 99.9 sensitive and 99.5
    specific

Correct identification of HIV ves
Correct identification of HIV -ves
13
Sensitivity or Specificity?
  • Objective of the experiment
  • HIV diagnostic kit, 99.9 sensitive and 99.5
    specific
  • Tests on immigrants, assume 1,001,000
    applications each month, of which 1000 are truly
    HIV-positive

14
Sensitivity or Specificity?
  • Objective of the experiment
  • HIV diagnostic kit, 99.9 sensitive and 99.5
    specific
  • Tests on immigrants, assume 1,001,000
    applications each month, of which 1000 are truly
    HIV-positive HIV ve x 1000 HIV ve x
    1,000,000

15
Sensitivity or Specificity?
  • Objective of the experiment
  • HIV diagnostic kit, 99.9 sensitive and 99.5
    specific
  • Tests on immigrants, assume 1,001,000
    applications each month, of which 1000 are truly
    HIV-positive HIV ve x 1000 HIV ve x
    1,000,000

On average, 999 correctly identified, 1
incorrectly diagnosed as HIV -ve
On average, 995,000 correctly identified as HIV
-ve, 5000 incorrectly diagnosed as HIV ve
16
Sensitivity or Specificity?
On average, 999 correctly identified, 1
incorrectly diagnosed as HIV -ve
On average, 995,000 correctly identified as HIV
-ve, 5000 incorrectly diagnosed as HIV ve
17
Height and Weight
18
Height and Weight
19
Height and Weight
20
Height and Weight
21
Scientific Process
22
Research hypothesis - What is your scientific
question? - What are you trying to achieve?
23
Scientific Process
24
Human Diversity
25
Human Diversity
  • Even within human race, variation exists between
    people of different ethnicities, cultures and
    populations
  • Genetic basis to a substantial fraction of such
    variation

26
Human Diversity
  • Even within human race, variation exists between
    people of different ethnicities, cultures and
    populations
  • Genetic basis to a substantial fraction of such
    variation
  • Observable differences physical appearances,
    build, weight

27
Human Diversity
  • Even within human race, variation exists between
    people of different ethnicities, cultures and
    populations
  • Genetic basis to a substantial fraction of such
    variation
  • Observable differences physical appearances,
    build, weight
  • Variation in susceptibility to diseases
  • Influenced by evolutionary processes, over many
    generations
  • Cross-sectional observation of adaptation and
    natural selection

28
Target population
  • Depends entirely on your research hypothesis!

29
Target population - Everyone in Singapore? -
Every female individuals in Singapore? - Every
female individuals of a certain age in
Singapore? - Every femal individuals of a certain
age in Singapore, and who could be pregnant?
30
Target population - Everyone in Singapore? -
Everyone of a certain age in Singapore? -
Everyone of a certain age in NUS? - Everyone of
a certain age from a specific population group in
Singapore
31
Target populations
  • Depends entirely on your research hypothesis!
  • Example Interest to investigate the genetic
    factors that increase the risk to type 2 diabetes
    in Chinese adults in Singapore.
  • Target population(s)
  • Every Chinese adult in Singapore that is affected
    by type 2 diabetes
  • Normal Chinese adults (unaffected by type 2
    diabetes) of the same age band
  • Classic case-control design in medical
    epidemiology.

But, is this sufficient???
32
Samples versus Population
  • Obviously not possible to perform an experiment
    on every diabetic Chinese adult in Singapore
  • Select a representative set of individuals from
    the appropriate population to perform the
    experiment on
  • This set of individuals is known as your samples.

33
Scientific Process
34
What is your intuition?
  • A pharmaceutical firm is developing a medical
    drug, that purportedly treats severe headache.
  • During the clinical trials (testing the efficacy
    and safety of the drug), it was tested on 10
    people, of which 7 reported that it worked to
    reduce headaches, while 3 claimed it had no
    effect.
  • Another pharma also developed a competing
    treatment, but tested on 1000 people, of which
    704 reported it helped to reduce headaches, while
    294 claimed it had no effect, and 2 people
    claimed their headaches worsen.

Which setting do you think gives you more
information about the developed drug? And why?
35
Sample Size Determination
  • Types of effects that can be detected depends
    entirely on sample sizes.

36
Pregnancy Test Kit
The test kit may either be accurate, or
inaccurate.
Actually pregnant
Actually not pregnant
Correct ve diagnosis(Sensitivity, or Power)
Incorrect ve diagnosis
Test kit shows ve
Incorrect ve diagnosis
Correct ve diagnosis(Specificity)
Test kit shows ve
37
Sample Size Determination
  • An issue commonly discussed in medical research!
  • Power calculations, sample size, effect sizes,
    statistical significance?

Power calculations
Sample size
Effect sizes
Statistical Significance
Require ? evidence, means Power ?
38
Scientific Process
39
Sample Selection
  • Simple Random Sample
  • Every sample in the population has an equal
    chance of being selected (e.g. phonebook
    sampling)
  • Stratified Sample
  • Every sample in the population belongs uniquely
    to a specific category (e.g. gender)
  • Cluster Sampling
  • Each cluster has the characteristics of the
    population, and sampling is performed within the
    cluster rather than in the population (e.g.
    diabetic patients in one hospital in Singapore,
    compared to all diabetic patients in Singapore)
  • Multistage Sampling
  • A combination of different sampling schemes

40
Scientific Process
41
Data exploration and Statistical analysis
  1. Exploratory data analysis
  2. Probability and Bayes Theorem
  3. Theoretical distributions (Uniform, Bernoulli,
    Binomial, Poisson, Normal)
  4. Confidence Interval
  5. Hypothesis testing (t-test, ANOVA, test of
    proportions, Chi-square tests)
  6. Non-parametric tests
  7. Linear regression and correlation
  8. Logistic regression

42
Data exploration and Statistical analysis
  1. Data checking, identifying problems and
    characteristics
  2. Understanding chance and uncertainty
  3. How will the data for one attribute behave, in a
    theoretical framework?
  4. Theoretical framework assumes complete
    information, need to address uncertainties in
    real data
  5. Testing your beliefs, do the data support what
    you think is true?
  6. What happens when the assumptions of the
    theoretical framework are not valid
  7. Modeling relationships between multiple outcomes
    and a numerical response
  8. Ditto, but with a two-state outcome.

43
Data
44
Scientific Process
45
Statistics Truths or Lies
  • 21st century age of information
  • Responsible for driving scientific progress in
    multiple disciplines
  • Core skills for data analysis
  • Ability and knowledge to ingest and digest
    information is at a premium

46
Statistics Truths or Lies
47
Computers and Statistics
Computers and Statistics
  • Excel, SPSS, Minitab, Stata, Mathlab, R, etc
  • RExcel for this course
  • http//www.stat.nus.edu.sg/statyy/ST1232/bin/RExc
    el_installation.docx
  • Advantages
  • Speed, accuracy, ease of data manipulation
  • Easy to produce plots, cross-tabulation tables,
    summary statistics
  • Disadvantages
  • Inappropriate analysis / use of wrong tests
  • Data dredging

48
Brief introduction to RExcel and SPSS
49
Features
  • RExcel and SPSS extremely similar in terms of
    data entry and usage
  • Spreadsheet-based data entry system

50
(No Transcript)
51
(No Transcript)
52
Link data in Excel to R
53
(No Transcript)
54
(No Transcript)
55
Features
  • RExcel and SPSS extremely similar in terms of
    data entry and usage
  • Spreadsheet-based data entry system
  • Remember a unique individual/entry per row!
  • Drop-down menu option for data analysis

56
(No Transcript)
57
(No Transcript)
58
Features
  • RExcel and SPSS extremely similar in terms of
    data entry and usage
  • Spreadsheet-based data entry system
  • Remember a unique individual/entry per row!
  • Drop-down menu option for data analysis
  • While both are extremely intuitive, SPSS is
    slightly more user-friendly, in terms of defining
    variables and format of output

59
(No Transcript)
60
(No Transcript)
61
In RExcel
62
(No Transcript)
63
Output is in the R Commander tab
64
Features
  • RExcel and SPSS extremely similar in terms of
    data entry and usage
  • Spreadsheet-based data entry system
  • Remember a unique individual/entry per row!
  • Drop-down menu option for data analysis
  • While both are extremely intuitive, SPSS is
    slightly more user-friendly, in terms of defining
    variables and format of output
  • Details will be given in the subsequent lectures
  • Important to know the usage and interpretation
    of both SPSS and RExcel well, examinable and
    practically important!

65
Reminders
  • Book your tutorial slots!
  • Work on your tutorials before going to the
    classes!
Write a Comment
User Comments (0)
About PowerShow.com