Week 5 Measurement, Sampling and Data Analysis - PowerPoint PPT Presentation

View by Category
About This Presentation

Week 5 Measurement, Sampling and Data Analysis


Week 5 Measurement, Sampling and Data Analysis – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 56
Provided by: Sandy317
Learn more at: http://www.unm.edu


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Week 5 Measurement, Sampling and Data Analysis

Week 5 Measurement, Sampling and Data Analysis
Measurement Chapter 4
  • If it exists, it is measurable!
  • Measurement is used to gain mathematical insight
    into our data
  • Measurement is a comparison We compare our data
    to a standard such as the norm, average or
    expected outcome
  • Measurement is a standard used for evaluation

  • Measurement is an essential component of
    quantitative research
  • Through measurement we can inspect, analyze and
    interpret our information

The Language of Variables
  • A variable is any observation that can take
    different values
  • Gender, Age, Religion, Ethnicity are variables
  • Attributes are specific values on a variable
  • Attributes of Gender 1. Male 2. Female
  • Are discreet values
  • Age (may be continuous 0-100)

  • An indicator is the responses to a single
    question The main concept in the question is
    the variable being measured
  • Concept A mental image that summarizes a set of
    similar observations, feelings or ideas
  • We may not all agree on the same definition
    of a concept Concepts are abstracts
  • Conceptualization specifying the dimensions of
    and defining the meaning of the concept

  • We often use concepts in our theories
  • i.e. crime, abuse, deterrence, eating disorder
  • What do we mean by crime? How is it measured.

How can we measure our concepts for research?
  • Devise operations that actually measure the
    concepts we intend to measure
  • Operationalization of concepts connect concepts
    to observations by identifying specific
    observations that we will use to indicate that
    concept in empirical reality.
  • The process of choosing the variable to
    represent the concept

  • We use variables which are derived from concepts
    in our hypotheses
  • The variables move the concepts into the realm of
    testability i.e. The indicator of crime is
    spousal abuse
  • Through the indicators of a variable we are able
    to ascertain the characteristics, behaviors,
    attitudes of our subjects

How it Works
  • Define the concept
  • We have a theory that Punishment deters criminal
  • Choose and indicator of the concept to represent
    it - The concept, punishment, is operationalized
    by using the variable, arrest to represent it.
    The indicators of the variable, arrest are 1.
    arrest on first offense 2. Dont arrest on first

  • The concept crime is represented in our research
    as physical spousal abuse.
  • The resulting hypothesis is
  • If subjects are arrested on the first reported
    offense of physical spousal abuse then they will
    be less likely to offend again (recidivism)
  • The indicators of the variable, recidivism are 1.
    re-offend 2. doesnt re-offend
  • Our concepts are now defined in measurable,
    testable terms

  • Through conceptualization and operationalization,
    measurement becomes the process of linking
    abstract concepts to empirical indicants
  • Measurement validity The operations we devise
    to measure our data must assure that we measure
    the variables we intended to measure

  • For the concept social class, we can use the
    variables income, education and occupation.
  • Now we get a much clearer picture of what
    indicators are necessary to measure the abstract
    concept, social class
  • Income can be measured by actual income
  • Education can be measured by years of education
  • Occupation can be measured by levels from not at
    all professional to being very professional

  • Measuring social variables is often done through
    questions posed to people
  • A single question may not be adequate for
    measuring a concept. Multiple questions may be
  • The concepts age, gender, ethnicity, religion,
    income, education, occupation, are what is called
    demographic variables. These can be measured with
    one question. What is your age?
  • But more than one question is necessary to
    measure Social Class, ADD, Prejudice, Nurture
  • May need to construct an Index of question

Levels of Measurement
  • Levels of measurement have important implications
    for the type of statistics that can be used in
    analyzing the data for a variable
  • 4 Levels of measurement Nominal, Ordinal,
    Interval and Ratio
  • These levels are determined by the indicators
    (response/answer categories) for a variable

  • Nominal qualitative, has no mathematical
    interpretation, even if numbers are attached to
    the value label
  • These are called categorical variables
  • For example, we may ask What is your gender?
    And the answer categories are 1. male 2. female.
    However, the numbers 1 2 do not indicate
    anything mathematical about the differences in
    the answers. Female is not more or higher of
    gender than Male.

Quantitative Levels of Measurement
  • Ordinal the numbers assigned to the response
    categories indicate order. 1 is lower in order
    than 2 and 2 is lower in order than 3.
  • 1. Very Unimportant is lower in order than 2.
    Unimportant and 2. is lower in order than 3.

  • Interval The numbers indicating values in the
    response categories have mathematical meaning.
  • They represent fixed measurement units, but have
    no absolute or fixed zero point.
  • This is important mathematically because having a
    fixed zero point allows us to use the highest
    level of statistics
  • Often researchers try to use ordinal level
    variables as interval (i.e. Likert Scales)

  • In interval level variables the numbers can be
    added and subtracted but ratios are not
  • Fahrenheit Temperature is an interval level
    variable. 60 degrees is 30 degrees hotter than
    30 degrees. But, 60 degrees can not be said to be
    twice as hot as 30 degrees because temperature
    has no absolute zero.
  • There are very few true interval-level measures
    in social science. This is why researchers use
    ordinal level data as interval level data and
    score it in ways that allow them to do so

  • Ratio The numbers attached to these response
    categories represent fixed measuring units and an
    absolute zero point. Age is a ratio level
    variable. Test Scores can be ratio. i.e. 0-100

SamplingHow to choose survey subjects?
  • Sample A subset of people (population) selected
    for study i.e. 100 students from Webster selected
  • Population larger group from which sample comes
    (will infer back to this group) i.e. All Webster
    students participate

Why Sample
  • If cant access entire population (too costly,
    too huge)
  • Sampling Goal
  • Representativeness smaller group (sample) is
    representative of larger group (population)
  • Larger the sample, more confidence in it being
  • More homogeneous the population, more confidence
    of sample representativeness

  • If sample is representative, findings can be
    generalized to population. You can infer that
    your sample will respond in same way as whole
  • But, generalizing from sample to population
    involves risk
  • Ecological Fallacy cant draw conclusions about
    individuals from group level sample of data
  • Reductionist Fallacy cant draw conclusions
    about groups from individual level sample of data

  • Not easy to achieve in experiment
  • Cant really apply findings to larger population
  • Experiments occur in artificial setting
  • Subjects recruited or selected, not chosen
    through random sampling

Types of Sampling Procedures
  • Probability Random Sampling selects subjects
    out of a large population on the basis of chance
  • ( a technique used most effectively with survey

Probability Sampling
  • Participants drawn by chance (random)
  • Every subject has equal chance of being chosen
    (known probability, 110 1100)
  • How to do Simple Random Sample
  • 1. Arbitrarily select a number from a random
    number table
  • 2. Match it to number in numbered subject list
    for starting point
  • 3. Continue selecting numbers and subjects
  • Until desired number of subjects is obtained

Systematic Random Sample
  • Arrange population elements sequentially
  • Determine size of sample wanted
  • Divide sample into of subjects in population
  • Randomly select a starting point in list
  • Select every nth subject
  • If need 5 subjects and have 45 in population,
    select every 9th person

Stratified Random Sampling
  • Characteristics of population are known to the
    researcher before taking the sample
  • Sample is selected with mirror proportions on
    characteristics such as ethnic, age, gender,
    religion,education level, income level etc.

Cluster Sampling
  • Unit chosen is not an individual, but is a
    cluster of individuals naturally grouped together
    such as Churches, Schools, Blocks, Counties,
    Businesses etc.
  • They are alike with respect to characteristics
    relevant to the study

Non-Probability Sampling-
  • Participants are not chosen by chance
  • They are Chosen due to economical and
    convenience reasons
  • Example Study on student attitudes. Stop
    students at the gym only and ask them to take the
    survey. They are not necessarily representative
    of the total student population

Types of Non-Probability Sampling
  • Accidental just encounter a of people and ask
    to be in your study
  • It is extremely weak, but popular method
  • Psychological research is often accidental
  • Convenience Similar to accidental Individuals
    seek out individuals who are available
  • Likely to be biased
  • Not representative of any population
  • Should be avoided

  • Snowball - used for hard to reach but
    interconnected populations
  • One person identifies and recommends another
    people and those people recommend other people
    and on and on.
  • Typical subjects drug dealers, prostitutes,
    practicing criminals, gang leaders, AA members

Data Analysis Chapter 12
Why are Statistics Important
  • Statistics give numeric meaning to our data
  • Helpful tool for understanding social world and
    are used to
  • 1.describe social phenomena
  • 2. identify relationships among them
  • 3. explore reasons for relationships
  • 4. test hypotheses
  • 5. interpret cause and effect

  • Can use statistics to distort reality
  • Lying with statistics is unethical
  • Easy to be careless when using statistics
  • Must use appropriate level of measurement for
    variables in our data

Preparing for Statistics
  • After data is collected it must b cleaned,
    checked and coded before statistics are run
  • There is software available to do this

Displaying Statistics
  • Graphics Bar Charts, histograms, pie charts,
    frequency tables and curve graphs describe the
    shape of the data visually

Statistics for One Variable
  • Univariate - describes statistical
    characteristics of one variable frequency
    distributions, summary statistics, measures of
    central tendency (mean, median mode), skewness,
    measures of dispersion (range, variance, standard
    deviation), reliability tests
  • Display the distribution of cases across the
    categories of one variable

Univariate Stats
  • Frequency distribution (1xtables) displays the
    number and percentage or cases corresponding to
    each of a variables values or group of values
  • Measures of Central Tendency
  • 1. Mean (arithmetic average of the values in a
    distribution) sum the values of the cases and
    divide by the number of cases

  • 2. Median (the point that divides the
    distribution in half) One in the middle
  • 3. Mode (most frequent value in a
  • The Mean is the most frequently used because it
    is the foundation for more advanced statistics

  • Skewness If there is a lack of symmetry in the
    data (symmetric would be Bell curve)
  • If data clustered to right of center- Positive
  • If data clustered to left of center Negative or
    inverse skew

  • Measures of Variation or Dispersion
  • Are the data spread out or clustered?
  • 1. Range- highest value minus the lowest value
    plus one
  • 3. Variance the average squared deviation of
    each case from the mean (takes into account the
    amount by which each case differs from the mean)

  • 4. Standard Deviation Preferred measure of
    variability because of its mathematical
    properties ( sq. root of the variance)

Bivariate/multivariate Analysis
  • Describes the association between two or more
  • Some types Cross-tabulation, Regression,
  • Measures of Association- descriptive statistics
    that summarized the strength of an association
    (Variation in one variable is related to
    variation in another.
  • For example Chi Sq. and Gamma are used to
    summarize the relationship between two or more
    variables in Cross-tabulation

  • The tables display the distribution of one
    variable for each category of another variable
    (see text pgs. 392-398)
  • Sex of voter determines party.
  • If Man then Republican

Rep M E 80 N Dem 20
W O M 30 E N 70
What to Look for
  • The IV is Gender
  • Do percentages distributions vary at all between
    categories of the independent variable?
  • (existence)
  • How much? (strength)
  • (This example is nominal level data)

Rep M E 80 N Dem 20
W O M 30 E N 70
Interval Level Data
  • Hypothesis - As education level (IV) increases,
    income level (DV) increases
  • Total N300
  • 100 with BAs
  • 100 with MAs
  • 100 with PhDs
  • Do values of the DV increase with increase in IV?
  • Are changes in DV fairly regular increasing
    fairly regularly? (pattern)

LT 50 60 20 5
50- 100 30 50 30
GT 100 10 30 65
Inferential Statistics
  • They estimate the degree of confidence that can
    be placed in generalizations from a sample to the
    whole population from which the sample was
  • Chi-Square used in bivariate analysis to
    estimate probability that an association between
    DV IV is not due to chance alone.

  • A probability level of .05 (p.05) from Chi Sq.
    means the probability that the association was
    due to chance is less than 5 out of 100 (5)
  • The lower the probability score the higher the
    significance level.
  • A relationship between variables is said to be
    statistically significant when the analyst feels
    reasonably confident (often 95) that an
    association was not due to chance.

  • Inferential statistics with Crosstabulation can
    tell us if there is an association more than
    would be expected by chance (coin toss 50/50)
  • But! Does not tell us how strong that
    relationship is (See pgs. 405-407)

Elaboration Analysis
  • Controlling for the effect of a third variable
  • Sometimes a 3rd variable could be effecting the
    association or strength of the association
    without us realizing it.
  • Example in Text The strength of the
    relationship between Arrest and Abuse is actually
    dependent on how much the perpetrator is vested
    in society. i.e. employed or not and married or

  • In fact, if the seemed relationship disappears
    when an extraneous (3rd variable) is controlled,
    it is probably a spurious relationship. The IV we
    think is effecting the DV isnt Its an
    extraneous variable we havent considered.
  • We hypothesize that Income level (IV) effects how
    we vote (DV)
  • In reality, income is a reflection of education
    (IV) and its education that really effects how
    we vote (DV).

Regression Analysis
  • Regression analysis and Correlation analysis-
    advantages over simple crosstabs give strength
    of association between two or more variables
  • Often collapse values of variables into
    categories for crosstabs
  • Better to leave values as continuous for upper
    level summary stats
  • Example Age 10-20 21-30 31-40 (grouped or
    categorical age)

Ethics in Data Analysis
  • l. When just letting computer search around in
    the data for relationships without a testable
    hypothesis, relationships may appear just on the
    basis of chance but mean nothing.
  • A reasonable balance needed between doing
    deductive data analysis (theorygthypothesisgtsignifi
    cant association)
  • And inductive data analysis (exploration of
    patterns in a dataset)
  • If findings are Serendipitous (based on inductive
    analysis) must be reported as such

  • 2. Report findings honestly (do not lie with
    statistics even though it is possible to do so)
  • 3. Do not mislead people by choosing summary
    statistics that accentuate a particular feature
    of a distribution. Use statistical techniques

Tools for Data Analysis and Statistics
  • Computer software ranges from easy, but not very
    comprehensive to difficult, very robust and very
  • Excel, Access, Lotus limited, elementary
    statistics, moderately expensive
  • Easy NCSS user friendly, cheap (lt100), only
    numeric data entry, outputso-so
  • SPSS Very comprehensive, user
    friendly,excellent graphics, small learning curve
  • Not very expensive for students (200-500)
  • SAS most robust, sort of user friendly, big
    learning curve,
  • CRISP, STATBasic, SYSstat expensive, not user
    friendly, More for programmers than average user
About PowerShow.com