Statistics for Research - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Statistics for Research

Description:

... online greeting cards services (AmericanGreeting.com, Blue Mountain Arts, and Hallmark. ... Greeting Cards -- $6.8 million ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 19
Provided by: dano6
Category:

less

Transcript and Presenter's Notes

Title: Statistics for Research


1
Statistics for Research
  • Spring 2004

2
Lies and Statistics
  • Statistical Lies
  • 14 April 1999
  • Dan Knight
  • Mark Twain is purported to have said, "There are
    lies, damned lies, and statistics."
  • Whether he actually said it or not, the fact
    remains that statistics can and do lie. Of all
    computer users, Mac users should know it best.
  • Remember the 3 and 4 market share figures we
    saw when Apple seemed on its deathbed? Well, they
    were only partly true -- they only measured part
    of the computer market.
  • Worse, market share numbers only apply to current
    sales. They tell us nothing about the installed
    base. So even when Apple had a 5 market share,
    anywhere from 10-15 of all computer users were
    using Macs. (See Mac installed base, MacInTouch)
  • After all, we don't sell our existing computers
    every month, quarter, or year!
  • More Lies
  • A recent study by WebSide Story showed that only
    3 of web users are on Macs. This is suspiciously
    low for a platform with at least 10 of the
    installed base!
  • The problem isn't the numbers, but how they were
    derived. WebSide Story used HitBOX Tracker. As
    Ben Wilson of MacCentral notes, "It should be
    emphasized that the HitBOX service only provides
    a viewer for Windows, meaning not many
    Mac-specific sites are likely to use the
    service."
  • Simply stated if you use a platform related
    measuring tool, results will be skewed toward
    that platform.

3
  • Real Statistics
  • A survey of visitors to Low End Mac shows very
    different results from those created by WebSide
    Story
  • 58.5 use the Mac OS (not 2.7), and only 36.6
    use Windows (not 94.5)
  • 63.8 browse with Netscape, 34.3 with Internet
    Explorer (not 68.7), and 0.7 use iCab
  • Admittedly, these numbers show a bias toward the
    Mac side, just as the WebSide statistics show a
    pro-Microsoft bias in both OS and browser.
  • The Truth
  • The truth is, we'll never have precise figures
    comparing Mac OS, Linux, Windows, and other
    market share or installed base. And any survey of
    the internet will be biased by the sites and
    software used -- sometimes just a little, but
    sometimes a great deal.
  • In reality, Mac users comprise somewhere over 10
    of the installed base, but probably less than
    15. Based on that and the ease of connecting all
    but the oldest Macs to the internet, we can
    estimate that Mac users are at least as likely to
    be on the internet as Windows users, so at least
    10 (and perhaps as high as 20) of internet
    users are Mac users.
  • Anyone purporting to offer statistics on
    something as vague as the installed base or use
    of the internet with three or four digits of
    precision is being unrealistic. While the numbers
    may mathematically provide a figure of great
    precision, the nature of the study leaves a
    margin of error of several percentage points --
    just like those public opinion polls in the press
    and on TV.
  • We can safely say that Windows users outnumber
    Mac users in the home, in the office, and on the
    internet. We can estimate there are about seven
    times as many Windows users as Mac users.
  • Beyond that, we have to remember that there is a
    great margin of error in these surveys. I would
    be a fool to extrapolate from visitors to my site
    and claim that almost 60 of all computers on the
    internet are Macs, despite the numbers which
    could be construed to "prove" it.

4
Another Example
  • TECHNOLOGY BREAKDOWN Numbers Don't Lie - Or Do
    They?
  • Statistics are a funny thing. You can use them to
    prove almost any argument. For example, many
    people regard Philadelphia 76er's player Allen
    Iverson as a great player. To be sure, while I
    despise AI's off court persona, there is no
    denying his prowess on the basketball court. One
    would expect that such a talented player would be
    able to carry his team like Michael Jordan did
    the Chicago Bulls. However, in case of the 76ers,
    the converse is true. Statistics show that in
    games where Iverson scores 40 or more points, the
    76ers actually have a losing record. The same was
    true of another 76er great, Wilt Chamberlain.
    What makes the use of statistics persuasive then
    is something called correlation. Simply put,
    correlation is what defines cause and effect
    between two observable phenomena - in this case,
    the number of times AI scores 40 more and the how
    many times Philly wins when he does. In our
    example, the level of correlation is pretty high.
    However, when the correlation between events is
    low, you can make pretty much any argument you
    want - statistics then becomes the smoke and
    mirrors used by spin doctors to shape the views
    and opinions of those who chose to, as the Wizard
    of Oz said, "ignore that man behind the curtain!"
    Right now, Republicans are pretty happy with
    themselves. On Sunday (December 14, 2003) US Army
    forces captured Saddam Hussein. But more
    importantly, the Bush Administration has been
    pushing the notion that the economy is
    recovering. Well, let's take a look at some
    numbers. The administration says unemployment is
    down. But what numbers do they use to support
    that claim. The most quoted statistic with regard
    to unemployment is the number of new claims for
    unemployment aid. If that number goes down,
    surely the unemployment problem is getting better
    right? Not necessarily so. In November, the
    government stated unemployment rate fell from 6
    percent to 5.9 percent and is the lowest since
    the 5.8 measurement in March 2003. What could be
    wrong with that? It turns out plenty.

5
  • First, the statistics are based on those people
    who are unemployed and are collecting
    unemployment benefits. One of the biggest
    weaknesses in the government's numbers is that
    they do not track what happens to people who
    exhaust their unemployment benefits and fall out
    of the system. Second, fluctuations in the
    unemployment rate give no indication how long
    people remain unemployed. Of the 8.7 million
    people classified as unemployed, nearly a quarter
    of them (23.7 percent) have been unemployed for
    over six months - which turns out is the highest
    level of long term unemployment for over 20
    years. Finally, let's not forget a statistic that
    is one of my favorites. Over the last 10 years,
    the unemployment rate in the African American and
    Latino communities has tracked at roughly double
    the national average. Apparently, Mr. Clinton's
    economic policies were about as beneficial for
    American minorities as Mr. Bush's - not very. In
    the name of fairness, Democrats like to take
    credit for the prosperity of the late 90's. The
    only thing that Mr. Clinton and his bunch had to
    do with the boom of the 90's was that they were
    sitting in the chairs at the time. The boom of
    the late 90's was due primarily to the emergence
    of the World Wide Web and a "gold rush" of
    speculative investment that followed and
    eventually collapsed. In fact, Clinton era
    legislation such as the 1996 Telecommunications
    Reform Act probably did more to stifle
    competition by shifting the balance of power in
    favor of the Bells (local phone companies) as
    opposed to long distance providers and
    competitive local exchange carriers (CLECs or
    alternative phone companies). That, and we won't
    even get into what the Telecom Act did to the
    media business. So where is the economic
    recovery that we keep hearing about? It's in the
    numbers and how you look at them. What we've
    shown here, or at least opened the door to is the
    possibility that economic recovery is not very
    tightly correlated to job creation and low
    unemployment. We can say this because
    productivity is up while long term unemployment
    is high. The reason for this is due to a number
    of factors, but most notably automation. One of
    the primary factors that allowed the
    manufacturing industry to move overseas was that
    automation got affordable. Automation means equal
    or increased productivity with fewer employees.
    This translates to lower costs, which in turn
    translates to a stronger cash position which
    helps corporate management keep company stock
    prices (and their bonuses) up. To bring this
    point home, Lehman Brothers economist Drew Mathus
    commented that reported 2 increase in the number
    of aggregate hours worked in the economy was "the
    equivalent of adding 350,000 additional jobs."

6
  • On Tuesday, the Department of Commerce issued a
    report of the "Digital Economy" that concludes
    that signs of recovery are present in the
    technology industry. Again, a look behind the
    numbers tells a different story. For instance,
    the report indicates that the rate of change in
    IT employment decreased by 9.2 percent from 2001
    to 2002, six times faster than the rate for all
    other private industries combined which lost jobs
    at a rate of 1.6 over the same period. The report
    also notes that the average wage for IT workers
    decreased by 1.3 percent from 2002 - 2003 while
    that of the average worker increased 1 percent
    over the same period. Call me crazy, but that
    doesn't sound like recovery to me. A few months
    ago, I wrote a column that talked about the dim
    prospects for the future of American IT
    employment, mostly due to offshoring and foreign
    outsourcing. The Digital Economy report states
    that of the 516,000 IT jobs lost since 2000,
    nearly half have been highly skilled positions,
    you know engineers, programmers and the like. To
    avoid ticking off potential campaign
    contributors, the government report draws no
    conclusions of its own, but reiterates those of
    other researchers. One of the conclusions drawn
    by Goldman Sachs and cited in the report was that
    "job losses due to offshoring could reach 6
    million over the next decade." More frightening
    was Goldman Sachs' conclusion that "In the end,
    increased offshoring would reduce labor demand,
    raise imports, and place downward pressure on the
    value of the dollar." The long and short of it
    is, that those on both sides of the political
    aisle can use statistics to prove whatever story
    is convenient at the time. Next year, we will
    once again be faced with choosing a new
    President. When some campaign flunky starts
    telling you about how great their person is, and
    offers a few statistics, take a look behind the
    numbers because as we now know, numbers can lie.
    Russell de Pina is a Principal with n2active, a
    technology consulting firm located in Long Beach,
    CA and Houston, TX. Russell can be reached by
    email at rdepina_at_n2active.com and
    eurfeedback_at_eurweb.com

7
One final Example
  • Lies, Damn Lies, and Statistics    
    Publishing Free or Fee?
  • BY Vin Crosbie August 13, 2002
  • Mark Twain popularized in the U.S. Benjamin
    Disraeli's statement, "There are three kinds of
    lies lies, damn lies, and statistics." That
    sardonic journalist, author, and speaker
    recognized the persuasive power of
    authoritatively made numeric presentations to a
    largely innumerate public. Ninety-two years after
    Twain's death, the Online Publishers Association
    issued a statistical report that buoys advocates
    of charging for online content.
  • Entitled "Online Paid Content U.S. Market
    Spending Report" and prepared by survey firm
    comScore Networks, the report states U.S.
    consumers spent 675 million for online content
    in 2001, a 92 percent increase over the previous
    calendar year.
  • Advocates of charging for online content point to
    this report's statistics as a signal consumer are
    more willing to pay. The statistics have induced
    journalists into affirming a trend is underway.
    "A Shift Registers in Willingness to Pay for
    Internet Content," headlines The New York Times.
    "Americans are warming up to paying for content
    on the Web," adds CNET's News.Com.
  • Wow! Consumers are paying for online content.
    What is it that made them suddenly more willing
    to do this?

8
  • To answer that question, I must quote another
    famous American, William Jefferson Clinton "It
    depends upon what the meaning of the word 'is'
    is." That presidential statement to a grand jury
    is now enshrined in Bartlett's Familiar
    Quotations, which is edited by Justin Kaplan,
    Mark Twain's biographer.
  • Clinton's quote is an exercise in semantics. So,
    unfortunately, is the OPA's report.
  • OPA is "an industry trade organization dedicated
    to representing high-quality online publishers
    before the advertising community, the press, the
    government, and the public," whose membership
    includes Bankrate.com, CBS MarketWatch, CNET
    Networks, CondéNet, Cox Enterprises, ESPN.com,
    Forbes.com, Knight Ridder Digital, Le Monde
    Interactif, MSNBC.com, New York Times Digital,
    Salon Media Group, Scripps Network, Slate,
    SPACE.com, Tribune Interactive, USATODAY.com,
    Wall Street Journal Online, Washingtonpost.com,
    Newsweek Interactive, and weather.com.
  • Those companies are publishers and broadcasters
    of what most people traditionally term content
    news and features, or TV programming. OPA
    members' parent companies would likely define
    content that way.
  • How does the OPA report define paid content? "We
    restrict our definition of 'paid content' to
    digital intellectual property purchased through a
    Web browser by an individual."
  • Paid "digital intellectual property." What does
    that mean exactly?
  • To OPA, it means not only what its membership's
    parent companies would traditionally define as
    content but also business-to-business (B2B)
    online research, such as Internet industry
    reports purchased from eMarketer.com wiring
    reports from the Institute of Electrical and
    Electronics Engineers' site day trader
    investment advice (e.g., ChangeWave.com) and
    downloadable clip art (e.g., ArtToday). All this
    B2B material counts toward consumer paid content
    totals.
  • Also included is pure research, such as
    Merriam-Webster OnLine, eLibrary.com,
    Brittanica.com, and USSEARCH, a category
    including not only dictionaries and encyclopedias
    but also public information and personal
    background searches. To OPA, content also means
    online subscriptions to services. OPA counts
    "Community Directories" such as Ancestry.com and
    Classmates.com. The services OPA counts as
    content include online consumer credit help such
    as ConsumerInfo.com and CreditExpert, plus
    "Personal Growth'' services, such as eDiets and
    WeightWatchers.com.

9
  • OPA includes online greeting cards services
    (AmericanGreeting.com, Blue Mountain Arts, and
    Hallmark.com) as content. Oh, gift certificates
    purchased from those sites are content, too.
  • OPA says "Entertainment/Lifestyle" streaming
    media from Real.com and pressplay and even pinups
    from Playboy.com are content. The study further
    counts online game site subscriptions, such as
    the Alien Adoption Agency, Case's Ladder, and The
    Well Dressed SIM.
  • How far does OPA's definition of content go? Toss
    in personal ads and dating site subscriptions,
    such as Match.com, Singles.com, and kiss.com.
  • "Digital intellectual property"? I can understand
    Brittanica.com's content being defined that way.
    But a personal ad labeled "Sexy Guy Seeks Comfy
    Girl" on kiss.com? A post of "Uma's
    Unmentionables" on The Well Dressed SIM? Gift
    certificates on Hallmark.com?
  • Do I detect OPA inflating the definition of paid
    online content? Why not add online revenue from
    Ticketmaster and other traditional services?
  • At least OPA decided to "exclude software
    downloads, pornographic sites, gambling sites,
    and certain other classes of sites which, in our
    view, skirted the bounds of decency or the law."
    Interesting. You might think software would fit
    the description of digital intellectual property.
    I wonder what bounds of decency or law thousands
    of companies that offer downloadable software are
    skirting? I can see excluding gambling
    subscriptions and winnings as a form of content.
    Incongruous that OPA would exclude pornographic
    downloads, perhaps the most widespread form of
    online paid content.
  • Does OPA's definition of online paid content make
    sense when you look at its membership? No, but it
    sure fabricates a great amount of money OPA can
    claim is being paid for online content.
  • Let's look again at the 675 million the report
    says was spent on online content in 2001. Here's
    the breakdown

10
  • Business Content -- 214.3 million
  • Entertainment/Lifestyle -- 112.0 million
  • Personals/Dating -- 72.0 million
  • Research -- 57.9 million
  • General News -- 51.8 million
  • Games -- 46.5 million
  • Community Directories -- 46.1 million
  • Credit Help -- 32.4 million
  • Personal Growth -- 24.7 million
  • Sports -- 10.0 million
  • Greeting Cards -- 6.8 million
  • Cut categories that actually provide paid
    services (Personals/Dating, Community
    Directories, Personal Growth, and Greeting Cards)
    and 675 million drops to 523.4 million. Then,
    eliminate downloadable songs and pinups. The
    figure arguably falls to 411.4 million. Remove
    B2B content from the Business Content and
    Research portions of this consumer spending
    report, and the total spent online for content,
    as traditionally defined, was probably only about
    139.2 million last year.
  • Indeed, the only OPA members listed in the
    report's ranking of the top 25 paid online
    content sites were The Wall Street Journal Online
    and ESPN.com.
  • Did American consumers spend 675 million for
    online content last year? Depends how you define
    content. Most people wouldn't define it the way
    OPA does. But OPA's definition sure makes for a
    spectacle of a press release.
  • "If a spectacle is going to be particularly
    imposing, I prefer to see it through somebody
    else's eyes, because that man will always
    exaggerate. Then, I can exaggerate his
    exaggeration, and my account of the thing will be
    the most impressive," Mark Twain also said.
  • It seems the wishful revenue hype of the Internet
    bubble era isn't entirely over. It's doubly
    ironic when you consider traditional media
    outlets belonging to OPA's membership are
    headlining the business stories about inflated
    revenues.
  • OPA states, "This first report and its ongoing
    installments will enliven the debate about the
    role of paid content in online publishing." It
    did for me. There is a role for reporting the
    amount of paid content in online publishing, but
    there's no role for hype.

11
  • Which of the two types of pasture (danthonia
    sub/clover, phalaris sub/clover) has a faster
    growth rate between July and October (in
    kg/ha/day)?

12
(No Transcript)
13
(No Transcript)
14
Class Objectives
  • Understand the scientific rationale behind
    different statistical methods.
  • Understand the socio-political implications of
    the use of different statistical methods.
  • Understand the different approaches to statistic
    research methodologies descriptive and
    inferential methods as they apply in the natural
    and social sciences.
  • Critically analyze those statistical methods
  • Discuss the Ethical dimensions of statistical
    research.
  • Understand the planning and preparation stages
    of statistical research through its application.
  • Apply the skills and theories discussed within
    the context of the Prescott and Las Vegas areas.
  • Explore how well structured and rigorously
    acquired knowledge on social and environmental
    problems can be a powerful tool in finding their
    solution..

15
Course Outline
  • I.  Introductory Concepts
  • A.  Why are we here If statistics are tricky,
    why should we learn about them?
  • B.   Types of data
  • C.    The Basics of SPSS
  • D.    Populations and sampling
  • II. Basics of Descriptive Statistics
  • A. The Standard Normal Distribution
  • B. Descriptive statistics of location and
    dispersion
  • C. The Binomial and other Distributions
  • III. Inferential Statistics The use of
    statistics to test hypotheses and establish
    connections and causalities.
  • A. Comparing descriptive statistics
  • B. Analysis of Variance (ANOVA)
  • E.  Correlation
  • F.  Chi-Square and non-parametric tests
  • G.  Simple and multiple regressions
  • H.  Evaluating inference and manipulating data to
    meet the assumptions of statistical models.

16
Evaluation
  • Contract
  • 4 Computer Assignments
  • Take home exam in Vegas and Prescott

17
Text
  • Kirkpatrick, L. and Feeney, B. (2001) A Simple
    Guide to SPSS for Windows. Belmont, CA,
    Wadsworth. 118 pp.

18
Course Scheule
Write a Comment
User Comments (0)
About PowerShow.com