Title:

## DATA AND DATA COLLECTION

Description:

DATA AND DATA COLLECTION Lecture 3 What is STATISTICS? Statistics is a discipline which is concerned with: designing experiments and other data collection

Transcript and Presenter's Notes

• Lecture 3

What is STATISTICS?
• Statistics is a discipline which is concerned
with
• designing experiments and other data collection,
• summarising information to aid understanding,
• drawing conclusions from data, and
• estimating the present or predicting the future.

What is STATISTICS?
• A branch of applied mathematics concerned with
• the collection
• interpretation of quantitative data
• the use of probability theory to estimate
population parameters

What is STATISTICS?
• I like to think of statistics as the science of
learning from data
• Jon Kettenring 1997, President, American
Statistical Association

Jun-07 GSE All Share Index
1 5,225.80
4 5,226.04
5 5,226.40
6 5,226.72
7 5,226.92
8 5,237.34
9 5,237.74
10 5,238.90
13 5,238.99
14 5,250.31
15 5,258.52
18 5,263.37
19 5,263.58
What is Data?
• It is a collection of facts from which meaningful
conclusions can be drawn.
• Examples
• names,
• numbers,
• text,
• graphics,
• Decimals.
• The singular form is datum and the plural form is
data.

Types of Data
• Qualitative
• Quantitative

Qualitative
• Qualitative data is not provided numerically.
• They are non numeric data.
• E.g. colour, race, geographical region, industry,
sex, type of car, place of birth, etc.
• Qualitative data may also be referred to as
categorical.

Qualitative
• Quantitative data is given numerically numeric
data.
• This can be further categorised into
• Discrete
• continuous

Quantitative
• Discrete data are numeric data that have a finite
number of possible values and represents counts.
• finite subset of the counting numbers, 1, 2, 3,
4, and 5 or
• how many students were present on a given day.
• The representation of discrete is by use of
integers. E.g. Number of firms listed on Ghana
Stock Exchange

Quantitative
• Continuous quantitative data have infinite
possibilities.
• They can be represented by real numbers.
• These are continuous with no gaps or
interruptions.
• Physically measurable quantities of length,
volume, time, mass, etc. are generally considered
continuous.
• At the physical level, especially for mass, this
may not be true.
• E.g. company profit, Height, mass and length.

Quantitative
• The structure and nature of data will greatly
affect the choice of analysis method.

Data-Cross Sectional
• Data sets may also be described as
• cross-sectional
• time series
• Cross-sectional data refers to data collected by
observing many subjects (such as individuals,
firms or countries/regions) at the same point of
time, or without regard to differences in time.
• Cross sectional data defines data set containing
observations on multiple phenomena observed at a
single point in time.

Data-Cross Sectional
• the values of the data points have meaning, but
the ordering of the data points does not.
• Analysis of cross-sectional data usually consists
of comparing the differences among the subjects.
E.g.

Data-Time Series
• Time series data is a sequence of numerical data
points in successive order, usually occurring in
uniform intervals.
• A sequence of numbers collected at regular
intervals over a period of time.
• Stated in yet another way, time series data is a
data set containing observations on a single
phenomenon observed over multiple time periods.

Data-Time Series
• In time series data, both the values and the
ordering of the data points have meaning.

Data Panel
• A data set containing observations on multiple
phenomena observed over multiple time periods is
called panel data.
• the second dimension of data may be some other
than time.
• when there is a sample of groups, like company
subsidiaries, and several observations from every
group, the data is panel data.

Data Panel
• Whereas time series and cross-sectional data are
both one-dimensional, panel data sets are
two-dimensional.
• Some data sets could possess more than two
dimensions.
• In such case the nomenclature is
multi-dimensional panel data.

Source of Data
• Primary
• Secondary.

Source-Primary
• Primary data is gathered specifically for a
research project.
• Data collected from the original source.
• Examples include data from
• focus groups,
• telephone surveys,
• Interviews
• questionnaires.

Source-Primary
• Advantages of primary data
• Collection based on researcher's need
• Control over measurement selection and execution
• timeliness of the data can be controlled
• representativeness of the data can be ensured

Source-Primary
• Advantages of primary data
• type of information desired can be directly
determined by the design of the questions.
• collected to fit the specific purpose
• data are current
• secrecy can be maintained

Source-Primary
• Disadvantages of primary data
• Expensive
• Time-consuming
• Quality declines if interviews are lengthy
• Reluctance to participate in lengthy interviews

Source-Secondary
• Secondary data is information that has already
been collected and is available to the public.
• Examples
• population statistics from the Ghana Statistical
Service (GSS) Census Office,
• economic indicators from the GSS,
• Trading data from GSE,
• information in government documents,
• Industry and trade journals.

Source-Secondary
• data contained in published accounts of
organisations.
• Many businesses and organisations also collect
information about their customers or clients
(such as where they live), and this is also
considered secondary data

Types of Secondary data
Source-Secondary
• Advantages of secondary data
• Little cost or time required to access data
(inexpensive)
• Not confined to immediate level or unit of
analysis
• available more quickly
• Several sources are available.

Source-Secondary
• Advantages of secondary data
• Saves time and money if on target
• Aids in determining direction for primary data
collection
• Pinpoints the kinds of people to approach
• Serves as a basis of comparison for other data

Source-Secondary
• Disadvantages of secondary data
• Information may be outdated
• May not be suitable
• Methodology for collection may be inappropriate.
• May not be on target with the research problem
• Quality and accuracy of data may pose a problem

Evaluating Secondary Data
• Overall suitability
• Precise suitability
• Costs and benefits

Evaluating Secondary Data
• Overall suitability
• Does the data set contain the information you
require to answer your research question(s) and
• Do the measures used match those you require?
• Is the data set a proxy for the data you really
need?
• Does the data set cover the population which is
the subject of your research?

Evaluating Secondary Data
• Overall suitability
• Can data about the population which is the
subject of your research be separated from
• unwanted data?
• Are the data sufficiently up to date?
• Are data available for all the variables you
require to answer your research question(s) and

Evaluating Secondary Data
• Precise suitability
• How reliable is the data set you are thinking of
using?
• How credible is the data source?
• Is the methodology clearly described?
• If sampling was used what was the procedure and
what were the associated sampling errors and
response rates?

Evaluating Secondary Data
• Precise suitability
• Who were responsible for collecting or recording
the data?
• (For surveys) is a copy of the questionnaire or
interview checklist included?
• (For compiled data) are you clear how the data
were analysed and compiled?

Evaluating Secondary Data
• Precise Suitability
• Are the data likely to contain measurement bias?
• What was the original purpose for which the data
were collected?
• Who were the target audience and what was their
relationship to the data collector or compiler?

Evaluating Secondary Data
• Precise Suitability
• Have there been any documented changes in the way
the data are measured or recorded, including
definition changes?
• How consistent are the data obtained from this
source when compared with data from other
sources?

Evaluating Secondary Data
• Costs and benefits
• Are you happy that the data have been recorded
accurately?
• What are the financial and time costs of
obtaining these data?
• Have the data already been entered into a
computer?
• Do the overall benefits of using this secondary
data source outweigh the associated costs?

Methods of Data Collection
• Census
• Survey

Sample Selection
• Population
• Sample frame
• Sample size
• Sampling error

Principles of Sampling
• Probability
• Non-Probability

Methods of Sampling
• Random Sampling
• Purposive
• Stratified sampling
• Systematic sampling
• Multi-stage and multi-phase