Title: African Handbook Experiences on Census Data Processing, Analysis and Dissemination
1African Handbook Experiences on Census Data
Processing, Analysis and Dissemination
- UNITED NATIONS ECONOMIC COMMISSION FOR AFRICA
- African Centre for Statistics
AFRICA SYMPOSIUM ON STATISTICAL DEVELOPMENT 19 -
21 November 2009, Dakar, Senegal
2outlines
- Background
- Overall PHC Planning
- Planning and Preparation of Census Data
Processing - Decision on data processing technology
- Data processing flow
- Evaluation of software
- Acquiring of software
- Acquiring of hardware
- Method of data capture
- Way forward
3background
- Luanda ASSD recommendation
- ECA Mission in SA to discuss with SSA the
outlines/contents of the handbook - ECA organised an EGM on the draft handbook from 5
to 7 October in Pretoria, SA
4Planning and Preparation of PHC Data Processing
Testing
Programmes
Questionnaire design
Tracking documents
Procurement process
Choice of materials
Data editing
Data coding
Data capture
Tabulation
Dissemination
Analysis
Archiving
5Planning and Preparation of Census Data
Processing
- Good Census DP planning entails identifying what
needs to be done, when and by whom - DP will be required in connection with the
results of census tests, compilation of
preliminary results, preparation of tabulations,
evaluation of census results, analysis of census
data, arrangements for storage in and retrieval
from a database, identification and correction of
errors, and so on.
6Planning and Preparation of Census Data
Processing
- The existing DP staff will certainly need to be
expanded somewhat and will probably need some
upgrading in terms of skills - Decisions will need to be made concerning the
location of the various DP activities within the
country - While considering the processing equipments to be
used in the census, decisions will have to be
made on the software to be used for capturing,
editing and tabulating the census data.
7Planning and Preparation of Census Data
Processing
- If outsourcing some of IT-related operations is
considered, it should be implemented in such a
way as to bring immediate economic and quality
advantages to census operations - In view of the long duration of the census cycle,
planning should not remain static but be flexible
to take into account scope changes that may
occur.
8Decision on data conversion technology
- The choice has a great influence on design,
layout and production of collection. This in turn
will determine the technology required to support
the data conversion process. - Choices are made during the planning phase.
- As a general rule, if hard copy forms are used
then they should mechanism to protect them from
adverse weather condition, high humidity
9Decision on data conversion technology
- If the data conversion technology selected is for
keying from paper (KFP), then there should be
sufficient space on the form for writing in codes
for open ended questions. - If scanning technology and optical mark
recognition (image processing) are used, then the
collection tool has to be of durable paper to be
able to withstand the stresses of being put a
scanner more than once. - The data collector has to handle the tool so as
to avoid the introduction of un-intended marks on
the tool thus degrading the quality of
information on the questionnaire.
10Decision on data conversion technology
- The census has to run to timelines. This requires
project plans and financial requirements - The stakeholders in this process are to be
involved and informed of the plans. The intention
of the involvement of this interaction is to
secure the commitment of the relevant
stakeholder, especially those that provide
funding.
11Decision on data conversion technology
- Intercensal period to be used for identifying and
doing feasibility studies on the appropriate
technology - Proliferation of computing technology one of the
factors to consider is whether support for such
technology is available within the country - Cost effectiveness and affordability are other
factors to take into consideration when deciding
on the appropriateness of the technology
12Data processing process flow
Receiving and audit of questionnaires
Storage and document management
Data capturing (scanning, recognition, coding,
key correction)
Quality assurance
Output process (tabulation, products, ? Internet,
prints, CDs, )
Statistical process (data examination,
derivation, comparison, adjustement, )
Validation (editing, correction, imputation, data
cleaning)
13EVALUATION OF SOFTWARE
- Evaluation criteria include whether
- The software is easy to learn and use
- It is an integrated tool that provides a common
approach - There is an easy development environment for user
interfaces - The software has strategic value to the
organization responsible for the census, or other
elements of the national information technology
infrastructure - The software is compatible with current industry
trends
14EVALUATION OF SOFTWARE
- Other possible criteria include whether
- There is current expertise in the product in the
organization or externally - Are internal or external staffs experienced with
the products readily available? - What level of training and support is required?
- What support is provided by the supplier?
- There is evidence of the supplier
- The software will be sourced locally or
internationally - It is a well-recognized of the current strength
and longer-term viability and used business with
well known products - Is the product compatible with current industry
trends? - Is the supplier financially secure?
15ACQUIRING SOFTWARE
- Software for census use in association with
selected hardware can be acquired in a number of
ways, such as - Purchasing complete off-the-shelf packages that
require no further development - Purchasing packages that can be further developed
for census-specific activities - Contracting out the provision of specific
functionality for parts of systems - Contracting for externally developed software for
complete systems - Obtaining free software such as IMPS or CSpro.
16ACQUIRING HARDWARE
- The requirements for evaluating hardware will
depend on the nature of the hardware, its
complexity and any links with existing hardware
or software. - There will normally be a tender process to ensure
that the hardware is the best solution,
technology- wise and financially, for the
organization.
17ACQUIRING HARDWARE
- Basic rules that should be followed for
acquisitions - Use requests for proposals or requests for tender
to control the process - Try to keep proposals simple
- Purchase only what is required, but as much as
possible to encourage competitiveness in the
evaluation process - Shortlist ruthlessly, focusing on the best
technical solution and overall value for money - Negotiate the warranty period
- Negotiate free training to be provided by the
vendor
18ACQUIRING HARDWARE
- Basic rules that should be followed for
acquisitions - Consider the level of local maintenance support
available - Consider the advantages and disadvantages of
purchasing locally compared to internationally - Avoid being under any obligation to a vendor
- Consider the alternative of renting
19Methods of Data capture
- Key From Paper (KFP)
- This process involves manual coding and data
entry followed at each stage by verification
process - Take into account
- Accessibility space for delivery/discharging of
questionnaires - Manual operations area
20Methods of Data capture
- Scanning model (OCR/OMR/ICR) and KFI
21Methods of Data capture
- Key from Image (KFI)
- Advantages
- Preparatory time minimal planning and
implementation time due to the basic scanning and
capturing process - Online verification that verification of
instruments occurs at the time of data entry and
therefore errors and discrepancies can be picked
up easily - Disadvantages
- Production time no computer aided recognition
occurs - Keying errors Keying errors are bound to occur
as each and every character of information is
being captured manually - Entry clerk changes data due to tight validation
If tight validation is put into place only
allowing the clerk a set number of values for
entry, any inconsistent information will be
changed to the easiest value the clerk can select
22Methods of Data capture
- Optical Character Recognition (OCR) /Intelligent
Character Recognition (ICR) - Advantages
- Recognition engines used with imaging can capture
highly specialized data sets - Recognition of machine-printed or hand-printed
characters - Scanning and recognition allowed efficient
management and planning for the rest of the
processing workload - Quick retrieval for editing and reprocessing
- Disadvantages
- Technology is costly
- May require significant manual intervention
- Additional workload to enumerators
- Ineffective when dealing with cursive characters
23Methods of Data capture
- Optical Mark Recognition (OMR)
- Advantages
- Form based OMR is a data collection technology
that does not require a recognition engine.
Therefore it is fast, using minimum processing
power to process forms and its costs are
predictable and defined - OMR capture speeds range around 4000 forms per
hour. - Disadvantages
- OMR cannot recognize hand-printed or
machine-printed characters - Images of forms are not captured by scanners so
electronic retrieval is not possible. - Tick boxes may not be suitable for all types of
questions
24Way forward
- Development of the handbook will continue
- Mid-December 2009, workshop for validation of the
handbook - By January 10, 2010, finalization of the
handbook - Translation by March 2010
25THANK FOR YOUR KIND ATTENTION