Data Management (1) - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Data Management (1)

Description:

Data Management (1) Application of Information and Communication Technology to Production and Dissemination of Official statistics 10 May 11 July 2006 – PowerPoint PPT presentation

Number of Views:291
Avg rating:3.0/5.0
Slides: 40
Provided by: nesdbGoTh
Category:

less

Transcript and Presenter's Notes

Title: Data Management (1)


1
Data Management (1)Application of Information
and Communication Technology to Production and
Dissemination of Official statistics10 May 11
July 2006
  • M Q Hasan
  • Lecturer/ Statistician
  • UN Statistical Institute for Asia and the Pacific
  • Chiba, Japan
  • Email hasan_at_unsiap.or.jp

2
Overview
  • Data management
  • Data management planning
  • Data management procedures
  • Data management software
  • Hands on experience
  • References

3
Data management and the NSO
  • Data management during production
  • Individual case
  • Data management after production
  • Individual case
  • Data management
  • All case long term

4
Data management
  • Management of data files
  • Management files during analysis
  • Management files afterwards

5
Data management
  • Management of data files
  • Labeling data files
  • Documentation

6
Data management
  • Management files during analysis
  • Version managements
  • Subset data
  • Arrange files in different folder
  • Index files

7
Data management
  • Management files afterwards
  • Pass them to system administrator for future
    reference

8
DATA MANAGEMENT

9
These will lead to
  • Production of creditable data
  • Design of robust/ efficient / flexible storage
    and accessible system
  • Efficient procedure for sharing data with others

10
Data managementbefore and duringdata processing
11
During DP Planning
  • Define the relevant aspects of a dataset.
  • Formulate a data preservation strategy.
  • Design an access procedure.

12
Defining the relevant aspects of a dataset
  • File format and file structure
  • Naming files
  • Creation and naming of variables
  • Variable labels

13
Defining the relevant aspects of a dataset
  • Chose file structure according to available
    computing resources and the experience of the
    data processors.

14
Defining the relevant aspects of a dataset
  • Documentation
  • Provide responsibility to log all processing
    activities
  • Problems encounter
  • How problems are to be solved
  • Major decision taken

15
DP Documentation
  • Can be time consuming.
  • Should contain all information about data, such
    as, survey method, sample information, time of
    collection, information about variables, missing
    values etc.
  • Should start well before actual data processing.
  • Follow standards.
  • Preferably one file with reference to other
    files.

16
DP Documentation
  • Title Child labour in Portugal Social
    characterization of school-age children and their
    families, 1998.
  • Subtitle Child labour in Portugal, 1998.
  • Alternative title SIMPOC Portugal survey,
    1998.
  • Parallel title Trabalho Infantil em Portugal
    Caracterização social dos menores emidade escolar
    e suas famílias, 1998 files.

17
DP Documentation
  • Keywords. National survey, child, economic
    activity, child labour, household, household
    chores etc.
  • Abstract. Purpose, nature, and scope of the child
    labour data collection. Special characteristics
    of the contents etc.
  • Time period covered. If the data was collected in
    1999, and one question was did you work last
    year?, The time period should be 1998-99.

18
DP Documentation
  • Date of collection. Date(s) when the data were
    collected.
  • Country. Name of the country where the survey was
    conducted.
  • Geographic coverage. Total geographic scope of
    the data.
  • Geographic unit. Lowest level of geographic
    aggregation covered by the datafor example
    province, state, or district.
  • Unit of analysis. For most child labour surveys,
    the basic unit of analysis or observation is the
    individual person.

19
DP Documentation
  • Time method. Panel, cross-sectional, trend, and
    time-series etc.
  • Data collector. Responsible for administering the
    questionnaire or interview or for compiling the
    data. E.G NSO.
  • Frequency of data collection. For example, in
    first-time.
  • Sampling procedure. Reference to sampling
    documents.

20
DP Documentation
  • Mode of data collection. CAPI, CATI etc.
  • Type of research instrument. Structured,
    semi-structured, open-ended questions etc.
  • Actions to minimize losses. E.G follow-up
    visits, supervisory checks, historical matching
    etc.
  • Control operations. Methods used to facilitate
    data control.

21
DP Documentation
  • Weighting. Reference to appropriate document.
  • Cleaning operation. E.g consistency checking,
    wild code checking, etc.
  • Response rate. Percentage of sample members who
    provided information.
  • Estimates of sampling error. Indication of how
    precisely one can estimate a population value
    from a given sample.

22
DP Documentation
  • Location. Say where the data is currently stored
    (e.g. A national statistics office).
  • Availability status. Provide a statement of data
    availability.
  • Extent of data. Number of physical files that
    exist in a dataset.
  • Completeness of dataset. Describe if items of
    collected information were not included in the
    data file.

23
DP Documentation
  • Access authority. Contact person or organization
    that controls access to the data collection.
  • Date use statement. Reference to the terms of use
    for the data collection, if any.
  • Citation requirement. Specify any text that
    should be cited in publications based on analysis
    of the data.

24
DP Documentation
  • File contents. Short description of the file(s).
  • File structure. E.G. Hierarchical, rectangular,
    or relational etc.
  • Record or record group. Describe the record
    groupings for hierarchical or relational.
  • Label (of record). Detailed information for each
    record group.
  • Dimensions (of record). Physical characteristics
    of the record, such items as number of variables
    per record, number of cases, etc.

25
DP Documentation
  • Overall case count. Number of cases or
    observations.
  • Overall variable count. Number of variables.
  • Data format. Delimited format, free format,
    software dependent, etc.
  • Missing data. Provide information such
    standardized across the collection, that missing
    data are the result of merging, etc.
  • Software. Identify the software used to create
    the file, including the software version number.
  • Version statement. Version statement for the data
    file.

26
DP Documentation
  • list of variables with followings    
  • if variable is a weight and if not reference
    weight variable for this variable
  • question ID for the variable
  • which format has been used (e.g. SAS, SPSS)
  • the number of decimal points in the variable
  • whether the options are discrete or continuous
    which record type this variable belongs to

27
DP
Conversion of data files to other formats as
required
  • Usually generated in a package-specific format
  • Convert data into other formats, if possible,
  • Convert data into ASCII and generate codebook
  • Reload ASCII data using same codebook
  • Recheck data

28
DATA MANAGEMENT
Storage of all files.
  • Possible list/type of files
  • Data in a package-specific format
  • Data in ASCII with necessary data dictionary
  • Public use data
  • Public use data in ASCII with necessary data
    dictionary
  • Final documentation
  • Questionnaire

29
DATA MANAGEMENT
Storage of all files.
  • Possible list/type of files contd.
  • Logical rules for consistency check.
  • Computer program files.
  • Interviewer and/or supervisors instruction
    manual.
  • Coding file/s.
  • Sampling and weight files.

30
DATA MANAGEMENT
Storage of all files
  • Group them considering version, type etc.
  • Create index file associated with each
    sub-directory.
  • Add short description to each file according to
    the file contents in the index file.

31
DATA MANAGEMENT
Formulating a data preservation strategy
  • Hardware
  • Automation software
  • Directory structure

32
DATA MANAGEMENT

33
DATA MANAGEMENT

34
DATA MANAGEMENT

35
DATA MANAGEMENT
Designing an access procedure
  • Access policy
  • Safe keeping person system administrator
  • Contact person supervisor
  • Content modifying authority supervisor
  • Finalize access condition to each file

36
DATA DISSEMINATION
Data type
  • Micro data
  • Aggregate tables
  • Executive summary
  • Reports

37
DATA DISSEMINATION
Methods
  • Online direct access through internet in real
    time
  • Off line available on request

38
DATA MANAGEMENT
Designing an access procedure
  • Backup policy
  • During during data processing
  • Data processors responsibility
  • After finalization of data and documentation
  • System administrators responsibility

39
  • END
Write a Comment
User Comments (0)
About PowerShow.com