Advisory Expert Group Big Data - PowerPoint PPT Presentation

Loading...

PPT – Advisory Expert Group Big Data PowerPoint presentation | free to download - id: 701b07-ZjE4O



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Advisory Expert Group Big Data

Description:

ADVISORY EXPERT GROUP BIG DATA Statistics Canada The advent of the Internet, mobile devices and other technologies has caused a fundamental change to the nature of data. – PowerPoint PPT presentation

Number of Views:9
Avg rating:3.0/5.0
Slides: 24
Provided by: uno120
Learn more at: http://unstats.un.org
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Advisory Expert Group Big Data


1
Advisory Expert GroupBig Data
  • Statistics Canada

2
Outline
  • Big data and the National Accounts
  • Establishing the right infrastructure
  • Lessons learned case studies from Statistics
    Canada
  • Traditional big data
  • Scanner data
  • Electricity consumption
  • Credit card and Interact
  • Remote sensing

3
Big data and the National Accounts
  • From a business perspective "Big data is high
    volume, high velocity, and/or high variety
    information assets that require new forms of
    processing to enable enhanced decision making,
    insight discovery and process optimization..
    (Gartner 2012) Wikipedia
  • From an NSO perspective "Big data is high volume,
    high velocity, and/or high variety information
    assets that require new forms of processing to
    which could reduce respondent burden, increase
    quality, develop new statistical products or
    enhance the detail of existing statistical
    products... ????

4
Big data and the National Accounts
  • Mich Couper from the University of Michigans
    Survey Research Center sites the following
    limitations NSO will face when confronting Big
    data
  • lack of covariates in the datasets
  • self-selection and self-reporting biases
  • lack of stability
  • privacy issues
  • access issues
  • opportunity for mischief
  • size issues and
  • selective reporting of results (file drawer
    problem).
  • You could add to that
  • Sustainability data sources disappear, systems
    change, perceptions change.
  • Couper, Mick P., Is the Sky Falling New
    Technology, Changing Media, and the Future of
    Surveys. (Presentation, European Survey Research
    Association, 5th Conference, Ljubljana, Slovenia,
    July, 2013)

5
Big data and the National Accounts
  • There needs to be up-front acknowledgement that
    we are trying to fit a square peg in a round
    hole.
  • The needs of business (big data to increase
    business intelligence) and national accountants
    (big data to produce comprehensive macroeconomic
    statistics) is quite different.

Dimensions of the data Needs of National Accountants Needs of business
Scope of the dataset Comprehensive Limited to the needs of the business
Use of the dataset Produce meaningful aggregate statistics Find patterns, explore the detail
Structure of the dataset On-going, stable, regular Structure can change as required by the business
6
Putting in place the appropriate infrastructure
  • In order to determine how to best leverage big
    data NSO needs to put in place the proper
    infrastructure to
  • Obtain the data
  • Process the data
  • Evaluate the data
  • Integrate the data

7
Putting in place the appropriate infrastructure
Obtaining the data
  • Use of legislation e.g., Section 13 of Canadas
    Statistics Act states that A person having the
    custody or charge of any documents or records
    that are maintained in any department or in any
    municipal office, corporation, business or
    organization, from which information sought in
    respect of the objects of this Act can be
    obtained or that would aid in the completion or
    correction of that information, shall grant
    access thereto for those purposes to a person
    authorized by the Chief Statistician to obtain
    that information or aid in the completion or
    correction of that information. 1970-71-72, c.
    15, s. 12.
  • Memorandum of understanding (MOUs) which outline
  • Roles and responsibilities
  • Delivery mechanism
  • Uses of data
  • Termination of the agreement
  • Purchasing big data
  • Many firms sell big data that can be used for
    business intelligence it could also be
    purchased for statistical purposes. Under what
    conditions and terms should NSOs purchase big
    data?

8
Putting in place the appropriate infrastructure
Processing the data
  • File transfer system - NSOs need a secure, high
    capacity file transfer system to transfer data
    from the data provider to the NSO.
  • Storage and processing capacity - In most NSOs
    (especially NA divisions) the processing capacity
    for big data does not exist.
  • Software - Statistics Canada is leveraging the
    SAS distributed computing solution called SAS
    Grid to shorten the time needed to process and
    analyze its larger data holdings. Also, the Data
    Analysis Resource Center at Statistics Canada
    maintains a research computer with analytical
    software installed, offering a wide range of
    add-ons that provide advanced analytical and
    visualization tools particular to big data
    analytics.
  • Information management policies Access,
    privacy, confidentiality, retention

9
Putting in place the appropriate infrastructure
Evaluating the data
  • Big data community of practice
  • There needs to be a structure in place that
    allows analysts and programs to gain knowledge
    and share experiences with respect to big data,
    to engage with colleagues internally or
    externally when needed and to report findings to
    senior managers when appropriate.
  • Big data needs to be evaluated with respect to
    its
  • Quality
  • Coverage
  • Timeliness
  • Detail
  • Regularity
  • In order to leverage big data we need to develop
    a research and development orientation.

10
Examples of big data research at Statistics
CanadaInternational merchandise trade statistics
  • Collection/access agreement Access to detailed
    customs data is governed by two memorandum of
    understanding one with the Canadian Revenue
    Agency and one with the U.S. Census Bureau
  • Cost Nil
  • Dimensions 1.5 Terabytes, 60 attributes
  • Uses Balance of Payments, International
    Merchandise Trade Statistics
  • Timeliness 35 days following the reference
    period
  • Frequency Daily, if required
  • Potential uses Creating an importer and exporter
    characteristics file which can be used to analyze
    the entry an exit of Canadian traders within the
    Canadian economy, used in studies of
    globalization, global production, goods for
    processing, foreign affiliate statistics.

11
Examples of big data research at Statistics
CanadaTaxation statistics
  • Collection/access agreement Access to detailed
    taxation statistics is governed by a memorandum
    of understanding with the Canada Revenue Agency.
  • Cost Approximately 1.6 million
  • Dimensions 6 Terabytes and growing
  • Uses Benchmark estimates of wages and salaries
    output property incomes, taxes, etc.
  • Timeliness Earliest use 45 data following the
    reference period
  • Frequency Mainly annual, some monthly (goods and
    services taxation statistics)
  • Potential uses Creation of a National Accounts
    longitudinal filea business level micro-data
    file that can be used to undertake studies such
    as GDP by city, GDP by firm size, productivity by
    firm size.

12
Examples of big data research at Statistics
CanadaGovernment finance statistics
  • Collection/access agreement No formal agreement
    in place institutional understanding between
    Statistics Canada and the government
    jurisdictions.
  • Cost Nil
  • Dimensions 40 million financial transactions,
    200 GB
  • Uses Government Finance Statistics, government
    sector National Accounts
  • Timeliness Earliest is 15 days following the
    reference period.
  • Frequency Monthly, quarterly, annual
  • Potential uses Local government remains a
    survey of municipalities, access to electronic
    files will increase our ability to provide CMA
    level data as well as increased revenue and
    expenditure details. Potential data uses for the
    health, education and justice programs.

13
Examples of big data research at Statistics
CanadaElectronic household transactions (credit
and debit)
  • Collection/access agreement Memorandum of
    understanding outlining the roles and
    responsibilities of both Statistics Canada and
    the data provider.
  • Cost Nil
  • Dimensions Aggregated big data - number of
    transactions, value of transactions aggregated by
    merchant group by place of transaction (domestic,
    international) by class of transactor (personal
    or commercial).
  • Uses Indicator for household final consumption
    expenditure and international travel abroad
  • Timeliness Earliest is 15 days following the
    reference period.
  • Frequency Monthly
  • Potential uses International travel services,
    monthly household final consumption expenditure.

14
Examples of big data research at Statistics
CanadaElectronic household transactions (credit
and debit)
15
Examples of big data research at Statistics
CanadaElectronic household transactions (credit
and debit)
16
Examples of big data research at Statistics
CanadaScanner data vendor specific
  • Collection/Access Agreement MOU in negotiation
  • Cost Current costs are nil though the long-term
    approach being proposed would involve a quid pro
    quo agreement where CPD would provide the company
    their data back with value added (i.e., an
    implicit cost would be borne by the division).
  • Dimensions Sales, quantities, and item
    descriptions of all goods sold for a given store
    over a given period
  • Uses Consumer prices and household expenditure
    weights to feed the CPI
  • Timeliness TBD, though potentially as little as
    a one day lag (e.g., weekly data for a given week
    could be delivered on the first day of the
    following week).
  • Frequency Initial data has been provided on a
    weekly aggregated basis. Future work will look at
    daily and / or transactional level data.
  • Dataset size For one week of sales data
    (aggregated on the week) for one store,
  • roughly 4,000 KB
  • roughly 30,000 rows (i.e., unique items sold)
  • implies roughly 200MB for one year of weekly
    aggregated data for one store.
  • Potential uses moving forward Direct input into
    the calculation of the CPI (potential replacement
    for collected prices), studies on consumer
    behaviour, CPI weights, household final
    consumption expenditures, retail sales.

17
Examples of big data research at Statistics
CanadaSmart meter household electricity
consumption
  • Collection/access agreement Two memoranda of
    understanding with two regional electricity
    distributors
  • Cost Nil
  • Dimensions Roughly 200 GB of raw hourly
    electricity consumption data have been obtained,
    providing detailed information on approximately
    120,000 customers, between the years of 2008 to
    2013
  • Uses Household electricity consumption
  • Timeliness Earliest is 15 days following the
    reference period.
  • Frequency Hourly
  • Potential uses Household final consumption
    expenditure, monthly Gross Domestic Products
    utilities.

18
Examples of big data research at Statistics
CanadaSmart meter household electricity
consumption
Total residential consumption
19
Examples of big data research at Statistics
CanadaSatellite Imaging Land Account
  • Collection/Access Agreement Public data
  • Cost Nil
  • Dimensions 20 GB. Although not apparent here,
    dimension of this type of big data (which is
    not really big data, strictly speaking) may well
    explode in the coming years. LIDAR datasets (high
    resolution radar), as well as higher resolution
    (space and time) satellite data will require
    terabytes of storage and terahertz of
    processing capacity.
  • Uses Land accounts Land cover / land use change
    2000 and 2010 - 2013
  • Timeliness 3 years lag
  • Frequency Annual
  • Potential Uses moving forward Landscape and
    freshwater ecosystem accounts

20
Examples of big data research at Statistics
CanadaRemote sensing land use
21
Examples of big data research at Statistics
CanadaWater Measurement Instruments Water
Account
  • Collection/Access Agreement Informal agreement
    with Water Survey of Canada
  • Cost Nil
  • Dimensions Original WSC data is 5 GB derived
    water yield data is 90 GB
  • Uses Water accounts Water Yield
  • Timeliness From real-time to lag of several
    years
  • Frequency Daily
  • Potential Uses moving forward Freshwater
    ecosystem accounts

22
Some lessons learned so far
  • Quid pro quo is important when trying to obtain
    big data. Firms are more willing to part with
    their big data if you show them how they will
    receive a business intelligence benefit on
    their side.
  • Cost big data is not always the cheapest
    option. It is sometimes easier to have the firm
    complete the survey than to create an
    infrastructure to receive and process their data.
    For example, the data received from local
    electricity providers is equivalent to the
    completion of two questions on our current
    survey.
  • Classification systems big data does not
    follow any standard classification system. For
    example, electronic retail transactions are
    classified according to merchant groups rather
    than industries.
  • Big data aggregates asking firms to aggregate
    their big data is an option.
  • Data formats Need to work with new data formats
    that we are often not familiar with.

23
Discussion point for the AEG
  • In order to exploit the potential of big data,
    NSOs need to make significant investments. How
    can we leverage the work taking place across
    various NSOs to minimize the investment and
    maximize the return?
  • How do we promote the development of new data
    products using big data over using big data to
    re-construct existing data products? Do we adjust
    our frameworks to accommodate big data or do we
    adjust big data to accommodate our frameworks?
About PowerShow.com