Creating Data Repositories.. - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Creating Data Repositories..

Description:

Data Sharing: what inhibits it? ... sticks to promote data sharing 'Must release data' to ... to addressing concerns with sharing. Positive Example. Example: ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 15
Provided by: Engineeri68
Category:

less

Transcript and Presenter's Notes

Title: Creating Data Repositories..


1
Creating Data Repositories..
  • Sanjay Rao
  • ECE Dept, Purdue University

2
Group Members
  • Dave Maltz
  • Rebecca Issacs
  • Ratul Mahajan
  • Yin Zhang
  • Aditya Akella
  • David Kotz
  • Charles DiFatta
  • ..

3
Motivation
  • Network Management Research
  • Barrier to entry is high
  • Data/insights from operators/industry critical
  • Examples
  • Failure characterization of enterprise network
  • VLAN characterization and use
  • Configuration Management

4
What happens today..?
  • End-user centric measurement studies
  • Network black-box no operator involvement
  • Real need white-box
  • Campus Networks
  • Difficulties in bootstrapping relationships with
    operators
  • Enterprise/Operator Network
  • Sprint or ATT (Microsoft with end-user)
  • Limited pool of researchers
  • Data across multiple enterprises??
  • Trends over many years ??

5
Bottomline
  • Need a data repository
  • Contributors from operators, researchers,
    industry
  • Accessible to all researchers
  • Facilitate research much like Planetlab
  • Vital to have critical mass of researchers on
    Network Management
  • Research along high-impact real problems

6
Data Sharing what inhibits it?
  • Sensitivity of data
  • Security Issues (firewall policies, network
    structure)
  • Privacy Issues (records of individual activity)
  • Proprietary nature of data
  • E.g. how many calls got, mobility models
  • Possible to have others use it?
  • Secret weapon for research
  • Competition Vs. collaboration
  • Inertia/ too much effort

7
Solutions
  • Carrots/sticks to promote data sharing
  • Must release data to publish
  • IMC best paper award only to work releasing
    data.
  • Technical ways to addressing concerns with
    sharing

8
Positive Example
Example HSARPA PREDICT make research on
network security possible. Firewalls and IDS
network security data
9
Research Anonymization
  • Hiding provider, hiding individual information
  • Need framework to reason about it
  • What trade-offs do you make?
  • What risks are posed?
  • How to expose trade-offs in a way we can
    appreciate?
  • Anonymization very domain specific
  • E.g. configuration file Vs. packet trace
  • Are there common themes?
  • Other Models
  • NDA-based
  • Give me a question -gt return answer
  • Exploratory nature of research

10
Community effort Cooperate on IRB
  • Social Sciences
  • Lots of experience with IRB
  • Networking
  • Lack of clear guidelines on IRB process
  • Admins feel happier if IRB can sanction things
  • As community
  • Must appreciate need/process for IRB
  • Develop guidelines for IRB process
  • Share IRB documents

11
Creating shareable data
  • 75 of time spent figuring how to use data
  • Researcher needs vary
  • Different forms of datum
  • Historical Vs. Streaming
  • Dated? Trending?
  • Assumptions made/gaps in data
  • timing info crucial at sub-RTT level?
  • Sharing hard, many idiosyncrasies
  • Data collection infrastructure, annotate

12
User Diagnostics
  • One-on-one exact data provided
  • Create shared repository(ies)
  • What data do most users want?
  • Is that 20 of stuff most critical to provide?
  • Data Collection Tools
  • Meta-data part of problem
  • Create data in standard formats
  • Observatory
  • How to discover, describe, explain data
  • Access policy, use policy

13
Other
  • Streaming Data Online Vs Offline
  • Scalable collection
  • What to collect? Over how long?
  • Compression techniques
  • Fine-grained overhead, coarse-grained
    information loss
  • What does it take to build this infrastructure?
  • Get all types of data as painlessly as possible
  • Massage, orchestrate data to fit researcher needs
  • Simple APIs to get data out fast analysis tools
  • Federated Access
  • DataManagement - Lifecycle of data

14
Action Items
  • Community-Wide Efforts
  • Initiate efforts to create data repository
  • How to manage? Who contributes? Who arbitrates
  • How much storage? Lifecycle - How long to store
    data?
  • Create IRB guidelines for networking data
  • Research
  • Anonymization
  • Usage diagnostics -gt what to collect,release
    widely applicable
  • Data Collection Tools, metadata information
  • Industry,operators must be as actively involved
    as possible
Write a Comment
User Comments (0)
About PowerShow.com