ANU data plans and National Data Services at the APAC National Facility - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

ANU data plans and National Data Services at the APAC National Facility

Description:

Mass Data Storage System installed in 1995 (and many upgrades beyond) ... Materials Sciences: Plexus project microCT experimental data with abstract structures; GRANI ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 16
Provided by: bene94
Category:

less

Transcript and Presenter's Notes

Title: ANU data plans and National Data Services at the APAC National Facility


1
ANU data plans and National Data Services at
the APAC National Facility
  • Dr Ben Evans
  • APAC National Facility
  • ANU Supercomputer Facility
  • (Ben.Evans_at_anusf.anu.edu.au)

2
ANU Data Investments
  • Background
  • 1995
  • Mass Data Storage System installed in 1995 (and
    many upgrades beyond)
  • Solution to large and long-term scientific data
    collections using infrastructure beyond the
    capability of one area
  • 2006
  • Expanded infrastructure role. Now Includes
  • complex data management software environments
  • data analysis
  • expanded expertise and research consultants,
  • better Campus/National and International
    integration
  • support research activity and good management
    practices

3
ANU data plans for Institution
  • ANU Established eResearch Task Force to scope
    university requirements and structures to address
    future needs.
  • Outcome (for data related activities)
  • Continuing need for high-end computational
    services
  • Increased growth in data-enabled research
    activities
  • Structures for encouraging data access
  • Integration with centralised infrastructure and
    support for managing digital assets
  • Value of researchers providing e-Research enabled
    services
  • Support for ANUs ongoing support for continuing
    APAC National Facility and expanding role in data
    services.

4
APAC-NF Data Services
  • Expanding Merit processes to support nationally
    merited data projects. Assessed on yearly basis
    but expectation is support for medium term,
    perhaps longer.
  • Project plan for year, including value of data
    project, requirements for project (and in some
    cases) funding.
  • Provide infrastructure/framework, environment
  • - at internationally competent levels
  • - well linked to National and International
    research activities
  • Goal to broaden access, cohesion and support for
    National priority research activities and
    networks

5
  • Roles and Engagement
  • Principal Investigator
  • overall responsibility of project research
    engagement, plans and outcomes
  • includes appropriately nominated archivists and
    data curators
  • PI is data custodian, and interface to APAC-NF to
    assist implement policy
  • APAC
  • Competent well-managed infrastructure growth with
    appropriate connectivity
  • advanced software environments to support
    specialised projects.
  • Software development may happen in research teams
    (central/distrib)
  • Deploying grid-enabled data expertise via APAC
  • Access to consultants for best practise in data
    management
  • National/International exposure and support for
    linkage
  • Centralised vs distributed data depends on
    project technical and policy issues and what is
    best for overall support good management

6
APAC-NF Data System infrastructure
Data transfer, web access, virtual hosting, video
streaming Access by grid software, specialised
software, command line, API
Dedicated Real-time Relational Database engines
Data Analysis Cluster Big and little
endian (Future attachment)
Disk and tape pools
fast, on-line global filesystem
7
Software environments
  • Large toolbox of packages, compilers, libraries
    (including file formats) and other software is
    available on
  • http//nf.apac.edu.au/facilities/software/
  • Data projects specialised toolkits and
    integration with infrastructure
  • Eg
  • Astronomy - data ingest, data search, VO enabled
  • Earth Systems - OpenDAP, experimental and modeled
    datasets
  • High energy physics - tiered storage SRB -gt SRM
  • Humanities - Babble grid-ingest repository
    management, annotation software
  • Materials Sciences Plexus project microCT
    experimental data with abstract structures GRANI
  • Social Sciences - NESSTAR leximancer VOSON
  • Terrestrial - categorisation, analysis and
    visualisation (eg GIS)

8
Data Discovery/Publishing
  • Providing a repository of references to datasets
  • http//nf.apac.edu.au/facilities/software/dataset.
    php
  • Some fields have VO registries and will be
    harvested and registered. (eg NVO in astronomy,
    Geographic, Humanities, Social Sciences)
  • Work with APSR for general discovery service,
    starting with APAC.

9
Data Lifecycle management
  • Storage media has a useful lifetime, in capacity,
    speed and maintenance.
  • 6 generations of tape drives, 5 generations of
    disks
  • Processes for assisting with Data project life
    cycle
  • multiple generations of data standards in
    metadata data and software
  • Access method changes (and change from protected
    to public)
  • Software develop needs production and development
    instances (may use virtualisation and data
    replication)
  • Data may change from large (archival) to complex
    data intensive

10
Data Trends
  • Large Data projects continue to get larger.
  • Eg telescopes/instruments
  • Requirement is changed from near-line to on-line.
  • Reduce complexity of software
  • Increase speed of access
  • Enable analysis next to data
  • Typical large scientific areas are being joined
    with humanities
  • Complex data management tools and opportunities
    being realised in all nearly disciplines.
    Inclusive of large and small datasets (eg
    skymapper)
  • Managing long-term software infrastructure
    becoming more complex, especially in an ongoing
    way.
  • Data management issues beyond capabilities of
    individual research group and trends beyond
    individual institution.

11
Data Protection
  • Frequent scheduling of archival copies.
  • Standard practice of multiple archival copies of
    all data, number and frequency handled by policy.
  • HA being established for some RDBMS and web
    services
  • Monitoring and Instrumentation of performance
  • Audit logs and backups of data

12
Assistance with Co-scheduled Computational and
Data workflow
Computation
Search/query And presentation
Dataset Access
Computation
Dataset Access
Computation
13
APAC National Grid
QPSF (JCU)
QPSF
APAC National Facility
IVEC
ac3
ANU
Computing Systems Peak Mid-range
Special
SAPAC
CSIRO
VPAC
TPAC
14
National Data transfer backbone for data workflow
  • Transfer of data from repository using managed
    data transfer backbone
  • Tuned transfer systems
  • Connection via high-bandwidth scalable pipes (in
    progress)
  • GridFTP, SRB, dCache more generic tools

15
Grid Infrastructure for data/compute workflow
  • Establish data transfer metrics for National Grid
    and International transfer performance across the
    interconnecting fabric.
  • Lead to improvements in services over the network
    in conjuction with network providers and local
    institutions.
Write a Comment
User Comments (0)
About PowerShow.com