Data Sharing eInfrastructure - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Data Sharing eInfrastructure

Description:

Contents. The SINAPSE project. Data Protection & pseudonymisation. Data sharing. Components ... Delegated to home universities. ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 30
Provided by: davidrodri
Category:

less

Transcript and Presenter's Notes

Title: Data Sharing eInfrastructure


1
Data Sharing e-Infrastructure
  • David Rodriguez1, Trevor Carpenter2, Jano van
    Hemert1 Joanna Wardlaw2.
  • On behalf of the SINAPSE Collaboration.
  • National e-Science Centre. School of Informatics,
    University of Edinburgh.
  • SFC Brain Imaging Research Centre. Department of
    Clinical Neurosciences, University of Edinburgh.

2
Contents
  • The SINAPSE project
  • Data Protection pseudonymisation
  • Data sharing
  • Components
  • Status

3
Contents
  • The SINAPSE project
  • Data Protection pseudonymisation
  • Data sharing
  • Components
  • Status

4
The SINAPSE Project
  • Stands for Scottish Imaging Network a Platform
    for Scientific Excellence.
  • Pooling initiative of six Scottish universities
    Aberdeen, Dundee, Edinburgh, Glasgow, St. Andrews
    and Stirling.
  • Main objectives
  • develop imaging expertise,
  • support multi-centre clinical research in
    conjunction with the Clinical Research Networks,
  • improve the ability of neuroscientists to
    collaborate on clinical trials,
  • have a direct impact on patient health.

5
Data Sharing e-Infrastructure
  • For enabling multi-centre clinical research
    through data sharing.
  • The main objectives of the SINAPSE
    e-infrastructure project are
  • Anonymisation, automatic compliance with data
    protection policies
  • Security, advanced authentication and
    authorisation within projects
  • Usability, providing a user friendly environment
    to access data and applications
  • Modularity, conforming to relevant standards and
    use of existing components
  • Centralisation, leveraging existing compute
    clusters and storage.

6
Benefits
  • Easier Data Protection compliance for users
  • Enables secure data sharing
  • Coherent view of available data (single point of
    access)
  • Roadmap for end-of-project data publication
    data curation

7
Key Features
  • Single sign-on identify once per session for all
    the services.
  • Delegated authentication to home universities
  • Permission management using groups and roles
  • Data Catalogue
  • Files Catalogue
  • Metadata Catalogue storing relevant information
    to allow users find the desired data
  • Modularity
  • Reuse existing components
  • Allows future updates/changes

8
Access Levels
  • Different access levels for different users/use
    cases
  • From only file access to encrypted files for site
    operators
  • Researchers sometimes just need access to
    decrypted images and associated basic image
    metadata,
  • other will access to more clinical information
    and metadata.

9
Contents
  • The SINAPSE project
  • Data Protection pseudonymisation
  • Data sharing
  • Components
  • Status

10
Data Protection
  • Data Protection Act (1998). Other legislation
    applies.
  • Personal data must be processed in a fair and
    lawful manner.
  • Projects to be run in SINAPSE shall have a proper
    consent form for the processing to be done.
  • All ethical approval.
  • Pseudonymous identifier to substitute the CHI
    (Community Health Index).
  • Linked using a database.
  • Anonymisation of other fields.
  • Full destruction of the information for some data
    like name or address.
  • Depending on the project some might be
    transformed into less informative
    representations
  • Postal Code -gt Deprivation Index or partial
    Postal Code
  • Date of birth -gt Age (with different precisions).
  • Any later access to personal data will be
    granted by the corresponding Data Controller.
  • All personal data processing will be logged for
    auditing.

11
Data Pseudonymisation
NHS
Research Centre
Local Storage
12
Pseudonymisation Tool
  • Implemented in Java.
  • To be deployed as near as possible to the data
    acquisition. Can be configured for each site.
  • Configurable using XML documents.
  • Different projects can apply different policies.
  • The policy specifies the classes that will
    execute the transformation of the data.
  • Graphical tool for editing the policies.
  • These classes will be distributed in signed jars,
    and their authenticity will be checked using
    their hash.
  • For data provenance checks and auditing purposes
    the classes version will be tracked.

13
CHI Transformation Service
  • CHI (Community Health Index) is the National
    unique identifier for NHS (Scotland) patients
  • Used in any health related communication
  • As it identifies the patient it is sensitive
    information
  • It is composed of 10 digits that include
  • Date of birth
  • Gender
  • Control digit
  • Possibilities
  • Reversible / Irreversible transformation
  • Unique for all Sinapse / Unique for each Data
    Controller

14
Contents
  • The SINAPSE project
  • Data Protection pseudonymisation
  • Data sharing
  • Components
  • Status

15
Data Sharing
  • Centralised model adopted cheaper, easier,
    allows to reduce the IT burden undertaken by
    research staff.
  • Although there are several grid projects that
    provide DICOM functionalities.
  • The research data will be encrypted before
    storing it.
  • Data organised per project
  • Access control using groups roles.
  • Authentication using Shibboleth due to usability
    concerns regarding X.509 certificates.

16
Uploading Data
Local Storage
Metadata extraction
Data Files
Portal
SINAPSE Storage
University Authentication Service
VOMS
17
Centralised Architecture
  • Simpler Deployment
  • Easier middleware release control
  • Lesser impact in participant centres
  • Easier to manage and use
  • No default resilience
  • A second centre would be needed
  • But this is only necessary for critical services
  • With a good support a reasonable service can be
    provided using a single centre

18
Deployment Plan
  • ECDF (http//www.is.ed.ac.uk/ecdf/)?
  • A singular facility along Scotland
  • Disk space and CPU time will be rented depending
    on the necessities.
  • 1456 CPU cores
  • 275 TB of disk
  • Also SINAPSE owned server to be hosted by ECDF
  • ECDF will provide basic hardware software
    support
  • SINAPSE services to be hosted in it
  • Portal
  • Data Catalogue
  • Research Data encryption service
  • OGSA-DAI
  • Projects customised databases
  • RAPID

19
Advantages
  • Cheaper start up and running costs without loss
    of performance compared to the alternatives
    presented.
  • Small initial deployment
  • Cost fully transparent, no need to factor in
    costs for cooling, power, insurance, backups,
    off-site backups and staff training
  • Massively reduced depreciation on investment
  • An easy way to scale up to meet increases in
    demand
  • Flexibility for future development
  • 24 hours, 7 days a week service availability with
    9am to 5pm systems support by experts.
  • Operating system, Hardware and Storage maintained
    and upgraded by ECDF staff
  • No increase in system administration workload for
    the participant centres
  • No need to open firewalls to deliver new services
    in participating centres

20
Contents
  • The SINAPSE project
  • Data Protection pseudonymisation
  • Data sharing
  • Components
  • Status

21
Components
22
Portal
  • A gridsphere based portal will give access to the
    resources.
  • Basic functionality to be provided by SINAPSE
  • Data uploading
  • Catalogues querying
  • The projects will customise the portal for their
    needs providing their own portlets.

23
Authentication
  • Shibboleth federated authentication
  • Single sign-on.
  • Delegated to home universities.
  • Users will continue using a method they are
    already familiar with.
  • X.509 certificates are usual in Grids
  • But can be a handicap for some users.

24
Authorization
  • Dynamic Virtual Organisations
  • Members should be added/removed easily
  • New VOs creation for new projects/studies
  • VO role management
  • Role based access
  • Allows different access levels to information for
    different users

25
Communications
  • Encrypted communications for all the services
  • GridFTP
  • SSH
  • HTTPS for web services

26
Images Encryption
  • These keys are to protect research data, not
    personal data
  • Not so sensitive.
  • Keys accessible from all the SINAPSE sites
  • Access to the keys based on groups and roles
  • Project/study dependent

27
Catalogues
  • Data Catalogue for keeping track of the files in
    the system
  • Metadata Catalogue storing key attributes
    extracted from the DICOM headers.
  • Clinical Information databases and additional
    metadata databases can be deployed by the
    different projects.
  • OGSA-DAI will be used to provide access to this
    resources.

28
Contents
  • The SINAPSE project
  • Data Protection pseudonymisation
  • Data sharing
  • Components
  • Status

29
Status
  • Proposal endorsed by the SINAPSE IT Image
    Analysis committee last July.
  • Grant application for machines storage
    resources to be sent soon.
  • Pseudonymisation tool being tested.

30
Questions
Write a Comment
User Comments (0)
About PowerShow.com