cfgPres - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

cfgPres

Description:

i.e. this is main stream and not (weird) Grid solutions! Federations based on trust ... 4th International Life Science Grid Conference ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 37
Provided by: richar564
Category:
Tags: cfgpres

less

Transcript and Presenter's Notes

Title: cfgPres


1
Security Oriented Data Grids for Microarray
Expression Profiles Prof. Richard O.
Sinnott Technical Director National e-Science
Centre Deputy Director (Technical)
Bioinformatics Research Centre University of
Glasgow 25th April 2007
2
BRIDGES Project
3
Lessons learned
  • Scientists wary of new paradigms
  • myData culture
  • Need simple but secure access to data /
    computational resources
  • Hide the Grid as much as possible
  • Take certificates away from end users
  • Present the Grid like the Internet, i.e. browser
    point and click!
  • Need to work outside of silos
  • Inter-disciplinary, translational research
  • Seamlessly move between associated resources
  • Need fine grained single sign-on across these
    resources

4
Security Usability
  • Grid Security
  • AAAA
  • Users like usernames/passwords
  • Provide them (once!)
  • Users dont like/understand X.509 based PKI
  • Forget training, education for most users!
  • gt openssl pkcs12 -in cert.p12 -clcerts -nokeys
    -out usercert.pem!
  • The vast majority most certainly wont jump
    through hoops to get on the Grid
  • me-Science culture

5
AAAA
  • Identity management issues
  • Certificate Revocation Lists
  • When revoked? By whom? How timely?
  • Strong passwords for private keys
  • Users write them down, share them, forget them
  • Privilege Management
  • Numerous domains where never get access to local
    account to do stuff
  • I need to access your NHS DB to run queries,
    change tables, run arbitrary code
  • At NeSC Glasgow we have focused on
  • improving AAAA and AAAA

6
Improving AAAA
  • Best to exploit local authentication
  • Sites know best if users still at institution and
    are best placed to state what their privileges
    are/should be
  • Introducing Shibboleth
  • will replace Athens as access mgt system across
    UK academia
  • i.e. this is main stream and not (weird) Grid
    solutions!
  • Federations based on trust
  • or more accurately trust but verify
  • numerous international federations exist MAMS,
    SWITCH, HAKA, SDSS

7
Typical Shibboleth Scenario
Identity Provider
AuthN
Home Institution
Federation
Service provider
5. User accesses resource
W.A.Y.F.
User
Grid resource / portal
8
Its a start, but
  • Benefit from local authentication but really want
    finer grained control
  • I know you have authenticated, but I need to know
    that you have sufficient/correct privileges to
    access my VO resources
  • can also return various other information needed
    to support authorisation decisions
  • At NeSC we have been working extensively with
    PERMIS and portal content configuration

9
Finer Grained Shibboleth Scenario
Service provider
Identity Provider
Shib Frontend
AuthN
Home Institution
6. Make final AuthZ decision
Federation
Grid Application
5. Pass authentication info and attributes
to authZ function
W.A.Y.F.
User
Grid Portal
Browser based sign sign-on
10
Inter-disciplinary e-Life Science Research
Tissues
Cell
Protein functions
Organs
Protein Structures
Organisms
Gene expressions
Physiology
Populations
Nucleotide structures
Cell signalling
GRIDSecurity
Nucleotide sequences
Protein-protein interaction (pathways)
11
Grid Enabled Microarray Expression Profile Search
(GEMEPS) Project
  • 1 year BBSRC funded project just completed
  • Involves Glasgow, Cornell University, US, Riken
    Institute, Japan
  • Aim to provide tools for discovery, comparison
    and
  • analysis of microarray data sets
  • BRIDGES focused on genes of interest
  • GEMEPS focuses on microarrays consisting of many
    thousands of genes of interest
  • Gene expression profiles

12
Messy
3.5MB per experiment, thousands of experiments,
...
13
Key Scenarios
  • Key questions to be answered by GEMEPS
    infrastructure
  • who has run a microarray experiment and generated
    similar results to mine?
  • How similar were these results?
  • who has undertaken experiments and produced data
    relevant to my own interests,
  • for a particular phenotype,
  • for a particular cell type,
  • for a particular pathogen,
  • on a particular platform
  • show me the conditions and analysis associated
    with experimental results similar to mine

14
GEMEPS Context
  • Levels of gene expression or differential
    expression important
  • Requires security focused, data access,
    integration and data mining
  • Microarrays expensive to run and contain
    potentially important (academically/commercially)
    data sets
  • Key aspect is that scientists keep their own data
    and define their own policies on access and usage

15
Microarray Repositories and Data Formats
  • MIAME goal is
  • minimum information required to interpret
    unambiguously and potentially reproduce and
    verify an array based gene expression monitoring
    experiment
  • Several data formats/controlled vocabularies and
    ontologies defined and applied across different
    sites communities including
  • MAGE-ML - Microarray Gene Expression Markup
    Language
  • SOFTtext Simple Omnibus Format in Text
  • MINiML MIAME Notation in Markup Language
  • SOFTmatrix - Simple Omnibus Format in Matrix
  • Numerous major repositories now exist including
  • Gene Expression Omnibus (GEO),
  • ArrayExpress,
  • CIBEX

16
GEMEPS Discovery and Analysis of Matching Profiles
  • Step 1
  • Query over appropriate meta-data and return
    matching experiments
  • (appropriate to level of privilege)
  • Step 2
  • For matching results
  • extract gene ordering (based on level of
    expression values)
  • Can include filtering/cut off, e.g. so only
    compare only 10, 100, 1000, most expressed
    genes from experiments
  • run similarity algorithm to determine best match
    of own data results against experiment gene
    expression ordering
  • currently support Spearman Rank/Kendall Tau
  • Step 3
  • Merge the results and display

17
Experiment Similarity
  • Based on correlation coefficient
  • Measuring correspondence between two rankings,
    and assessing the significance of this
    correspondence
  • rank correlation coefficient given in interval
    -1,1 where
  • If the agreement between the two rankings is
    perfect, i.e., the two rankings are the same, the
    coefficient has value 1
  • for GEMEPS implies that the same sets of genes in
    the same ordering exists
  • If the disagreement between the two rankings is
    perfect, i.e., one ranking is the reverse of the
    other then the coefficient has value -1
  • For all other arrangements the value lies between
    -1 and 1, and increasing values imply increasing
    agreement between the rankings.
  • If the rankings are completely independent, the
    coefficient has value 0.
  • Spearman Rank correlation coefficient given by
  • where
  • d i the difference between each rank of
    corresponding values of x and y, and
  • n the number of pairs of values

18
Kendall Tau
  • Kendall Tau correlation coefficient given by
  • where
  • where n is the number of items
  • P is the sum over all the items, of items ranked
    after the given item by both rankings.
  • Example of ranking between height/weight

19
  • Demo?
  • (or death by snapshot?)

20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
  • A different user logs in to Bioinformatics/GEMEPS
    portal

25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
Single-sign on!!!
33
(No Transcript)
34
(No Transcript)
35
Conclusions
  • Shibboleth model is aligned with wider UK
    academic community eduPerson attributes
  • No distinction to end user from accessing
    e-journal or Grid resource more generally
  • Readily supports inter-disciplinary hopping!
  • Many other projects in this space at NeSC
  • Drug Discovery Portal
  • Paediatric Endocrinology Registry for Congenital
    Anomalies
  • Generation Scotland Scottish Family Health Study
  • Brain Trauma

36
LSGrid 2007 www.lsgrid.org/2007
  • 4th International Life Science Grid Conference
  • Will take place at University of Glasgow on 6-7th
    September 2007
  • Paper submission deadline 29th June 2007
  • 10-13th September 2007 UK e-Science All Hands
    Meeting Nottingham UK
  • Braemar Highland Games 1st September 2007
  • Usually attended by the Queen
Write a Comment
User Comments (0)
About PowerShow.com