NGS induction --- case study: the BRIDGES project Micha Bayer Grid Services Developer, BRIDGES project National e-Science Centre, Glasgow Hub - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

NGS induction --- case study: the BRIDGES project Micha Bayer Grid Services Developer, BRIDGES project National e-Science Centre, Glasgow Hub

Description:

NGS induction --- case study: the BRIDGES project. Micha Bayer ... Biomedical Research Informatics ... Micha Bayer at NeSC in Glasgow -- michab_at_dcs.gla.ac.uk ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 23
Provided by: Nes66
Category:

less

Transcript and Presenter's Notes

Title: NGS induction --- case study: the BRIDGES project Micha Bayer Grid Services Developer, BRIDGES project National e-Science Centre, Glasgow Hub


1
NGS induction --- case study the BRIDGES
projectMicha BayerGrid Services Developer,
BRIDGES projectNational e-Science Centre,
Glasgow Hub
2
The BRIDGES project
  • Biomedical Research Informatics Delivered by
    Grid-Enabled Services
  • 2 year e-Science project, started 1st October
    2003
  • aim provide data integration and grid-based
    compute power for Cardiovascular Functional
    Genomics project
  • CFG project investigates genetic predisposition
    for hypertensive heart disease
  • my role on project develop grid applications for
    end users

3
BRIDGES requirements and the NGS
  • functional
  • high throughput compute tasks, e.g. large BLAST
    jobs
  • non-functional
  • interfaces to applications should be targeted at
    the less computer literate --- users range in
    computer literacy from fairly advanced to mildly
    technophobic
  • security requirements should not cause any extra
    work or inconvenience for users as this may put
    them off altogether
  • resources provided by BRIDGES compete with
    familiar, similar resources already on offer at
    established bioinformatics institutions (EBI,
    NCBI, EMBL) -gt need to make things palatable so
    people do use it

4
How to get your job onto the NGS
standard solutions
NGS portal
Leeds
GSI-SSH
Oxford
NGS clusters
RAL
Manchester
5
Custom grid applications
  • if possible/appropriate, get a developer to write
    bespoke interface to a grid app running on NGS
  • only worthwhile if application is used frequently
    and/or by many users and is relatively
    unchanging/simple
  • best to hide complexity of grid from users
    altogether
  • users should not even have to choose between
    resources
  • automatic scheduling of jobs to resources that
    currently have spare capacity is desirable
  • best option for delivery is portlet in
    project-specific web portal just need web
    browser for access then

6
Project web portals
  • portals are configurable, personalized
    collections of web applications delivered to a
    web browser as a single page
  • NGS encourage projects to maintain their own web
    portals to deliver apps to their users
  • applications can then be provided through
    user-friendly, specific portlet interfaces
  • allows the hiding of grid complexity from users
  • requires developer time
  • BRIDGES portal currently uses IBM Websphere (free
    to academia)

7
More on portals
  • increasingly important technology not just for
    grid computing (cf. Yahoo)
  • gives end users a customized view of software and
    hardware resources specific to their particular
    application domain
  • also provides a single point of access to
    Grid-based resources following user
    authentication (single-sign-on)
  • content is provided by portlets (Java servlet
    extension) JSR168 standard provides for
    exchangeability
  • some portal packages currently available IBM
    Websphere, Gridsphere, JetSpeed, uPortal,
    Jportlet, Apache Pluto

8
Authentication and User Management (1)
  • model adopted in BRIDGES
  • requirement was for users not to have to obtain
    and manage certificates
  • we applied for a single project account at NGS
    users do not need individual NGS accounts
  • this account maps to a single user (BRIDGES) on
    the NGS with home directories on all nodes (like
    normal users)
  • authentication for this user on NGS is by means
    of the host certificate of the machine where the
    jobs are submitted from (under control of BRIDGES
    project)
  • users authenticate via the BRIDGES web portal
    using standard username and password pairs

9
Authentication and User Management(2)
  • Users can create accounts for themselves in
    BRIDGES Websphere portal (self-care)
  • alternatively one could of course give the users
    usernames and passwords
  • information gathered is kept in Websphere's
    secure user database
  • current info is very basic but will be extended
    to include more detail (e.g. URL of user's
    project or departmental website where the user is
    listed)
  • provides at least a basic means of accounting for
    user activity
  • no need for physically visiting the Registration
    Authority/presenting ID
  • may need to resort to stricter security if system
    is abused e.g. if impersonation takes place etc.

10
Authorisation with PERMIS
ScotGRID
  • PERMIS grid authorisation software developed at
    Salford University (http//sec.isi.salford.ac.uk/p
    ermis/)
  • BRIDGES uses PERMIS to differentially allow users
    access to resources
  • typical use is with GT3.3 service but lookup-type
    use is also possible with other services (in our
    case GT3.0.2)
  • code in our service calls a PERMIS authorisation
    service running on a machine at NeSC
  • user's roles are queried and access to resource
    is permitted or denied accordingly
  • gives BRIDGES staff full control over who is
    allowed to use NGS resource through our
    applications

NeSC Condor Pool
NGS
end user
Leeds
Oxford
RAL
Manchester
11
Security in BRIDGES summary
make host proxy, authenticate with NGS and submit
job
job request is passed on securely with username
NeSC grid server with host credentials
NGS clusters
authenticate at BRIDGES web portal with username
and password only
get user authorisations
Leeds
Oxford
end user
BRIDGES web portal
RAL
Manchester
NeSC machine with PERMIS authorisation service
(GT3.3)
12
Host authentication for job submission
  • allows us to submit jobs to NGS as user BRIDGES
  • apply for host certificate for the grid server
    machine as normal (UK e-Science Certification
    Authority)
  • results in a passwordless private key and host
    certificate for the machine
  • Java Cog kit code can then be used to generate a
    host proxy locally
  • this is used for job submission

13
Use case Microarray reporter sequence BLAST jobs
Job processing please wait.... (and
wait....and wait....)
  • microarray chips contain up to 400,000 reporter
    sequences
  • these need to be compared to existing annotated
    sequence databases
  • takes approx. 3 weeks to compute against human
    genome on average desktop machine

14
BLAST
  • Basic Local Alignment Search Tool
  • used for comparing biological sequences (DNA,
    protein) against a set of target sequences
  • returns a sorted list of matches
  • most widely used algorithm for this sort of thing
  • compute intensive

15
How do I get my application to run efficiently on
a grid?
  • applications to be deployed on a compute grid
    need to be parallelised to really benefit (can of
    course just run them as single jobs too)
  • for this one must be able to partition a job into
    several subjobs
  • these then get processed separately at the same
    time on multiple processors
  • need to combine results of individual subjobs at
    the end

16
Parallel BLAST grid style
  • partition your job by putting one or several
    query sequences into a separate input file ( 1
    subjob)
  • distribute all input files, the executable and
    target data onto your grid clusters (stage-in)
  • results are returned to the server and combined
    there
  • if 100 free processors are available, and 100
    subjobs are to be run, the time taken is 1/100th
    of the time it would have taken to run the whole
    job on a single machine (plus overheads for
    scheduling, data transfer and result combining)

17
To stage or not to stage?
  • file staging is the copying at runtime of
    files onto the remote resource
  • example BLAST jobs
  • we need
  • input file
  • target data file (database really a flat text
    file)
  • executable (BLAST)
  • target files and executable are unchanging
    components for this kind of job
  • it is best to store these locally on the remote
    resources to avoid staging overhead (target data
    are in the region of several gb in size and
    growing exponentially)
  • rather than individual users keeping multiple
    copies of publicly available data in their home
    directories, get sys admins to put up copies
    visible to all
  • must stage in input files since these vary from
    job to job

18
BRIDGES GridBLAST Job Submission
ScotGRID worker nodes
ScotGRID masternode
NESC Grid Server (Titania)
end user machine
PBS server side BLAST
send job request
GT 3 core grid service
GridBLAST client
return result
jobs farmed out to compute nodes
PBS wrapper
BRIDGES Meta-Scheduler
Apache Tomcat
GT2.4 wrapper
NGS
19
Current status of our system
  • software is still at prototype stage havent
    benchmarked any really big jobs yet
  • Java webstart client (launched from portal)
    connects to service needs to be changed to
    portlet
  • user registration needs to be revised and users
    re-registered
  • happy to share portlet code etc with others once
    finished

20
How we worked with the NGS
  • BRIDGES was one of the first projects doing bio
    stuff on NGS
  • we established a basic infrastructure needed for
    BLAST on the NGS clusters in collaboration with
    NGS user support
  • good collaboration on our security requirements
    very helpful and accommodating
  • our project account is the first of its kind and
    we jointly tailored a solution that would fit
    BRIDGES
  • ask for what you need! things are not cast in
    stone and it is supposed to be a public service

21
Public bioinformatics infrastructure on NGS
current status
  • we are in the process of establishing an
    infrastructure for BLAST jobs that can be used by
    all
  • this includes
  • making BLAST and mpiBLAST executables publicly
    available
  • mirroring the entire NCBI BLAST databases
    repository
  • currently trialling this on Leeds node will be
    replicated at other nodes eventually
  • data replication on all nodes necessary to avoid
    severe performance hits
  • input from others needed and welcome!

22
Contact details
  • BRIDGES website http//www.brc.dcs.gla.ac.uk/
    projects/bridges/
  • Code repository (available soon)
    http//www.brc.dcs.gla.ac.uk/projects/bridges/publ
    ic/code.htm
  • BRIDGES web portal http//europa.nesc.gla.ac.uk
    9081/wps/portal
  • Contacts
  • Micha Bayer at NeSC in Glasgow --
    michab_at_dcs.gla.ac.uk
  • Richard Sinnott at NeSC in Glasgow --
    ros_at_dcs.gla.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com