The new EGRID infrastructure - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

The new EGRID infrastructure

Description:

To implement Italian national grid facility for processing Economic and Financial data. ... Job execution points for non CPU intensive data processing. ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 34
Provided by: ezioc
Category:

less

Transcript and Presenter's Notes

Title: The new EGRID infrastructure


1
The new EGRID infrastructure
  • An update on the status of the EGRID project

2
The new EGRID infrastructure
  • The EGRID project
  • To implement Italian national grid facility for
    processing Economic and Financial data.
  • Underlying fabric on top of which partner
    projects develop Economic and Financial
    applications.

3
The new EGRID infrastructure
  • Summary
  • Original user requirements
  • The first EGRID release
  • Operating problems
  • Redesigning EGRID
  • A web portal to access EGRID

4
  • I. Original user requirements

5
Original user requirements
  • HW infrastructure to storemanage 2TB Stock
    Exchange Data NYSE, LSE, Borsa di Milano, etc.
  • Privacy legally binding disclosure policies
  • Users do not have the same read rights a
    research group has contract with NYSE for a
    specific company another group has contract with
    LSE for all companies etc.
  • Two classes of users those that upload stock
    exchange raw data and that may remove it and
    those that work on the data.
  • Facility organised for raw data pre-processing
    and end-user applications.

6
  • II. The first EGRID release

7
The first EGRID release
  • Meeting the HW infrastructure requirement
  • Bulk computing power access and bulk storage
    rented from INFN Padova, part of Physics grid!
  • Employed same EDG middleware INFN uses.
  • Two tiered topology dictated by network
    connectivity
  • Partner projects have limited connectivity
    installed peripheral sites supply local services.
  • Cache area for large data transfers.
  • Job execution points for non CPU intensive data
    processing.
  • INFN Padova has good connectivity supplies
    services to whole community.

8
The first EGRID release
  • Meeting data privacy EDGs data access mechanism
    implied critical and fragile fine-tuning.
  • Classic SE local files exposed through GridFTP.
  • GridFTP allows file manipulation compatible with
    underlying Unix filesystem permissions.
  • The underlying filesystem must be carefully
    managed
  • Users mapped to specific local accounts not pool
    accounts.
  • Users partitioned into especially created groups
    reflects data access patterns.
  • Carefully crafted directory tree guides data
    access.
  • Users have same UID across all SEs.
  • Replication/Synchronisation of directory
    structure across all SEs.
  • Users supplied with tools to manage permissions
    coherently across all SEs.

9
The first EGRID infrastructure
  • Meeting pre-processing requirement supported
    with tailor made wrapper component.
  • Developers can more easily grid enable
    pre-processing operations.
  • Users to more easily run grid pre-processing on
    given datasets.
  • Common Unix commands such as cat, cut and grep,
    were adapted to operate on grid stored files.

10
The first EGRID infrastructure
  • Meeting user needs
  • User applications are specific to research
    interests programmes and function libraries
    developed to aid porting of applications.
  • To facilitate installation of grid client SW,
    LiveCD technology was employed.

11
  • III. Operating problems

12
Operating problems
  • HW infrastructure
  • Only one large computing site insufficient to
    demonstrate grid potential for distributed
    resource allocation.
  • Two tiered topology problematic maintenance task
    on designated local user EGRID could not
    dedicate enough manpower to job.

13
Operating problems
  • Privacy
  • EDG and successor middleware LCG still lacked
    data access mechanism strong enough for EGRID.
  • Implemented solution is complex and does not
    scale real account for each user in each SE,
    permissions on filesystem make tree replication
    tricky, etc
  • The middleware did not allow a solution in line
    with a pervasive grid view.

14
Operating problems
  • User needs
  • Only small part of community used tailor made
    command line tools.
  • UI distributed on LiveCD spared users workstation
    reinstallation, but
  • users complained of awkward usage
  • interference with usual way of working

15
  • IV. Redesigning EGRID

16
Redesigning EGRID
  • Driving factors
  • Leaner and more general infrastructure
  • Robust privacy
  • Thoroughly re-examined grid usability

17
Redesigning EGRID
  • HW infrastructure
  • Added second large computing centre INFN
    Catania.
  • Dropped two tiered topology.

18
Redesigning EGRID
  • Privacy
  • Classic SE replaced with specific implementation
    of Storage Resource Manager (SRM) protocol
    currently being completed.
  • Implementation is result of StoRM collaboration
    with INFN-CNAF.
  • Not a proprietary solution SRM becoming
    standard for grid disk access security solution
    compatible with mainstream grid trends.

19
Redesigning EGRID
  • How StoRM solves privacy
  • All file requests are brokered with SRM protocol.
  • When StoRM receives an SRM request for a file
  • StoRM asks policy source for access rights to
    given file for given grid credentials.
  • Check is made at the grid credential level not
    local user as before!
  • Physical enforcement through JustInTime ACL
    setup
  • All files have no ACLs setup no user can access
    files.
  • Local Unix account corresponding to grid
    credentials is determined.
  • ACL granting requested access set up for local
    user.
  • ACL removed when file no longer needed.
  • StoRM leverages grids LogicalFileCatalogue (LFC)
    as policy source compatible with mainstream grid
    trends

20
Redesigning EGRID
  • Completing data privacy
  • ELFI tool developed to allow classic POSIX I/O
    software interface access to grid files.
  • ELFI is FUSE filesystem implementation grid
    resources are seen through local mount points.
  • ELFI speaks SRM protocol there is lack of SRM
    clients.

21
Redesigning EGRID
  • ELFI allows more
  • All existing file management tools work
    automatically with grid files
  • Text tools cat, grep, etc.
  • Graphical tools Konqueror, etc.
  • Helps RAD/Prototyping developers not got to
    learn new APIs when porting applications.
  • Sites supporting ELFI on WNs applications spared
    need to explicitly run grid file transfer
    commands.

22
Redesigning EGRID
  • Grid usability
  • Web portal key solution portals long proved to
    be effective ways to allow user interaction with
    organisations information system.
  • Old command line tools will remain
  • For backwards compatibility.
  • For few users that eagerly adopted them.
  • New development will concentrate on web portal.

23
  • V. A web portal to access EGRID

24
A web portal to access EGRID
  • Main entrance to new EGRID infrastructure.
  • All tools in one place Graphical UI
  • Closer to users way of working.
  • Lowers resistance to new technology.
  • No need to install grid SW on users workstation
  • Interaction through portal as displayed in web
    browser.
  • P-grade chosen as portal technology
  • Sufficiently sophisticated as starting point to
    meet EGRID requirements.
  • Does not fully meet EGRID requirements extra
    development needed.

25
A web portal to access EGRID
  • P-grades GUI simplifies many routine task and
    masks complexity
  • No need to manually handle job identification
    strings.
  • Display keeps track of launched jobs, status,
    allows output retrieval, job cancelling, etc.
  • Easily choose Broker for automatic job submission
    or specific CEs.
  • Enough flexibility to allow direct jdl attribute
    specification.
  • Graphical browsing of grid resources file
    management no need for distinct tools.

26
A web portal to access EGRID
  • P-grade portal adds new functionality
  • Although MPI jobs can also be run from the CLI,
    P-grade supplies a special API that allows a
    graphical report on such jobs to be displayed.
  • Workflow manager
  • Graphically specify several jobs.
  • Define connections among them showing data flow.
  • Portal takes care of retrieving job output and
    feeding it to linked jobs.
  • Monitoring of workflow done graphically showing
    data flow.

27
A web portal to access EGRID
  • Extra development needed
  • Improved proxy management
  • SRM data management
  • SRM support in Workflow
  • Support for special workflow jobs swarm jobs

28
A web portal to access EGRID
  • Improved proxy management
  • P-grade first uploads users private key into
    host where Portal resides then transfers it to
    MyProxy Server.
  • To lower security risks EGRID needs key to be
    transferred directly from user workstation to
    MyProxy server.
  • Java WebStart application developed by EGRID and
    seamlessly integrated into P-grade credentials
    portlet.

29
A web portal to access EGRID
  • SRM data management
  • P-grade allows browsing of files in classic SE
    files local to user workstation.
  • P-grade does not support SRM does not support
    browsing of files in portal hosting machine.
  • ELFI allows access to StoRM through local mount
    point.
  • It is easier to write a portlet that allows
    browsing of portal local resources rather than
    one that deals with the new SRM protocol.
  • EGRID developed a new portlet to allow such
    browsing.

30
A web portal to access EGRID
  • SRM support in Workflow
  • Workflow definition requires for each job to
    define input and output files.
  • For each file must be specified respective
    location.
  • P-grade supports classic SEs user workstation.
  • SRM is not supported.
  • New file location support in P-grade host
    containing portal itself StoRM will be accessed
    through ELFI local mount point!
  • On going collaboration with P-grade developers to
    better define requirement and study feasibility.

31
A web portal to access EGRID
  • Swarm Workflow jobs
  • Swarm jobs application run repeatedly on
    different datasets final job collects results
    and carries out final aggregate computation.
  • Currently P-grade workflows allow only manual job
    parameter specification automatic mechanism
    needed.
  • This feature is already present in P-grades
    release schedule.

32
A web portal to EGRID
  • Possible drawback
  • Java technology is used extensively also on
    client side Applets and JavaWebStart used for
    certain operations users must have a Java
    Virtual Machine installed.
  • Given ubiquitous nature of Java should not be a
    big problem.

33
Acknowledgements
  • StoRM collaboration with INFN-CNAF of grid.IT
    project Dr. Mirco Mazzuccato, Dr. Antonia
    Ghiselli.
  • P-grade team headed by Prof. Peter Kacsuk of MTA
    Sztaki Hungarian Academy Sciences
  • EGRID project leaders Dr. Alvise Nobile of ICTP,
    Dr. Stefano Cozzini of INFM Democritos.
  • EGRID team Alessio Terpin, Angelo Leto, Antonio
    Messina, Ezio Corso, Riccardo di Meo, Riccardo
    Murri.
Write a Comment
User Comments (0)
About PowerShow.com