Evaluating Condor for Enterprise Use: A UBS Case Study - PowerPoint PPT Presentation

About This Presentation
Title:

Evaluating Condor for Enterprise Use: A UBS Case Study

Description:

Specifically, when we say 'grid' we mean a computational cluster ... uses quant code, partners with GSD. SECTION 2. The Tests: Function, not Performance ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 20
Provided by: Csw5
Category:

less

Transcript and Presenter's Notes

Title: Evaluating Condor for Enterprise Use: A UBS Case Study


1
Evaluating Condor for Enterprise Use A UBS
Case Study
GENERALLY ACCESSIBLE
Gregg Cooke, IT Technical Council
April 26, 2006
2
Overview
  • Context Why UBS Uses Grids
  • Tests What Did We Look At?
  • Results Strengths Limitations

3
SECTION 1
  • The Context Grids in an Investment Bank

4
Grids at UBS
What do we mean by grid?
  • Specifically, when we say grid we mean a
    computational cluster
  • Condor fits the definition closely
  • Other terminology

5
Grids at UBS
Why do we use grids?
  • Complex, long-running calculations include
  • Monte Carlo simulations of risk exposure
  • Black-Scholes option valuations on portfolios of
    stock options
  • Valuation of complicated exotic financial
    instruments
  • Speed of computation directly correlates to
    volume of sales
  • Accuracy of risk exposure calculation directly
    correlates to reserve cash
  • Calculations constructed by quantitative analysts
    (quants)
  • Write code thats easy to change, not code thats
    particularly efficient or parallelized

6
Current Grid Environment at UBS
How do we build run our grids?
  • 10 separate production grids totaling 3000
    engines
  • All separate gridssome 60-engine, some
    2000-engine
  • 1 million tasks per day
  • Wide variety of platforms, languages,
    architectures
  • C/C, C, Java on Windows or Linux
  • Service-oriented vs. batch-oriented,
    embarrassingly parallel vs. workflow
  • Rarely any greenfield development
  • Dedicated deployment operations teams (GSD)
  • Straddle the development / operations worlds
  • Focused on meeting businesses SLAs
  • Strong drivers of what grid platform we use

7
Typical UBS Grid Environment
  • Quants
  • write the calculations
  • part of the business
  • GSD
  • makes app meet SLAs
  • faces off with business
  • Dev
  • builds tests the application
  • uses quant code, partners with GSD

8
SECTION 2
  • The Tests Function, not Performance

9
How to Test Condor?
Feasibility Study is Condor suitable for use
within our enterprise?
  • No performance testsinstead
  • Determine the functional limits of Condor
  • Determine how Condor integrates with existing
    enterprise systems
  • Port one or more projects to use Condor and
    measure
  • Porting effort
  • Opportunities for new functionality (and cost of
    lost functionality)
  • Operational impact

10
The Tests
We tested the following aspects of Condor
  • Scheduling capabilities
  • Various combinations of Requirements, Rank,
    Start, Suspend, etc. rules
  • Administrative capabilities
  • Features of command line tools, common admin
    practices,
  • Interaction model
  • Integrating Condor with an app APIs, SOAP
    interface, command line interface
  • Robustness and resilience
  • Failover options, long-term stability, task
    retry, realtime reconfiguration, etc.
  • Usability
  • Impact to the user when a Condor engine is
    installed on their desktop
  • Andscheduling latency

11
Scheduling Latency
Definition the interval between the initial
request and when the first engine starts working
on your task
  • Applications may be designed with a given
    scheduling latency in mind
  • We can control how long our code takeswe cannot
    control the scheduling latency
  • Redevelopment is often a major undertaking
  • We were expecting a very short (100msec)
    deterministic scheduling latency
  • Condors is much longer (1min or more) and
    nondeterministic
  • Condor does have an alternative (COD) but it
    changes the expected behavior of the grid
  • Impact on testing new set of questions!
  • Does Condors scheduling latency present a
    problem for our applications?
  • Do we have applications that were not developed
    with assumptions about the scheduling latency?
  • Are there other aspects of Condors performance
    that offset the scheduling latency concerns?
  • Can we measure the performance of our
    applications on Condor without regard to
    scheduling latency?

12
SECTION 3
  • The Result Condor as a Functional Benchmark

13
What We Love About Condor
Too many to listhere are the top four
  • Incredibly powerful expression-based scheduling
    policy
  • No-impact desktop cycle scavenging
  • Easy reconfiguration
  • Anything that can be run from a command line can
    be a task

But, Condor has limits too
14
What Condor Needs to Better Support UBS
We found issues in four key areas
  • Administrative interface
  • Code deployment
  • Scheduling latency
  • Job submission APIs
  • Important remember that these conclusions are
    only relevant to UBS!
  • This is only what we found, based on our
    contextyour mileage may vary

15
Administration Interface
Our conclusions
  • What we expected
  • A nice GUI admin console similar to others our
    operations personnel are familiar with
  • What we found
  • A rich command-line administration interface, but
    no GUI
  • Our conclusion
  • At UBS, Condor will not be used by operations
    teams that cannot accept a command-line admin
    interface
  • These are usually Windows teamsUnix teams dont
    seem to have as much bias
  • What this means for the Condor community
  • A GUI admin console will make Condor more
    acceptable to enterprise users
  • Web-based is best
  • Doesnt have to be fancyjust needs to be point
    click (and stable, of course)
  • Work being done at Indiana University on a Condor
    portal is a start

16
Code Deployment
Our conclusions
  • What we expected
  • Automatic task code deployment done once and
    refreshed automatically when the grid system
    senses a change in a central repository
  • What we found
  • Automatic task code deployment every time a job
    is submitted
  • Our conclusion
  • At UBS, Condor causes problems with applications
    with huge (15Mb) task codes and short tasks
    because the network transmission time impacts job
    completion time
  • What this means for the Condor community
  • To make Condor more acceptable to enterprise
    users, task code should be cached at the engines
    and only refreshing when it changes
  • Fortunately, this is being worked on by the
    Condor Project!
  • Weve watched commercial grid vendors implement
    thisis not an easy feature!

17
Scheduling Latency
Our conclusions
  • What we expected
  • Negligibly small latency thats deterministic
    enough for us to predict job completion times
  • What we found
  • Latencies that depend on configuration settings
    and complexity of classads
  • Our conclusion
  • At UBS, Condor cannot be used for tasks that
    require less than 3.5 min to complete or where
    the total job completion time must be easily
    predictable
  • However,
  • Even though our highest-value applications
    require short deterministic scheduling latencies,
    there are many more lower-value applications that
    arent sensitive to scheduling latency

18
Application Programmers Interface
Our conclusions
  • What we expected
  • Nice, well-designed APIs for all our favorite
    languages
  • What we found
  • A command line interface and a maturing SOAP
    interface
  • Our conclusion
  • Once the SOAP interface matures, UBS programmers
    will be more amenable to using Condor
  • What this means for the Condor community
  • Full-speed ahead on the SOAP interface!
  • Make sure all of the functionality available in
    the command-line interface is available in the
    SOAP interface

19
Condor at UBS
We will continue to use Condor for
  • Teaching new teams how to grid their applications
  • Condor is an excellent exploration and learning
    environment
  • Has already accelerated at least one team
  • A functional benchmark for all things grid
  • Condor is a crucible where new and innovative
    grid ideas get tried and refined
  • Many of these features will prove valuable for
    commercial vendors to embrace
  • Check-pointing task migration
  • Expression-based scheduling policy
  • User-centric cycle scavenging
  • Non-critical batch-oriented applications with
    standalone or SOAP-enabled service code, with
    operations teams that dont mind a command line
    administration interface
  • There are lots and lots of non-critical
    batch-oriented apps with standalone services
  • There are not a lot of operations teams that will
    tolerate a command line interface

20
Contact Information
Gregg Cooke UBS Investment Bank, IT Technical
Architecture, Chicago Applied Architecture
Group 312-525-5134 gregg.cooke_at_ubs.com
UBS AG Chicago Branch One North Wacker Drive31st
FloorChicago IL 60606Tel. 1-312-525
5000 www.ubs.com
Write a Comment
User Comments (0)
About PowerShow.com