Title: Evaluating Condor for Enterprise Use: A UBS Case Study
1Evaluating Condor for Enterprise Use A UBS
Case Study
GENERALLY ACCESSIBLE
Gregg Cooke, IT Technical Council
April 26, 2006
2Overview
- Context Why UBS Uses Grids
- Tests What Did We Look At?
- Results Strengths Limitations
3SECTION 1
- The Context Grids in an Investment Bank
4Grids at UBS
What do we mean by grid?
- Specifically, when we say grid we mean a
computational cluster - Condor fits the definition closely
- Other terminology
Condor term UBS term
Pool Grid
Job cluster Job
Job Task
Virtual machine Engine or Node
Central Manager Broker or Manager
5Grids at UBS
Why do we use grids?
- Complex, long-running calculations include
- Monte Carlo simulations of risk exposure
- Black-Scholes option valuations on portfolios of
stock options - Valuation of complicated exotic financial
instruments - Speed of computation directly correlates to
volume of sales - Accuracy of risk exposure calculation directly
correlates to reserve cash - Calculations constructed by quantitative analysts
(quants) - Write code thats easy to change, not code thats
particularly efficient or parallelized
6Current Grid Environment at UBS
How do we build run our grids?
- 10 separate production grids totaling 3000
engines - All separate gridssome 60-engine, some
2000-engine - 1 million tasks per day
- Wide variety of platforms, languages,
architectures - C/C, C, Java on Windows or Linux
- Service-oriented vs. batch-oriented,
embarrassingly parallel vs. workflow - Rarely any greenfield development
- Dedicated deployment operations teams (GSD)
- Straddle the development / operations worlds
- Focused on meeting businesses SLAs
- Strong drivers of what grid platform we use
7Typical UBS Grid Environment
- Quants
- write the calculations
- part of the business
- GSD
- makes app meet SLAs
- faces off with business
- Dev
- builds tests the application
- uses quant code, partners with GSD
8SECTION 2
- The Tests Function, not Performance
9How to Test Condor?
Feasibility Study is Condor suitable for use
within our enterprise?
- No performance testsinstead
- Determine the functional limits of Condor
- Determine how Condor integrates with existing
enterprise systems - Port one or more projects to use Condor and
measure - Porting effort
- Opportunities for new functionality (and cost of
lost functionality) - Operational impact
10The Tests
We tested the following aspects of Condor
- Scheduling capabilities
- Various combinations of Requirements, Rank,
Start, Suspend, etc. rules - Administrative capabilities
- Features of command line tools, common admin
practices, - Interaction model
- Integrating Condor with an app APIs, SOAP
interface, command line interface - Robustness and resilience
- Failover options, long-term stability, task
retry, realtime reconfiguration, etc. - Usability
- Impact to the user when a Condor engine is
installed on their desktop - Andscheduling latency
11Scheduling Latency
Definition the interval between the initial
request and when the first engine starts working
on your task
- Applications may be designed with a given
scheduling latency in mind - We can control how long our code takeswe cannot
control the scheduling latency - Redevelopment is often a major undertaking
- We were expecting a very short (100msec)
deterministic scheduling latency - Condors is much longer (1min or more) and
nondeterministic - Condor does have an alternative (COD) but it
changes the expected behavior of the grid - Impact on testing new set of questions!
- Does Condors scheduling latency present a
problem for our applications? - Do we have applications that were not developed
with assumptions about the scheduling latency? - Are there other aspects of Condors performance
that offset the scheduling latency concerns? - Can we measure the performance of our
applications on Condor without regard to
scheduling latency?
12SECTION 3
- The Result Condor as a Functional Benchmark
13What We Love About Condor
Too many to listhere are the top four
- Incredibly powerful expression-based scheduling
policy - No-impact desktop cycle scavenging
- Easy reconfiguration
- Anything that can be run from a command line can
be a task
But, Condor has limits too
14What Condor Needs to Better Support UBS
We found issues in four key areas
- Administrative interface
- Code deployment
- Scheduling latency
- Job submission APIs
- Important remember that these conclusions are
only relevant to UBS! - This is only what we found, based on our
contextyour mileage may vary
15Administration Interface
Our conclusions
- What we expected
- A nice GUI admin console similar to others our
operations personnel are familiar with - What we found
- A rich command-line administration interface, but
no GUI - Our conclusion
- At UBS, Condor will not be used by operations
teams that cannot accept a command-line admin
interface - These are usually Windows teamsUnix teams dont
seem to have as much bias - What this means for the Condor community
- A GUI admin console will make Condor more
acceptable to enterprise users - Web-based is best
- Doesnt have to be fancyjust needs to be point
click (and stable, of course) - Work being done at Indiana University on a Condor
portal is a start
16Code Deployment
Our conclusions
- What we expected
- Automatic task code deployment done once and
refreshed automatically when the grid system
senses a change in a central repository - What we found
- Automatic task code deployment every time a job
is submitted - Our conclusion
- At UBS, Condor causes problems with applications
with huge (15Mb) task codes and short tasks
because the network transmission time impacts job
completion time - What this means for the Condor community
- To make Condor more acceptable to enterprise
users, task code should be cached at the engines
and only refreshing when it changes - Fortunately, this is being worked on by the
Condor Project! - Weve watched commercial grid vendors implement
thisis not an easy feature!
17Scheduling Latency
Our conclusions
- What we expected
- Negligibly small latency thats deterministic
enough for us to predict job completion times - What we found
- Latencies that depend on configuration settings
and complexity of classads - Our conclusion
- At UBS, Condor cannot be used for tasks that
require less than 3.5 min to complete or where
the total job completion time must be easily
predictable - However,
- Even though our highest-value applications
require short deterministic scheduling latencies,
there are many more lower-value applications that
arent sensitive to scheduling latency
18Application Programmers Interface
Our conclusions
- What we expected
- Nice, well-designed APIs for all our favorite
languages - What we found
- A command line interface and a maturing SOAP
interface - Our conclusion
- Once the SOAP interface matures, UBS programmers
will be more amenable to using Condor - What this means for the Condor community
- Full-speed ahead on the SOAP interface!
- Make sure all of the functionality available in
the command-line interface is available in the
SOAP interface
19Condor at UBS
We will continue to use Condor for
- Teaching new teams how to grid their applications
- Condor is an excellent exploration and learning
environment - Has already accelerated at least one team
- A functional benchmark for all things grid
- Condor is a crucible where new and innovative
grid ideas get tried and refined - Many of these features will prove valuable for
commercial vendors to embrace - Check-pointing task migration
- Expression-based scheduling policy
- User-centric cycle scavenging
- Non-critical batch-oriented applications with
standalone or SOAP-enabled service code, with
operations teams that dont mind a command line
administration interface - There are lots and lots of non-critical
batch-oriented apps with standalone services - There are not a lot of operations teams that will
tolerate a command line interface
20Contact Information
Gregg Cooke UBS Investment Bank, IT Technical
Architecture, Chicago Applied Architecture
Group 312-525-5134 gregg.cooke_at_ubs.com
UBS AG Chicago Branch One North Wacker Drive31st
FloorChicago IL 60606Tel. 1-312-525
5000 www.ubs.com