Title: Evaluating Condor for Enterprise Use: A UBS Case Study
1Evaluating Condor for Enterprise Use A UBS
Case Study
GENERALLY ACCESSIBLE
Gregg Cooke, IT Technical Council
April 26, 2006
2Overview
- Context Why UBS Uses Grids
- Tests What Did We Look At?
- Results Strengths Limitations
3SECTION 1
- The Context Grids in an Investment Bank
4Grids at UBS
What do we mean by grid?
- Specifically, when we say grid we mean a
computational cluster - Condor fits the definition closely
- Other terminology
5Grids at UBS
Why do we use grids?
- Complex, long-running calculations include
- Monte Carlo simulations of risk exposure
- Black-Scholes option valuations on portfolios of
stock options - Valuation of complicated exotic financial
instruments - Speed of computation directly correlates to
volume of sales - Accuracy of risk exposure calculation directly
correlates to reserve cash - Calculations constructed by quantitative analysts
(quants) - Write code thats easy to change, not code thats
particularly efficient or parallelized
6Current Grid Environment at UBS
How do we build run our grids?
- 10 separate production grids totaling 3000
engines - All separate gridssome 60-engine, some
2000-engine - 1 million tasks per day
- Wide variety of platforms, languages,
architectures - C/C, C, Java on Windows or Linux
- Service-oriented vs. batch-oriented,
embarrassingly parallel vs. workflow - Rarely any greenfield development
- Dedicated deployment operations teams (GSD)
- Straddle the development / operations worlds
- Focused on meeting businesses SLAs
- Strong drivers of what grid platform we use
7Typical UBS Grid Environment
- Quants
- write the calculations
- part of the business
- GSD
- makes app meet SLAs
- faces off with business
- Dev
- builds tests the application
- uses quant code, partners with GSD
8SECTION 2
- The Tests Function, not Performance
9How to Test Condor?
Feasibility Study is Condor suitable for use
within our enterprise?
- No performance testsinstead
- Determine the functional limits of Condor
- Determine how Condor integrates with existing
enterprise systems - Port one or more projects to use Condor and
measure - Porting effort
- Opportunities for new functionality (and cost of
lost functionality) - Operational impact
10The Tests
We tested the following aspects of Condor
- Scheduling capabilities
- Various combinations of Requirements, Rank,
Start, Suspend, etc. rules - Administrative capabilities
- Features of command line tools, common admin
practices, - Interaction model
- Integrating Condor with an app APIs, SOAP
interface, command line interface - Robustness and resilience
- Failover options, long-term stability, task
retry, realtime reconfiguration, etc. - Usability
- Impact to the user when a Condor engine is
installed on their desktop - Andscheduling latency
11Scheduling Latency
Definition the interval between the initial
request and when the first engine starts working
on your task
- Applications may be designed with a given
scheduling latency in mind - We can control how long our code takeswe cannot
control the scheduling latency - Redevelopment is often a major undertaking
- We were expecting a very short (100msec)
deterministic scheduling latency - Condors is much longer (1min or more) and
nondeterministic - Condor does have an alternative (COD) but it
changes the expected behavior of the grid - Impact on testing new set of questions!
- Does Condors scheduling latency present a
problem for our applications? - Do we have applications that were not developed
with assumptions about the scheduling latency? - Are there other aspects of Condors performance
that offset the scheduling latency concerns? - Can we measure the performance of our
applications on Condor without regard to
scheduling latency?
12SECTION 3
- The Result Condor as a Functional Benchmark
13What We Love About Condor
Too many to listhere are the top four
- Incredibly powerful expression-based scheduling
policy - No-impact desktop cycle scavenging
- Easy reconfiguration
- Anything that can be run from a command line can
be a task
But, Condor has limits too
14What Condor Needs to Better Support UBS
We found issues in four key areas
- Administrative interface
- Code deployment
- Scheduling latency
- Job submission APIs
- Important remember that these conclusions are
only relevant to UBS! - This is only what we found, based on our
contextyour mileage may vary
15Administration Interface
Our conclusions
- What we expected
- A nice GUI admin console similar to others our
operations personnel are familiar with - What we found
- A rich command-line administration interface, but
no GUI - Our conclusion
- At UBS, Condor will not be used by operations
teams that cannot accept a command-line admin
interface - These are usually Windows teamsUnix teams dont
seem to have as much bias - What this means for the Condor community
- A GUI admin console will make Condor more
acceptable to enterprise users - Web-based is best
- Doesnt have to be fancyjust needs to be point
click (and stable, of course) - Work being done at Indiana University on a Condor
portal is a start
16Code Deployment
Our conclusions
- What we expected
- Automatic task code deployment done once and
refreshed automatically when the grid system
senses a change in a central repository - What we found
- Automatic task code deployment every time a job
is submitted - Our conclusion
- At UBS, Condor causes problems with applications
with huge (15Mb) task codes and short tasks
because the network transmission time impacts job
completion time - What this means for the Condor community
- To make Condor more acceptable to enterprise
users, task code should be cached at the engines
and only refreshing when it changes - Fortunately, this is being worked on by the
Condor Project! - Weve watched commercial grid vendors implement
thisis not an easy feature!
17Scheduling Latency
Our conclusions
- What we expected
- Negligibly small latency thats deterministic
enough for us to predict job completion times - What we found
- Latencies that depend on configuration settings
and complexity of classads - Our conclusion
- At UBS, Condor cannot be used for tasks that
require less than 3.5 min to complete or where
the total job completion time must be easily
predictable - However,
- Even though our highest-value applications
require short deterministic scheduling latencies,
there are many more lower-value applications that
arent sensitive to scheduling latency
18Application Programmers Interface
Our conclusions
- What we expected
- Nice, well-designed APIs for all our favorite
languages - What we found
- A command line interface and a maturing SOAP
interface - Our conclusion
- Once the SOAP interface matures, UBS programmers
will be more amenable to using Condor - What this means for the Condor community
- Full-speed ahead on the SOAP interface!
- Make sure all of the functionality available in
the command-line interface is available in the
SOAP interface
19Condor at UBS
We will continue to use Condor for
- Teaching new teams how to grid their applications
- Condor is an excellent exploration and learning
environment - Has already accelerated at least one team
- A functional benchmark for all things grid
- Condor is a crucible where new and innovative
grid ideas get tried and refined - Many of these features will prove valuable for
commercial vendors to embrace - Check-pointing task migration
- Expression-based scheduling policy
- User-centric cycle scavenging
- Non-critical batch-oriented applications with
standalone or SOAP-enabled service code, with
operations teams that dont mind a command line
administration interface - There are lots and lots of non-critical
batch-oriented apps with standalone services - There are not a lot of operations teams that will
tolerate a command line interface
20Contact Information
Gregg Cooke UBS Investment Bank, IT Technical
Architecture, Chicago Applied Architecture
Group 312-525-5134 gregg.cooke_at_ubs.com
UBS AG Chicago Branch One North Wacker Drive31st
FloorChicago IL 60606Tel. 1-312-525
5000 www.ubs.com