Building a secure Condor pool in an open academic environment - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Building a secure Condor pool in an open academic environment

Description:

Building a secure Condor pool in an open ... Primary purpose of workstations not for running Condor jobs ... Backported patch from development series to fix ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 15
Provided by: bruceb61
Category:

less

Transcript and Presenter's Notes

Title: Building a secure Condor pool in an open academic environment


1
Building a secure Condor pool in an open
academic environment
  • Bruce Beckles
  • University of Cambridge Computing Service

2
Condor pool characteristics
  • Large number (1000) of similar/ identical
    workstations
  • Workstations centrally managed
  • Primary purpose of workstations not for running
    Condor jobs
  • Workstations are public access machines, i.e.
    available to all members of institution

3
Fundamental requirements
  • Condor service in this environment must be
  • Stable
  • Must not make machines any less stable
  • Low impact
  • Must be unnoticeable to ordinary users
  • Secure
  • Must not significantly increase the attack surface

4
Stability
  • Only use the current Condor stable series, not
    the development series
  • Extensive testing (months, 1000s of test jobs) on
    small pool of workstations
  • Disable any features of Condor not required by
    users
  • Support only limited subset of Condor
    functionality (only Vanilla and Java universes)

5
Low impact
  • Gather usage statistics of target workstations
    and only allow Condor to run at periods when they
    would normally be idle
  • Will not run jobs if a user is logged in
  • Custom ClassAd attribute with number of users
    logged in
  • Any user activity aggressively preempts Condor
    job
  • Issue under standard Linux 2.6 kernels USB mouse
    and keyboard activity not detected
  • Control Condor jobs environment and sterilise
    environment after job completion
  • Handles jobs using up all available disk space
    and not cleaning up after themselves, etc

6
Security
  • What is our threat landscape?
  • What are we worried about?
  • How does this specifically relate to Condor?
  • Specific security concerns
  • and how we addressed them

7
Threat landscape
  • Threats internal to the environment are at least
    as significant as external threats
  • Largest body of users (students) are untrusted
  • No clear separation of use of machines by trusted
    and untrusted users
  • Access (often wholly or largely unrestricted) to
    the public Internet is a core requirement
  • Both for normal use of the machines and for
    Condor jobs
  • Firewalls are of little help

8
Specific security concerns (1)
  • Reliable identification of machines
  • IP addresses useless as identifiers (IP
    spoofing)
  • So strong authentication required
  • Do not significantly increase the attack surface
    of machines
  • No daemons running as root that listen to the
    network
  • Privilege separation (see following talk)
  • Control access to the Condor pool
  • Easiest at point of job submission
  • Restricted number of centralised submit nodes

9
Specific security concerns (2)
  • Controlling the job execution environment
  • Inspect job prior to running on machine
  • Start job in a sterile environment
  • Sterilise environment after job has run
  • Job run under dedicated unprivileged user account
  • Restrict access to the Condor commands
  • Ideally develop separate front-end to Condor
    system
  • Currently just wrapper scripts for Condor
    commands
  • Can be circumvented (in some cases), so piloting
    service with relatively trusted users

10
Strong authentication
  • Currently only available under UNIX/Linux
  • Kerberos or GSI
  • GSI
  • Flawed security paradigm (mandates daemons run as
    root, etc)
  • Serious usability and scalability issues
  • Kerberos
  • KDCs provide separate audit trail
  • Plan to use Kerberos elsewhere in the University
  • Support for Kerberos under Windows and MacOS X is
    being added to Condor support for GSI is not
    (functional GSI libraries not available)
  • Bug in Kerberos support in the stable series of
    Condor
  • Backported patch from development series to fix
  • Kerberos has proved surprisingly easy to deploy
    and administer in our setup

11
Scalability / Performance
  • condor_schedd (job queue management) doesnt
    scale well
  • Monolithic process performs too many different
    tasks
  • Uses blocking connections in stable series
  • In our experience
  • Performs very badly above 4,000 jobs
  • Falls over above 10,000 jobs
  • Cannot handle significant numbers of
    short-running (less than 5 minute) jobs
  • Job overhead is such that jobs need to be about
    10 minutes long to be worth running under Condor
  • Not much we can do about this
  • Add more submit nodes as demand on our service
    rises
  • Educate our users to use service sensibly (e.g.
    batch up short running jobs)
  • Wrap / replace Condor commands to encourage
    sensible behaviour / mitigate some of these
    problems
  • Lobby Condor Team to re-design the condor_schedd
    daemon

12
Partitioning the pool
  • Require ability to only allow jobs from certain
    users to run on certain machines
  • No sensible way provided to do this
  • Restriction via lists of users or machines in
    configuration files / ClassAd attributes is
    unwieldy and doesnt scale
  • Our method
  • Machines configured to only accept jobs with
    particular ClassAd attribute
  • Set automatically by our wrapper scripts based on
    users identity
  • On execute nodes cross check user against
    independently maintained and distributed (via
    LDAP) ACL this prevents users falsifying the
    ClassAd attributes

13
Architectural overview
  • Large number of centrally managed public access
    workstations running Linux
  • Jobs only run when no users are logged in
  • Centralised submit node(s)
  • Wrappers around Condor commands
  • Restricted (but still useful) subset of Condors
    functionality
  • Machine identity strongly authenticated
  • Improved Condor security model
  • Privilege separation on execute nodes
  • Strict control of job environment

14
Conclusion
  • Although Condor not designed for a hostile
    environment, it can be used relatively securely
    in such environments (some caveats naturally)
  • under Linux
  • but a lot of development work is required to
    achieve this
  • and it requires the supporting infrastructure of
    a stable, centrally managed workstation service.
  • Improvements to Condor would make this
    significantly easier
  • Design for a hostile environment. These days,
    most environments are.
Write a Comment
User Comments (0)
About PowerShow.com