Unclipped Condor in Windows - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Unclipped Condor in Windows

Description:

Unclipped Condor in Windows via coLinux. Henry Neeman, Horst Severini, ... Botany & Microbiology: Conway, Wren. Chemistry & Biochemistry: Roe (Co-PI), Wheeler ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 22
Provided by: henry172
Category:

less

Transcript and Presenter's Notes

Title: Unclipped Condor in Windows


1
Unclipped Condor in Windows via coLinux
  • Henry Neeman, Horst Severini,
  • Chris Franklin, Josh Alexander
  • University of Oklahoma
  • Sumanth J.V.
  • University of Nebraska-Lincoln
  • Condor Week, University of Wisconsin, Tuesday May
    1 2007

2
Condor Linux vs Windows
  • Condor inside Linux full featured
  • Condor inside Windows clipped
  • No autocheckpointing
  • No job automigration
  • No remote system calls
  • No Standard Universe

http//www.our-picks.com/archives/2006/10/page/2/
3
Lots of PCs in IT Labs
  • At many institutions, there are lots of PC labs
    managed by a central IT organizations.
  • If the head of IT (e.g., CIO) is on board,
    then all of these PCs can be Condorized.
  • But, these labs tend to be Windows labs, not
    Linux. So you cant take the Windows desktop
    experience away from the desktop users, just to
    get Condor.
  • So, how can we have Linux Condor AND Windows
    desktop on the same PC at the same time?

4
Solution Attempt 1 VMware
  • Attempted solution VMware
  • Linux as native host OS
  • Condor inside Linux
  • VMware inside Linux
  • Windows inside VMware
  • Tested on 200 PCs in IT PC labs (Union, library,
    dorms, Physics Dept)
  • In production for over a year

5
VMware Disadvantages
  • Attempted solution VMware
  • Linux as native host OS
  • Condor inside Linux
  • VMware inside Linux
  • Windows inside VMware
  • Disadvantages
  • VMware costs money! (Less so now than then.)
  • Crashy
  • VMware performance tuning (straight to disk) was
    unstable
  • Sensitive to hardware heterogeneity
  • Painful to manage
  • CD/DVD burners and USB drives didnt work in some
    PCs.

6
A Better Solution coLinux
  • Cooperative Linux (coLinux)
  • http//www.colinux.org/
  • FREE!
  • Runs inside native Windows
  • No sensitivity to hardware type
  • Better performance
  • Easier to customize
  • Smaller disk footprint and lower CPU usage in
    idle
  • Minimal management required (10 hours/month)

7
Compatibility Issue
  • About 30 of the 200 lab PCs we installed coLinux
    on had problems with it, so those PCs now run a
    prerelease version of coLinux.
  • We have no idea why the production version of
    coLinux was a miserable failure on these 30 PCs,
    nor why the prerelease version succeeded.

8
Preventing BSOD
  • The Data Execution Prevention feature inside
    Windows, when running on some newer processors,
    can conflict with coLinux and cause system
    failure. The solution to this problem is to add
    the /NOEXECUTE switch to the Windows boot.ini.

9
Network Issue
  • Networking options
  • Bridged Each PC has to have a second IP address,
    so the institution has to have plenty of spare IP
    addresses available. (Oklahoma solution)
  • NAT The Condor pool requires a Generic
    Connection Broker (GCB) on a separate, dedicated
    PC (hardware ), and has some instability.
    Switched to OpenVPN.(Nebraska solution)
  • Nebraska experimented with port forwarding in
    Windows, but abandoned it for OpenVPN because of
    security and usability.

10
Traversing NATs and Firewalls
  • What is GCB (Connection Broker)?
  • Socket level approach.
  • A broker arranges connections between machines
    inside the firewall and machines outside the
    firewall.
  • What is OpenVPN (Open Virtual Private Network)?
  • A network within a network.
  • Virtual network adapter.
  • Virtual IP (static/dynamic).
  • TCP within UDP.
  • Client/Server architecture.
  • All to All communication.
  • All traffic is encrypted by default.

11
OpenVPN
  • When using GCB, each machine is represented by a
    unique port on the broker.
  • Central Manager sees all the machines as
    ltGCB_IPport 1gt, ltGCB_IP port 2gt etc.
  • Only applications linked against GCB work.
    (Condor is already linked)
  • When using OpenVPN, each machine has a unique
    virtual IP address in the VPN.
  • Simplifies troubleshooting.
  • Central Manager is also part of the OpenVPN and
    runs in server mode.
  • ClientConnect.py
  • Determines Virtual IP of a new client based on
    its Real IP.
  • E.g. node-25-55 has real IP129.93.25.55 gets
    virtual IP10.1.25.55
  • Pushes this configuration to the clients.
  • Updates /etc/hosts.
  • OpenVPN lockups can be fixed with mssfix 1200 and
    fragment 1200 options?

12
OpenVPN
  • No modification of application required to use
    OpenVPN.
  • We have successfully mounted NFS shares (CMS
    stack).
  • Inbound SSH access
  • Since all-to-all communication is present, even
    MPICH works.
  • Remember all-to-all communication still has to go
    through the OpenVPN server.
  • Secure
  • No firewall required in coLinux.

13
Monitoring Issue
  • Condor inside Linux monitors keyboard and mouse
    usage to decide when to suspend a job.
  • In coLinux, this is tricky.
  • We had to set up a Visual Basic script on the
    Windows side to send the keyboard and mouse
    information to coLinux.
  • UNL implements a similar idea in C, and OU is
    now doing likewise.
  • UNL collects all the keyboard and mouse data on a
    server, while OU does it on each local machine.
    But the result is the same.

14
Monitoring coLinux Labs
  • How to determine whether all the machines in each
    lab are up an running?
  • condor_status only displays working machines.
    What about missing machines?
  • We need a list of expected but missing nodes per
    lab.
  • We need a physical layout of the nodes in each
    lab.
  • MYSQL database to store lab info.
  • Need to separately handle static and dynamic IP
    labs.
  • Static IPs are easy to handle.
  • Store IP and relative co-ordinates of the node.
  • Dynamic IP store a lambda function expressing how
    to determine if a machine belongs to a lab.
  • E.g. lambda x '18-' in x - matches node-18-2,
    node-18-3
  • Store expected number of machines per lab, known
    hardware/software issues as notes per machine.
  • Compare output of condor_status and MYSQL
    database.
  • Demo http//mindspawn.unl.edu/condor/stats
  • Web front-end developed for mod_python.

15
How to Build a Multistate Grid
  • To make a prairie
  • It takes a clover and one bee.
  • One clover, and a bee, and reverie.
  • The reverie alone will do,
  • If bees are few.
  • Emily Dickinson, 1858

http//magickcanoe.com/blog/2006/08/24/on-our-walk
/
16
OUs NSF CI-TEAM Project
  • OU recently received a grant from the National
    Science Foundations Cyberinfrastructure
    Training, Education, Advancement, and Mentoring
    for Our 21st Century Workforce (CI-TEAM) program.
  • Objectives
  • Teach general HPC concepts to a broad audience
  • Provide Condor resources to the national
    community
  • Teach users to use Condor and sysadmins to deploy
    and administer it
  • Teach bioinformatics students to use BLAST over
    Condor

17
OU NSF CI-TEAM Project
Cyberinfrastructure Education for Bioinformatics
and Beyond
Objectives
OU will provide
  • Condor pool of 750 desktop PCs (already part of
    the Open Science Grid)
  • Supercomputing in Plain English workshops via
    videoconferencing
  • Cyberinfrastructure rounds (consulting) via
    videoconferencing
  • drop-in CDs for installing full-featured Condor
    on a Windows PC (Cyberinfrastructure for FREE)
  • sysadmin consulting for installing and
    maintaining Condor on desktop PCs.
  • OUs team includes High School, Minority
    Serving, 2-year, 4-year, masters-granting 11 of
    the 15 institutions are in 4
    EPSCoR states (AR, KS, NE, OK).
  • teach students and faculty to use FREE Condor
    middleware, stealing computing time on idle PCs
  • teach system administrators to deploy and
    maintain Condor on PCs
  • teach bioinformatics students to use BLAST on
    Condor
  • provide Condor Cyberinfrastructure to the
    national community (FREE).

18
OU NSF CI-TEAM Project
  • Participants at OU
  • (29 faculty/staff in 16 depts)
  • Information Technology
  • OSCER Neeman (PI)
  • College of Arts Sciences
  • Botany Microbiology Conway, Wren
  • Chemistry Biochemistry Roe (Co-PI), Wheeler
  • Mathematics White
  • Physics Astronomy Kao, Severini (Co-PI),
    Skubic, Strauss
  • Zoology Ray
  • College of Earth Energy
  • Sarkeys Energy Center Chesnokov
  • College of Engineering
  • Aerospace Mechanical Engr Striz
  • Chemical, Biological Materials Engr
    Papavassiliou
  • Civil Engr Environmental Science Vieux
  • Computer Science Dhall, Fagg, Hougen,
    Lakshmivarahan, McGovern, Radhakrishnan
  • Electrical Computer Engr Cruz, Todd, Yeary, Yu
  • Industrial Engr Trafalis
  • Participants at other institutions
  • (19 faculty/staff at 14 institutions)
  • California State U Pomona (masters-granting,
    minority serving) Lee
  • Contra Costa College (2-year, minority serving)
    Murphy
  • Earlham College (4-year) Peck
  • Emporia State U (masters-granting, EPSCoR)
    Pheatt, Ballester
  • Kansas State U (EPSCoR) Andresen, Monaco
  • Langston U (masters-granting, minority serving,
    EPSCoR) Snow
  • Oklahoma Baptist U (4-year, EPSCoR) Chen, Jett,
    Jordan
  • Oklahoma School of Science Mathematics (high
    school, EPSCoR) Samadzadeh
  • St. Gregorys U (4-year, EPSCoR) Meyer
  • U Arkansas (EPSCoR) Apon
  • U Central Oklahoma (masters-granting, EPSCoR)
    Lemley, Wilson
  • U Kansas (EPSCoR) Bishop
  • U Nebraska-Lincoln (EPSCoR) Swanson
  • U Northern Iowa (masters-granting) Gray

19
How to Create a Multistate Grid?
  • Grids arent primarily about technology!
  • You need to recruit people, by offering them more
    than you ask them to provide.
  • Go to their institution.
  • Give a really fun and interesting talk
    about your stuff.
  • Tell them that they can use your stuff
    for free.
  • Make them commit to using your stuff.
  • Help them use your stuff.
  • If possible, get them to visit you and see your
    stuff.

20
OU NSF CI-TEAM Project
  • Participants at OU
  • (29 faculty/staff in 16 depts)
  • Information Technology
  • OSCER Neeman (PI)
  • College of Arts Sciences
  • Botany Microbiology Conway, Wren
  • Chemistry Biochemistry Roe (Co-PI), Wheeler
  • Mathematics White
  • Physics Astronomy Kao, Severini (Co-PI),
    Skubic, Strauss
  • Zoology Ray
  • College of Earth Energy
  • Sarkeys Energy Center Chesnokov
  • College of Engineering
  • Aerospace Mechanical Engr Striz
  • Chemical, Biological Materials Engr
    Papavassiliou
  • Civil Engr Environmental Science Vieux
  • Computer Science Dhall, Fagg, Hougen,
    Lakshmivarahan, McGovern, Radhakrishnan
  • Electrical Computer Engr Cruz, Todd, Yeary, Yu
  • Industrial Engr Trafalis
  • Participants at other institutions
  • (19 faculty/staff at 14 institutions)
  • California State U Pomona (masters-granting,
    minority serving) Lee
  • Contra Costa College (2-year, minority serving)
    Murphy
  • Earlham College (4-year) Peck
  • Emporia State U (masters-granting, EPSCoR)
    Pheatt, Ballester
  • Kansas State U (EPSCoR) Andresen, Monaco
  • Langston U (masters-granting, minority serving,
    EPSCoR) Snow
  • Oklahoma Baptist U (4-year, EPSCoR) Chen, Jett,
    Jordan
  • Oklahoma School of Science Mathematics (high
    school, EPSCoR) Samadzadeh
  • St. Gregorys U (4-year, EPSCoR) Meyer
  • U Arkansas (EPSCoR) Apon
  • U Central Oklahoma (masters-granting, EPSCoR)
    Lemley, Wilson
  • U Kansas (EPSCoR) Bishop
  • U Nebraska-Lincoln (EPSCoR) Swanson
  • U Northern Iowa (masters-granting) Gray

21
Thanks for your attention!Questions?
Write a Comment
User Comments (0)
About PowerShow.com