OpenMOSIX approach to build scalable HPC farms with an easy management infrastructure - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

OpenMOSIX approach to build scalable HPC farms with an easy management infrastructure

Description:

OpenMOSIX approach to build scalable HPC farms with an easy management infrastructure Rosario Esposito1 Paolo Mastroserio1 Francesco Maria Taurino1,2 – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 27
Provided by: CdC1161
Learn more at: http://www2.latech.edu
Category:

less

Transcript and Presenter's Notes

Title: OpenMOSIX approach to build scalable HPC farms with an easy management infrastructure


1
OpenMOSIX approach to build scalable HPC farms
with an easy management infrastructure
Rosario Esposito1 Paolo Mastroserio1 Francesco
Maria Taurino1,2 Gennaro Tortone1
  • INFN - Napoli1INFM - UDR Napoli2
  • CHEP 2003 La Jolla (San Diego)

2
Index
  • Introduction
  • OpenMosix overview
  • Farm setup
  • Use cases
  • Conclusions

3
What makes clusters hard ?
  • Setup (administrator)
  • setting up a 16 node farm by hand is prone to
    errors
  • Maintenance (administrator)
  • ever tried to update a package on every node in
    the farm?
  • Running jobs (users)
  • running a parallel program or set of sequential
    programs requires the users to figure out which
    hosts are available and manually assign tasks to
    the nodes, or use software tools based on static
    process allocation (queue managers)

4
What is OpenMosix ?
  • Description
  • OpenMosix is an OpenSource enhancement to the
    Linux kernel providing adaptive (on-line)
    load-balancing between x86 Linux machines. It
    uses preemptive process migration to assign and
    reassign the processes among the nodes to take
    the best advantage of the available resources
  • OpenMosix moves processes around the Linux farm
    to balance the load, using less loaded machines
    first
  • URL
  • http//www.openmosix.org

5
OpenMosix introduction
  • Execution environment
  • farm of diskless x86 based nodes both UP and
    SMP that are connected by standard or high-speed
    LAN
  • Implementation level
  • Linux kernel (no library to link with sources)
  • System image model
  • virtual machine with a lot of memory and CPU
  • Granularity
  • Process
  • Goal
  • improve the overall (cluster-wide) performance
    and create a convenient multi-user, time-sharing
    environment for the execution of both sequential
    and parallel applications

6
OpenMosix architecture (1/5)
  • Network transparency
  • the interactive user and the application level
    programs are provided by a virtual machine that
    looks like a single MP machine
  • Preemptive process migration
  • any users process, trasparently and at any
    time, can migrate to any available node.
  • The migrating process is divided into two
    contexts
  • system context (deputy) that may not be migrated
    from home workstation (UHN)
  • user context (remote) that can be migrated on a
    diskless node

7
OpenMosix architecture (2/5)
  • Preemptive process migration

master node
diskless node
8
OpenMosix architecture (3/5)
  • Dynamic load balancing
  • initiates process migrations in order to balance
    the load of farm
  • responds to variations in the load of the nodes,
    runtime characteristics of the processes, number
    of nodes and their speeds
  • makes continuous attempts to reduce the load
    differences between pairs of nodes and
    dynamically migrating processes from nodes with
    higher load to nodes with a lower load
  • the policy is symmetrical and decentralized all
    of the nodes execute the same algorithm and the
    reduction of the load differences is performed
    indipendently by any pair of nodes

9
OpenMosix architecture (4/5)
  • Memory sharing
  • places the maximal number of processes in the
    farm main memory, even if it implies an uneven
    load distribution among the nodes
  • delays as much as possible swapping out of pages
  • makes the decision of which process to migrate
    and where to migrate it is based on the knoweldge
    of the amount of free memory in other nodes
  • Efficient kernel communication
  • is specifically developed to reduce the overhead
    of the internal kernel communications (e.g.
    between the process and its home site, when it is
    executing in a remote site)
  • fast and reliable protocol with low startup
    latency and high throughput

10
OpenMosix architecture (5/5)
  • Probabilistic information dissemination
    algorithms
  • provide each node with sufficient knowledge about
    available resources in other nodes, without
    polling
  • measure the amount of the available resources on
    each node
  • receive the resources indices that each node send
    at regular intervals to a randomly chosen subset
    of nodes
  • the use of randomly chosen subset of nodes is due
    for support of dynamic configuration and to
    overcome partial nodes failures
  • Decentralized control and autonomy
  • each node makes its own control decisions
    independently and there is no master-slave
    relationship between nodes
  • each node is capable of operating as an
    independent system this property allows a
    dynamic configuration, where nodes may join or
    leave the farm with minimal disruption

11
Farm setup PXE ClusterNFS
  • diskless nodes
  • low cost
  • eliminates install/upgrade of hardware, software
    on diskless client side
  • backups are centralized in one single main server
  • zero administration at diskless client side

12
Diskless farm setup traditional method (1/2)
  • Traditional method
  • Server
  • BOOTP server
  • NFS server
  • separate root directory for each client
  • Client
  • BOOTP to obtain IP
  • TFTP to load tagged kernel image
  • rootNFS to load root filesystem

13
Diskless farm setup traditional method (2/2)
  • Traditional method Problems
  • separate root directory structure for each node
  • hard to set up
  • lots of directories with slightly different
    contents
  • difficult to maintain
  • changes must be propagated to each directory

14
ClusterNFS
  • Description
  • cNFS is a patch to the standard Universal-NFS
    server code that parses file request to
    determine an appropriate match on the server
  • Example
  • when client machine foo2 asks for file
    /etc/hostname it gets the contents of
    /etc/hostnameHOSTfoo2
  • URL
  • https//sourceforge.net/projects/clusternfs

15
ClusterNFS features
  • ClusterNFS allows all machines (including
    server) to share the root filesystem
  • all files are shared by default
  • files for all clients are named
    filenameCLIENT
  • files for specific client are namedfilenameIPx
    xx.xxx.xxx.xxx orfilenameHOSThost.domain.com

16
Diskless farm setup with ClusterNFS (1/2)
  • ClusterNFS method
  • Server
  • DHCP and TFTP server
  • ClusterNFS server
  • single root directory for server and clients
  • Clients
  • DHCP to obtain IP
  • TFTP to load PXE boot loader and then kernel
    image
  • rootNFS to load root filesystem

17
Diskless farm setup with ClusterNFS (2/2)
  • ClusterNFS method Advantages
  • easy to set up
  • just copy (or create) the files that need to be
    different
  • easy to maintain
  • changes to shared files are global
  • easy to add nodes
  • A node can be added to a running farm in 1 minute

18
VIRGO experiment (Jun 2001) (1/4)
VIRGO is the collaboration between Italian and
French research teams, for the realization of an
interferometric gravitational wave detector The
main goal of the VIRGO project is the first
direct detection of gravitational waves emitted
by astrophysical sources Interferometric
gravitational wave detectors produce a large
amount of raw data that require a significant
computing power to be analysed. To satisfy such
a strong requirement of computing power we
decided to build a Linux cluster running MOSIX
(and now OpenMosix)
19
VIRGO experiment (Jun 2001) (2/4)
Hardware Farm nodesSuperMicro6010H- Dual
Pentium III 1Ghz- RAM 512Mbyte- HD 18Gbyte-
2 Fast Ethernet interfaces- 1 Gbit Ethernet
interface- (only on master-node)StorageAlpha
Server 4100HD 144GB
20
VIRGO experiment (Jun 2001) (3/4)
  • The Linux farm has been strongly tested by
    executing intensive data analysis procedures,
    based on the Matched Filter algorithm, one of the
    best ways to search for known waveforms within a
    signal affected by background noise.
  • Matched Filter analysis requires a high
    computational cost as the method consists in an
    exhaustive comparison between the source signal
    and a set of known waveforms, called templates,
    to find possible matches. Using a large number of
    templates the quality of known signals
    identification gets better and better but a
    great amount of floating points operations has to
    be performed.
  • Running Matched Filter test procedures on the
    OpenMosix cluster have shown a progressive
    reduction of execution times, due to a high
    scalability of the computing nodes and an
    efficient dynamic load distribution

21
VIRGO experiment (Jun 2001) (4/4)
speed-up of repeated Matched Filter executions
The increase of computing speed respect to the
number of processors doesnt follow an exactly
linear curve this is mainly due to the growth of
communication time, spent by the computing nodes
to transmit data over the local area network.
22
ARGO experiment (Jan 2002) (1/3)
The aim of the ARGO-YBJ experiment is to study
cosmic rays, mainly cosmic gamma-radiation, at an
energy threshold of 100 GeV, by means of the
detection of small size air showers. This goal
will be achieved by operating a full coverage
array in the Yangbajing Laboratory (Tibet, P.R.
China) at 4300m a.s.l.   As we have seen for the
Virgo experiment, the analysis of data produced
by Argo requires a significant amount of
computing power. To satisfy this requirement we
decided to implement an OpenMOSIX cluster.
23
ARGO experiment (Jan 2002) (2/3)
  • currently Argo researchers are using a small
    Linux farm, located in Naples, constituted by
  • 5 machines (dual 1Ghz Pentium III with 1 Gbyte
    RAM) running RedHat 7.2 openmosix 2.4.13.
  • 1 file server with 1 Tbyte of disk space

24
ARGO experiment (Jan 2002) (3/3)
  • At this time the Argo OpenMOSIX farm is mainly
    used to run Monte Carlo simulations using
    Corsika, a Fortran application developed to
    simulate and analyse extensive air showers.
  • The farm is also used to run other applications
    such as GEANT to simulate the behaviour of the
    Argo detector.
  • The OpenMOSIX farm is responding very well to
    the researchers computing requirements and we
    already decided to upgrade the cluster in the
    near future, adding more computing nodes and
    starting the analysis of real data produced by
    Argo.
  • Currently ARGO researchers in Naples have
    produced 400 Gbytes of simulated data with
    this OpenMOSIX cluster

25
Conclusions (1/2)
  • the most noticeable features of OpenMOSIX are its
    load-balancing and process migration algorithms,
    which implies that users need not have knowledge
    of the current state of the nodes
  • this is most useful in time-sharing, multi-user
    environments, where users do not have means (and
    usually are not interested) in the status (e.g.
    load of the nodes)
  • parallel application can be executed by forking
    many processes, just like in an SMP, where
    OpenMOSIX continuously attempts to optimize the
    resource allocation

26
Conclusions (2/2)
  • Building up farms with the OpenMosixClusterNFS
    approach requires no more than 2 hours
  • With this approach management of a farm
    management of a single server
  • This solution has proven to be scalable in farms
    up to 32 nodes
Write a Comment
User Comments (0)
About PowerShow.com