LinuxHA Release 2 - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

LinuxHA Release 2

Description:

Linux-HA Release 2. What is High-Availability (HA) Clustering? What can HA ... Circuit City, Autozone, others uses Linux-HA in each of several hundred stores ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 33
Provided by: linux3
Category:

less

Transcript and Presenter's Notes

Title: LinuxHA Release 2


1
Linux-HA Release 2
  • Alan Robertson
  • IBM Linux Technology Center
  • alanr_at_unix.sh

2
Linux-HA Release 2
  • What is High-Availability (HA) Clustering?
  • What can HA do for me?
  • What is the Linux-HA project?
  • Linux-HA applications
  • Linux-HA customers
  • Linux-HA release 1 capabilities
  • Linux-HA release 2 capabilities
  • Comparative Architectures
  • Release 2 Details
  • Futures

3
What Is HA Clustering?
  • Putting together a group of computers which trust
    each other to provide a service even when system
    components fail
  • When one machine goes down, others take over its
    work
  • This involves IP address takeover, service
    takeover, etc.
  • New work comes to the takeover machine
  • Not primarily designed for high-performance

4
What Can HA Clustering Do For You?
  • It cannot achieve 100 availability nothing
    can.
  • HA Clustering designed to recover from single
    faults
  • It can make your outages very short
  • From about a second to a few minutes
  • It is like a Magician's (Illusionist's) trick
  • When it goes well, the hand is faster than the
    eye
  • When it goes not-so-well, it can be reasonably
    visible
  • A good HA clustering system adds a 9 to your
    base availability
  • 99-99.9, 99.9-99.99, 99.99-99.999,
    etc.
  • Complexity is the enemy of reliability!

5
Single Points of Failure (SPOFs)
  • A single point of failure is a component whose
    failure will cause near-immediate failure of an
    entire system or service
  • Good HA design eliminates of single points of
    failure

6
How Does HA work?
  • Manage redundancy to improve service availability
  • Like a cluster-wide-super-init on steroids
  • Even complex services are now respawn
  • on node (computer) death
  • on impairment of nodes
  • on loss of connectivity
  • for services that aren't working (not necessarily
    stopped)
  • managing very complex dependency relationships

7
Redundant Communications
  • Intra-cluster communication is critical to HA
    system operation
  • Most HA clustering systems provide mechanisms for
    redundant internal communication for heartbeats,
    etc.
  • External communications is usually essential to
    provision of service
  • External communication redundancy is usually
    accomplished through routing tricks
  • Having an expert in BGP or OSPF is a help

8
Redundant Data Access
  • Replicated
  • Copies of data are kept updated on more than one
    computer in the cluster
  • Shared
  • Typically Fiber Channel Disk (SAN)
  • Sometimes shared SCSI
  • Back-end Storage (Somebody Else's Problem)
  • NFS, SMB
  • Back-end database

9
The Desire for HA systems
  • Who wants low-availability systems?
  • Why are so few systems High-Availability?

10
Why isn't everything HA?
  • Cost
  • Complexity

11
(No Transcript)
12
The Linux-HA Project
  • Linux-HA is the oldest high-availability project
    for Linux, with the largest associated community
  • The core piece of Linux-HA is called
    heartbeat(though it does much more than
    heartbeat)
  • Linux-HA has been in production since 1999, and
    is currently in use on about ten thousand sites
  • Linux-HA also runs on FreeBSD and Solaris, and is
    being ported to OpenBSD and others
  • Linux-HA is shipped with every major Linux
    distribution except one.

13
Linux-HA Release 1 Applications
  • Load Balancers
  • Web Servers
  • Database Servers
  • Custom Applications
  • Firewalls
  • Retail Point of Sale Solutions
  • Authentication
  • File Servers
  • Proxy Servers
  • Medical Imaging
  • Almost any type server application you can think
    of except SAP

14
Linux-HA customers
  • Emageon medical imaging services
  • Contraloria General de la Republica (Colombian
    government)
  • Incredimail bases their mail service on Linux-HA
    on IBM hardware
  • Karstadts' uses Linux-HA in each of several
    hundred stores
  • Bavarian Radio Station (Munich) coverage of 2002
    Olympics in Salt Lake City
  • Circuit City, Autozone, others uses Linux-HA in
    each of several hundred stores
  • Citysavings Bank in Munich (infrastructure)
  • University of Toledo (US) 20k student Computer
    Aided Instruction system
  • Autostrada 230 clusters across country
  • The Weather Channel (weather.com)
  • Sony (manufacturing)
  • ISO New England manages power grid using 25
    Linux-HA clusters

15
Linux-HA Release 1 capabilities
  • Supports 2-node clusters
  • Can use serial, UDP bcast, mcast, ucast comm.
  • Fails over on node failure
  • Fails over on loss of IP connectivity
  • Capability for failing over on loss of SAN
    connectivity
  • Limited command line administrative tools to fail
    over, query current status, etc.
  • Active/Active or Active/Passive
  • Simple resource group dependency model
  • Requires external tool for resource monitoring
  • SNMP monitoring

16
Linux-HA Release 2 capabilities
  • Built-in resource monitoring
  • Support for the OCF resource standard
  • Much Larger clusters supported ( 8 nodes)
  • Sophisticated dependency model with rich
    constraint support (resources, groups,
    incarnations, master/slave) (needed for SAP)
  • XML-based resource configuration
  • Configuration and monitoring GUI
  • Support for GFS cluster filesystem
  • Multi-state (master/slave) resource support
  • Initially - no IP, SAN monitoring

17
Release 2 Credits
  • Andrew Beekhof CRM, CIB
  • Gouchun Shi significant infrastructure
    improvements
  • Sun, Jiang Dong and Huang, Zhen LRM, Stonithd
    and testing
  • Lars Marowsky-Bree architecture, PHB -)
  • Alan Robertson architecture, project
    leadership, original heartbeat code and testing

18
Linux-HA Release 1 Architecture
19
Linux-HA Release 2 Architecture(add TE and PE)

20
Resource Objects in Release 2
  • Release 2 supports resource objects which can
    be any of the following
  • Primitive Resources
  • OCF, heartbeat-style, or LSB resource agent
    scripts
  • Resource Incarnations need n resource objects
    - somewhere
  • Resource groups a group of resources with
    implied co-location and linear ordering
    constraints
  • Multi-state resources (master/slave)
  • Designed to model master/slave (replication)
    resources (DRBD, et al)

21
Basic Dependencies in Release 2
  • Ordering Dependencies
  • start before (implies stop after)
  • start after (implies stop before)
  • Mandatory Co-location Dependencies
  • must be co-located with
  • cannot be co-located with

22
Resource Location Constraints
  • Mandatory Constraints
  • Resource Objects can be constrained to run on any
    selected subset of nodes. Default is none.
  • Preferential Constraints
  • Resource Objects can also be preferentially
    constrained to run on specified nodes by
    providing weightings for arbitrary logical
    conditions
  • The resource object is run on the node which has
    the highest weight (score)

23
Resource Incarnations
  • Resource Incarnations allow one to have a
    resource which runs multiple (n) times on the
    cluster
  • This is useful for managing
  • load balancing clusters where you want n of
    them to be slave servers
  • Cluster filesystems
  • Cluster Alias IP addresses

24
Resource Groups
  • Resource Groups provide a shorthand for making a
    creating ordering and co-location dependencies
  • Each resource object in the group is declared to
    have linear start-after ordering relationships
  • Each resource object in the group is declared to
    have co-location dependencies on each other
  • This is an easy way of converting release 1
    resource groups to release 2

25
Multi-State (master/slave) Resources
  • Normal resources can be in one of two stable
    states
  • running
  • stopped
  • Multi-state resources can have more than two
    stable states. For example
  • running-as-master
  • running-as-slave
  • stopped
  • This is ideal for modeling replication resources
    like DRBD

26
Advanced Constraints
  • Nodes can have arbitrary attributes associated
    with them in namevalue form
  • Attributes have types int, string, version
  • Constraint expressions can use these attributes
    as well as node names, etc in largely arbitrary
    ways
  • Operators
  • , ! , ,
  • defined(attrname), undefined(attrname),
  • colocated(resource id), not colocated(resource id)

27
Advanced Constraints (cont'd)
  • Each constraint is associated with particular
    resource, and is evaluated in the context of a
    particular node.
  • A given constraint has a boolean predicate
    associated with it according to the expressions
    before, and is associated with a weight, and a
    condition.
  • If the predicate is true, then the condition is
    used to compute the weight associated with
    locating the given resource on the given node.
  • Supported conditions are (these distinctions
    may be unneeded ?)
  • can same as prefer with MAXINT weight
  • cannot same as prefer with -MAXINT weight
  • prefer positive weight
  • prefer not same as prefer with negative weight

28
Security Considerations
  • Cluster A computer whose backplane is the
    Internet
  • If this isn't frightening, you don't
    understand...
  • You may think you have a secure cluster network
  • You're probably mistaken now
  • You will be in the future

29
Secure Networks are Difficult Because...
  • Security is not often well-understood by admins
  • Security is well-understood by black hats
  • Network security is easy to breach accidentally
  • Users bypass it
  • Hardware installers don't fully understand it
  • Most security breaches come from trusted staff
  • Staff turnover is often a big issue
  • Virus/Worm/P2P technologies will create new holes
    especially for Windows machines

30
Security Advice
  • Good HA software should be designed to assume
    insecure networks
  • Not all HA software assumes insecure networks
  • Good HA installation architects use dedicated
    (secure?) networks for intra-cluster HA
    communication
  • Crossover cables are reasonably secure all else
    is suspect -)

31
References
  • http//linux-ha.org/
  • http//linux-ha.org/download/
  • http//wiki.linux-ha.org/NewHeartbeatDesign
  • New Web site content (in progress)
  • http//linux-ha.trick.ca/ (pretty - offline!)
  • http//wiki.linux-ha.org/ (editable)
  • www.linux-mag.com/2003-11/availability_01.html

32
Legal Statements
  • IBM is a trademark of International Business
    Machines Corporation.
  • Linux is a registered trademark of Linus
    Torvalds.
  • Other company, product, and service names may be
    trademarks or service marks of others.
  • This work represents the views of the author and
    does not necessarily reflect the views of the IBM
    Corporation.
Write a Comment
User Comments (0)
About PowerShow.com