IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1 Reference Architecture - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1 Reference Architecture

Description:

We would like to show you a description here but the site won t allow us. – PowerPoint PPT presentation

Number of Views:1094
Avg rating:3.0/5.0
Slides: 43
Provided by: amck3
Category:

less

Transcript and Presenter's Notes

Title: IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1 Reference Architecture


1
IBM BigInsights 2.1 Understanding the role of
IBM Platform Computing and GPFS FPO in the STG
BigInsights 2.1 Reference Architecture
Gord Sissons Steve Hurley Chris Porter Blane
Rockafellow
2
Agenda
  • About the BigInsights 2.1 HW reference
    architecture
  • Solution components
  • Key BigInsights Advantages
  • Platform Computing Products
  • IBM Platform Symphony
  • IBM GPFS FPO
  • IBM Platform Cluster Manager

3
The IBM System X BigInsights Reference
Architecture
  • One of a family of big data reference
    architectures from IBM
  • Enables fast, risk free deployment with validated
    configurations
  • Flexibility to accommodate different client needs
  • Value-added software components can be
    implemented with Lab Services
  • Pre-Assembled racks
  • Customized to your needs
  • Integrated and tested
  • Supported as a solution
  • Tailored to your needs
  • Start smalland grow
  • Easy to order
  • Easy to manage

4
IBM BigInsights Reference ArchitectureHardware
Incorporating a balance of value, enterprise and
performance options
Management Node
x3550 M4 with Two E5-2650 2GHz 8-core CPU 128GB RAM, 16x 8GB 1600MHz RDIMM Four 600GB 2.5 HDD (OS) Two Dual-port 10GbE (data) Dual-port 1GbE (mgmt)
Data Node
x3630 M4 with Two E5-2450 2.1GHz 8-core CPU 48GB RAM, 6x 8GB 1600MHz RDIMM Two 3TB 3.5 HDD (OS/app) Twelve 3TB 3.5 HDD (data) Optional 4TB HDD upgrade Dual-port 10GbE (data) Dual-port 1GbE (mgmt)
Configuration Starter Half Rack w/ Mgmt Nodes Full Rack w/ Mgmt Nodes Full Data Node Rack
Available storage (2TB/3TB) 108TB / 144TB 324TB / 432TB 648TB / 864TB 720TB / 960TB
Raw data space (2TB/3TB) 27TB / 36TB 81TB / 108TB 114TB / 216TB 180TB / 240TB
Mgmt Nodes / Data Nodes 1 Mgmt / 3 Data 3 Mgmt / 9 Data 3 Mgmt / 18 Data 0 Mgmt / 20 Data
Switches 1 x 10GbE / 1 x 1GbE 1 x 10GbE / 1 x 1GbE 1 x 10GbE / 1 x 1GbE 1 x 10GbE / 1 x 1GbE
Number of management nodes required varies
with cluster size and workload for multi-rack
configs, select combination of these racks as
needed
5
IBM BigInsights Reference ArchitectureSoftware
Your choice of best-of-breed and open-source
components
Optional items should be sold with Lab
Services to ensure proper installation and
configuration
6
The IBM Big Data Platform
A Comprehensive solution for big data analytics
  • Comprehensive platform
  • Data at rest, data in motion
  • Structured, un-structured, semi-structured
  • Extensive library of data connectors
  • Rich development tools
  • Application accelerators
  • Web-based management console

7
The IBM Big Data Platform
8
Complexity - A Key Customer Challenge
Multiple distributed software components, often
deployed on separate infrastructure expensive
to deploy, expensive to manage, expensive to
evolve
9
Cluster sprawl drives cost and inefficiency
  • Operational challenges are looming
  • Fast evolving ecosystem
  • Multiple versions and distributions
  • Many inter-dependencies
  • Data management challenges (HDFS)
  • Application lifecycle management concerns

From Mike Gualiteri, Forrester Research
10
A smarter, consolidated infrastructure
Workload Manager(s)
Multi-tenant shared service environment
Resource Orchestration
Provisioning Management
Enterprise Storage
11
IBM Platform Symphony
Understanding the advantage
12
IBM Platform Symphony
  • A heterogeneous grid management platform
  • A high-performance SOA middleware environment
  • Supports diverse compute data intensive
    applications
  • Compute and Data intensive ISV analytic
    applications
  • In-house analytic applications (C/C, C/.NET,
    Java, Excel, R etc)
  • Optimized low-latency Hadoop compatible run-time
  • Can be used to launch, persist and manage
    non-grid aware application services
  • React instantly to time critical-requirements
  • Production proven multi-tenancy with resource
    sharing capabilities
  • Embedded single-tenant license in InfoSphere
    BigInsights 2.1

13
Symphony brings unique capabilities to Big Data
  • Performance
  • Performance advantages for a variety of Map
    Reduce workloads Boost productivity and reduce
    or avoid cost
  • Resource sharing
  • Share infrastructure among departments and across
    multiple Hadoop and non-Hadoop applications to
    maximize efficiency and reduce cost
  • Scheduling agility
  • Proportional, priority-based resource allocation,
    SLA guarantees, and fast configurable pre-emption
    ensures that Symphony can respond instantly to
    time critical workloads
  • SLA management
  • Removes a major barrier to resource sharing
    helping organizations evolve to a shared service
    model to maximize flexibility and reduce
    infrastructure costs
  • Reporting Analytics
  • Optional Platform Analytics add-on enables
    organizations to monitor granular resource usage
    for charge-back accounting and improved capacity
    planning
  • Reliability
  • Ensure reliability of core system services, and
    make individual Hadoop jobs recoverable to avoid
    down-time, and ensure that critical reporting
    windows and SLAs are met

IBM Platform Symphony Advanced Edition license
required
14
IBM Platform Symphony
Performance
  • Low-latency SOA workload manager
  • Performance results vary between 40 and 10x
    depending on workload
  • Audited results1 show an average 7x advantage on
    social media workloads with a 50x advantage in
    raw scheduling performance
  • Single tenant2 Symphony license included in
    BigInsights 2.1 Enterprise Edition
  • Many performance enhancements
  • Push-model for low-latency scheduling
  • Shuffle-stage optimizations
  • Use of native APIs for JAR file movement
  • Generic slots to fully utilize cluster

Comparative sleep test based on methodology to
measure scheduling performance discussed at
Hadoop World 2011. Compares Hadoop 0.20.2, Hadoop
1.0.1 (with 0.3 second heartbeat) and Hadoop
1.0.1 accelerated by IBM Platform
Symphony. http//www.slideshare.net/cloudera/hado
op-world-2011-hadoop-and-performance-todd-lipcon-y
anpei-chen-cloudera
1-Audited STAC Report available for download -
http//www-03.ibm.com/systems/technicalcomputing/p
latformcomputing/products/symphony/highperfhadoop.
html
2-The embedded Symphony licenses entitles a user
to run only a single instance of BigInsights. No
limits are placed on concurrently executing BI
workloads. Customers can purchased Platform
Symphony Advanced Edition to support multiple
grid consumers (tenants)
15
IBM Platform Symphony
Resource sharing
  • Share resources among heterogeneous workloads
    (Hadoop and non-Hadoop)
  • Up to 300 concurrent job trackers
  • Flexible application profiles
  • Support multiple IBM and third party analytic
    applications on a shared infrastructure
  • InfoSphere Streams, IBM DataStage, SPSS, SAS,
    Mathworks MatLab, R etc.

16
IBM Platform Symphony
Scheduling agility
  • Agile scheduling ensures that time critical
    workloads start and finish fast
  • Optionally give priority to interactive jobs
    (i.e. BigSheets, Big SQL)
  • Resource allocations shift instantly based on
    priority adjustments and proportional allocations
    at run-time
  • Generic slot models ensures that the cluster can
    be kept 100 busy

17
IBM Platform Symphony
SLA management
  • Guarantee minimum quality of service
  • Time-variant sharing policies
  • Multiple resource sharing models
  • Granular, directed sharing
  • Configurable pre-emption policies
  • Maintain multiple versions of application
    services to simplify life-cycle management
  • Share resources between Dev, Test, Production
    QA application instances

18
IBM Platform Symphony
Reporting and Analytics
  • Comprehensive reporting built-in
  • Monitor resource allocations to tune sharing
  • Ensure business SLAs are being met
  • Optional Platform Analytics add-on for OLAP
    analysis supporting chargeback accounting and
    improved capacity planning

19
IBM Platform Symphony
Reliability
  • No single point of failure
  • All services highly available
  • Hadoop jobs recoverable in the event of failure
  • Ensure deadlines and batch-windows are met
  • Service replay debugger helps rapidly diagnose
    problems that occur in production at scale
  • Production proven at scale

20
IBM GPFS
Bringing new capabilities to IBM BigInsights
21
GPFS bringing new capabilities to BigInsights
  • POSIX compliance
  • Wile HDFS is a single-purpose file system, GPFS
    implements the POSIX specification natively
    meaning that multiple applications can share the
    same filesystem improving flexibility and
    avoiding data redundancy
  • File system reliability
  • GPFS FPO eliminates the name node as a single
    point of failure improving file system
    reliability and recoverability
  • Flexible storage configuration
  • Employ the right storage architecture depending
    on the application need, using shared nothing
    storage with n-way block replication for Hadoop
    workloads, and traditional GPFS storage for
    non-Hadoop workloads to improve flexibility and
    minimize cost
  • Enterprise features
  • GPFS FPO and GPFS can co-exist on the same
    cluster, bringing advanced features to Hadoop
    environments including active file management,
    information lifecycle management and file system
    snapshots to simplify the management of large
    storage infrastructure
  • Support from the source
  • Avoid the risk of storing critical data on an
    open-source file system with limited support. IBM
    owns the codebase for GPFS and can provide
    mission critical support

22
GPFS bringing new capabilities to BigInsights
POSIX file system
Hadoop MapReduce applications
Native OS applications
A single filesystem for both MapReduce and
non-MapReduce applications
  • Native POSIX file system
  • Avoid workarounds like FUSE
  • Avoid needless data movement and replication
  • Variable block-sizes provide good performance
    across diverse types of workloads

23
GPFS bringing new capabilities to BigInsights
File system reliability
  • GPFS FPO avoids the need for a central namenode,
    a common failure point in HDFS
  • Avoid long recovery times in the event of name
    node failure
  • Pipelined replication for efficient storage of
    block replicas in GPFS FPO environment
  • Boost performance for meta-data intensive
    applications where the name-node can emerge as a
    bottleneck.

Metadata is striped across GPFS FPO nodes,
providing better reliability and avoiding the
need for primary and secondary name nodes
24
GPFS bringing new capabilities to BigInsights
Flexible storage configuration
Shared nothing storage - GPFS FPO
Switched Fabric
  • GPFS FPO avoids the need for a central namenode
    with distributed metadata, a common failure point
    in HDFS environments
  • Avoids long recovery times in the event that the
    namenode fails and metadata needs to be recovered
    from the secondary name node
  • Pipelined replication for efficient storage of
    block replicas in GPFS FPO environment

GPFS Server
GPFS Server
Shared storage - GPFS
25
GPFS bringing new capabilities to BigInsights
Enterprise features
26
IBM Platform Cluster Manager
Understanding the advantage
27
IBM Platform Cluster Manager Advanced Edition
Cluster Grid Provisioning and Management
Provisioning and management of distributed
clusters, including self-service cluster creation
and management by multiple user groups
Platform Cluster Manager
28
IBM Platform Cluster Manager Advanced Edition
  • Overview
  • Multitenant self-service creation, flexing and
    management of multiple analytics and high
    performance computing (HPC) clusters
  • Key Capabilities
  • Rapid deployment of heterogeneous analytics and
    HPC clusters
  • Secure multi-tenant environment
  • Dynamically grow and shrink clusters
  • Provision physical and/or virtual machines
  • Automates self-service cluster delivery and
    administration
  • Consolidates infrastructure from multiple
    clusters enabling analytics and HPC cloud
    environments
  • Benefits
  • Faster time to full system readiness
  • Single interface for integrated management
    monitoring
  • Reduces time to full user productivity
  • Reduces IT costs with dramatic gains in
    infrastructure utilization

28
29
IBM Platform Cluster Manager Advanced Edition
IBM Platform Cluster Manager Advanced Edition
Grid Instance 1
Grid Instance 2
Grid Instance 3
Grid Instance 4
Life Sciences / EDA / CFD / CAE
Life Sciences / EDA / CFD / CAE
Open-source Apache Hadoop
IBM InfoSphere BigInsights
IBM Platform LSF
3rd Party Schedulers
IBM Platform Symphony
IBM Platform Symphony
30
IBM Platform Cluster Manager Main Capabilities
  • Multiple analytics and HPC clusters
  • Rapid Provisioning Get the clusters you need, in
    minutes, instead of hours and days
  • Heterogeneous Deploy LSF, Symphony, Grid Engine,
    PBS, Hadoop, most 3rd party workload managers
  • Dynamically grow and shrink clusters
  • Support expansion and shrinking of clusters as
    needed over time.
  • Based on policy, calendar and user intervention
  • Share resources between clusters
  • Multitenant
  • Account separation, different service catalogs,
    resource limits, per account reporting
  • Dynamic VLAN creation
  • Authenticated access to portal, service catalog,
    provisioned machines storage
  • Physical, virtual and hybrid clusters
  • Choose the right resource to match the workload
  • Bare metal provisioning
  • Switch management
  • GUI for multiple xCAT instances
  • Self-service delivery and administration
  • Cluster are available on-demand when they are
    needed
  • Reduce/eliminate the need to wait for someone to
    act
  • Consolidate
  • Breaks down silos and provides a larger resource
    pool

31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
IBM InfoSphere BigInsights, Platform Computing
Extending the capabilities of IBM BigInsights
  • Platform Symphony and GPFS provide significant
    advantages
  • Improved performance
  • More efficient use of infrastructure
  • Diverse, concurrent workloads
  • Dynamic resource allocation
  • Fast workload pre-emption
  • Sophisticated multi-tenancy
  • Ease of management
  • Guaranteed service levels

Analytic Applications
BI / Reporting
Exploration / Visualization
FunctionalApp
IndustryApp
Predictive Analytics
Content Analytics
Big Data Platform
Systems Management
Application Development
Visualization Discovery
Accelerators
Data Warehouse
HadoopSystem
Stream Computing
Information Integration Governance
Agile, multi-tenant shared infrastructure
35
BigInsights, Platform Symphony GPFS
Providing competitive advantage for Big Data
infrastructure
Capability Cloudera CDH EMC / GP UAP MAPR HortonWorks Open Source BigInsights Platform, GPFS FPO
Low-latency scheduling Impala only No Some features No No
Heterogeneous workloads No No No No No
Fast pre-emptive scheduling No No No No No
Time-variant SLA guarantees No No Some features No No
Usage Accounting Analytics add-on No No No No No
Recoverable Hadoop jobs No No No No No
POSIX file system No NFS only No No
Enterprise file system features No No No
36
BigInsights, Platform Symphony GPFS
Providing competitive advantage for Big Data
infrastructure
Capability Cloudera CDH EMC / GP UAP MAPR HortonWorks Open Source BigInsights Platform, GPFS FPO
SQL Support Impala Pivotal Drill Via open source only Impala, Drill
BigSheets No No No No No
External Data Connectors GP DB built-in No No No
Accelerators No No No No No
Complete HW Software solution Through HW partners No No No
Single vendor support Through HW partners No No No
Full-featured private cloud management No No No No No
37
Summing up
IBM BigInsights, Platform Computing, GPFS FPO
  • Single-tenant license for Platform Symphony
    included in BI 2.1
  • Upgrade to Symphony Advance Edition for resource
    sharing features
  • Enterprise-class POSIX file system
  • Advanced cluster provisioning, private cloud
    management
  • The most complete infrastructure solution for Big
    Data analytics

38
(No Transcript)
39
Additional Slides
40
Latency matters in Big Data Analytics
Being more efficient means getting more work done
with fewer resources
Each engine polls broker 5 times per second
(configurable)
Other Grid Server
Broker
Client
Engines
Send work when engine ready
Network transport (client to broker)
Network transport (broker to engine)
Compute Result
Post result back to broker
Wait for engine to poll broker

Time
Serialize input data
De-serialize Input data
Serialize result
Broker Compute time
41
Benchmark SWIM Facebook 2010 Workload
7.5x Faster
42
Understanding the advantage
Hadoop 1.1.1
  • Symphony 6.1 can schedule 50x more tasks per
    second
  • Hadoop results taken from Hadoop World 2011
    performance presentation, Lipcon Chen
Write a Comment
User Comments (0)
About PowerShow.com