FORTIS DATA SHARING Implementation - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

FORTIS DATA SHARING Implementation

Description:

the CF receives a updated version of a page by DB2A; that same page must be ... For Full Function DB Updates, a new type of lock : Block Locks (OSAM, VSAM ESDS ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 72
Provided by: mari233
Category:

less

Transcript and Presenter's Notes

Title: FORTIS DATA SHARING Implementation


1
FORTIS DATA SHARING Implementation
  • IMS-DB2 GUIDE - 28/03/2002
  • Frans De Brabanter/Marc Piron

2
CONTENT
  • FORTIS history WHY //-sysplex
  • Data sharing theory
  • Implementation phases and experiences
  • Near-term future
  • Contacts
  • marc.piron_at_fortisbank.com
  • franciscus.debrabanter_at_fortisbank.com
  • and many others ...

3
FORTIS history
  • 1998
  • merger ASLK-CGER GENERALE BANK
  • IS-target by mid september 2001, merger of the
    2 data centers
  • choice of the target platform ex-A becomes
    Fortis platform
  • choice of the target applications target is a
    mix of ex-A and ex-G applications (target ex-G
    applications to be moved to FORTIS-platform)
  • move split up into 10 phases, taking into account
    functional dependencies
  • during the move technical changes (to adapt to
    Fortis naming conventions, security, ), but NO
    logical changes
  • target by begin 2001, all target applications
    running on Fortis-platform, with the ex-A data
  • mid-september merge ex-A and ex-G data kill
    ex-G platform

4
FORTIS migration plan
  • Ex-A Production Environment
  • SYSA 1 IMS/DB2/MQS
  • SYSC - infocenter 1 DB2 (pure non-ims)
  • Ex-G Production Environment
  • SI40 normal production 1 IMS/DB2/MQS
  • SI70 non-brick production (frontend) 1
    IMS/DB2/MQS
  • SI90 infocenter production (ims non-ims) 1
    IMS/DB2
  • Target move SIxx to SYSA
  • SI40 by mid-september 2001
  • SI70/SI90 by mid-2002

5
FORTIS migration plan
  • Limited sysplex experience in ex-G
  • Decision for 2-way //-sysplex for reasons of
  • CAPACITY
  • IMS WADS ? Cfr next foil
  • Logging ?
  • Limit on number of MIPS ?
  • Limit on memory ?
  • BIG MOVE is a one-shot, not a gradual process
    the risk of a WAIT AND SEE approach is not
    acceptable
  • after integrating SI40, still integration to do
    of SI70 and SI90
  • trend of workload in general always UP

6
FORTIS migration plan
7
FORTIS migration plan
  • Decision for 2-way //-sysplex for reasons of
  • AVAILABILITY
  • in case of PROBLEMS/DISASTER on one system, other
    system should continue to work presuming there
    is a good session balancing, only half of the
    connections impacted
  • rolling implementation of maintenance and IPL
  • still potential problem if outage takes too long
    and all the workload should be taken over on one
    system

8
FORTIS migration plan
  • Which kind of //-sysplex Partial or Full
  • Partial (Vertical splitting)
  • effort to isolate clusters of applications
  • and make each cluster run on one system in the
    sharing-group
  • less access to the coupling facility less
    overhead --gt attractive
  • BUT
  • is clustering stable in time ?
  • how to choose and document ?
  • what if machine power/cluster load changes
    rebalance ?
  • users need access to multiple clusters (and
    IMS-systems)
  • if one system down a whole application is down

9
FORTIS migration plan
  • Which kind of //-sysplex Partial or Full
  • Full (Horizontal splitting)
  • technically possible IMS V6 offers Vtam Generic
    Resources
  • transparant application runs where the power is
    (WLM)
  • simple approach
  • only ONE question is it feasible ? (not that
    many references)
  • Decision FULL (although with a limited set of
    affinities)
  • preferred half of users be out instead of half
    the applications when system fails
  • links at inter-application level everywhere
    extremely difficult to isolate clean and
    well-defined clusters of applications

10
DB2 - sharing overview(cfr IBM)
11
DB2 Locking overview
  • Local Locks Business as usual
  • Global Locks data sharing lock, saved in CF
  • ALWAYS put in CF PARENT LOCKS (on
    tablespace/partition level)
  • SOMETIMES put in CF CHILD LOCKS
  • table - for segmented tablespace
  • page
  • record

12
DB2 Locking overview
  • Logical Locks
  • control concurrency
  • can be local or global
  • PARENT locks always global
  • CHILD locks global or local
  • are associated to programs
  • Physical Locks
  • control inter-DB2 coherency and consistency of
    pages
  • always global
  • are associated to DB2 subsystems

13
DB2 Locking logical/physical
14
DB2 L-Locking overview
  • If those Parent Locks are held in the CF, for a
    particular tablespace or partition
  • DB2A/DB2B
  • S / S
  • S / X
  • X / X
  • Then those Child Locks will be propagated to the
    CF
  • DB2A/DB2B
  • None / None
  • All / X
  • All / All

15
DB2 L-Locking overview (cfr IBM)
16
DB2 L-Locking overview
  • Detail Lock Table entry

Exclusive owner
Shared lock status
00
1 / 0 / 0 / 0 / 0 / 0 / 0 / 0
5
...
01
10
0 / 0 / 0 / 0 / 0 / 0 / 0 / 0
...
17
DB2 L-Locking overview
  • Types of L-Lock Contention
  • False Contention
  • 2 DB2s request incompatible locks on two
    different database objects, belonging to the same
    hash class
  • XES Contention
  • 2 DB2s request f.i. an IS- and IX-lock, but
    changed to S- and X-lock respectively by XES
  • Real (IRLM) Contention
  • 2 DB2s request incomatible locks on the same
    database object

18
DB2 L-Locking overview
  • Resolution of L-Lock Contention
  • False Contention
  • solved by XES informing the IRLMs to grant the
    lock
  • XES Contention
  • solved by XES, by giving control to the IRLM
    contention exit of the Global Lock Manager
  • Real Contention
  • Wait . until the other process terminates

19
DB2 P-Locking overview
  • Depends on the OPEN/CLOSE status of the
    tablespace/indexspace
  • At Open page set
  • from None ---gt Read only ( _ ---gt IS)
  • At First update
  • from Read Only ---gt Read/Write (IS ---gt IX)
  • At Pseudo Close (no update for PCLOSEN/PCLOSET)
  • from Read/Write ---gt Read Only (IX---gt IS)
  • At Physical Close (no activity for
    PCLOSEN/CLOSET CLOSE YES pseudo closed)
  • from Any ---gt None (IS/IX ---gt _)

20
DB2 P-Locking overview
  • P-locks are kept by each individual IRLM
  • Are promoted to the CF
  • when promoted to the CF, conflicting P-locking
    can be detected
  • solved by a process of negotiation
  • P-lock mode changes
  • Result of negotiation triggering/stopping the
    Inter DB2 Read Write (IDRW)-status (page set is
    becoming Group Buffer Pool Dependent)
  • GBP-behaviour FORCE AT COMMIT

21
DB2 P-Locking and GBPools
22
Example flow in GBP(cfr IBM)
23
DB2 Group Buffer Pools (cfr IBM)
24
DB2 Group Buffer Pools (cfr IBM)
25
DB2 Group Buffer Pools (cfr IBM)
26
DB2 Group Buffer Pools (cfr IBM)
27
DB2 Group Buffer Pools
  • Are divided in 2 parts directory data
  • Cross-invalidation reasons
  • the CF receives a updated version of a page by
    DB2A that same page must be marked invalid in
    the local bufferpool of DB2B (if DB2B wants a
    fresh copy it must be read from the CF)
  • the number of directory entries is too small
  • when all directory entries are exhausted, and a
    new page must be registered in the CF
  • one of the existing directories (with clean
    pages) is choosen
  • the DB2s having the page referenced by that
    directory receive a XI signal for that page
  • the new page is registered in the freed directory
    entry

28
P-locks and Row Locking (cfr IBM)
29
DB2 sharing managing
  • Managing roles accomplished by each DB2
  • the Global Lock Manager is the DB2 subsystem
    generating the first Update-lock on the resource
    resolves conflicting lock situations
  • the GBP Structure owner is the DB2 subsystem
    which first connects to the GBP monitors the
    GBP level threshold, and constructs a list of
    page names to be castout
  • triggers the GBP checkpoint
  • the pageset castout owner is the DB2 subsystem
    which first updates the pageset ownership
    reassigned to another updating DB2 at
    pseudo/physical close
  • castout via private buffers of pageset castout
    owner

30
IMS sharing overview(cfr IBM)
31
IMS/DB2 use of Cache in CF
  • STORE IN CACHE
  • DEDB/VSO
  • read of a VSO-CI ALWAYS brings the CI into local
    storage from DASD and caches it in the CF
  • PRELOAD
  • DB2
  • pages read SOMETIMES promoted to CF update on
    GBP-dep TS
  • GBPCACHE ALL
  • STORE THROUGH CACHE
  • OSAM - not activated
  • DIRECTORY ONLY CACHE
  • OSAM VSAM

32
IMS sharing behaviour
  • IMS
  • IMS V6 installed ---gt possibility of DEDB/VSO
  • impact on the cache structures
  • a couple of heavily used DBs pseudo-SPA /
    branch-protocol
  • use of IRLM will trigger BLOCK LOCKs
  • used to serialize updates to a block (CI) by
    different IMS systems, for Full Function (will
    not replace existing database record locks)
  • ---gt additional overhead impact on the lock
    structures
  • IBM message do not panic, but watch out for BAD
    applications - they can only get WORSE pay
    especially attention to deadlocks
  • ONE advantage by putting SHRLVL 3 on the
    datasets, IMS begins notifying ALL the locks to
    the Lock Structure in the CF ---gt effect on the
    Lock Structure can easily be measured in the
    starting phase

33
IMS sharing behaviour
  • IMS - potential reasons for WORSE behaviour of
    the Full Function applications
  • For Full Function DB Updates, a new type of lock
    Block Locks (OSAM, VSAM ESDS VSAM KSDS)
    also for pointer updates
  • Block Locks always kept until Sync point always
    private attribute
  • Block Lock always prohibiting concurrent update
    on 2 IMSs to the same CI (lt---gt DB record lock in
    non-sharing environment)
  • Block Locks even in 1 IMS-system, concurrent
    erase and insert/update on KSDS CI not possible
    anymore
  • CI-split Block Lock on the CI
  • TWIN ROOT Pointers in HIDAM maintenance of the
    bi-directional pointers implies extra locking on
    neighbour
  • Dataset Busy Locks in combination with Block
    Locks deadlocks !

34
Ex-A and Ex-G environments
  • A-side
  • (system testing)
  • Development
  • Test
  • Quality Assurance (test cases to be maintained by
    Development Teams)
  • Production
  • G-side
  • (system testing)
  • Development
  • Acceptance (regular refreshes from Production)
  • Production

35
FORTIS environments
  • (system testing)
  • sharing OK at T-16 months - phase 1 gt PLXT
  • Development
  • Test
  • Acceptance (regular refreshes from Production)
  • sharing OK at T- 10 months - phase 2 gt PLXB
  • Production
  • sharing OK at T- 7 months - phase 3 gt PLXA
  • Quick Fix (for testing emergency fixes in Prod)

36
Starting situation
  • IMS V6
  • Block-level data sharing possible for DEDB-VSO
  • Block-level data sharing possible for DEDB using
    SDEP segments
  • makes Vtam Generic Resources possible
  • NO use of shared message and fast path EMH
    queues, in the first phase
  • DB2 V5
  • very stable
  • is V6 stable enough ?
  • V5 functionality is OK

37
Starting what to expect ?
  • DB2
  • use of the Group Buffer Pool is a general option
  • FULL sharing ---gt every table at every moment
    potential GBPdependent
  • difficult to foresee what will happen in real
    life wait and see ?
  • locking DB2 tries to be more intelligent than
    IMS
  • only lock registration in the CF if really
    necessary
  • is nice BUT what can we expect wait and see ?
  • IMS and DB2 figures as of March 2001
  • transaction load/day pure IMS/IMSDB2
    3.750.000/1.250.000
  • space IMS DB2 gt 1 Tbyte
  • mid-september 2001 trx load expected to double
    / 70 space increase

38
Starting what to expect ?
  • Important mix of ex-A ex-G applications
  • Effort on the merger - not on the tuning of the
    applictions
  • IMS gets higher priority than DB2 (75 in IMS)
  • Decision to take for IMS DEDB-VSO which DBs
    and to which extent stored in the CF
  • private pool for each AREA in the CF better
    follow up and tuning
  • 4 DEDB-VSOs in CF
  • 2 PRELOAD - small DBs (lt 200K)
  • 1 LOOKASIDE - protocol DB for communication with
    the branches
  • 1 pseudo-SPA

39
Starting first experiences
  • IMS - March 2001
  • on PLXB set up the batch workload (Scheduling
    Environments OK)
  • start with BMP balancing
  • scripts for TPNS stress testing for a selected
    set of transactions, concurrent with selected
    BMPs
  • pushing the system to see what happens
  • not realistic too high transaction load
    simulated, with too few different transactions
  • Nevertheless IMS SYSTEMs BLOCKED
  • gt PTFs to be applied

40
Starting - first experiences
  • IMS - April 2001
  • putting SHRLVL 3 for IMS datasets, on PLXA CF
    IOs peaking at 17K per sec during online
  • CF capacity 45K-50K IOs per sec ?
  • ---gt action needed to limit the usage of the CF
    structures
  • meanwhile SHRLVL 3 desactivated
  • action started to understand the reason of the
    high lock numbers
  • IBM-contacts Pay Attention to the number of
    daily deadlocks
  • FORTIS 200-300 deadlocks/day considered to be
    (too) high
  • in data sharing environment a certain
    multiplication factor may be expected
  • SHRLEVEL 3 reactivated for a limited set of
    DBs

41
Starting - first experiences
  • IMS - May 2001
  • working in 2 directions in parallel
  • find out the highest lock-generators
  • understand the deadlock-reports
  • identify PSBs where procopt GOT would make sense
  • investigate the potential gains of removing TWIN
    ROOT pointers on HIDAM root segments
  • started developing a tool to extract info from
    IMS lock traces
  • some people are becoming specialists in locking
    behaviour of IMS

42
Starting - actions follow up
  • IMS - MAY 2001 - actions results
  • lock trace showed 25 DBs responsible for 90 of
    locks
  • 2 DBs responsible for 40 of deadlocks (total of
    300-400 deadlocks per day)
  • related to high overflow usage and contention on
    IOVF
  • compression solved the problem for the 2 DBs
  • highly referenced DBs with low level update
    lots of PCBs found with Procopt A
  • automated change by DBA to Procopt GO without
    application changes
  • savings 600K locks per day
  • starting removing Twin Root pointers for a
    selected list of HIDAMs

43
Starting - actions follow up
  • IMS - JUNE 2001
  • detailed analysis of lock traces
  • lock peaks for particular DBs identified
    processes active at the peak-time identified and
    investigated badly-coded calls adapted
  • Development teams contacted to change PCB Procopt
    to GOT where mass change by DBA not feasible
    (matching IMS LOGS lt---gt PCB definitions)
  • by the end of June online lock registration at
    8K IOs per sec, with peaks of 10K-11K IOs per
    sec (50 gain compared to start)
  • Overnight rate up to 27K IOs per sec (mid-june
    BMPs in sysplex)
  • deadlock rate down to 100 deadlocks per day (from
    400)
  • further automation of the deadlock detection
    automated daily list of involved DBs /
    transactions / jobs mailing to DBA

44
Starting - actions follow up
  • IMS - JULY 2001
  • detailed analysis of overnight lock traces
    started
  • a number of BMPs identified as to examine
  • majority high GU/GN activity (gt 1.000.000)
  • Procopt changed to GOT where possible
  • PCB added ---gt 2 PCBs (1 for read, 1 for update)
  • Deadlocks continued to require a lot of
    attention possible actions
  • compression of Fast Path data (less in IOVF)
  • higher checkpoint frequency (at most 1 second
    during online)
  • use of GOT wherever possible
  • adjustment CI sizes/freespace on certain
    highly-accessed indexes
  • (release lock by using extra call with high
    key) - rarely applied
  • End of July 70-150 deadlocks/day across both
    systems in the sysplex

45
Starting - actions follow up
  • IMS - August 2001
  • during the first half of August online network
    gradually opened
  • ---gt steep increase in deadlocks across a number
    of related transactions
  • every case involved already known, but rate was
    now sometimes 100-fold
  • mostly due to hotspot indexes
  • Fortis relatively high number of secondary
    indexes (dixit IBM), thus greater potential for
    hotspots
  • Development Teams responsive, but
    unfortunately, for a couple of deadlocks the most
    obvious (and only) solution was creating
    affinities via MSC, for MPPs and BMPs in order to
    force them to run on the same system

46
Starting - actions follow up
  • IMS - September 2001
  • beginning of September, before the BIG MOVE
    between 3K (daytime) and 6K (overnight) IOs/sec
    on lock structure, on average
  • after the BIG MOVE stabilisation at 250-300
    deadlocks per day
  • higher than we liked to see, but less than feared

47
Starting - actions follow up
  • DB2 - JUNE 2001
  • Open question regarding locking do we need to
    do the same effort as for IMS, by pushing
    ISOLATION(UR) ?
  • Default all Plans/Packages bound with
  • ISOLATION(CS) - sometimes UR on individual SQL
    statements
  • RELEASE(COMMIT)
  • CURRENTDATA(NO)
  • aware of potential less Lock Avoidance (GCLSN
    instead of CLSN)
  • a limited number of heavily executed SQLs WITH
    UR added
  • Decision not to change the BIND-parameters and
    count on DB2 for optimization of lock
    registration in the CF gt REBIND BMP with
    RELEASE(DEALLOCATE) can be done very quickly in
    case of too many Global Locks

48
Starting - actions follow up
  • DB2 - August 2001
  • Estimating size of Group Buffer Pools
  • based on the number of Buffers Written (versus
    Buffers Updated) and the elapsed time we want
    the page to be available in the Group Buffer Pool
    for reuse by another DB2
  • f.e. in peak 200000 buffers written on 10 minutes
    to be able to keep up for 5 minutes a Group
    Buffer Pool of 1000004K needed
  • Avoid XI caused by too few directory entries
  • made the sum of all number of pages in local
    virtual bufferpools hiperpools Group Buffer
    Pool total pages
  • ratio directory entries/cache directory
    entries gt total pages
  • gt never XI due to insufficient directory
    entries

49
Starting - actions follow up
  • DB2 - August 2001
  • Avoid flood of committed updated pages to the
    Group Buffer Pool when transaction/BMP commits
  • trying to have a smooth process of emptying the
    Virtual Buffer Pool
  • VDWTH set to 64 pages - makes transfer of updated
    pages to the Group Buffer Pool asynchronously
  • Avoid filling up of Group Buffer Pool avoidance
    of insufficient number of CASTOUT engines
  • class castout 1 ( minimum)
  • GBP treshold 10
  • GBP checkpoint 30 minutes

50
Miscellaneous actions
  • Avoid U3307 abends (CF Lock table full)
  • IMS LOCKMAX parameter adapted downwards new
    default is 15K
  • DB2 via combination of NUMLKUS
    NUMLKTS/LOCKSIZE/LOCKMAX
  • Ultimate solution for bad application behaviour
  • Affinity on one system (via MSC), eventually
    serialisation
  • Affinities to be avoided as much as possible is
    not in line with the sysplex philosophy
  • Review randomizing module to avoid synonyms
    pointing to the same RAP

51
Miscellaneous actions
  • Passing commands always scope GLOBAL
  • Checks the correct execution of the command on
    both systems
  • if OK, continue with next step
  • if NOK on at least one system, rollback on all
    systems
  • IMS AOI (local) APPC(remote) implementation
  • if one IMS/DB2 does not respond job in abend
  • possibility to bypass this abend (in case of
    maintenance)
  • Resynchronization of IMS-DB-status and TRX
    classes on IMS startup
  • based on extract of last system checkpoint of
    last stopping IMS system
  • based on the active IMS if one is still active

52
Miscellaneous actions
  • Affinities defined for reasons of
  • external connections BCNL, MERVA, CSFI,
  • deadlock avoidance
  • serial definition of transaction
  • for protocol reasons (input device no human being
    f.e.)
  • for performance problems (only 3 instances)
  • all trxs and BMPs processing (GET) MQSeries
    messages queues defined on one system (PUT no
    problem due to MQ-clustering-implementation)

53
(No Transcript)
54
Used Structures (1/2)
  • size max IO/sec
  • DB2 - Group Buffer Pools (duplex) 1.2 GB
    4.250 - Lock structure 128 MB 1.200
    - Shared Communication Area 49 MB 10
  • IMS - VSAM structure 20 MB 1.700 -
    OSAM structure 6 MB 3.200 - Lock
    structure 64 MB 25.000 - Fast Path VSO
    DBs 144 MB 1.250 - Shared Queue
    (test)
  • XCF signalling paths 28 MB 2.400
  • GRS star 8 MB 100
  • RACF 85 MB
  • DFSMS/HSM Record Level Sharing (test)

55
Used Structures (2/2)
  • size max. IO/sec.
  • JES2 primary checkpoint 20 MB 100
  • LOGREC (test)
  • OPERLOG 16 MB 1.750
  • MIM 22 MB 3.000
  • Shared Queue MQ (test)
  • Resource Recovery System (test)
  • Enhanced Catalog Sharing 1 MB
  • XBM 35 MB
  • VTAM Generic Resources 9 MB 300

56
Workload balancing
  • BATCH WLM Scheduling Environments WLM
    Managed Initiators.
  • ONLINE VTAM Generic Resources.

57
WLM Scheduling Environments
//JOBNAME JOB ...,SCHENVIMP1 //STEP1 EXEC
IMSBATCH
58
WLM Scheduling Environments
//JOBNAME JOB ...,SCHENVIMP2 //STEP1 EXEC
IMSBATCH
59
WLM Scheduling Environments
//JOBNAME JOB ...,SCHENVIMP //STEP1 EXEC
IMSBATCH
60
IMSBATCH procedure
61
IMSBATCH procedure
IMASP1
... // SET IMULOAD'I10.IM.CA.IMSP1.USERLIB', //
IMSID'IMSP1',
...
IMASP
... // SET IMULOAD'I10.IM.CA.IMSP.USERLIB', //
IMSID'IMSP', ...
IMASP2
... // SET IMULOAD'I10.IM.CA.IMSP2.USERLIB', //
IMSID'IMSP2',
...
... // INCLUDE MEMBERIMASENV //G EXEC
PGMDFSRRC00,REGIONRGN, //
PARM(BMP,MBR,PSB,IN,OUT, //
OPTSPIETESTDIRCA,PRLD, ...
// DD DSNIMULOAD,DISPSH
R ...
62
(No Transcript)
63
BMP restart - JES2 exit4
  • PROBLEM !
  • A BMP that abends has to be restarted on the same
    IMS subsystem where it was running.

64
BMP restart - JES2 exit4
//JOB1 JOB ...,SCHENVIMP //STEP1 EXEC
IMSBATCH
65
BMP restart - JES2 exit4
//JOB1 JOB ...,SCHENVIMP //STEP1 EXEC
IMSBATCH
66
BMP restart - JES2 exit4
When job starts and flag exists gt RESTART
//JOB1 JOB ...,SCHENVIMP //STEP1 EXEC
IMSBATCH
67
BMP restart - JES2 exit4
When job starts and flag exists gt RESTART
//JOB1 JOB ...,SCHENVIMP1 //STEP1 EXEC
IMSBATCH
68
(No Transcript)
69
Status today
  • Daily gt 8,000,000 transactions
  • 6,000,000 IMS-only
  • 2,000,000 IMS DB2 (/- 250,000,000 SQLs/day)
  • peak up to 400 trxs/sec
  • branches equally split amongst the 2 IMS-systems
  • BUT WLM still favors SYSA (especially for BMPs)
  • f.e. total SQL statements during online hours
    85 on SYSA/ 15 on SYA2 (85,000,000 on
    SYA1/15,000,000 on SYA2)
  • Locking
  • IMS 0.3 True versus 0.1 False Contention
  • DB2 2-3 True versus 0.5 False Contention (no
    real effort done gt still optimization to do)

70
Second half of 2002 4-Way
IMP4
IMP2
IMP3
IMP1
DBP4
DBP3
DBP2
DBP1
DBPI
71
In the Pipeline
  • Installing MQSeries 5.2
  • non-persistent messages in Coupling
  • IMS Shared Queue
  • testing on PLXT
  • MQSeries 5.3
  • persistent messages in Coupling - nothing planned
    yet
Write a Comment
User Comments (0)
About PowerShow.com