Log Manager - PowerPoint PPT Presentation

About This Presentation
Title:

Log Manager

Description:

Often want to query table via one attribute or another: . RMID, TRID, timestamp, ... trid TRID, -- id of transaction that wrote this record ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 30
Provided by: pagesC
Category:
Tags: log | manager | trid

less

Transcript and Presenter's Notes

Title: Log Manager


1
Log Manager
  • Jim Gray
  • Microsoft, Gray _at_ Microsoft.com
  • Andreas Reuter
  • International University, Andreas.Reuter_at_i-u.de

Mon
Tue
Wed
Thur
Fri
900
Overview
TP mons
Log
Files Buffers
B-tree
1100
Faults
Lock Theory
ResMgr
COM
Access Paths
130
Tolerance
Lock Techniq
CICS Inet
Corba
Groupware
330
T Models
Queues
Adv TM
Replication
Benchmark
700
Party
Workflow
Cyberbrick
Party
2
Log Concept
  • Log is a history of all changes to the state.
  • Log old state gives new state
  • Log new state gives old state (not in this
    picture)
  • Log is a sequential file.
  • Complete log is the complete history
  • Current state is just a "cache" of the log
    records.

Archive
3
How Log is Used
  • Recovery from faults A redundant copy of the
    state and transitions
  • Security audits Who did what to whom. Often
    too low-level for this.
  • Performance Monitor Accounting But only
    records changes (not reads).
  • ISSUES Who should be allowed to read the
    log? It is a security hole. Must authorize
    access on a per-record basis.

4
The Log Manager in the Scheme of Things
Archive
Manager
SQL Other
Transaction Manager
Resource Managers
Lock Manager
Log Manager
Buffer Manager
File Manager
Operating
System
File
Media Manager
System
  • Interesting thing is the cycle Need log to
    recover archive to recover log.Break the cycle
    with a bootstrap file.

5
Log Is a Sequential File.
  • Encapsulation of the log it is a shared
    resource.
  • Startup Log manager holds startup info for all
    others.
  • Careful writes Log manager provides a
  •  High performance.
  •  Very reliable
  • Semi-infinite
  • Archived
  • Sequential file.
  • Some RMs keep private logs anyway.
  • (Notably PORTABLE DB systems.)
  • Then user or system has to manage multiple logs

6
The Log Table
  • Log table is a sequential set (relation).Log
    Records have standard part and then a log
    body.Often want to query table via one attribute
    or another . RMID, TRID, timestamp,
  • create domain LSN unsigned integer(64) -- log
    sequence number (file , rba)
  • create domain RMID unsigned integer -- resource
    manager identifier
  • create domain TRID char(12) -- transaction
    identifier
  • create table log_table (
  • lsn LSN, -- the records log sequence number
  • prev_lsn LSN, -- the lsn of the previous
    record in log
  • timestamp TIMESTAMP, -- time log record was
    created
  • resource_manager RMID, -- resource mgr that
    wrote this record
  • trid TRID, -- id of transaction that wrote
    this record
  • tran_prev_lsn LSN, -- prev log record of this
    transaction (or 0)
  • body varchar, -- log data rm understands
    it
  • primary key (lsn) -- lsn is primary key
  • foreign key (prev_lsn) -- previous log record
    in this table
  • references a_log_table(lsn), --
  • foreign key (tran_prev_lsn) -- transaction's
    prev log rec also in table
  • references a_log_table(lsn), --

7
Log is complete history
B files
Log Table
A files
Log Anchor
lsn
prev_lsn
resource_mgr
trid
tran_prev_lsn
Archive

body
  • Log anchor points at chain of each transaction.
  • May maintain other chains.
  • Log records map to sequence of N-plexed files
  • Old files are archived.
  • Eventually, archive files are discarded (weeks,
    months, never)

8
The Log LSN
  • Each log record has a logical sequence number.
  • This number (LSN for Log Sequence Number) plays a
    key role in many algorithms.
  • Key property MONOTONICITY
  • If action A happened after action B then
  • LSN(A) gt LSN(B).

9
Reading The Log
  • long log_read_lsn( LSN lsn, / lsn of
    record to be read /
  • log_record_header header, / header fields
    of record to be read /
  • long offset, / offset into body to
    start read /
  • pointer buffer, / buffer
    to receive log data /
  • long n) / length of buffer
    /
  • LSN log_max_lsn(void) / returns the current
    maximum lsn of the log table./
  • Read with C (see next slide) or SQL
  • long sql_count( RMID rmid) / count log records
    written by this rmid /
  • long rec_count / count of records /
  • exec sql SELECT count () / ask sql to
    scan log counting records /
  • INTO rec_count / written by the
    calling resource mgr and /
  • FROM log_table / place count in the
    rec_count /
  • WHERE resource_manager rmid /
    /
  • return rec_count / return the answer. /

10
Reading the Log SQL is easier than C
  • long c_count( RMID rmid) / count log records
    written by this rmid /
  • log_record_header header / structure to
    receive log record header /
  • LSN lsn / log sequence number of next log
    rec /
  • char buffer1 / null buffer to receive log
    record body. /
  • long rec_count 0 / count of records /
  • int n 1 / size of log body returned /
  • if (!log_open(READ)) panic() / open the log
    (authorization check) /
  • lsn log_max_lsn( ) / get most recent lsn
    /
  • while (lsn ! NullLSN) / scan backward through
    the log /
  • n log_read_lsn( lsn, / lsn of record to
    be read /
  • header, / log record header fields /
  • 0L, buffer, 1L )/ log rec body
    ignored. /
  • if (header.rmid rmid) / if record written
    by this RMID then /
  • rec_count rec_count 1 / increment
    count /
  • lsn header.prev_lsn / go to previous
    LSN. /
  • / loop over LSNs /
  • logtable_close( ) / close log
    table /
  • return rec_count / return the answer. /
  • / /

11
Writing The Log
  • Add a log record, Log manager fills in
    header.LSN log_insert( char buffer, long n)
    / log body is buffer0..n-1 /
  • Force log up to a certain LSN to persistent
    storage LSN log_flush( LSN lsn, Boolean lazy)
    / / (lazy waits for a batch write or timeout
    boxcar)
  • Note many real interfaces allow some ofempty
    buffer to allow RM to fill it in (avoids data
    copies)incremental copy build the "buffer" in
    steps.gather take log data from many buffers.
  • Few offer SQL access to the log.

12
Summary Of Log Structure And Verbs
B file
A file
durable
storage
log page header
Log pages
in buffer pool
empty page in
Pages written in next write
buffer pool
current end of log
end of
header
durable
Log Table
body
log
  • Operations Open/Close
  • Read(LSN),
  • Insert(body),
  • Flush(LSN)
  • SQL read operations.

13
Log Anchor Logging and Locking
typedef struct filename tablename / name
of log table / struct log_files/ A
B file prefix names active file
/ xsemaphore lock / semaphore
regulates log write / LSN prev_lsn / LSN
of most recent write / LSN lsn
/ LSN of next record / LSN
durable_lsn / max lsn in durable
storage / LSN TM_anchor_lsn / lsn of
trans mgr's last ckpt / struct / array of
open log parts / long partno /
partition number / int
os_fnum / operating system file /
part MAXOPENS / /
log_anchor / /
  • Log records never updated only inserted and
    read.
  • So no locks needed on log.
  • Semaphore (or something) needed on "end" of log
    to manage space/growth/LSN for inserts

14
Making Optimistic Log Reads Work
  • Log is duplexed.
  • Log manager reads only one copy of the page.
  • What if the "other" copy has more data?
  • Trick
  • read BOTH copies of FIRST and LAST page in log.
  • Other pages have "full" flag and a timestamp.
  • IF not full or timestamp lt prev_timestamp THEN
  • read other page and take highest timestamp
  • Torn log pages
  • Log page consists of disk sectors (512B).
  • Write may only write some sectors.
  • How detect missing fragments?
  • 1. Checksum?
  • 2. Byte stuffing stuff a parity byte on each
    page

15
Log Insert
  • Log semaphore covers
  • Incrementing LSN
  • Finding the log end
  • filling in the page(s)
  • allocating space on a page, perhaps allocating
    new pages.
  • LSN log_insert( char buffer, long n) /
    insert a log record with body buffer0..n/
  • / Acquire the log lock (an exclusive semaphore
    on the log) /
  • Xsem_get(log_anchor.lock) / lock the log end
    in exclusive mode /
  • lsn log_anchor.lsn / make a copy of
    the records lsn. /
  • / find page and allocate space in it. /
  • / fill in log record header body /
  • / update the anchors /
  • log_anchor.prev_lsn lsn / log anchor lsn
    points past this record /
  • log_anchor.lsn.rba log_anchor.lsn.rba
    rec_len / /
  • Xsem_give(log_anchor.lock) / unlock the log
    end /
  • return lsn / return lsn of record just
    inserted /

16
Log Write Demon
  • Log Semaphore can be a hotspot so No IO under
    semaphore
  • Allocation (OS requests), and Archiving is done
    in advance.
  • Flush to persistent storage (disc) is done
    asynchronously.
  • Demons driven by timers and by events (requests)
  • Demons need not touch end-of-log semaphore

log daemon
log daemon
to flush
to allocate
(carefully write)
new log files
log pages as needed
as needed
log data in shared
memory and on disc
17
Careful Writes
  • If partial pages may be written then subsequent
    write may invalidate previous write.
  • Standard technique Serial Writes write one
    page then write the second page.Problem 1/2
    disc bandwidth, 2x delay.
  • Ping-Pong techniqueNever overwrite good page
    Ping-Pong between I and I1When complete, assure
    that page I has final data Never worse than
    serial write, generally 2x better.
  • Also note the careful techniques for optimistic
    reads and torn pages.

New Log
18
Group Commit (Boxcaring)
  • Batch processing of log writes.
  • If receive 1,000 log force requests/second
  • why not just execute 50 of them?
  • Response time will be the same (20ms).
  • IOs will be 20x fewer
  • CPU will be 10x smaller (10x fewer dispatches,
    20x fewer OS IO).
  • Without it, systems are limited to about
  • 50tps no ping-pong
  • 100tps ping-pong.
  • With it, systems are limited to disc bandwidth
    gtgt10ktps.
  • Group commit threshold can be set automatically.

19
WADS- Giving the Log Disc Zero Latency
  • Log disc is dedicated, so only has rotational
    latency.
  • Reserve some cylinders on the disc as scratch.
  • For each write
  • Write at current position on next track (zero
    latency).
  • When have a full-track (or two) of log data
  • consolidate the write in ram
  • do a single LARGE write (100KB 1 rotation) to
    the log.
  • cost of this is seek rotation 20ms.
  • This reserved area is called the Write Ahead Data
    Set (WADS).
  • At restart
  • read cylinders
  • gather recent log data
  • rewrite end of log.
  • RAID Write Cache makes this obsolete (if it
    works).

20
Log Normal Use
  • Transaction UNDO During Normal Operation
  • Transaction log anchor needed during normal
    operation
  • Points to most recent log rec of that
    transaction.
  • Follow the transaction prev_lsn chain.
  • EASY!

21
The Log Anchor Where It All Starts
  • REDO/UNDO at System / RM Restart.
  • Need to bootstrap the most recent log state.
  • Log manager is the first to restart
  • Helps Transaction Manager recover
  • Transaction manager helps Resource mangers
    recover.
  • Alternate design (each RM has its own log).
  • All this depends on rebuilding the log anchor.

Log Anchor
Transaction Manager
The Log
Checkpoint Record
Previous Transaction
Resource Manager
Checkpoint Records
Manager Checpoint Record
22
Preparing For Restart Careful Write of Log
Anchor
  • Use the "standard" careful write techniques
  • Put the anchor in a special well-known place(s)
  • Ping-Pong to 2 or more copies
  • Timestamp each copy
  • N-plex the copies on devices with independent
    failures.
  • Align copies so that writes are "atomic"
  • Accept most recent copy on pessimistic reads.
  • Now TM and RMs can bootstrap their anchors are
    in the log.

23
Finding the End of the Log
  • Find the anchor
  • If using WADS, go to the WADS area and write log
    end.
  • else Scan forward from the most log-anchor lsn
  • Read optimistic all full pages.
  • At 1/2 full page or bad page read pessimistic.
  • Now have end-of log.
  • Finish 1/2 finished record at end of log and
    give to TM

Half-finished record
Pages
Invalid Page
End of log
Pages
End of log
24
Archiving The Log And "Old" Transactions
  • What if transaction/RM low water mark is 1-month
    old?
  • Abort?
  • Copy aside copy the undo/redo log records to a
    side file
  • Copy forward copy the undo/redo log records
    forward in the file.
  • Dynamic logcopy undo records aside (so can
    online-undo if needed).
  • All advance the low water mark.

25
Archiving the Log Online
Archive
Staggered
1
2
3
Allocation of
Log Tables on
2
1
3
Secondary Storage
Log
2
3
1
26
The Safety Spectrum
  • Just UNDOtransactional storage (no durable log)
  • Just Online Restart keep simplexed durable log.
  • Online plus Off-line Archive (no single point of
    failure) periodic copies of dataduplex log
  • Electronic vaultingarchive copies and duplexing
    is done to remote site.via fast communications
    links (or Federal Express).

27
Multiple Logs?
  • Transaction Manager has a log (DECdtm, MS-DTC,)
  • Transaction Monitor has a log (CICS, Tuxedo,
    ACMS,...)
  • Each DB instance (3 Oracle, 2 Informix, 4 Rdb)
    has a log.
  • Some have 3 logs UNDO, REDO, SNAPSHOT.
  • ConsLots of tapes/files.Lots of IOs at
    commitLots of things to break.
  • ProsPortablePerformance (in the 1 RM case)
  • You decide

28
Client/Server Logging
  • One server design (can be process pair)Well
    known log server in the net.Client sends a BATCH
    of log records to the server.Gets back a
    LSNUses "local" LSNs for his objects.Log
    servers can be N-plexed processes.
  • Multi-server designClient forms a quorum
    (majority of servers).Client sends log batch to
    all, gets back N-LSNs.
  • If less than majority, client must poll ALL N
    serversServers synchronize their "logical" logs
    as "sum" of physical logs (need a majority).

29
Summary
  • Log is a sequential file
  • Contains entire history of DB
  • Many tricks to write it efficiently and carefully
  • Many tricks to archive and recover it
Write a Comment
User Comments (0)
About PowerShow.com