LKCD Linux Kernel Crash Dump - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

LKCD Linux Kernel Crash Dump

Description:

1. Dump Save Mechanism. Kernel save method chosen because: ... raw, dump end) page header with a special end marker is written and the dump process completes ... – PowerPoint PPT presentation

Number of Views:179
Avg rating:3.0/5.0
Slides: 26
Provided by: harishk
Category:
Tags: lkcd | crash | dump | kernel | linux

less

Transcript and Presenter's Notes

Title: LKCD Linux Kernel Crash Dump


1
LKCD Linux Kernel Crash Dump
  • Harish K
  • Motorola Inc.

2
  • What is LKCD?
  • Why LKCD?

3
The Journey
  • Introduction
  • LKCD Process
  • Design Considerations
  • Kernel Implementation
  • User Level Analysis (Lcrash)

4
Introduction
  • LKCD is a set of kernel and application code to
    configure, implement, and analyze system crash
    dumps
  • Objectives
  • Post-failure kernel analysis
  • Kernel problems are resolved more quickly
  • As the Linux kernel becomes more complex, the
    need for LKCD increases

5
LKCD - Process
6
LKCD Kernel Design Considerations
  • The biggest design considerations were
  • Dump Save Mechanism
  • Raw I/O vs. Buffer Cache I/O
  • Kernel Code Location
  • Dump Storage

7
LKCD Kernel Design Considerations
  • 1. Dump Save Mechanism
  • PROM Save Method
  • Crash, reset the system, and have the
    hardware's PROM save the memory image to disk.
  • Kernel Save Method
  • Crash, save the memory image to disk, and then
    reset the system

8
LKCD Kernel Design Considerations
  • 1. Dump Save Mechanism
  • Kernel save method chosen because
  • PROM/BIOS is too architecture-specific
  • reset/power-off may clear memory
  • kernel disk driver restrictions
  • code can be modified in kernel PROM code is
    difficult to make changes

9
LKCD Kernel Design Considerations
  • 2. Raw I/O vs. Buffer Cache I/O
  • Buffer cache locking prevents handling dump
    workaround without major performance hit on basic
    I/O
  • Raw I/O was not fully supported in Linux (in the
    kernel)
  • IDE, RAID, etc., drivers need raw I/O hooks
    (current plan is to create driver layer above to
    avoid necessary locking)

10
LKCD Kernel Design Considerations
  • 3. Kernel Code Location
  • Code changes are separated into generic and
    architecture-specific files
  • kernel/vmdump.c
  • arch//kernel/vmdump.c
  • Additional modifications made to
    linux/include/sysctl.h, kernel/sysctl.c, and
    kernel crash hook functions

11
LKCD Kernel Design Considerations
  • 4. Dump Storage
  • Memory dumps are saved to swap space
  • Swapping during boot-up is an issue
  • Disk partition tables in memory -- could this
    cause a data corruption problem?
  • Cannot assume filesystem layer will be available
    during crash

12
LKCD - Kernel Implementation
  • Dump Process Activation
  • Kernel Hooks for executing dump process
  • The kernel directly calls panic()
  • A kernel exception occurs due to a system fault,
    calls die_if_kernel()
  • In both instances dump_execute is called, which
    in turn calls architecture specific
    __dump_execute() to save dump to disk

13
LKCD - Kernel Implementation
  • Storing Crash Dumps

Dump Header
Dump Page Headers
Dump pages
14
LKCD - Kernel Implementation
  • Storing Crash Dumps
  • The first 64K of the crash dump contains the dump
    header, which show the system state at the time
    of the kernel failure
  • Memory pages are written next, each with a page
    header containing
  • virtual address of the page in memory
  • size of page (important if compressed)
  • page flags (compressed, raw, dump end)
  • page header with a special end marker is written
    and the dump process completes

15
Kernel Dump Tunables
  • The set of kernel dump tunable are listed in
    /etc/sysconfig/vmdump which configures the
    behavior of LKCD system
  • The tunables are
  • DUMP_ACTIVE
  • DUMPDEV
  • DUMPDIR
  • DUMP_LEVEL
  • DUMP_COMPRESS_PAGES
  • PANIC_TIMEOUT

16
User Level Analysis - LCrash
  • lcrash is a utility that generates detailed
    kernel information about crash dumps. It contains
    many features for displaying information about
    the events leading up to a system crash in a
    clear, easy-to-read manner
  • It basically operates in two modes
  • Crash Dump Report Generation
  • Interactive Crash Dump Analysis

17
User Level Analysis - LCrash
  • Crash Dump Report Generation
  • This report contains selected pieces of
    information from the kernel considered most
    useful when trying to identify the cause of a
    crash. The LCRASH report includes the following
    information
  • General system information
  • Type of crash
  • Dump of system log_buf
  • CPU summary
  • Kernel stack trace leading up to the system PANIC

  • Disassembly of instructions before and after the
    instructions that caused the crash

18
User Level Analysis - LCrash
  • LCRASH Interactive Commands
  • For a more detailed examination of the elements
    of a crash
  • Kernel data displayed in a clear, easy-to-read
    manner
  • Invoked via an ASCII command line user interface
    featuring command line editing and command
    history
  • Command output can be piped to utilities such as
    more and grep

19
User Level Analysis - LCrash
  • LCRASH Interactive Commands example
  • Stat Displays pertinent system information and
    the contents of the log_buf array.
  • Vtop Displays virtual to physical address
    mappings for both kernel and application virtual
    addresses
  • Symbol Maps kernel symbols to virtual addresses

20
User Level Analysis - LCrash
  • LCRASH Interactive Commands example
  • Dump Dumps the contents of system memory in a
    variety of bases (hexadecimal, decimal, or octal)
    and data sizes (byte, short, int, or long)
  • Task Displays relevant information for selected
    tasks or all tasks running at the time of the
    crash
  • Trace Displays a kernel stack backtrace for
    selected tasks, or for all tasks running on the
    system
  • Dis Disassembles one or more machine instructions

21
lcrash Example Output
  • stat head
  • sysname Linux
  • nodename crashme.atmyhouse.com
  • release 2.4.8
  • version 9 SMP Mon Dec 10 000519 PST 2001
  • machine i686
  • domainname (none)
  • LOG_BUF
  • dump log_buf 10
  • 0xc0332c60 4c3e343c 78756e69 72657620 6e6f6973
    Linux version
  • 0xc0332c70 342e3220 2820382e 746f6f72 74617740
    2.4.8 (root_at_cra
  • 0xc0332c80 79657265 70612e65
    shme.atm

22
lcrash Example Output
  • task
  • ADDR UID PID PPID STATE FLAGS
    CPU NAME


  • 0xc02e4000 0 0 0 0 0
    - swapper
  • 0xdfffc000 0 1 0 0 0x100
    - init
  • 0xdfff2000 0 2 1 1 0x40
    - keventd
  • 0xdffee000 0 3 0 0 0x40
    - ksoftirqd_CPU0
  • . . .
  • 0xde47a000 0 867 1 1 0x100
    - mingetty
  • 0xda0fe000 0 1017 660 0 0x140
    - sshd
  • 0xd9c06000 0 1018 1017 1 0x100
    - bash
  • 0xde4b4000 0 1101 1018 0 0x100
    0 insmod


  • 31 active task structs found

23
lcrash Example Output
  • t 0xda0fe000


  • STACK TRACE FOR TASK 0xda0fe000(sshd)
  • 0 schedule1040 0xc0111250
  • 1 schedule_timeout121 0xc0110d89
  • 2 do_select506 0xc014251a
  • 3 sys_select820 0xc01428c4
  • 4 system_call44 0xc0106ed4



24
  • Reference
  • http\\lkcd.sourceforge.net
  • Contact
  • harish_at_motorola.com

25
Questions/Comments?
Write a Comment
User Comments (0)
About PowerShow.com