Crash Data Collection and Analysis - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Crash Data Collection and Analysis

Description:

firefox.exe. shlwapi.dll. 6.0.2800.1106. explorer.exe. hungapp. 5.3.4.21. CreateCD50.exe ... firefox.exe.9.0.0msvcrt.dll7.0.2600.1106... user1. machine1 ... – PowerPoint PPT presentation

Number of Views:372
Avg rating:3.0/5.0
Slides: 42
Provided by: arch1
Category:

less

Transcript and Presenter's Notes

Title: Crash Data Collection and Analysis


1
Crash Data Collection and Analysis
  • Archana Ganapathi
  • Department of EECS, UC Berkeley
  • (archanag_at_cs.berkeley.edu)

2
ROC Philosophy
  • If a problem has no solution, it may not be a
    problem, but a fact, not to be solved, but to be
    coped with over time. Shimon Peres

3
Motivation (1)
  • Collect real failure/attack information
  • to drive benchmarks
  • evaluate our ideas via prototypes tested using
    benchmarks
  • help us select problems to tackle

4
Motivation (2)For the good of the industry
  • Determine dominant cause Windows crashes
  • Document crash likelihood of SW/HW components
  • Discovering product dependability
  • Build oracle for system behavior

5
Motivation (3)My machine crashes
  • Since 2/25/04
  • 3 system crashes
  • 30 application errors
  • 177 application hangs
  • Who cares?
  • I do!
  • People who share similar experiences
  • In general, customer uproar

6
Data Source (so far)
  • 300 research machines in the EECS department
  • Windows XP SP1
  • Relatively low variability in user profile
  • Caveat data might be unrepresentative of many
    users

7
Data Collection
  • Collect minidumps that contain
  • The Stop message/parameters/data
  • Loaded drivers
  • Processor context for processor that stopped
  • Process info/kernel context for process/thread
    stopped
  • The Kernel-mode call stack for thread that stopped

8
How we collect minidumps
  • Corporate Error Reporting
  • http//www.microsoft.com/resources/satech/cer/
  • Manage error reports/msgs generated by WER and
    other programs
  • Configure clients to redirect reports to CER
    shared directory

9
CER setup
  • Modify group policy for sending error reports to
    directory in local server
  • Easy-to-install software for server
  • Use software to configure/view
  • CER shared directory and policies
  • Per machine/user crash statistics

10
Crash reporting
  • Transparent to user
  • No prompt for user to send crash report
  • Auto direct crash reports to local server instead
    of Microsoft
  • Frequency of collection
  • synchronized with application and system crashes
    on computers

11
Parsing the crash dumps
  • Use Microsofts publicly available debugging
    tools
  • Load crash dump and analyze using
  • symbol server
  • executable images

12
Analysis results
  • What happened that is immediately responsible for
    the crash
  • exact error code
  • brief description, primarily for debugging
  • Bucketing info, e.g. "driver fault"
  • Details for debugging, e.g. stack contents

13
Sample Results
14
Raw Crash Data is Skewed
  • User retry
  • System instability period
  • Inter- and intra-app related crashes

15
Sample Results
16
Sample Results
17
Time-based Crash Filter
18
What do we want to know from this data?
  • Do some apps appear to crash more than others?
  • Are some dlls more highly correlated with crashes
    than others?
  • Usage, design etc

19
Common Hypotheses
  • Microsoft OS is unreliable and causes most
    crashes
  • Third party dlls arent as well written as
    Microsoft dlls
  • Different applications in the same category crash
    more than others

20
Application Categories
21
User behavior study
  • How regularly do people use different
    applications?
  • Do people behave differently if different things
    crash?
  • How frequently do people proactively reboot their
    computer for system stability?

22
Preliminary User Survey
23
Crashes by app category
24
Digging deeper
  • Based on stack and symbols
  • Microsoft tools categorize cause of crashes
  • Our analysis is only as accurate as their
    analysis tools allow us to be

25
Crashes by application
26
Crashes by component
27
Who is responsible for these faulty components?!
  • Hungappno one and everyone
  • Dllswhoever writes them
  • Exeswhoever writes them

28
Applications that hang
29
Bad components
30
Who wrote the crashing components?
31
Application Dependencies
App a
App b
App c
Time
0 5 10 15 20
25 30 35
40
32
Inter-app dependencies
33
System Crashes
  • XP operating system crashes very rarely!
  • More impactingmost require reboot
  • 4/490 isnt enough data to study system
    crashes!!
  • Copied system crash dump files from peers
    C/windows/minidump directory

34
50 System Crashes
  • CLASSPNP.SYS 2
  • win32k.sys 2
  • SynTP.sys 1
  • TDI.SYS 1
  • ino_fltr.sys 1
  • ks.sys 1
  • drvnddm.sys 1
  • ntkrnlmp.exe 1
  • Pool_Corruption 1
  • watchdog.sys 7
  • ar5211.sys 6
  • ibmpmdrv.sys 6
  • ati3duag.dll 5
  • SYMEVENT.SYS 3
  • ipsecw2k.sys 3
  • memory_corruption 3
  • ialmdev5.DLL 2
  • PSCRIPT4.DLL 2
  • ntoskrnl.exe 2

35
50 System Crashes
36
Limitations of this analysis
  • No info on how long each application was used
    before crash
  • Need metrics on system and application usage
  • Skewed by user communitys usage patterns

37
Useful Metrics
  • Availability
  • system uptime
  • CPU(s)
  • processes, processor queue length, non-idle
  • Memory
  • available physical memory, free swap space
  • Disk(s)
  • free space
  • Network(s)
  • IP address, packetsbytes sentreceived/sec

38
User survey limitations
  • Data only as accurate as user reports
  • Difficult to estimate time spent using each
    application
  • Hard to distinguish active usage from background
    processes

39
Future Work
  • Expand data set
  • Non-academic environment
  • Other operating systems
  • Perform fine-grained inter-application dependency
    analysis
  • Compare failure paths of related crashes

40
Concluding Thoughts
  • Lots of interesting patterns to be learned from
    crashes
  • Getting data is difficult
  • Thats why Im here?
  • I would love to share analyzed results with data
    contributors!

41
Questions/comments?
  • UC Berkeley ROC website http//roc.cs.berkeley.ed
    u
Write a Comment
User Comments (0)
About PowerShow.com