Discovering Models of Software Processes form Eventbased Data - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Discovering Models of Software Processes form Eventbased Data

Description:

Discovering Models of Software Processes form Event-based Data ... A : the alphabet of tokens that make up the strings in S. P : the set of all prefixes in S ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 24
Provided by: yoonk
Category:

less

Transcript and Presenter's Notes

Title: Discovering Models of Software Processes form Eventbased Data


1
Discovering Models of Software Processes form
Event-based Data
  • Jonathan E. Cook and Alexander L. Wolf
  • TOSEM 1998
  • June, 11, 2002
  • Yoon, Kyung-A

2
Contents
  • Introduction
  • Approach
  • Method for process discovery
  • Rnet
  • Ktail
  • Markov
  • DaGama discovery tool
  • Case study
  • Conclusion

3
Introduction(1/3) - background and motivation
  • Many technologies of software process assume the
    existence of a formal model of a process for
  • Unambiguity
  • Communication
  • Automation

Process model
Process discovery
4
Introduction(2/3) - process discovery
  • Methods for automatically deriving a formal model
    of a process from basic event data collected on
    the process
  • Foundation for process discovery
  • Grammar inference
  • Sentences in language ? Data describing the
    process behavior
  • Grammar of language ? Formal model of the process
  • Data mining
  • The task of discovering behavioral information in
    data
  • Reverse engineering

5
Introduction(3/3)- event-based framework and FSM
  • Event-based framework
  • Event
  • is typed and can have attributes (ex. time)
  • Uses to characterize the dynamic behavior of a
    process in terms of identifiable, instantaneous
    actions
  • Single event stream represents one execution of
    one process
  • FSM (finite-state machine)
  • Convenient and sufficiently powerful for
    describing historical patterns of actual behavior
  • Reduce the complexity of discovery problems
  • No inherent ability to model concurrency

6
Approach
  • Goal of work
  • To use event data collected from a software
    process execution to infer a formal model of the
    behavior of the process

7
Method for process discovery- overview
  • Three grammar inference methods
  • RNet
  • Statistical (neural network) approach that looks
    at the past behavior to characterize a state
  • Ktail
  • Algorithmic approach that looks at the future
    behavior to compute a possible current state
  • Markov
  • Hybrid statistical and algorithmic approach that
    looks at the neighboring past and future behavior
    to define a state
  • Simple event stream example
  • Edit, Review, Checkin

Edit-Review-Checkin (ERC) Edit-Checkin-Review
(ECR)
8
Method for process discovery- RNet(1/2)
  • Statistical approach
  • Extended by Das and Mozer 1994
  • Supports an arbitrary number of token types
  • Standard feed-forward neural network is trained
  • Propagating the difference between actual and
    desired outputs backward through the network

9
Method for process discovery- RNet(2/2)
  • Result
  • RNet successfully produces a deterministic FSM
  • Edit-Review-Checkin and Edit-Checkin-Review
  • RNet models behavior that is not present in the
    stream
  • Edit-Review-Review
  • Advantage
  • Robust w.r.t. input stream noise
  • Disadvantage
  • Very slow for the training time
  • Size of the net grows rapidly with the number of
    token types

10
Method for process discovery- Ktail(1/3)
  • Algorithmic approach
  • Based on work by Biermann and Feldman1972
  • Takes a sample string as input, and gives FSM as
    output
  • The basic concept of Ktail
  • State is defined by what future behaviors can
    occur from it
  • Current state is reached by given history, string
    prefix
  • Future behavior is defined as the next k tokens
  • This work examines a k-length future form all
    points in an input string and reduce the number
    of states in the FSM

11
Method for process discovery- Ktail(2/3)
  • Definition of Ktail
  • Equivalence class E is a set of prefixes such
    that
  • ?(p,p) ? E, ?t ? Tk , p t ? P ? p t ?
    P
  • S the set of sample strings
  • A the alphabet of tokens that make up the
    strings in S
  • P the set of all prefixes in S
  • p?P a valid prefix for some subset of the
    strings in S
  • t token string, tail
  • Tk the set of all strings composed from A of
    length k or less
  • Transitions among state are the set D of E
  • D ? epa, ?p ? Ei
  • D destination state of the transitions
  • Ei a given state (equivalence class)
  • a token, a ? A

12
Method for process discovery- Ktail(3/3)
  • FSM inferred by the Ktail (k2)
  • Merging state
  • If S1 has transitions to states S2, .., Sn for a
    token t, and if the sets of output transition
    tokens for the states S2, .., Sn are equivalent
    or strict subsets, then we merge states S2, ..,
    Sn.

13
Method for process discovery- Markov(1/6)
  • Hybrid of statistical and algorithmic approach
  • Uses the concept of Markov models to find the
    most probable event sequence production
  • Algorithmically converts those probabilities into
    states and state transitions
  • Assumptions of Markov model
  • There are a finite number of states defined for
    the process
  • At any point in time, the probability of the
    process being in some state is only dependent on
    the previous state that the process was in
  • The state transition probabilities do not change
    over time
  • The initial state of the process is defined
    probabilistically

14
Method for process discovery- Markov(2/6)
  • Four steps
  • St1) Construction of the event-sequence
    probability tables by traversing the event stream
  • St2) Construction of the event graph from the
    probability tables
  • St3) Find the overconnected vertices and correct
    by splitting this
  • St4) Conversion the event graph to proper form

15
Method for process discovery- Markov(3/6)
  • Construction of the event-sequence probability
    tables by traversing the event stream

Fist- and second-order event-sequence
probability tables
16
Method for process discovery- Markov(4/6)
  • Construction of the event graph from the
    probability tables

R
C
E
First- and second-order event-sequence
probability tables
17
Method for process discovery- Markov(5/6)
  • Find the overconnected vertices and correct by
    splitting this

Fist- and second-order event-sequence
probability tables
18
Method for process discovery- Markov(6/6)
  • Conversion the event graph to proper form

G
G
19
Method for process discovery- evaluation
Comparison of discovery methods
Event stream length vs. Time and space
requirements
Number of event type vs. Time and space
requirements
20
Method for process discovery- DaGama discovery
tool
  • DaGama is fit into the Balboa data analysis
    framework
  • Usage
  • Select event stream
  • Choose a discovery method
  • Specify the methods parameter
  • Run the method
  • The discovered model is displayed in a Balboa
    process model viewer
  • The model can be edited by process engineer

21
Case study- overview
  • Conducted at ATT Bell Lab with DaGama.
  • Change request process for a large
    telecommunications software system
  • Prescribed process that was documented by
    organization was not strictly enforced
  • 159 executions of the process
  • 141 acceptance fix and 18 rejected fix
  • 32 event types

22
Case study- discovering a process model
  • DaGama found the general patterns of behavior
    entrenched within the data
  • as a sound starting point for a process engineer
    to construct an accurate and useful model
  • Discovered model reflected a much greater amount
    of the process behavior than the prescribed
    process model documented by the organization
    about 65

23
Conclusion
  • Ktail and Markov methods shows the most promise
    and RNet is not sufficiently mature
  • Methods for process discovery support the process
    engineer in constructing initial process models
  • May give the process engineer clues as to when
    and in what direction the process model should
    evolve, based on data from the currently
    executing process
Write a Comment
User Comments (0)
About PowerShow.com