Dependency Tracking in software systems - PowerPoint PPT Presentation

About This Presentation
Title:

Dependency Tracking in software systems

Description:

... diff tool returns a list of regions ('hunk's) that differ in the two files ... For each pair compute the differences (hunks) Process the hunks to create edges ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 20
Provided by: MySt9
Learn more at: https://www.sosy-lab.org
Category:

less

Transcript and Presenter's Notes

Title: Dependency Tracking in software systems


1
Dependency Trackingin software systems
  • Presented by Ashgan Fararooy

2
Related Papers
  • Supporting Software Evolution Analysis with
    Historical Dependencies and Defect Information
    (ICSM 2008)
  • A Flexible Framework to Support Collaborative
    Software Evolution Analysis (CSMR 2008)
  • Mining Software Repositories for Traceability
    Links (ICPC 2007)
  • Tracking Objects to Detect Feature Dependencies
    (ICPC 2007)
  • Software Repositories A Source for Traceability
    Links (TEFSE-GTC 2007)
  • Mining Version Archives for Co-changed Lines
    (ICSE 2006)
  • Understanding Semantic Impact of Source Code
    Changes an Empirical Study

3
Mining Version Archives for Co-changed Lines
  • Thomas Zimmermann, Sunghun Kim, Andreas Zeller,
    E. James Whitehead Jr.
  • (ICSE 2006)

4
Abstract
  • Files, classes, or methods have frequently been
    investigated in research on co-change
  • Present a first study at the level of lines
  • Annotation Graph which captures how lines evolve
    over time
  • More fine-grained software evolution information
    (based on lines)

5
Overview
  • Co-Change items that are changed together, are
    related to each other
  • Any granularity modules, files, classes, methods
  • What about more fine-grained items blocks, lines

6
Co-Change in More Fine-Grained Items
  • Seemed infeasible
  • Hard to identify across different versions
  • Line numbers are not suitable identifiers
  • SCM systems annotation feature is not enough
  • Line content is not a good identifier either

7
Annotation Graph
  • Definition
  • A multipartite graph where each part corresponds
    to one version of a file
  • Within each part/version every line is
    represented by a single node
  • Edges between node indicate that a line
    originates from another by modification /
    movement
  • Node labels (e.g. bold node) indicate a changed
    line

8
Annotation Graph
9
Annotation Graph
  • Construction
  • One needs to compare all subsequent revisions of
    a file
  • Using the GNU diff tool For computing textual
    differences
  • The diff tool returns a list of regions (hunks)
    that differ in the two files

10
Annotation Graph
  • Three different kinds of changes
  • Modifications
  • Result in a complete bipartite subgraphs
  • Additions
  • Do not result in any edges
  • Positions of the following lines are updated
  • Deletions
  • The same effect as in addition

11
Annotation Graph
  • Computation
  • Creates nodes for each revision and each line
  • Two approaches
  • 1- Forward-Directed
  • 2- Backward-Directed

12
Annotation Graph
  • Computation (Forward-Directed Algorithm)
  • Iterate over all pairs of subsequent revisions
  • For each pair compute the differences (hunks)
  • Process the hunks to create edges
  • Exactly one edge between unchanged lines (nodes)
  • For modified lines all possible edges
  • For inserted and deleted lines no edges
  • Label the nodes of the later revision in
    modifications and additions

13
Annotation Graph
  • Problem
  • Changes that modify large parts of a file
  • Results in a large number of edges
  • Not reasonable for evolution analysis

14
Annotation Graph
  • Treat large modifications as combined deletions
    and additions
  • No creation of edges in the annotation graph

15
Annotation Graph
  • Recognizing Large Modifications

16
Annotating Lines
  • Comparison
  • Most SCM systems have annotating features for
    each line providing the latest change information
  • Annotation graphs can be used to get such
    information
  • Furthermore, they provide information on all past
    changes

17
Life Cycle of Lines
  • Investigated the life cycle of lines for the
    Eclipse Project
  • How frequently are lines changed
  • Computed for each line the change count
  • The number of distinct revisions in its
    annotation
  • How many developers change a line
  • What are the most frequently changed lines

18
Finding Related Lines
  • Computed related lines using frequent pattern
    mining
  • Used transaction ids instead of revision ids
  • Used Apriori algorithm
  • Inferred useful association rules

19
Thank you
Write a Comment
User Comments (0)
About PowerShow.com