Title: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method
1Visualization and Analysis of Open Source
Software Evolution using An Evolution Curve Method
- Dr. Robertas Damaševicius
- Software Engineering Department,
- Kaunas University of Technology
- Studentu 50-415, Kaunas, Lithuania
- Email robertas.damasevicius_at_ktu.lt
- http//soften.ktu.lt/damarobe
2Context and Problem
- Software systems are
- designed, constructed and used by people
- components in larger socio-technical systems
- Software design is
- a social process embedded within organizational
and cultural structures - influenced by social processes such as programmer
collaboration in teams - Open source software systems
- Free to use
- Free availability of source code
- Developed by many programmers
- Continuously evolve
- Aim analysis of open source software evolution
using metrics
3What is software evolution?
- Definition
- a continuing process in time during which some
essential software properties are changed - Activities
- modification, adaptation, maintenance, and
- other activities which occur after the delivery
of the first operational release to the users - Importance
- costs devoted to system maintenance and evolution
account for more than 90 of total software costs
(Erlikh, 1990)
4Forces and factors of open source software
evolution
- Evolution of open source systems
- less strict control and management model
- usually started by a single developer (seed)
- attracted users become co-developers
- governed by the needs of users and spontaneous
collaboration of co-developers - Evolution mechanisms
- natural selection, competition
- variation-increasing variation-decreasing
- influenced by psychological, intellectual, social
and cultural, economic and business factors
5Software metrics
- Common
- Source lines of code
- Cyclomatic complexity
- Halstead metrics
- Number of classes and interfaces
- R.C. Martins software package metrics
- Cohesion, Coupling,
- Specific software evolution metrics
- SDI metric
- Lmetric
- AICC metric
- G-metric
- Software development models
- Statistical models
- Rayleigh model
- Halsteads Software Science model
- COCOMO model
6Lehmans Laws of Software Evolution
- Formulated by M.M. Lehman in the 1980s
- Law of Continuing Change
- Law of Increasing Complexity
- Law of Statistically Smooth Growth
- Law of Organisational Stability
- Law of Conservation of Familiarity
- Law of Continuing Growth
- Law of Declining Quality
- Law of Feedback System
- Evolution forces
- Growth
- Maintenance
7Transition-based model of evolution
- Stages many, often overlapping
- Transitions breakpoints between stages, which
represent significant changes. Transitions occur
because as a system evolves, its structure must
be regularly adapted to the changing requirements
and environment - Gradual change a slow process of incremental
change caused by accumulating maintenance steps
or gradual decay - Sudden change significant changes in the
evolving system or in the process by which it is
evolved
8Information-theoretic methods
- Shannon entropy
- A measure of the uncertainty associated with a
random variable. - The information source generates a series of
symbols xi belonging to an alphabet with size N
according to a known probability distribution
p(xi), the entropy function H of a sequence X can
be defined - High entropy higher complexity of the systems
code - Low entropy there are some repeated patterns of
source code code maintenance is required - Kolmogorov Complexity
- Measures the complexity (i.e., information
content) of an object by the length of the
smallest program that generates it. - Kolmogorov Complexity Kf(x) of an object x in the
description system f is the length of the
shortest program capable of producing x
9Evolution curve method (1)
- Motivation the addition of new features to a
software system leads to the change of basic
software characteristics (complexity/entropy) in
the system. - Idea use the change of software size and
complexity as a means to determine different
stages of evolution of a software system - Inspiration Z-curve1 and DNA walk2 methods used
in analyzing complex genetic sequences
1 R. Zhang, C.T. Zhang. Z Curves, an Intuitive
Tool for Visualizing and Analyzing DNA sequences.
J. Biomol. Struc. Dynamics 11, 767782, 1994. 2
S. Paxia, A. Rudra, Y. Zhou, B. Mishra. A Random
Walk down the Genomes DNA Evolution in VALIS.
IEEE Computer 35(7)73-79, 2002.
10Evolution curve method (2)
- E-curve is composed of a series of nodes
, whose coordinates are and (i
1,2,...,N), where N is the number of versions
of the analyzed software system. - The nodes are connected sequentially with
straight segments. - The coordinates and are calculated
iteratively - is the Kolmogorov Complexity of the i-th
version of a software system - is the Shannon entropy of the i-th version
of a system
11Evolution curve method (3)
- Two dimensions of the Evolution curve
- x (relative information content) and
- y (relative complexity),
- Represent two independent (orthogonal)
characteristics of a software system - x-dimension amount of information contained in a
software system and is an estimation of software
size - y-dimension information entropy of a software
system and is an estimation of software
complexity.
12Software evolution stages
- Software Growth system is actively developed
- Software Maintenance system becomes simpler
often at a cost of its size - Software Improvement system becomes more complex
and generic - Software Shrink functionality of a system is
reduced
13Trends of Evolution curve
- Actively developed systems long upward trends of
growth - Mature, stable systems long downward trends of
maintenance
14Case studies
- Source SourceForge
- 7-zip
- Archiver
- 82 versions, 5 years, 160K LOC
- Grip
- CD player/ripper
- 36 versions, 14K LOC
- eMule
- P2P file sharing client
15Case study eMule
- eMule
- one of the biggest P2P file sharing clients
- coded in Microsoft Visual C using MFC
- Free software, released under the GNU GPL
- Source code first released at version 0.02 on
July 6, 2002 - Latest release contains 222,680 lines of code
- Actively developed by 5 developers
- Current development status is Production/Stable
- For analysis, 68 versions of eMule source code
were used
16eMule Entropy
Version 015a
Version 030a
Version 018a
17eMule Size
y A Bx Cx2 A 7676.17 B 4324.67 C
177.488 r 0.9935
18eMules Evolution curve
30e
47c
23b
44b
25b
19What does the changelog say?
20Conclusions
- Software evolution process can be divided into 4
stages - software growth the size and complexity of
developed software is increasing - software maintenance the aim is to contain
complexity and fix software bugs - software improvement the aim is to contain
software system size at a cost of increasing
complexity - software shrink both software size and its
complexity is trimmed - Evolution curve method can
- identify software evolution stages
- identify the initial development status of the
analyzed software system - actively developed systems show long growth
trends - mature systems show maintenance and improvement
trends - Is independent from software implementation
language
21Ongoing Research and Further Work
- Analysis of other entropy measures such as block
entropy and Rényi entropies - paper submitted to Journal of Software
Maintenance and Evolution - Dynamic models of software evolution
- Differential equations, etc.
- More case studies
- paper submitted to Computing and Information
Systems Journal
22Thank You.Any Questions?
237-zip Evolution curve
24Grip Evolution curve