Extracting Code Clones for Refactoring Using Combinations of Clone Metrics - PowerPoint PPT Presentation

About This Presentation
Title:

Extracting Code Clones for Refactoring Using Combinations of Clone Metrics

Description:

Extracting Code Clones for Refactoring Using Combinations of Clone Metrics Eunjong Choi , Norihiro Yoshida , Takashi Ishio , Katsuro Inoue , and Tateki Sano* – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 30
Provided by: D006
Category:

less

Transcript and Presenter's Notes

Title: Extracting Code Clones for Refactoring Using Combinations of Clone Metrics


1
Extracting Code Clones for Refactoring Using
Combinations of Clone Metrics
Eunjong Choi, Norihiro Yoshida, Takashi
Ishio,Katsuro Inoue, and Tateki Sano
Osaka University, Japan Nara Institute of
Science and Technology , Japan NEC Corporation,
Japan
2
Background Clone Set
  • A set of code clones that is similar or identical
    to each other

similar
identical
Clone Set
S1Code Clone 1, Code Clone 3
S2Code Clone 2, Code Clone 4, Code Clone 5
3
Background Refactoring Code Clone
  • Merge code clones into a single program unit

4
Background Language-dependent Code Clone
  • It is unavoidable to exist in source code
  • because of features of the used program
    language.

/ Code Clone in a clone set whose RNR(S) is the
second highest in Ant 1.7.0 / else // is
the zip file in the cache file) if null)
(file)
Example of the language-dependent code
clone (Consecutive setter invocations)
5
Background Clone Metrics Higo2007
  • Quantitative information on clone sets
  • E.g., LEN(S), RNR(S), POP(S)
  • Purposes
  • To check features of code clones in software
  • To extract code clones for several purposes
  • E.g., refactoring, defect-prone code clones

Higo2007 Yoshiki Higo, Toshihiro Kamiya,
Shinji Kusumoto, Katsuro Inoue, "Method and
Implementation for Investigating Code Clones in a
Software System", Information and Software
Technology, pp. 985-998 (2007-9)
6
Clone Metrics LEN(S)
  • The average length of token sequences of code
    clones in a clone set S

A token sequence c c is detected as a code
clone from a token sequence ltc c c a bgt
LEN(S) 2
Superscript indicated that the token is in a
repeated token sequence
Clone set S
7
Clone Metrics RNR(S)
  • The ratio of non-repeated token sequences of code
    clones in a clone set S

A token sequence c c is detected as a code
clone from a token sequence ltc c c a bgt
Clone set S
8
Clone Metrics POP(S)
  • The number of code clones in a clone set S

POP(S) 6
Clone set S
9
Single Clone Metric (1/2)
  • Clone sets whose RNR(S) is higher
  • They do not organize a single semantic unit
  • semantic unit many instructions forming a
    single functionality

/ Code Clone in a clone set whose RNR(S) is the
second highest in Ant 1.7.0 / else // is
the zip file in the cache ZipFile zipFile
(ZipFile) zipFiles.get(file) if (zipFile
null) zipFile new ZipFile(file)
zipFiles.put(file, zipFile)
ZipEntry entry zipFile.getEntry(resourceName)
if (entry ! null) x
10
Single Clone Metric (2/2)
  • Clone sets whose POP(S) is higher
  • They Include many language-dependent code clones

/ Code Clone in a clone set whose POP(S) is the
first highest in Ant 1.7.0 /
out.println("\"gt")
out.println("") out.print("lt!ELEMENT
project (target ")
out.print(TASKS) out.print(" ")
out.print(TYPES)
11
Key Idea
  • It is not appropriate to extract refactorable
    code clones using just a single clone metric
  • According to our experiences
  • We propose a method based on combined clone
    metrics
  • To improve the weakness of single-metric-based
    extraction

12
Combined Clone Metrics
  • Clone sets whose RNR(S), POPS(S) are higher
  • Each code clone organizes a single semantic units

/ Code Clone in a clone set whose RNR(S),
POP(S) are higher than others/ if (ifProperty !
null p.getProperty(ifProperty) null)
return false else if
(unlessProperty ! null
p.getProperty(unlessProperty) ! null)
return false return true

13
Case Study (1/2)
  • Goal validating our key idea
  • Using combined clone metrics is a feasible method
    to extract code clone for refactoring
  • Target System
  • Industrial Java software developed by NEC
  • 110KLOC, 736 clone sets

14
Case Study (2/2)
  • Experimental Step
  • Selected 62 clone sets from CCFinder's output
    using clone metrics.
  • Conducted a survey about these clone sets and got
    feedback from a developer.

Survey
Feed back
CCFinder
Source files
Clone sets using clone metrics
15
Subject Code Clones (1/2)
  • Clone sets whose either clone metric value is
    high
  • Clone sets whose LEN(S) value is top 10 high
  • Clone sets whose RNR(S) value is top 10 high
  • Clone sets whose POP(S) value is top 10 high

16
Subject Code Clones (2/2)
  • Clone sets whose combined clone metrics values
    are high
  • 15 clone sets whose LEN(S) and RNR(S) values are
    high rank in the top 15
  • 7 clone sets whose LEN(S) and POP(S) values are
    high rank in the top 15
  • 18 clone sets whose RNR(S) and POP(S) values are
    high rank in the top 15
  • 1 clone set whose LEN(S), RNR(S) and POP(S)
    values are high rank in the top 15

17
Results of Case Study (1/2)
Filtering Selected Clone Sets Refactoring Precision
Each Single Clone metric 30 14 0.47
Combined Clone metrics 41 34 0.87
  • Selected Clone Sets The number of selected
    clones
  • Refactoring The number of clone sets marked as
    Perform refactoring in survey

18
Results of Case Study (2/2)
Filtering Selected Clone Sets Refactoring Precision
Each Single Clone metric 30 14 0.47
Combined Clone metrics 41 34 0.87
  • Precision How many refactoring candidates were
    accepted by a developer?

Refactoring
Precision
Selected Clone Sets
Combined clone metrics is more accepted as
refactoring candidates by a developer
19
Summary and Future Work
  • Summary
  • Our Industrial case study shows that our key idea
    is appropriate.
  • Future Work
  • Investigate about recall
  • Conduct case studies of open source software
  • Suggest a new metric

20
Thank You
21
Clone sets whose RNR(S) is higher than others
  • Each code clone in a clone set S consists of more
    non-repeated token sequences

/ Code Clone in a clone set whose RNR(S) is the
second highest in Ant 1.7.0 / else // is
the zip file in the cache ZipFile zipFile
(ZipFile) zipFiles.get(file) if (zipFile
null) zipFile new ZipFile(file)
zipFiles.put(file, zipFile)
ZipEntry entry zipFile.getEntry(resourceName)
if (entry ! null) / /
22
Clone sets whose RNR(S) is lower than others
  • Consists of more repeated token sequences
  • Involve in language-dependent code clone

/ Code Clone in a clone set whose RNR(S) is the
lowest in Ant 1.7.0 / String
sosCmdDir null skip
code. private String
filename null private boolean noCompress
false private boolean noCache false
private boolean recursive false private
boolean verbose false / /
23
Survey Format About Clone set XXX
  • (1) Do you think that this clone set need a
    practice?
  • Yes No(?Jump to next clone set)
  •  
  • (2) If you marked Yes in your answer to (1),
    what practice is appropriate for this clone set?
  • Refactoring
  • Write comments about code clones, but dont
    perform refactoring.
  • Change nothing.
  • Others. (
  • (3) Write the reason why did you mark in your
    answer to (2)
  • Reason

24
Results, and Precision of each clone set in the
survey
Filtering Selected Clone Sets Refactoring Precision
Clone sets whose LEN(S) value is top 10 high 10 7 0.70
Clone sets whose RNR(S) value is top 10 high 10 4 0.40
Clone sets whose POP(S) value is top 10 high 10 3 0.30
Clone sets whose LEN(S) and RNR(S) values are high rank in the top 15 15 13 0.87
Clone sets whose LEN(S) and POP(S) values are high rank in the top 7 6 0.86
RNR(S) and POP(S) values are high rank in the top 15 18 14 0.78
Clone sets whose 1 clone set whose LEN(S), RNR(S), and POP(S) values are high rank in the top 15 1 1 1.00
25
Clone metric RNR(S) (1/2)
  • File
  • F1 a b c a b,
  • F2 c c c a b,
  • F3 d a b, e f
  • F4 c c d e f
  • Superscript indicated that the token is in a
    repeated token sequence
  • RNR(S1) of Clone Set S1 is

Clone Set S1 , , ,
ab
ab
ab
ab
26
Clone metric RNR(S) (2/2)
  • File
  • F1 a b c a b,
  • F2 c c c a b,
  • F3 d a b, e f
  • F4 c c d e f
  • Superscript indicated that the token is in a
    repeated token sequence
  • RNR(S2) of Clone Set S2 is

Clone Set S2 , ,
c c
c c
c c
1 0 1 2 2 2
RNR(S2) 100 33.3
27
Subject Code Clones
  • 62 clone sets
  • clone sets whose individual clone metric value is
    high
  • SLEN Clone sets whose LEN(S) value is top 10
    high.
  • SRNR Clone sets whose RNR(S) value is top 10
    high.
  • SPOP Clone sets whose POP(S) value is top 10
    high.
  • clone sets whose combined clone metrics values
    are high
  • SLENRNR 15 clone sets whose LEN(S) and RNR(S)
    values are high rank in the top 15.
  • SLENPOP 7 clone sets whose LEN(S) and POP(S)
    values are high rank in the top 15.
  • SRNRPOP 18 clone sets whose RNR(S) and POP(S)
    values are high rank in the top 15.
  • SLENRNRPOP 1 clone set whose LEN(S), RNR(S) and
    POP(S) values are high rank in the top 15.

28
The Number of Duplicate Clone Set
  • SRNR n SPOP n SRNR POP 1
  • SRNR n SRNR POP 2
  • S POP n SRNR POP 2
  • SLEN RNR n SLEN POP n SRNR POP
    n SLEN RNR POP 1

29
Example of clone set that are not selected
  • It is too short to organize a semantic unit.
  • RNR metric sometimes extract unintentional code
    clones
  • E.g., Language-dependent code clones

boolean isEqual(final DeweyDecimal other)
final int max Math.max(other.components.length,
components.length) for (int i 0 i lt max
i) final int component1 (i lt
components.length) ? components i 0
final int component2 (i lt other.components.lengt
h) ? other.components i 0 if (
Write a Comment
User Comments (0)
About PowerShow.com