Reverse Architecting - PowerPoint PPT Presentation

1 / 69
About This Presentation
Title:

Reverse Architecting

Description:

Reverse Architecting: Motivation. Architecture description lost or outdated ... Reverse Engineering. The process of analyzing a subject system with two goals in mind: ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 70
Provided by: arievan
Category:

less

Transcript and Presenter's Notes

Title: Reverse Architecting


1
Reverse Architecting
  • Arie van Deursen

2
Outline
  • Legacy systems
  • Reverse architecting
  • Architecture exploration
  • Extraction
  • Abstraction
  • Presentation
  • Evaluation

3
Motivation
  • Multi-channel distribution
  • Web enable existing applications
  • Due dilligence / QA
  • Company merger
  • Helping software immigrants
  • Estimating new functionality

Documentation at best out of date
4
Legacy Systems
  • Definition
  • Any information system that significantly resists
    evolution
  • to meet new and changing business requirements
  • Characteristics
  • Large
  • Geriatric
  • Outdated languages
  • Outdated databases
  • Isolated

5
Software Volume
  • Capers Jones software size estimate
  • 700,000,000,000 lines of code
  • (7 109 function points )
  • (1 fp 110 lines of code)
  • Total nr of programmers
  • 10,000,000
  • 40 new dev. 45 enhancements, 15 repair
  • (2020 30, 55, 15)

6
Legacy By Example
7
Reverse Architecting Motivation
  • Architecture description lost or outdated
  • Obtain advantages of expl. arch.
  • Stakeholder communication
  • Explicit design decisions
  • Transferable abstraction
  • Architecture conformance checking
  • Quality attribute analysis

8
Software Architecture
  • Structure(s) of a system which
  • comprise the software components
  • the externally visible properties of those
    systems
  • and the relationships among them

9
Architectural Structures
  • Module structure
  • Data model structure
  • Process structure
  • Call structure
  • Type structure
  • GUI flow
  • ...

10
The 4 1 View Model
Development view
Logical view
Use case view
Physical view
Process view
Extract compare!
11
Reverse Engineering
  • The process of analyzing a subject system with
    two goals in mind
  • to identify the system's components and their
    interrelationships and,
  • to create representations of the system in
    another form or at a higher level of abstraction.

Decompilation Reverse Architecting
12
Reengineering
  • The examination and alteration of a subject
    system
  • to reconstitute it in a new form
  • and the subsequent implementation of that new
    form
  • Beyond analysis -- actually improve.

13
Reengineering
14
Program Understanding
  • the task of building mental models of an
    underlying software system
  • at various abstraction levels, ranging from
  • models of the code itself to
  • ones of the underlying application domain,
  • for software maintenance, evolution, and
    reengineering purposes

50 of maintenance effort!!
15
Cognitive Processes
  • Building a mental model
  • Top down / bottom up / opportunistic
  • Generate and validate hypotheses
  • Chunking create higher structures from chunks of
    low-level information
  • Cross referencing understand relationships

16
Supporting Program Understanding
  • Architects build up mental models
  • various abstractions of software system
  • hierarchies for varying levels of detail
  • graph-like structures for dependencies
  • How can we support this process?
  • infer number of predefined abstractions
  • enrich systems source code with abstractions
  • let architect explore result

17
Architecture Exploration
  • Goal Translate source code into form that can
    easily be processed by humans
  • Lesson from compiler construction
  • split processing in separate stages

Similarity with compilers translate source code
into form that can be processed by machines
  • parsing turns source code into intermediate form
  • optimisation improves intermediate form
  • code generation emits the machine code

18
Architecture Exploration
results
repository
artifacts
extract
view
query
  • Extract src models from system artifacts
  • Query/manipulate to infer new knowledge
  • Present different views on results

19
Source Model Extraction
results
repository
artifacts
extract
view
query
20
Source Model Extraction
  • Derive information from system artifacts
  • variable usage, call graphs, file dependencies,
    database access,
  • Challenges
  • Accurate complete results
  • Flexible easy to write and adapt
  • Robust deal with irregularities in input

21
Grammar Challenges
  • Syntax Errors
  • Language Dialects
  • Local Idioms
  • Missing Parts
  • Embedded Languages
  • Preprocessing
  • Additional problem grammar availability
  • process languages without grammar
  • (e.g. undisclosed proprietary languages)
  • development of full grammar is expensive
    (Cobol 1500 productions, 4-5 months)

22
Processing Artifacts
  • Syntactical analysis
  • generate / hand-code / reuse parser
  • Lexical analysis
  • tools like perl, grep, Awk or LSME, MultiLex
  • generally easier to develop

23
Island Grammars
  • Grammar containing
  • detailed productions for constructs of interest
  • liberal productions that catch remainder

24
Island Grammars
  • Grammar containing
  • detailed productions for constructs of interest
  • liberal productions that catch remainder

Input
Parse tree standard grammar
Parse tree island grammar

25
Island Grammars
  • Grammar containing
  • detailed productions for constructs of interest
  • liberal productions that catch remainder

Lisland
  • Accept larger language
  • catch dialects, syntax errors, embedded
    languages,

L
26
Island Grammars
  • Grammar containing
  • detailed productions for constructs of interest
  • liberal productions that catch remainder

Gi
GL
GL
  • Often smaller grammar
  • can share productions
  • can have different structure

Gi
27
Example (Water)
  • lexical syntax
  • ? Water avoid
  • context-free syntax
  • Water ? Part
  • Part ? Input

Water is fall-back
28
Example (Program Calls)
  • lexical syntax
  • ? Water avoid
  • A-ZA-Z0-9 ? Id
  • context-free syntax
  • Water ? Part
  • Part ? Input
  • CALL Id ? Call
  • Call ? Part

Water is fall-back
29
Query and Manipulate
results
repository
artifacts
extract
view
query
30
Query and Manipulate
  • Goals
  • infer new knowledge abstractions
  • filter information
  • Example structures
  • Perform graph
  • Call graph (OI, PVL)
  • Screen flow
  • Batch job
  • Subsystem dbs

In search for more abstraction
31
Combining Data Functionality
  • Cluster analysis
  • technique for finding groups in data
  • Relies on metrics to compare distance between
    data items
  • Concept analysis
  • for finding groups too
  • Relies on maximal subsets of data items sharing a
    set of features

32
Cluster Analysis
  • Calculate distance (similarity) number between
    all data items (record fields)
  • Use clustering to find hierarchy

33
Dendrogram
34
Dendrogram
35
Dendrogram
Distance is 1
36
Dendrogram
37
Dendrogram
38
Dendrogram
39
Dendrogram
40
Dendrogram from Real Data
Amount
Account
OfficeName BankCity IntAccount OfficeType PaymentK
ind RelationNr ChangeDate
MortSeqNr MortNr
TitleCd Prefix Initial
Name
ZipCd CountyCd StreetNr
City
Street
41
Concept Analysis
  • Relies on maximal subsets of data items sharing a
    set of features
  • Concept analysis finds a lattice

42
Concept Lattice
?
top
All Variables
bottom
43
Concept Lattice
?
top
All Variables
bottom
44
Concept Lattice
?
top
All Variables
P4
Number Nb-Ext Zipcode Street City
bottom
45
Concept Lattice
?
top
All Variables
P4
Number Nb-Ext Zipcode Street City
bottom
46

Many fields
Progr. nrs
Concept
Fields
One field
47
System Views
  • Grouping method based on feature table
  • Metrics or subset based
  • Find alternative system views
  • Kruchtens logical view
  • Object-based view on procedural code
  • Starting point for objectification
  • Keep human in the loop

48
Types
  • A type describes a set of possible values
  • A type groups variables
  • A type encapsulates representation
  • Parameter types provide interfaces
  • Types provide component connectors

Types are architectural structures
49
But types are already available...
  • Not in a legacy language like Cobol
  • Data division declares variables structure
  • No separation between type/variable.
  • Repeated structure per variable.
  • No enumeration types, no ranges.
  • No parameters for sections
  • Similar problems with other legacy languages

50
Automatic Type Inference
  • Group variables based on usage
  • Initially
  • Each variable unique primitive type
  • From statements infer equivalencies
  • Assignment v e
  • Comparison e1 gt e2
  • Computation e1 e2

51
Example
DATA DIVISION. 01 PERSON. 03 INITIALS PIC
X(05). 03 NAME PIC X(27). 03 STREET
PIC X(18). 01 TAB000 03 A00-NAME-PART. 05
A00-POS PIC X(01) OCCURS 40. 03 A00-MAX
PIC S9(03) COMP-3 VALUE 40. 03 A00-FILLED PIC
S9(03) COMP-3 VALUE 0. 01 N000. 03 N100
PIC S9(03) COMP-3 VALUE 0. ... PROCEDURE
DIVISION. R210-INITIAL SECTION. MOVE
INITIALS TO A00-NAME-PART. PERFORM
R300-COMPOSE-NAME. R300-COMPOSE-NAME SECTION.
... PERFORM UNTIL N100 gt A00-MAX ...
IF A00-FILLED N100 ...
52
DATA DIVISION. 01 PERSON. 03 INITIALS PIC
X(05). 03 NAME PIC X(27). 03 STREET
PIC X(18). 01 TAB000 03 A00-NAME-PART. 05
A00-POS PIC X(01) OCCURS 40. 03 A00-MAX
PIC S9(03) COMP-3 VALUE 40. 03 A00-FILLED PIC
S9(03) COMP-3 VALUE 0. 01 N000. 03 N100
PIC S9(03) COMP-3 VALUE 0. ... PROCEDURE
DIVISION. R210-INITIAL SECTION. MOVE
INITIALS TO A00-NAME-PART. PERFORM
R300-COMPOSE-NAME. R300-COMPOSE-NAME SECTION.
... PERFORM UNTIL N100 gt A00-MAX ...
IF A00-FILLED N100 ...
Example
N100, A00-MAX and A00-FILLED are equivalent
53
DATA DIVISION. 01 PERSON. 03 INITIALS PIC
X(05). 03 NAME PIC X(27). 03 STREET
PIC X(18). 01 TAB000 03 A00-NAME-PART. 05
A00-POS PIC X(01) OCCURS 40. 03 A00-MAX
PIC S9(03) COMP-3 VALUE 40. 03 A00-FILLED PIC
S9(03) COMP-3 VALUE 0. 01 N000. 03 N100
PIC S9(03) COMP-3 VALUE 0. ... PROCEDURE
DIVISION. R210-INITIAL SECTION. MOVE
INITIALS TO A00-NAME-PART. PERFORM
R300-COMPOSE-NAME. R300-COMPOSE-NAME SECTION.
... PERFORM UNTIL N100 gt A00-MAX ...
IF A00-FILLED N100 ...
Example
54
DATA DIVISION. 01 PERSON. 03 INITIALS PIC
X(05). 03 NAME PIC X(27). 03 STREET
PIC X(18). 01 TAB000 03 A00-NAME-PART. 05
A00-POS PIC X(01) OCCURS 40. 03 A00-MAX
PIC S9(03) COMP-3 VALUE 40. 03 A00-FILLED PIC
S9(03) COMP-3 VALUE 0. 01 N000. 03 N100
PIC S9(03) COMP-3 VALUE 0. ... PROCEDURE
DIVISION. R210-INITIAL SECTION. MOVE
INITIALS TO A00-NAME-PART. PERFORM
R300-COMPOSE-NAME. R300-COMPOSE-NAME SECTION.
... PERFORM UNTIL N100 gt A00-MAX ...
IF A00-FILLED N100 ...
Example
INITIALSsubtype of A00-NAME-PART
55
DATA DIVISION. 01 PERSON. 03 INITIALS PIC
X(05). 03 NAME PIC X(27). 03 STREET
PIC X(18). 01 TAB000 03 A00-NAME-PART. 05
A00-POS PIC X(01) OCCURS 40. 03 A00-MAX
PIC S9(03) COMP-3 VALUE 40. 03 A00-FILLED PIC
S9(03) COMP-3 VALUE 0. 01 N000. 03 N100
PIC S9(03) COMP-3 VALUE 0. ... PROCEDURE
DIVISION. R210-INITIAL SECTION. MOVE
INITIALS TO A00-NAME-PART. PERFORM
R300-COMPOSE-NAME. R300-COMPOSE-NAME SECTION.
... PERFORM UNTIL N100 gt A00-MAX ...
IF A00-FILLED N100 ...
Example
56
System Level Types
  • Propagate types across modules
  • Calls
  • Database operations
  • File I/O
  • Include files / copybooks
  • Lift type dependencies to package level

57
Type Inference Case Study (I)
  • 100,000 lines Cobol / CICS system
  • First param of all batch progs
  • program-fields
  • info required for restart and error recovery
  • literals in subroutine field all progs
  • First param of all on line progs
  • dfhcommarea
  • mapped to appropriate record --gt type

58
Type Inference Case Study (II)
  • Programs with integer parameter
  • Used as enumeration type
  • Value represents function to be performed
  • Program as package
  • Parameter links
  • Formal parameters of same type
  • RA31.6 RA36.4
  • Relations between copybooks

59
Presentation of Results
results
repository
artifacts
extract
view
query
60
Presentation Desiderata
  • Show multiple structures
  • Show relationships between structures
  • Multiple levels of abstraction
  • Zoom in, zoom out
  • Visual as well as textual information
  • Graph visualization
  • Browsing and searching

61
Presenting ArchitecturesUsing Hypertext
  • Hyperlinked pages for system elements
  • Multiple structures, multiple views
  • Backbone system hierarchy, sources
  • Abstractions become additional navigation
    structures
  • Text clickable graphs

62
Types of navigation
  • Vertical browsing
  • supported by hierarchical structures
  • zoom into more detailed level
  • system ? subsystem ? program ? ? source
  • Horizontal browsing
  • supported by graph-like structures
  • find related on same abstraction level
  • called programs, variables of same type, etc

63
Presentation Challenges
  • Handling abstractions not visible in code
  • Giving abstractions a meaningful name
  • e.g., name for inferred type
  • Defining starting points for browsing
  • lists of types, programs, copybooks, words, lits
  • add cross-cutting hyperlinks on all levels

64
Advanced Documentation Generation
  • DocGen
  • Provide technical documentation
  • Used for all ABN AMRO Cobol sources
  • Customizable product line
  • TypeExplorer
  • Include inferred types as navigation structure
  • Advance level of abstraction

65
Tool Sets
  • Rigi (Victoria)
  • Bauhaus (Stuttgart)
  • Dali (SEI)
  • Portable Bookshelf (Toronto)
  • DocGen (Amsterdam)
  • Extract
  • Query
  • Abstract
  • Present
  • Visualize
  • Browse
  • Search

66
SWARM / WCRE 2001
  • The UML
  • Rationale recovery
  • Pattern-oriented software architecture
  • Architecture description languages
  • Dynamic analysis
  • Software product lines
  • Software architecture users guide

67
Summary
  • Extract, abstract, present
  • Multiple structures
  • Zoom in/out, switch abstraction levels
  • Browse / hypertext
  • Compiler construction technology
  • Active area of research
  • Experiment in your projects

68
Further Reading (I)
  • A. van Deursen and T. Kuipers. Identifying
    Objects using Cluster Concept Analysis. ICSE99
  • A. van Deursen and T. Kuipers. Building
    Documentation Generators. ICSM99.
  • A. van Deursen and L. Moonen. Exploring Legacy
    Systems Using Types. WCRE00.
  • A. van Deursen. Software Architecture Recovery
    and Modeling. WCRE2001 workshop report. Applied
    Computing Review, ACM, 2002.

69
Further Reading (II)
  • www.cwi.nl/arie/papers/
  • www.cwi.nl/arie/swarm2001/
  • www.program-transformation.org
Write a Comment
User Comments (0)
About PowerShow.com