GETTING unstuck: working with legacy code and data - PowerPoint PPT Presentation

View by Category
About This Presentation

GETTING unstuck: working with legacy code and data



Number of Views:330
Avg rating:3.0/5.0
Slides: 50
Provided by: cornetdes
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: GETTING unstuck: working with legacy code and data

GETTING unstuck working with legacy code and data
  • Cory Foy http//

  • What is Legacy Code?
  • How do we change Legacy Code?
  • Common patterns for code bases
  • Does Legacy Code have to be code, or can it be
    something else like a really long bullet on a
    PowerPoint slide, or perhaps a database?
  • Next Steps

Legacy Code
  • How do you define Legacy Code?
  • Several definitions possible
  • Code weve gotten from somewhere else
  • Code you have to change, but dont understand
  • Demoralizing code (Big ball of mud)
  • Code without unit tests

Legacy Code
Legacy Code
  • Code that needs to have behavior preserved
  • What is behavior?
  • The way in which someone behaves
  • The way in which a person, organism, or group
    responds to a specific set of conditions
  • The way that a machine operates or a substance
    reacts under a specific set of conditions

Legacy Code
  • Whats the behavior of the following code?

Legacy Code
  • Does the following code add behavior?

Legacy Code
  • Now have we changed the behavior?

How do we change Legacy Code?
  • Why would we want to change the code?
  • Four reasons to change software
  • Adding a feature
  • Fixing a bug
  • Improving the design
  • Optimizing resource usage
  • Each has unique attributes

Adding a feature / Fixing a bug
  • Causes the following changes
  • Structure
  • Functionality (adding or replacing)
  • Need to be able to know the new functionality
  • Need to be able to know that the system as a
    whole is still functioning appropriately

Improving the Design
  • Causes the following changes
  • Structure
  • Note that it does functionality is not listed
  • Important to be able to know that all
    functionality works before and after the change

Optimizing Resource Usage
  • Changes
  • Resource usage
  • May cause structure change
  • Again note that functionality is ideally not in
    the above list
  • Need to have a way to make sure functionality was
    not changed
  • Need to have a way to verify the optimization
    goals have been met (and stay met)

Edit and Pray
  • Carefully plan the changes you are going to make
  • Make sure you understand the code to be modified
  • Make the changes
  • Run the system to make sure the change was made
  • Do some additional testing to smoke test that
    everything seems to be functioning
  • Pray you dont get a call at 2am that the system
    doesnt work anymore

Cover and Modify
  • Verify that the system is working by running the
  • Write tests to expose the behavior you want to
    add or change
  • Write code to make the test pass
  • Refactor duplication
  • Wash, rinse, repeat
  • Verify the system is still working by running the

Feathers Legacy Change Algorithm
  • Michael Feathers discusses a Legacy Code Change
    Algorithm in Working Effectively with Legacy Code
  • Five steps
  • Identify change points
  • Find test points
  • Break dependencies
  • Write tests
  • Make changes and refactor
  • These steps have common steps and scenarios

Patterns for the Change Algorithm
  • Identify Change Points
  • One of the key areas architects and architecture
    comes into play
  • If you arent sure where, put it in you can
    refactor later (with unit test support)

Patterns for the Change Algorithm
  • Identify Change Points
  • Scenarios
  • I dont understand the code well enough to change
  • Notes / Sketching
  • Listing Markup
  • Separate Responsibilities
  • Understand method structure
  • Extract Methods
  • Effect Sketch
  • Scratch Refactoring
  • Delete Unused Code

Patterns for the Change Algorithm
  • Identify Change Points
  • Scenarios
  • My application has no structure
  • Tell the story of the system
  • Naked CRC (Class, Responsibility, and
  • Conversation Scrutiny

Patterns for the Change Algorithm
  • Find Test Points
  • Where can you write tests to exercise the
    behavior you want to add/change?
  • Important to have team standards for where unit
    tests should go

Patterns for the Change Algorithm
  • Find Test Points
  • Scenarios
  • I need to make a change, what methods should I
  • Reason about effects (Effect Sketch)
  • Reasoning Forward (TDD)
  • Effect propagation
  • Effect reasoning
  • Effect analysis

Patterns for the Change Algorithm
  • Find Test Points
  • Scenarios
  • I need to make many changes in one area do I
    have to break all dependencies?
  • Interception Points
  • Higher-Level interception points
  • Pinch Points (encapsulation boundary)
  • Pinch Point Traps

Patterns for the Change Algorithm
  • Break Dependencies
  • Generally the most difficult part of the process
  • Usually dont have tests to tell if breaking
    dependencies will cause problems

Patterns for the Change Algorithm
  • Break Dependencies
  • Scenarios
  • How do I know Im not breaking anything?
  • Hyperaware editing
  • Single-goal editing
  • Preserve Signatures
  • Lean on the compiler
  • Pair Programming (aka Real-Time Code Reviews)

Patterns for the Change Algorithm
  • Break Dependencies
  • Scenarios
  • I cant get this class into a test harness
  • Irritating Parameters
  • Hidden Dependencies
  • Construction Blob
  • Irritating Global Dependency
  • Horrible Include Dependencies
  • Onion Parameter
  • Aliased Parameter

Patterns for the Change Algorithm
  • Break Dependencies
  • Scenarios
  • I cant run this method in a test harness
  • Hidden Methods
  • Helpful language features
  • Undetectable Side Effect
  • Sensing variables
  • Command/Query Separation

Patterns for the Change Algorithm
  • Break Dependencies
  • Scenarios
  • I need to change a monster method and cant write
  • Introduce sensing variables
  • Extract what you know
  • Break out a method object
  • Skeletonize Methods
  • Find Sequences
  • Extract to the current class first
  • Extract small pieces
  • Be prepared to redo extractions

Patterns for the Change Algorithm
  • Break Dependencies
  • Scenarios
  • It takes forever to make a change
  • Understanding
  • Lag Time
  • Breaking Dependencies
  • Build Dependencies

Patterns for the Change Algorithm
  • Write Tests
  • Tests may be more difficult to write then normal
    unit tests
  • May have less-than-ideal scenarios

Patterns for the Change Algorithm
  • Write Tests
  • Scenarios
  • I need to make a change, but dont know what
    tests to write
  • Characterization Tests
  • Characterizing Classes
  • Targeted Testing
  • Writing Characterization Tests
  • Write tests for the area youll be making the
    change. Write as many as you need to understand
    the code.
  • Then write tests for the things you need to
  • If converting or moving functionality, write
    tests to verify the behavior on a case-by-case

DEMO Change Algorithm at Work
  • Step through a common scenario, implementing the
    tests as we go

Legacy Code isnt just Code
  • Most applications arent just simple console apps
  • They deal with many dependencies
  • File Systems
  • Registries
  • Databases
  • Hardware

Legacy Code isnt just Code
  • These dependencies can cause legacy problems of
    their own
  • Database schemas
  • Existing data in the tables
  • Business logic in the database
  • No access to development data that mirrors
  • In other words, Legacy Data

Legacy Data
  • So where does this Legacy Data come from?
  • Flat Files
  • XML Documents
  • RDBs
  • Object DBs
  • Other DBs
  • Application Wrappers
  • Your DB
  • Many, many sources

Legacy Data
  • Legacy data produces its own unique set of
  • Data quality
  • Data architecture problems
  • Database design problems
  • Process-related challenges

Data Quality
  • Common Data Quality problems

A single column is used for several purposes Determining the purpose of a column by the value of one or more other columns Inconsistent data values / formatting Missing data / columns Additional columns Important attributes and relationships are hidden in text fields Data values that stray from their field descriptions and business rules Various key strategies for the same type of entity Unrealized relationships between data records One attribute is stored in several fields Inconsistent use of special characters Different data types for similar columns Different levels of detail Different modes of operation Varying timeliness of data Varying default values Various representations
Data Architecture Problems
  • Common Architectural Problems may include
  • Applications responsible for data cleansing
    (instead of DB)
  • Different database paradigms
  • Different hardware platforms / storage
  • Fragmented / Redundant / Inaccessible data
  • Inconsistent semantics
  • Inflexible architecture
  • Lack of event notification
  • No or inefficient security
  • Varying timeliness of data sources

Design Problems
  • There may be key design issues with the database
  • Database encapsulation scheme exists, but its
    difficult to use
  • Ineffective (or no) naming conventions
  • Inadequate documentation
  • Original design goals at odds with current
    project needs
  • Inconsistent key strategy
  • Design goals at odds with data storage (treating
    relational DBs as object DBs, etc)

Design Problems
  • Example
  • Application which presented custom forms to users
  • Implementers could create custom forms with
    custom questions and validations
  • Beautiful OO architecture Forms had Groups
    which had Items
  • Everything was rendered dynamically and could be
    updated on the fly

Design Problems
  • Example
  • The Form, Group, Item and other objects were
    all stored as individual records in one database
  • A user in the system had on average 74 forms with
    an average of 30 questions. With a target of
    20,000 users in the database, this would lead to
    over 50 million rows in the one table.
  • We identified one stored proc as one of the main
    culprits. It had something like the following

Design Problems
  • Example
  • INSERT INTO _at_tmpTable SELECT ot.myCol FROM
    OtherTable ot WHERE ot.bitMask (144567
    99435) 0
  • This led to a full table scan for one of their
    most heavily used procs degrading performance
    significantly (average page load time of over 7

Working with Legacy Data
  • So how do you deal with legacy data?
  • Strategies
  • Avoid it
  • Develop Error Handling Strategy
  • Work Iteratively and Incrementally
  • Prefer Read-Only Legacy Access
  • Encapsulate Legacy Data Access
  • Introduce Data Adapters for Simple Data Access
  • Introduce a staging database for complex access
  • Adopt Existing Tools

Working with Legacy Data
  • We couldnt avoid the data the proc had to be
  • So we developed an incremental 5 step plan
  • Add an IsValidRecord column to the table
  • Update the Column based on the bitmask for each
  • Change the proc to use the column instead of the
  • Make sure all tests are still passing
  • Introduce Update and Insert Triggers to
    automatically populate the column

Working with Legacy Data
  • Advantages
  • Required no change to application code
  • We could rapidly test the application
  • We could make incremental changes to see
  • What made it work
  • Testing/QA Database with production-like data
  • Regression tests to insure functionality
  • Timing tests to show performance improvement

Process Problems
  • All the issues arent technical
  • Working with legacy data when you dont have to
  • Data design drives your object model
  • Legacy data issues overshadow everything else
  • App developers ignore legacy issues
  • You choose not to refactor the legacy data
  • Politics
  • You are too focused on the data to see the

Refactoring Databases
  • Databases should not be left out of the
    refactoring process
  • An interesting observation is that when you take
    a big design up front (BDUF) approach to
    development where your database schema is created
    early in the life of your project you are
    effectively inflicting a legacy schema on
    yourself. Dont do this.
  • Scott Ambler maintains a catalog of DB
  • How do you refactor a database?

Refactoring Databases
Refactoring Databases
  • Implementing Database Refactoring in your
  • Start simple
  • Accept that iterative and incremental development
    is the norm
  • Accept that there is no magic solution to get you
    out of your existing mess
  • Adopt a 100 regression testing policy
  • Try it

Next Steps
  • Dealing with legacy code is hard
  • Integration issues
  • Code Issues
  • Political Issues
  • There are ways out
  • Important to address pain points first

Next Steps
  • So where can you go from here?
  • Working Effectively With Legacy Code by Michael
  • Agile Database Techniques by Scott Ambler
  • Refactoring Databases by Scott Ambler
  • http//
  • NUnit, JUnit, CppUnit, CppUnitLite, dbFit,
  • http//