Data Abstractions

- EECE 310 Software Engineering

Learning Objectives

- Define data abstractions and list their elements
- Write the abstraction function (AF) and

representation invariant (RI) of a data

abstraction - Prove that the RI is maintained and that the

implementation matches the abstraction (i.e., AF) - Enumerate common mistakes in data abstractions

and learn how to avoid them - Design equality methods for mutable and immutable

data types

Data Abstraction

- Introduction of a new type in the language
- Type can be abstract or concrete
- Has one of more constructors and operations
- Type can be used like a language type
- Both the code and the data associated with the

type is encapsulated in the type definition - No need to expose the representation to clients
- Prevents clients from depending on implementation

Isnt this OOP ?

- NO, though OOP is a way to implement ADTs
- OOP is a way of organizing programs into classes

and objects. Data abstraction is a way of

introducing new types ADTs with meanings. - Encapsulation is a goal shared by both. But data

abstraction is more than just creating classes. - In Java, every data abstraction can be

implemented by a class declaration. But every

class declaration is not a data abstraction.

Elements of a Data Abstraction

- The abstraction specification should
- Name the data type
- List its operations
- Describe the data abstraction in English
- Specify a procedural abstraction for each

operation - Public vs. Private
- The abstraction only lists the public operations
- There may be other private procedures inside

Example IntSet

- Consider a IntSet Data type that we wish to

introduce in the language. It needs to have - Constructors to create the data-type from scratch

or from other data types (e.g., lists, IntSets) - Operations include insert, remove, size and isIn
- A specification of what the data type represents
- Internal representation of the data type

IntSet Abstraction

- public class IntSet
- //OVERVIEW IntSets are mutable, unbounded sets

of integers. - // A typical IntSet is x1, xn, where

xi are all integeres - // Constructors
- public IntSet()
- //EFFECTS Initializes this to be the empty

set - // Mutators
- public void insert (int x)
- // MODIFIES this
- // EFFECTS adds x to the set this, i.e,

this_post this u x - public void remove (int x)
- // MODIFIES this
- // EFFECTS this_post this - x
- //Observers
- public boolean IsIn(int x)
- // EFFECTS returns true if x e this,

false otherwise - public int size()
- // EFFECTS Returns the cardinality of this

Group Activity

- Consider the Polynomial data-type below. Write

the specifications for its methods. - public class Poly
- public Poly(int c, int n) throws NegException
- public Poly add(Poly p) throws NPException
- public Poly mul(Poly p) throws NPException
- public Poly minus()
- public int degree()

Learning Objectives

- Define data abstractions and list their elements
- Write the abstraction function (AF) and

representation invariant (RI) of a data

abstraction - Prove that the RI is maintained and that the

implementation matches the abstraction (i.e., AF) - Enumerate common mistakes in data abstractions

and learn how to avoid them - Design equality methods for mutable and immutable

data types

Abstraction Versus Representation

- Abstraction External view of a data type
- Representation Internal variables to represent

the data within a type (e.g., arrays, vectors,

lists)

Abstraction

Representation

Example Representation

0

N

VectorltIntegergt elems of size N to represent an

IntSet

- Vector directly holds the set elements
- if integer e is in the set, there exists 0 lt i lt

N, such that elemsi e

- Vector is a bitmap for denoting set elements
- If integer i is in the set, then elemsi True,

else elemsi False

Can you tell how the representation maps to the

abstraction ?

Abstraction Function

- Mathematical function to map the representation

to the abstraction - Captures designers intent in choosing the rep
- How do the instance variables relate to the

abstract object that they represent ? - Makes this mapping explicit in the code
- Advantages Code maintenance, debugging

IntSet Abstraction Function

- Unsorted Array

- Boolean Vector

- AF ( c ) c.elemsi.intValue
- 0 lt i lt c.elems.size

- AF( c )
- j 0 lt j lt 100
- c.elemsj

- The abstraction function is defined for concrete

instances of the class c, and only includes the

instance variables of the class. Further, it

maps the elements of the representation to the

abstraction.

Abstraction Function Valid Rep

- The abstraction function implicitly assumes that

the representation is valid for the class - What happens if the vector contains duplicate

entries in the first scenario ? - What happens in the second scenario if the

bitmap contains values other than 0 or 1 ? - The AF holds only for valid representations. How

do we know whether a representation is valid ?

Representation Invariant

- Captures formally the assumptions on which the

abstraction function is based - Representation must satisfy this at all times

(except when executing the ADTs methods) - Defines whether a particular representation is

valid invariant satisfied only by valid reps.

IntSet Representation Invariant

- Unsorted Arrays

- Boolean Vector

- c.elems / null
- c.elems has no null elements
- 3. there are no duplicates in c.elems i.e., for

0lti, j ltN, - c.elemsi.intValue c.elemsj.intValuegt i

j.

- 1. c.elements / null
- 2. c.elements.size maxValue

NOTE The types of the instance variables are NOT

a part of the Rep Invariant. So there is not need

to repeat what is there in the type signature.

Rep Invariant Important Points

- Rep invariant always holds before and after the

execution of the ADTs operations - Can be violated while executing the ADTs

operations - Can be violated by private methods of the ADT
- How much shall the rep invariant constrain?
- Just enough for different developers to implement

different operations AND not talk to each other - Enough so that AF makes sense for the

representation

AF and RI How to implement ?

- RI repOK

- AF toString

- Public method to check if the rep invariant holds
- Useful for testing/debugging
- public boolean repOK()
- // EFFECTS Returns true
- // if the rep invariant holds,
- // Returns false otherwise

- Public method to convert a valid rep to a String

form - Useful for debugging/printing
- public String toString( )
- // EFFECTS Returns a string
- // containing the abstraction
- // represented by the rep.

Uses of RI and AF

- Documentation of the programmers thinking
- RepOK method can be called before and after every

public method invocation in the ADT - Typically during debugging only
- toString method can be used both during debugging

and in production - Both the RI and AF can be used to formally prove

the correctness of the ADT

Group Activity

- Assume that the Polynomial data type is

represented as an array trms and a variable deg.

The co-efficients of the term xi are stored in

the ith element of trms array, and the variable

deg represents the degree of the polynomial

(i.e., its highest exponent). - Write its abstraction function
- Write its rep-invariant

Learning Objectives

- Define data abstractions and list their elements
- Write the abstraction function (AF) and

representation invariant (RI) of a data

abstraction - Prove that the RI is maintained and that the

implementation matches the abstraction (i.e., AF) - Enumerate common mistakes in data abstractions

and learn how to avoid them - Design equality methods for mutable and immutable

data types

Reasoning about ADTs - 1

- ADTs have state in the form of representation
- Need to consider what happens over a sequence of

operations on the abstraction - Correctness of one operation depends on

correctness of previous operations - We need to reason inductively over the operations

of the ADT - Show that constructor is correct
- Show that each operation is correct

Reasoning about ADTs - 2

- First, need to show that the rep invariant is

maintained by the constructor operations - Then, show that the implementation of the

abstraction matches the specification - Assume that the rep invariant is maintained
- Use the abstraction function to map the

representation to the abstraction

Why show that Rep Invariant is maintained ?

- Consider the implementation of the IntSet using

the unsorted vector representation. We wish to

compute the size of the set (i.e., its

cardinality). - public int size()
- return elems.size()
- Is the above implementation correct ?

Why show that Rep Invariant is maintained ?

- Yes, but only if the Rep Invariant holds !
- c.elems ! Null c.elems has no null elements
- c.elems has no duplicates
- Otherwise, size can return a value gt cardinality
- public int size()
- return elems.size()

Showing Rep Invariant is maintainedData Type

Induction

- Show that the constructor establishes the Rep

Invariant - For all other operations,

- Assume at the time of the call the invariant

holds for - this and
- all argument objects of the type
- Demonstrate that the invariant holds on return

for - this
- all argument objects of the type
- for returned objects of the type

A Valid Rep

Function Body

Another Valid Rep

IntSet getIndex

Assume that IntSet has the following private

function. Note that private methods do not need

to preserve the RI.

- private int getIndex( int x )
- // EFFECTS If x is in this, returns

index - // where x appears in the Vector elems
- // else return -1 (do NOT throw an

exception) - for (int i 0 i lt els.size( ) i )
- if ( x elements.get(i).intValue()

) - return i
- return 1

IntSet Constructor

Show that the RI is true at the end of the

constructor

- public IntSet( )
- // EFFECTS Initializes this to be empty
- elems new VectorltIntegergt()

RI c.elems ! NULL c.elems has no null

elements c.elems has no duplicates

Proof When the constructor terminates, Clause 1

is satisfied because the elems vector is

initialized by constructor Clause 2 is satisfied

because elems has no elements (and hence no null

elements) Clause 3 is satisfied because elems has

no elements (and hence no duplicates)

IntSet Insert

Show that if RI holds at the beginning, it holds

at the end.

- public void insert (int x)
- // MODIFIES this
- // EFFECTS adds x to the set such that

this_post this u x - if ( getIndex(x) lt 0 )
- elems.add( new Integer(x) )

RI c.elems ! NULL c.elems has no null

elements c.elems has no duplicates

Proof If clause 1 holds at the beginning, it

holds at the end of the procedure. - Because

c.elems is not changed by the procedure. If

clause 2 holds at the beginning, it holds at the

end of the procedure - Because it adds a

non-null reference to c.elems If clause 3 holds

at the beginning, it holds at the end of the

procedure - Because getIndex() prevents

duplicate elements from being added to the vector

IntSetRemove

Show that if RI holds at the beginning, it holds

at the end.

- pubic void remove(int x)
- // MODIFIES this
- // EFFECTS this_post this - x
- int i getIndex(x)
- if (i lt 0) return // Not found
- elems.set(i, elems.lastElement() )
- elems.remove(elems.size() 1)

RI c.elems ! NULL c.elems has no null

elements c.elems has no duplicates

IntSet Observers

Show that if RI holds at the beginning, it holds

at the end.

- public int size()
- return elems.size()

- public boolean isIn(int x)
- return getIndex(x) gt 0

RI c.elems ! NULL c.elems has no null

elements c.elems has no duplicates

This completes the proof that the RI holds in the

ADT. In other words, given any sequence of

operations in the ADT, the RI always holds at

the beginning and end of this sequence.

Group Activity

- Consider the implementation of the Polynomial

Datatype described earlier (also on the code

handout sheet) - Show using data-type induction that the Rep

Invariant is preserved

Are we done ?

- Thus, we have shown that the RI is established by

the constructor and holds for each operation

(i.e., if RI is true at the beginning, it is true

at the end). Can we stop here ?

No. To see why not, consider an implementation of

the operators that does nothing. Such an

implementation will satisfy the rep invariant,

but is clearly wrong !!!

To complete the proof, we need to show that the

Abstraction provided by the ADT is correct. For

this, we use the (now proven) fact that the RI

holds and use the AF to show that the rep

satisfies the AFs abstraction after each

operation.

Abstraction Function IntSet

- Show that the implementation matches the ADTs

specification (i.e., its abstraction)

Pre-Rep

Abstraction function

Given

Pre-Abstraction

Function Spec

Function Implementation

Abstraction function

Prove that

Post- Rep

Post-Abstraction

Abstraction Function Constructor

- AF ( c ) c.elemsi.intValue 0 lt i lt

c.elems.size

public IntSet( ) // EFFECTS

Initializes this to be empty

elems new VectorltIntegergt()

AF

Empty vector

Empty Set

Proof Constructor creates an empty set, so it is

correct.

Abstraction Function Size

- AF ( c ) c.elemsi.intValue 0 lt i lt

c.elems.size

public int size() // EFFECTS Returns the

cardinality of this return elems.size( )

AF

Number of elements in vector

Cardinality of the set (Why ?)

Proof Because the rep invariant guarantees that

there are no duplicates in the vector, the number

of elements in the vector denotes the cardinality

of the set.

Abstraction Function Insert

- AF ( c ) c.elemsi.intValue 0 lt i lt

c.elems.size

AF

public void insert (int x) // MODIFIES this

// EFFECTS adds x to the set // such that

this_post this U x if ( getIndex(x) lt

0 ) elems.add(new Integer(x))

Vector

this

Implementation

Vector with element added if and only if it did

not already exist

this_post this U x

AF

Abstraction Function Remove

- AF ( c ) c.elemsi.intValue 0 lt i lt

c.elems.size

Vector

this

public void remove (int x) // MODIFIES this

// EFFECTS this_post this - x

int i getIndex(x) if (i lt 0) return

// Not found // Move last element to

the index i elems.set(i,

elems.lastElement() ) elems.remove(elems.s

ize() 1)

Vector with first instance of element removed if

it exists

this_post this - x

Abstraction Function IsIn

AF ( c ) c.elemsi.intValue 0 lt i lt

c.elems.size

- public boolean isIn(int x)
- // EFFECTS Returns true if x belongs to
- // this, false otherwise
- return getIndex(x) gt 0

vector

this

True if and only if x is present in the vector

True if x belongs to this, False otherwise

Proof Summary

- This completes the proof. Thus, weve shown that

the ADT implements it spec correcltly. This

method is called Data type induction, because

it proceeds using induction. - Step 0 Write the implementation of the ADT
- Step 1 Show that the RI is maintained by the ADT
- Step 2 Assuming that the RI is maintained, show

using the AF that the translation from the rep to

the abstraction matches the methods spec.

Group Activity

- Consider the implementation of the Polynomial

Datatype described earlier (also on the code

handout sheet) - Show that the ADTs implementation matches its

specification assuming that the RI holds.

Learning Objectives

- Define data abstractions and list their elements
- Write the abstraction function (AF) and

representation invariant (RI) of a data

abstraction - Prove that the RI is maintained and that the

implementation matches the abstraction (i.e., AF) - Enumerate common mistakes in data abstractions

and learn how to avoid them - Design equality methods for mutable and immutable

data types

Exposing the Rep

- Note that the proof we just wrote assumes that

the only way you can modify the representation is

through its operations - Otherwise Rep invariant can be violated
- Is this always true ?
- What if you expose the representation outside the

class, so that any outside entity can change it ?

Mistakes that lead to exposing the rep - 1

- Making rep components public
- public class IntSet
- public VectorltIntegergt elements
- Your rep must always be private. Otherwise, all

bets are off. - Hopefully, your code will not have this bug .

Mistakes that lead to exposing the rep - 2

public class IntSet //OVERVIEW IntSets are

mutable, unbounded sets of integers. //

A typical IntSet is x1, xn private

VectorltIntegergt elems // no duplicates in

vector public VectorltIntegergt allElements ()

//EFFECTS Returns a vector containing the

elements of this, // each exactly

once, in arbitrary order return

elems intSet new IntSet() intSet.allElem

ents().add( new Integer(5) ) intSet.allElements()

.add( new Integer(5) ) // RI violated

duplicates !

Mistakes that lead to exposing the rep - 3

public class IntSet //OVERVIEW IntSets are

mutable, unbounded sets of integers. //

A typical IntSet is x1, xn private

VectorltIntegergt elems //constructors public

IntSet (VectorltIntegergt els) throws

NullPointerException //EFFECTS If els is

null, throws NullPointerException, else //

initializes this to contain as elements all the

ints in els. if (els null) throw new

NullPointerException() elems

els VectorltIntegergt someVector new

Vector() intSet new IntSet(someVector) someVec

tor.add( new Integer(5) ) someVector.add( new

Integer(5) ) // RI violated duplicates !

Summary of mistakes that expose the Rep

- NOT making rep components private
- Returning a reference to the reps mutable

components - Initializing rep components with a reference to

an outside mutable object - NOT performing deep copy of rep elements
- Use clone method instead
- Perform manual copies

Group Activity

- For the polynomial example, how many mistakes of

exposing the rep can you find. How will you fix

them ? (refer to code handout sheet)

Learning Objectives

- Define data abstractions and list their elements
- Write the abstraction function (AF) and

representation invariant (RI) of a data

abstraction - Prove that the RI is maintained and that the

implementation matches the abstraction (i.e., AF) - Enumerate common mistakes in data abstractions

and learn how to avoid them - Design equality methods for mutable and immutable

data types

Mutable objects

- Objects whose abstract state can be modified
- Applies to the abstraction, not the

representation - Mutable objects Can be modified once they are

created e.g., IntSet, IntList etc. - Immutable objects Cannot be modified
- Examples Polynomials, Strings

Equality Equals Method

- All objects are inherited from object which has a

method Boolean equals(Object o) - Returns true if object o is the same as the

current - Returns false otherwise
- Note that equals tests whether two objects have

the same state - If a and b are different objects, a.equals(b)

will return false even if they are functionally

identical

Equality IntSet Example

- IntSet a new IntSet()
- a.insert(1) a.insert(2) a.insert(3)
- IntSet b new IntSet()
- b.insert(1) b.insert(2) b.insert(3)
- if ( a.equals(b) )
- System.out.println(Equal)
- What is printed by the above code ?

Equality IntSet Example

- It prints nothing. Why ?
- Because the intsets are different objects and the

object.equals method only compares their hash - Therefore, a.equals(b) returns false
- But this is in fact the correct behavior !
- To see this, assume that you added an element to

a but not b after the equals comparison - a.equals(b) would no longer be true, even if you

have not changed the references to a or b

Rule of Object Equality

- Two objects should be equal if it is impossible

to distinguish between them using any sequence of

calls to the objects methods - Corollary Once two objects are equal, they

should always be equal. Otherwise it is possible

to distinguish between them using some

combination of the objects methods.

Mutability and the Equals Method

- For mutable objects, you can distinguish between

two objects by mutating them after the

comparison. Therefore, they are NOT equal. The

default equals method does the right thing

i.e., returns false. - If the objects are immutable AND have the same

state, then the equals method should return true.

So we need to override the equals for immutable

objects to do the right thing.

Immutable Abstractions

- ADT does not change once created
- No mutator methods
- Producer methods to create new objects
- Appropriate for modeling objects that do not

change during their existence - Mathematical entities such as Rational numbers
- Certain objects may be implemented more

efficiently e.g., Strings

Why use immutable ADTs ?

- Safety
- Dont need to worry about accidental changes
- Can be assured that rep doesnt change
- Efficiency
- May hurt efficiency if you need to copy the

object - In some cases, it may be more efficient by

sharing representations across objects e.g.,

Strings - Ease of Implementation
- May be easier for concurrency control

Equality Immutable objects

- Immutable objects should define their own equals

method - Return true if the abstract state matches, even

if the internal state (i.e., rep) is different - Therefore, methods of an Immutable object can

modify its rep, but not the abstraction - Such methods said to have benevolent side effects

Group Activity

- Design an equals method for two polynomials. What

will you do if the polynomials are not in their

canonical forms ?

Learning Objectives

- Define data abstractions and list their elements
- Write the abstraction function (AF) and

representation invariant (RI) of a data

abstraction - Prove that the RI is maintained and that the

implementation matches the abstraction (i.e., AF) - Enumerate common mistakes in data abstractions

and learn how to avoid them - Design equality methods for mutable and immutable

data types

To do before next class

- Submit assignment 2 in the lab
- Start working on assignment 3
- Prepare for the midterm exam
- Portions include everything covered so far
- In class on Feb 28th