CSI 3125, Data Types, page 1 - PowerPoint PPT Presentation

About This Presentation

Title:

CSI 3125, Data Types, page 1

Description:

A data type is not just a set of objects. We must consider all ... Hardware implementations (used by floating-point processors): exponent and mantissa. ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 46

Provided by: alanwi8

Category:

more less

Transcript and Presenter's Notes

Title: CSI 3125, Data Types, page 1

1
Data types

Outline
Primitive data types
Structured data types
Strings
Enumerated types
Arrays
Records
Pointers

2
Primitive data types

Points
Numeric types
Booleans
Characters

3
Data types introduction

A data type is not just a set of objects. We must
consider all operations on these objects. A
complete definition of a type must include a list
of operations, and their definitions.
Primitive data objects are close to hardware, and
are represented directly (or almost directly) at
the machine levelusually word, byte, bit.

4
Integer types

An integer type is a finite approximation of the
infinite set of integer numbers0, 1, -1, 2,
-2, ....
Various kinds of integerssignedunsigned,
longshortsmall.
Hardware implementations of integersone's
complement, two's complement, ...

5
Floating-point types
All "real" numbers in computers are finite
approximations of the non-denumerable set of real
numbers. Precision and range of values are
defined by the language or by the
programmer. Hardware implementations (used by
floating-point processors) exponent and mantissa.
6
Boolean type

This type is not supported by all languages (for
example, it is not available in PL/, C, Perl).
The values are true and false. Operations are as
in classical two-valued propositional logic.
Hardware implementation a single bit or a byte
(this allows more efficient operations).

7
Character types

This is usually ASCII, but extended character
(Unicode, ISO) sets are often used.
Accented characters é à ü etc. should fit within
ASCII, though there is no single standard.
Chinese or Japanese are examples of writing
systems that require character sets of many more
than 256 elements.
Hardware implementation a byte (ASCII, EBCDIC),
two bytes (Unicode) or several bytes.

8
Other primitive types

word (for example, in Modula-2)
byte, bit (for example, in PL/I)
pointer (for example, in C)

9
Structured data types

Points
Strings
Enumerated types
Arrays
Records

10
Strings

A string is a sequence of characters. It may be
a special data type (its objects can be
decomposed into characters)Fortran, Basic
an array of charactersPascal, Ada
a list of charactersProlog
consecutively stored charactersC.
The syntax is the same characters in quotes.
Pascal has one kind of quotes, Ada has two
'A' is a character, "A" is a string.

11
String operations

Typical operations on strings
string ? string ? string
concatenation
string ? int ? int ? string
substring
string ? characters
decompose into an array or list
characters ? string
convert an array or list into a string

12
String operations (2)

string ? integer
length
string ? boolean
is it empty?
string ? string ? boolean
equality, ordering

13
More string operations

Specialized string manipulation languages
(Snobol, Icon) include built-in pattern matching,
sometimes very complicated, with extremely
elaborate backtracking.
Another language that works very well with
strings is, of course, Perl.

14
Fixed- and variable-length strings

The allowed length of strings is a design issue
fixed-length stringsPascal, Ada, Fortran
variable-length stringsC, Java, Perl.
A character may be treated as a string of length
1, or as a separate data structure. Many
languages (Pascal, Ada, C, Prolog) treat strings
as special cases of arrays or lists.
Operations on strings are the same as on arrays
or lists of other types. For example, every
character in a character array is available at
once, whereas a list of characters must be
searched linearly.

15
Enumerated types

Also called user-defined ordinal typesread
Section 6.4
We can declare a list of symbolic constants that
are to be treated literally, just like in Prolog
or Scheme.
We also specify the implicit ordering of those
newly introduced symbolic constants. In Pascal
type day (mo, tu, we, th, fr, sa, su)
Here, we have mo lt tu lt we lt th lt fr lt sa lt su.

16
Operators for enumerated types

Pascal supplies the programmer with three generic
operations for every new enumerated type T
succ successor, for example, succ(tu) we
pred predecessor, for example, pred(su) sa
(each is undefined at one end)
ord (ordinal) position in the type, starting at
0,so for example ord(th) 3
For characters, Pascal also has chr, producing
the character at a given position, so for example
chr(65) returns A.

17
Enumerated types in Ada

Ada makes these generic operations complete
succ successor,
pred predecessor,
pos position,
val constant at position.
Ada also supplies type attributes, among them
FIRST and LAST
day'FIRST mo, day'LAST su

18
Reuse of symbolic constants

A design issue is the symbolic constant allowed
in more than one type? In Pascal, no. In Ada,
yes
type stoplight is (red, amber, green)
type rainbow is (violet, indigo, blue, green,
yellow, orange, red)
Qualified descriptionssimilar to type
castsprevent any confusion we can write
stoplight'(red) or rainbow'(red).

19
Implementation of enumerated types

Map the constants c1, ..., ck into small integers
0, ..., k-1.
Enumerated types help increase clarity and
readability of programs by separating concepts
from their numeric codes.

20
Arrays

An array represents a mapping
index_type ? component_type
The index type must be a discrete type (integer,
character, enumeration etc). In some languages
this type is specified implicitly
an array of size N is indexed 0N-1 in C / Java
/ Perl, but in Fortran it is 1N. In Algol,
Pascal, Ada the lower and upper bound must be
both given.
There are normally few restrictions on the
component type (in some languages we can even
have arrays of procedures or files).

21
Multidimensional arrays

Multidimensional arrays can be defined in two
ways (for simplicity, we show only dimension 2)
index_type1 ? index_type2 ? component_type
This corresponds to references such as AI,J.
Algol, Pascal, Ada work like this.
index_type1 ?(index_type2 ? component_type)
This corresponds to references such as AIJ.
Java works like this.
Perl sticks to one dimension

22
Operations on arrays (1)

select an element (get or change its value) AJ
select a slice of an array
(read the textbook, Section 6.5.7)
assign a complete array to a complete array
A B
There is an implicit loop here.

23
Operations on arrays (2)

Compute an expression with complete arrays (this
is possible in extendible or specialized
languages, for example in Ada)
V W U
If V, W, U are arrays, this may denote array
addition. All three arrays must be compatible
(the same index and component type), and addition
is probably carried out element by element.

24
Subscript binding

static fixed size, static allocation
this is done in older Fortran.
semistatic fixed size, dynamic allocation
Pascal.
semidynamic size determined at run
time, dynamic allocation
Ada
dynamic size fluctuates during
execution, flexible allocation required
Algol 68, APLboth little used...

25
Array-type constants and initialization

Many languages allow initialization of arrays to
be specified together with declarations
C int vector 10,20,30
Ada vector array(0..2) of integer
(10,20,30)
Array constants in Ada
temp is array(mo..su)of -40..40
T temp
T (15,12,18,22,22,30,22)
T (mogt15, wegt18, tugt12,
sagt30, othersgt22)
T (15,12,18, sagt30, othersgt22)

26
Implementing arrays (1)

The only issue is how to store arrays and access
their elementsoperations on the component type
decide how the elements are manipulated.
An array is represented during execution by an
array descriptor. It tells us about
the index type,
the component type,
the address of the array, that is, the data.

27
Implementing arrays (2)

Specifically, we need
the lower and upper bound (for subscript
checking),
the base address of the array,
the size of an element.
We also need the subscriptit gives us the offset
(from the base) in the memory area allocated to
the array.
A multi-dimensional array will be represented by
a descriptor with more lower-upper bound pairs.

28
Implementing multidimensional arrays

Row major order (second subscript increases
faster)

Column major order (first subscript increases
faster)
29
Implementing multidimensional arrays (2)

Suppose that we have this array
A array LOW1..HIGH1,
LOW2..HIGH2 of ELT
where the size of each entity of type ELT is
SIZE.
This calculation is done for row-major
(calculations for column-major are quite
similar). We need the basefor example, the
address LOC of ALOW1, LOW2.

30
Implementing multidimensional arrays (3)

We can calculate the address of AI,J in the
row-major order, given the base.
Let the length of each row in the array be
ROWLENGTH HIGH2 - LOW2 1
The address of AI,J is
(I - LOW1) ROWLENGTH SIZE (J - LOW2) SIZE
LOC

31
Implementing multidimensional arrays (4)

Here is an example.
VEC array 1..10, 5..24 of integer
The length of each row in the array is
ROWLENGTH 24 - 5 1 20
Let the base address be 1000, and let the size of
an integer be 4.
The address of VECi,j is
(i - 1) 20 4 (j - 5) 4 1000
For example, VEC7,16 is located in 4 bytes at
1524 (7 - 1) 20 4 (16 - 5) 4
1000

32
Languages without arrays

A final word on arrays they are not supported by
standard Prolog and pure Scheme. An array can be
simulated by a list, which is the basic data
structure in Scheme and a very important data
structure in Prolog.
Assume that the index type is always 1..N.
Treat a list of N elements
x1, x2, ..., xN (Prolog)
(x1 x2 ... xN) (Scheme)
as the (structured) value of an array

33
Records

A record is a heterogeneous collection of fields
(components)this differs from homogenous arrays.
Records are supported by a majority of important
languages, beginning with Cobol, through Pascal,
PL/I, Ada, C (where they are called structures),
Prolog (!) to C.
There are no records in Java, but classes replace
them. There are no records in Perl.

34
Ada recordssyntax

type date is record day 1..31 month
1..12 year 1000..9999end recordtype
person is record name record fname
string(1..20) lname string(1..20)
end record born date gender (F,
M)end record

35
Fields

A field is distinguished by a name rather than an
index. Iteration on elements of an array is
natural and very useful, but iteration on fields
of a record is not possible (why?).
A field is indicated by a qualified name. In Ada
X, Y person
X.born.day 15
X.born.month 11
X.born.year 1964
Y.born (23, 9, 1949)
Y.name.fname(1..8) "Smithson"

36
Operations on records (1)

Selection of a component is done by field name.
Construction of a record from componentseither
from separate fields, or as a complete record in
a structured constant.
D (month gt 10, day gt 15, year gt 1994)
D (day gt 15, month gt 10, year gt 1994)
D (15, 10, 1994)
D (15, 10, year gt 1994)
Note that an array can also be assigned such a
constant. Interpretation depends on context.
A array(1..3)of integer
A (15, 10, 1994)

37
Operations on records (2)

Assignment of complete records is allowed in Ada,
and is done field by field.
Records can be compared for equality or
inequality, regardless of their structure or type
of components. No generic standard ordering of
records exists, but specific ordering can be
defined by the programmer.

38
More on records

Ada allows default values for fields
type date is record
day 1..31 month 1..12
year 1000..9999 2002
end record
D date -- D.year is now 2002
There are almost no restrictions on field types.
Any combination of records and arrays (any depth)
is usually possible. A field could also be a file
or even a procedure!

39
The Prolog equivalent of records

Recordsor rather termsin Prolog can carry their
type and their components around
date(day(15), month(10), year(1994))
person(
name(fname("Jim"), lname("Berry")),
born(date(day(15), month(10), year(1994))),
gender(male)
)
If we can assure correct use, this can be
simplified by dropping one-argument "type"
functors
date(15, 10, 1994)
person(name("Jim", "Berry"),
born(date(15, 10, 1994)),
male)

40
Back to pointers

Note Were skipping 6.9.9
A pointer variable has addresses as values (and a
special address nil or null for "no value"). They
are used primarily to build structures with
unpredictable shapes and sizeslists, trees,
graphsfrom small fragments allocated dynamically
at run time.
A pointer to a procedure is possible, but
normally we have pointers to data (simple and
composite). An address, a value and usually a
type of a data item together make up a variable.
We call it an anonymous variable no name is
bound to it. Its value is accessed by
dereferencing the pointer.

41
Back to pointers (2)
Pointers in Pascal are quite well designed.
value(p) ?
value(p) 17

Note that, as with normal named variables, in
this
p 23
we mean the address of p (the value of p).
In this
m p
we mean the value of p.

42
Pointer variable creation

A pointer variable is declared explicitly and has
the scope and lifetime as usual.
An anonymous variable has no scope (because it
has no name) and its lifetime is determined by
the programmer. It is created (in a special
memory area called heap) by the programmer, for
example
new(p) in Pascal
p malloc(4) in C
and destroyed by the programmer
dispose(p) in Pascal
free(p) in C

43
Pointer variable creation (2)

If an anonymous variable exists outside the scope
of the explicit pointer variable, we have
"garbage" (a lost object). If an anonymous
variable has been destroyed inside the scope of
the explicit pointer variable, we have a dangling
reference.
new(p)
p 23
dispose(p)
......
if p gt 0 ???

44
Pointer variable creation (2)

Producing garbage, an example in Pascal
new(p) p 23 new(p)
the anonymous variable with the value 23 becomes
inaccessible
Garbage collection is the process of reclaiming
inaccessible storage. It is usually complex and
costly. It is essential in languages whose
implementation relies on pointers Lisp, Prolog.

45
Pointers types and operators

Pointers in PL/I are typeless. In Pascal, Ada, C
they are declared as pointers to types, so that a
dereferenced pointer (p, p) has a fixed type.
Operations on pointers in C are quite rich
char b, c
c '\007'
b ((c - 1) 1)
putchar(b)

Write a Comment

User Comments (0)