Title: CSI 3125, Data Types, page 1
1Data types
- Outline
- Primitive data types
- Structured data types
- Strings
- Enumerated types
- Arrays
- Records
- Pointers
2Primitive data types
- Points
- Numeric types
- Booleans
- Characters
3Data types introduction
- A data type is not just a set of objects. We must
consider all operations on these objects. A
complete definition of a type must include a list
of operations, and their definitions. - Primitive data objects are close to hardware, and
are represented directly (or almost directly) at
the machine levelusually word, byte, bit.
4Integer types
- An integer type is a finite approximation of the
infinite set of integer numbers0, 1, -1, 2,
-2, .... - Various kinds of integerssignedunsigned,
longshortsmall. - Hardware implementations of integersone's
complement, two's complement, ...
5Floating-point types
All "real" numbers in computers are finite
approximations of the non-denumerable set of real
numbers. Precision and range of values are
defined by the language or by the
programmer. Hardware implementations (used by
floating-point processors) exponent and mantissa.
6Boolean type
- This type is not supported by all languages (for
example, it is not available in PL/, C, Perl). - The values are true and false. Operations are as
in classical two-valued propositional logic. - Hardware implementation a single bit or a byte
(this allows more efficient operations).
7Character types
- This is usually ASCII, but extended character
(Unicode, ISO) sets are often used. - Accented characters é à ü etc. should fit within
ASCII, though there is no single standard. - Chinese or Japanese are examples of writing
systems that require character sets of many more
than 256 elements. - Hardware implementation a byte (ASCII, EBCDIC),
two bytes (Unicode) or several bytes.
8Other primitive types
- word (for example, in Modula-2)
- byte, bit (for example, in PL/I)
- pointer (for example, in C)
9Structured data types
- Points
- Strings
- Enumerated types
- Arrays
- Records
10Strings
- A string is a sequence of characters. It may be
- a special data type (its objects can be
decomposed into characters)Fortran, Basic - an array of charactersPascal, Ada
- a list of charactersProlog
- consecutively stored charactersC.
- The syntax is the same characters in quotes.
- Pascal has one kind of quotes, Ada has two
- 'A' is a character, "A" is a string.
11String operations
- Typical operations on strings
- string ? string ? string
- concatenation
- string ? int ? int ? string
- substring
- string ? characters
- decompose into an array or list
- characters ? string
- convert an array or list into a string
12String operations (2)
- string ? integer
- length
- string ? boolean
- is it empty?
- string ? string ? boolean
- equality, ordering
13More string operations
- Specialized string manipulation languages
(Snobol, Icon) include built-in pattern matching,
sometimes very complicated, with extremely
elaborate backtracking. - Another language that works very well with
strings is, of course, Perl.
14Fixed- and variable-length strings
- The allowed length of strings is a design issue
- fixed-length stringsPascal, Ada, Fortran
- variable-length stringsC, Java, Perl.
- A character may be treated as a string of length
1, or as a separate data structure. Many
languages (Pascal, Ada, C, Prolog) treat strings
as special cases of arrays or lists. - Operations on strings are the same as on arrays
or lists of other types. For example, every
character in a character array is available at
once, whereas a list of characters must be
searched linearly.
15Enumerated types
- Also called user-defined ordinal typesread
Section 6.4 - We can declare a list of symbolic constants that
are to be treated literally, just like in Prolog
or Scheme. - We also specify the implicit ordering of those
newly introduced symbolic constants. In Pascal - type day (mo, tu, we, th, fr, sa, su)
- Here, we have mo lt tu lt we lt th lt fr lt sa lt su.
16Operators for enumerated types
- Pascal supplies the programmer with three generic
operations for every new enumerated type T - succ successor, for example, succ(tu) we
- pred predecessor, for example, pred(su) sa
- (each is undefined at one end)
- ord (ordinal) position in the type, starting at
0,so for example ord(th) 3 - For characters, Pascal also has chr, producing
the character at a given position, so for example
chr(65) returns A.
17Enumerated types in Ada
- Ada makes these generic operations complete
- succ successor,
- pred predecessor,
- pos position,
- val constant at position.
- Ada also supplies type attributes, among them
FIRST and LAST - day'FIRST mo, day'LAST su
18Reuse of symbolic constants
- A design issue is the symbolic constant allowed
in more than one type? In Pascal, no. In Ada,
yes - type stoplight is (red, amber, green)
- type rainbow is (violet, indigo, blue, green,
yellow, orange, red) - Qualified descriptionssimilar to type
castsprevent any confusion we can write
stoplight'(red) or rainbow'(red).
19Implementation of enumerated types
- Map the constants c1, ..., ck into small integers
0, ..., k-1. - Enumerated types help increase clarity and
readability of programs by separating concepts
from their numeric codes.
20Arrays
- An array represents a mapping
- index_type ? component_type
- The index type must be a discrete type (integer,
character, enumeration etc). In some languages
this type is specified implicitly - an array of size N is indexed 0N-1 in C / Java
/ Perl, but in Fortran it is 1N. In Algol,
Pascal, Ada the lower and upper bound must be
both given. - There are normally few restrictions on the
component type (in some languages we can even
have arrays of procedures or files).
21Multidimensional arrays
- Multidimensional arrays can be defined in two
ways (for simplicity, we show only dimension 2) - index_type1 ? index_type2 ? component_type
- This corresponds to references such as AI,J.
Algol, Pascal, Ada work like this. - index_type1 ?(index_type2 ? component_type)
- This corresponds to references such as AIJ.
Java works like this. - Perl sticks to one dimension
22Operations on arrays (1)
- select an element (get or change its value) AJ
- select a slice of an array
- (read the textbook, Section 6.5.7)
- assign a complete array to a complete array
- A B
- There is an implicit loop here.
23Operations on arrays (2)
- Compute an expression with complete arrays (this
is possible in extendible or specialized
languages, for example in Ada) - V W U
- If V, W, U are arrays, this may denote array
addition. All three arrays must be compatible
(the same index and component type), and addition
is probably carried out element by element.
24Subscript binding
- static fixed size, static allocation
- this is done in older Fortran.
- semistatic fixed size, dynamic allocation
- Pascal.
- semidynamic size determined at run
time, dynamic allocation - Ada
- dynamic size fluctuates during
execution, flexible allocation required - Algol 68, APLboth little used...
25Array-type constants and initialization
- Many languages allow initialization of arrays to
be specified together with declarations - C int vector 10,20,30
- Ada vector array(0..2) of integer
(10,20,30) - Array constants in Ada
- temp is array(mo..su)of -40..40
- T temp
- T (15,12,18,22,22,30,22)
- T (mogt15, wegt18, tugt12,
- sagt30, othersgt22)
- T (15,12,18, sagt30, othersgt22)
26Implementing arrays (1)
- The only issue is how to store arrays and access
their elementsoperations on the component type
decide how the elements are manipulated. - An array is represented during execution by an
array descriptor. It tells us about - the index type,
- the component type,
- the address of the array, that is, the data.
27Implementing arrays (2)
- Specifically, we need
- the lower and upper bound (for subscript
checking), - the base address of the array,
- the size of an element.
- We also need the subscriptit gives us the offset
(from the base) in the memory area allocated to
the array. - A multi-dimensional array will be represented by
a descriptor with more lower-upper bound pairs.
28Implementing multidimensional arrays
- Row major order (second subscript increases
faster)
Column major order (first subscript increases
faster)
29Implementing multidimensional arrays (2)
- Suppose that we have this array
- A array LOW1..HIGH1,
- LOW2..HIGH2 of ELT
- where the size of each entity of type ELT is
SIZE. - This calculation is done for row-major
(calculations for column-major are quite
similar). We need the basefor example, the
address LOC of ALOW1, LOW2.
30Implementing multidimensional arrays (3)
- We can calculate the address of AI,J in the
row-major order, given the base. - Let the length of each row in the array be
- ROWLENGTH HIGH2 - LOW2 1
- The address of AI,J is
- (I - LOW1) ROWLENGTH SIZE (J - LOW2) SIZE
LOC
31Implementing multidimensional arrays (4)
- Here is an example.
- VEC array 1..10, 5..24 of integer
- The length of each row in the array is
- ROWLENGTH 24 - 5 1 20
- Let the base address be 1000, and let the size of
an integer be 4. - The address of VECi,j is
- (i - 1) 20 4 (j - 5) 4 1000
- For example, VEC7,16 is located in 4 bytes at
- 1524 (7 - 1) 20 4 (16 - 5) 4
1000
32Languages without arrays
- A final word on arrays they are not supported by
standard Prolog and pure Scheme. An array can be
simulated by a list, which is the basic data
structure in Scheme and a very important data
structure in Prolog. - Assume that the index type is always 1..N.
- Treat a list of N elements
- x1, x2, ..., xN (Prolog)
- (x1 x2 ... xN) (Scheme)
- as the (structured) value of an array
33Records
- A record is a heterogeneous collection of fields
(components)this differs from homogenous arrays. - Records are supported by a majority of important
languages, beginning with Cobol, through Pascal,
PL/I, Ada, C (where they are called structures),
Prolog (!) to C. - There are no records in Java, but classes replace
them. There are no records in Perl.
34Ada recordssyntax
- type date is record day 1..31 month
1..12 year 1000..9999end recordtype
person is record name record fname
string(1..20) lname string(1..20)
end record born date gender (F,
M)end record
35Fields
- A field is distinguished by a name rather than an
index. Iteration on elements of an array is
natural and very useful, but iteration on fields
of a record is not possible (why?). - A field is indicated by a qualified name. In Ada
- X, Y person
- X.born.day 15
- X.born.month 11
- X.born.year 1964
- Y.born (23, 9, 1949)
- Y.name.fname(1..8) "Smithson"
36Operations on records (1)
- Selection of a component is done by field name.
- Construction of a record from componentseither
from separate fields, or as a complete record in
a structured constant. - D (month gt 10, day gt 15, year gt 1994)
- D (day gt 15, month gt 10, year gt 1994)
- D (15, 10, 1994)
- D (15, 10, year gt 1994)
- Note that an array can also be assigned such a
constant. Interpretation depends on context. - A array(1..3)of integer
- A (15, 10, 1994)
37Operations on records (2)
- Assignment of complete records is allowed in Ada,
and is done field by field. - Records can be compared for equality or
inequality, regardless of their structure or type
of components. No generic standard ordering of
records exists, but specific ordering can be
defined by the programmer.
38More on records
- Ada allows default values for fields
- type date is record
- day 1..31 month 1..12
- year 1000..9999 2002
- end record
- D date -- D.year is now 2002
- There are almost no restrictions on field types.
Any combination of records and arrays (any depth)
is usually possible. A field could also be a file
or even a procedure!
39The Prolog equivalent of records
- Recordsor rather termsin Prolog can carry their
type and their components around - date(day(15), month(10), year(1994))
- person(
- name(fname("Jim"), lname("Berry")),
- born(date(day(15), month(10), year(1994))),
- gender(male)
- )
- If we can assure correct use, this can be
simplified by dropping one-argument "type"
functors - date(15, 10, 1994)
- person(name("Jim", "Berry"),
- born(date(15, 10, 1994)),
- male)
40Back to pointers
- Note Were skipping 6.9.9
- A pointer variable has addresses as values (and a
special address nil or null for "no value"). They
are used primarily to build structures with
unpredictable shapes and sizeslists, trees,
graphsfrom small fragments allocated dynamically
at run time. - A pointer to a procedure is possible, but
normally we have pointers to data (simple and
composite). An address, a value and usually a
type of a data item together make up a variable.
We call it an anonymous variable no name is
bound to it. Its value is accessed by
dereferencing the pointer.
41Back to pointers (2)
Pointers in Pascal are quite well designed.
value(p) ?
value(p) 17
- Note that, as with normal named variables, in
this - p 23
- we mean the address of p (the value of p).
- In this
- m p
- we mean the value of p.
42Pointer variable creation
- A pointer variable is declared explicitly and has
the scope and lifetime as usual. - An anonymous variable has no scope (because it
has no name) and its lifetime is determined by
the programmer. It is created (in a special
memory area called heap) by the programmer, for
example - new(p) in Pascal
- p malloc(4) in C
- and destroyed by the programmer
- dispose(p) in Pascal
- free(p) in C
43Pointer variable creation (2)
- If an anonymous variable exists outside the scope
of the explicit pointer variable, we have
"garbage" (a lost object). If an anonymous
variable has been destroyed inside the scope of
the explicit pointer variable, we have a dangling
reference. - new(p)
- p 23
- dispose(p)
- ......
- if p gt 0 ???
44Pointer variable creation (2)
- Producing garbage, an example in Pascal
- new(p) p 23 new(p)
- the anonymous variable with the value 23 becomes
inaccessible - Garbage collection is the process of reclaiming
inaccessible storage. It is usually complex and
costly. It is essential in languages whose
implementation relies on pointers Lisp, Prolog.
45Pointers types and operators
- Pointers in PL/I are typeless. In Pascal, Ada, C
they are declared as pointers to types, so that a
dereferenced pointer (p, p) has a fixed type. - Operations on pointers in C are quite rich
- char b, c
- c '\007'
- b ((c - 1) 1)
- putchar(b)