CSI 3125, Data Types, page 1 - PowerPoint PPT Presentation

About This Presentation
Title:

CSI 3125, Data Types, page 1

Description:

A data type is not just a set of objects. We must consider all ... Hardware implementations (used by floating-point processors): exponent and mantissa. ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 46
Provided by: alanwi8
Category:
Tags: csi | data | mantissa | page | types

less

Transcript and Presenter's Notes

Title: CSI 3125, Data Types, page 1


1
Data types
  • Outline
  • Primitive data types
  • Structured data types
  • Strings
  • Enumerated types
  • Arrays
  • Records
  • Pointers

2
Primitive data types
  • Points
  • Numeric types
  • Booleans
  • Characters

3
Data types introduction
  • A data type is not just a set of objects. We must
    consider all operations on these objects. A
    complete definition of a type must include a list
    of operations, and their definitions. 
  • Primitive data objects are close to hardware, and
    are represented directly (or almost directly) at
    the machine levelusually word, byte, bit.

4
Integer types
  • An integer type is a finite approximation of the
    infinite set of integer numbers0, 1, -1, 2,
    -2, ....
  • Various kinds of integerssignedunsigned,
    longshortsmall. 
  • Hardware implementations of integersone's
    complement, two's complement, ...

5
Floating-point types
All "real" numbers in computers are finite
approximations of the non-denumerable set of real
numbers. Precision and range of values are
defined by the language or by the
programmer. Hardware implementations (used by
floating-point processors) exponent and mantissa.
6
Boolean type
  • This type is not supported by all languages (for
    example, it is not available in PL/, C, Perl).
  • The values are true and false. Operations are as
    in classical two-valued propositional logic.
  • Hardware implementation a single bit or a byte
    (this allows more efficient operations).

7
Character types
  • This is usually ASCII, but extended character
    (Unicode, ISO) sets are often used.
  • Accented characters é à ü etc. should fit within
    ASCII, though there is no single standard.
  • Chinese or Japanese are examples of writing
    systems that require character sets of many more
    than 256 elements.
  • Hardware implementation a byte (ASCII, EBCDIC),
    two bytes (Unicode) or several bytes.

8
Other primitive types
  • word (for example, in Modula-2)
  • byte, bit (for example, in PL/I)
  • pointer (for example, in C)

9
Structured data types
  • Points
  • Strings
  • Enumerated types
  • Arrays
  • Records

10
Strings
  • A string is a sequence of characters. It may be
  • a special data type (its objects can be
    decomposed into characters)Fortran, Basic
  • an array of charactersPascal, Ada
  • a list of charactersProlog
  • consecutively stored charactersC.
  • The syntax is the same characters in quotes.
  • Pascal has one kind of quotes, Ada has two
  • 'A' is a character, "A" is a string.

11
String operations
  • Typical operations on strings
  • string ? string ? string
  • concatenation
  • string ? int ? int ? string
  • substring
  • string ? characters
  • decompose into an array or list
  • characters ? string
  • convert an array or list into a string

12
String operations (2)
  • string ? integer
  • length
  • string ? boolean
  • is it empty?
  • string ? string ? boolean
  • equality, ordering

13
More string operations
  • Specialized string manipulation languages
    (Snobol, Icon) include built-in pattern matching,
    sometimes very complicated, with extremely
    elaborate backtracking.
  • Another language that works very well with
    strings is, of course, Perl.

14
Fixed- and variable-length strings
  • The allowed length of strings is a design issue
  • fixed-length stringsPascal, Ada, Fortran
  • variable-length stringsC, Java, Perl.
  • A character may be treated as a string of length
    1, or as a separate data structure. Many
    languages (Pascal, Ada, C, Prolog) treat strings
    as special cases of arrays or lists.
  • Operations on strings are the same as on arrays
    or lists of other types. For example, every
    character in a character array is available at
    once, whereas a list of characters must be
    searched linearly.

15
Enumerated types
  • Also called user-defined ordinal typesread
    Section 6.4
  • We can declare a list of symbolic constants that
    are to be treated literally, just like in Prolog
    or Scheme.
  • We also specify the implicit ordering of those
    newly introduced symbolic constants. In Pascal
  • type day (mo, tu, we, th, fr, sa, su)
  • Here, we have mo lt tu lt we lt th lt fr lt sa lt su.

16
Operators for enumerated types
  • Pascal supplies the programmer with three generic
    operations for every new enumerated type T
  • succ successor, for example, succ(tu) we
  • pred predecessor, for example, pred(su) sa
  • (each is undefined at one end)
  • ord (ordinal) position in the type, starting at
    0,so for example ord(th) 3
  • For characters, Pascal also has chr, producing
    the character at a given position, so for example
    chr(65) returns A.

17
Enumerated types in Ada
  • Ada makes these generic operations complete
  • succ successor,
  • pred predecessor,
  • pos position,
  • val constant at position.
  • Ada also supplies type attributes, among them
    FIRST and LAST
  • day'FIRST mo, day'LAST su

18
Reuse of symbolic constants
  • A design issue is the symbolic constant allowed
    in more than one type? In Pascal, no. In Ada,
    yes
  • type stoplight is (red, amber, green)
  • type rainbow is (violet, indigo, blue, green,
    yellow, orange, red)
  • Qualified descriptionssimilar to type
    castsprevent any confusion we can write
    stoplight'(red) or rainbow'(red).

19
Implementation of enumerated types
  • Map the constants c1, ..., ck into small integers
    0, ..., k-1.
  • Enumerated types help increase clarity and
    readability of programs by separating concepts
    from their numeric codes.

20
Arrays
  • An array represents a mapping
  • index_type ? component_type
  • The index type must be a discrete type (integer,
    character, enumeration etc). In some languages
    this type is specified implicitly
  • an array of size N is indexed 0N-1 in C / Java
    / Perl, but in Fortran it is 1N. In Algol,
    Pascal, Ada the lower and upper bound must be
    both given.
  • There are normally few restrictions on the
    component type (in some languages we can even
    have arrays of procedures or files).

21
Multidimensional arrays
  • Multidimensional arrays can be defined in two
    ways (for simplicity, we show only dimension 2)
  • index_type1 ? index_type2 ? component_type
  • This corresponds to references such as AI,J.
    Algol, Pascal, Ada work like this.
  • index_type1 ?(index_type2 ? component_type)
  • This corresponds to references such as AIJ.
    Java works like this.
  • Perl sticks to one dimension

22
Operations on arrays (1)
  • select an element (get or change its value) AJ
  • select a slice of an array
  • (read the textbook, Section 6.5.7)
  • assign a complete array to a complete array
  • A B
  • There is an implicit loop here.

23
Operations on arrays (2)
  • Compute an expression with complete arrays (this
    is possible in extendible or specialized
    languages, for example in Ada)
  • V W U
  • If V, W, U are arrays, this may denote array
    addition. All three arrays must be compatible
    (the same index and component type), and addition
    is probably carried out element by element.

24
Subscript binding
  • static fixed size, static allocation
  • this is done in older Fortran.
  • semistatic fixed size, dynamic allocation
  • Pascal.
  • semidynamic size determined at run
    time, dynamic allocation
  • Ada
  • dynamic size fluctuates during
    execution, flexible allocation required
  • Algol 68, APLboth little used...

25
Array-type constants and initialization
  • Many languages allow initialization of arrays to
    be specified together with declarations
  • C int vector 10,20,30
  • Ada vector array(0..2) of integer
    (10,20,30)
  • Array constants in Ada
  • temp is array(mo..su)of -40..40
  • T temp
  • T (15,12,18,22,22,30,22)
  • T (mogt15, wegt18, tugt12,
  • sagt30, othersgt22)
  • T (15,12,18, sagt30, othersgt22)

26
Implementing arrays (1)
  • The only issue is how to store arrays and access
    their elementsoperations on the component type
    decide how the elements are manipulated.
  • An array is represented during execution by an
    array descriptor. It tells us about
  • the index type,
  • the component type,
  • the address of the array, that is, the data.

27
Implementing arrays (2)
  • Specifically, we need
  • the lower and upper bound (for subscript
    checking),
  • the base address of the array,
  • the size of an element.
  • We also need the subscriptit gives us the offset
    (from the base) in the memory area allocated to
    the array.
  • A multi-dimensional array will be represented by
    a descriptor with more lower-upper bound pairs.

28
Implementing multidimensional arrays
  • Row major order (second subscript increases
    faster)

Column major order (first subscript increases
faster)
29
Implementing multidimensional arrays (2)
  • Suppose that we have this array
  • A array LOW1..HIGH1,
  • LOW2..HIGH2 of ELT
  • where the size of each entity of type ELT is
    SIZE.
  • This calculation is done for row-major
    (calculations for column-major are quite
    similar). We need the basefor example, the
    address LOC of ALOW1, LOW2.

30
Implementing multidimensional arrays (3)
  • We can calculate the address of AI,J in the
    row-major order, given the base.
  • Let the length of each row in the array be
  • ROWLENGTH HIGH2 - LOW2 1
  • The address of AI,J is
  • (I - LOW1) ROWLENGTH SIZE (J - LOW2) SIZE
    LOC

31
Implementing multidimensional arrays (4)
  • Here is an example.
  • VEC array 1..10, 5..24 of integer
  • The length of each row in the array is
  • ROWLENGTH 24 - 5 1 20
  • Let the base address be 1000, and let the size of
    an integer be 4.
  • The address of VECi,j is
  • (i - 1) 20 4 (j - 5) 4 1000
  • For example, VEC7,16 is located in 4 bytes at
  • 1524 (7 - 1) 20 4 (16 - 5) 4
    1000

32
Languages without arrays
  • A final word on arrays they are not supported by
    standard Prolog and pure Scheme. An array can be
    simulated by a list, which is the basic data
    structure in Scheme and a very important data
    structure in Prolog.
  • Assume that the index type is always 1..N.
  • Treat a list of N elements
  • x1, x2, ..., xN (Prolog)
  • (x1 x2 ... xN) (Scheme)
  • as the (structured) value of an array

33
Records
  • A record is a heterogeneous collection of fields
    (components)this differs from homogenous arrays.
  • Records are supported by a majority of important
    languages, beginning with Cobol, through Pascal,
    PL/I, Ada, C (where they are called structures),
    Prolog (!) to C.
  • There are no records in Java, but classes replace
    them. There are no records in Perl.

34
Ada recordssyntax
  • type date is record day 1..31 month
    1..12 year 1000..9999end recordtype
    person is record name record fname
    string(1..20) lname string(1..20)
    end record born date gender (F,
    M)end record

35
Fields
  • A field is distinguished by a name rather than an
    index. Iteration on elements of an array is
    natural and very useful, but iteration on fields
    of a record is not possible (why?).
  • A field is indicated by a qualified name. In Ada
  • X, Y person
  • X.born.day 15
  • X.born.month 11
  • X.born.year 1964
  • Y.born (23, 9, 1949)
  • Y.name.fname(1..8) "Smithson"

36
Operations on records (1)
  • Selection of a component is done by field name.
  • Construction of a record from componentseither
    from separate fields, or as a complete record in
    a structured constant.
  • D (month gt 10, day gt 15, year gt 1994)
  • D (day gt 15, month gt 10, year gt 1994)
  • D (15, 10, 1994)
  • D (15, 10, year gt 1994)
  • Note that an array can also be assigned such a
    constant. Interpretation depends on context.
  • A array(1..3)of integer
  • A (15, 10, 1994)

37
Operations on records (2)
  • Assignment of complete records is allowed in Ada,
    and is done field by field.
  • Records can be compared for equality or
    inequality, regardless of their structure or type
    of components. No generic standard ordering of
    records exists, but specific ordering can be
    defined by the programmer.

38
More on records
  • Ada allows default values for fields
  • type date is record
  • day 1..31 month 1..12
  • year 1000..9999 2002
  • end record
  • D date -- D.year is now 2002
  • There are almost no restrictions on field types.
    Any combination of records and arrays (any depth)
    is usually possible. A field could also be a file
    or even a procedure!

39
The Prolog equivalent of records
  • Recordsor rather termsin Prolog can carry their
    type and their components around
  • date(day(15), month(10), year(1994))
  • person(
  • name(fname("Jim"), lname("Berry")),
  • born(date(day(15), month(10), year(1994))),
  • gender(male)
  • )
  • If we can assure correct use, this can be
    simplified by dropping one-argument "type"
    functors
  • date(15, 10, 1994)
  • person(name("Jim", "Berry"),
  • born(date(15, 10, 1994)),
  • male)

40
Back to pointers
  • Note Were skipping 6.9.9
  • A pointer variable has addresses as values (and a
    special address nil or null for "no value"). They
    are used primarily to build structures with
    unpredictable shapes and sizeslists, trees,
    graphsfrom small fragments allocated dynamically
    at run time.
  • A pointer to a procedure is possible, but
    normally we have pointers to data (simple and
    composite). An address, a value and usually a
    type of a data item together make up a variable.
    We call it an anonymous variable no name is
    bound to it. Its value is accessed by
    dereferencing the pointer.

41
Back to pointers (2)
Pointers in Pascal are quite well designed.
value(p) ?
value(p) 17
  • Note that, as with normal named variables, in
    this
  • p 23
  • we mean the address of p (the value of p).
  • In this
  • m p
  • we mean the value of p.

42
Pointer variable creation
  • A pointer variable is declared explicitly and has
    the scope and lifetime as usual.
  • An anonymous variable has no scope (because it
    has no name) and its lifetime is determined by
    the programmer. It is created (in a special
    memory area called heap) by the programmer, for
    example
  • new(p) in Pascal
  • p malloc(4) in C
  • and destroyed by the programmer
  • dispose(p) in Pascal
  • free(p) in C

43
Pointer variable creation (2)
  • If an anonymous variable exists outside the scope
    of the explicit pointer variable, we have
    "garbage" (a lost object). If an anonymous
    variable has been destroyed inside the scope of
    the explicit pointer variable, we have a dangling
    reference.
  • new(p)
  • p 23
  • dispose(p)
  • ......
  • if p gt 0 ???

44
Pointer variable creation (2)
  • Producing garbage, an example in Pascal
  • new(p) p 23 new(p)
  • the anonymous variable with the value 23 becomes
    inaccessible
  • Garbage collection is the process of reclaiming
    inaccessible storage. It is usually complex and
    costly. It is essential in languages whose
    implementation relies on pointers Lisp, Prolog.

45
Pointers types and operators
  • Pointers in PL/I are typeless. In Pascal, Ada, C
    they are declared as pointers to types, so that a
    dereferenced pointer (p, p) has a fixed type.
  • Operations on pointers in C are quite rich
  • char b, c
  • c '\007'
  • b ((c - 1) 1)
  • putchar(b)
Write a Comment
User Comments (0)
About PowerShow.com