Title: Comp201 Computer Systems Data Representation
1Comp201 Computer SystemsData Representation
2Data Representation
- Describes the methods by which data can be
represented and transmitted in a computer. - Reading Chapter Three Englander
3Data Representation
- Alphanumeric data
- Big Endian vs Little Endian
- Images
- Bit map
- Vector
- Audio
4Example Data Representations
5Alphanumeric Data
- Many applications process text (e.g. compilers
and word processors) - coding schemes include ASCII, EBCDIC and Unicode
- ASCII table (in hex)
0
0
0
1
0
2
0
3
0
4
0
5
0
6
0
7
0
8
0
9
0
a
0
b
0
c
0
d
0
e
0
f
n
u
l
s
o
h
s
o
t
e
t
x
e
o
t
e
n
q
a
c
k
b
e
l
b
s
h
t
n
l
v
t
n
p
c
r
s
o
s
i
1
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
1
a
1
b
1
c
1
d
1
e
1
f
d
l
e
d
c
1
d
c
2
d
c
3
d
c
4
n
a
k
s
y
n
e
t
b
c
a
n
e
m
s
u
b
e
s
c
f
s
g
s
r
s
u
s
2
0
2
1
2
2
2
3
2
4
2
5
2
6
2
7
2
8
2
9
2
a
2
b
2
c
2
d
2
e
2
f
s
p
!
"
'
(
)
,
-
.
/
3
0
3
1
3
2
3
3
3
4
3
5
3
6
3
7
3
8
3
9
3
a
3
b
3
c
3
d
3
e
3
f
0
1
2
3
4
5
6
7
8
9
lt
gt
?
4
0
4
1
4
2
4
3
4
4
4
5
4
6
4
7
4
8
4
9
4
a
4
b
4
c
4
d
4
e
4
f
_at_
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
5
0
5
1
5
2
5
3
5
4
5
5
5
6
5
7
5
8
5
9
5
a
5
b
5
c
5
d
5
e
5
f
P
Q
R
S
T
U
V
W
X
Y
Z
\
_
6
0
6
1
6
2
6
3
6
4
6
5
6
6
6
7
6
8
6
9
6
a
6
b
6
c
6
d
6
e
6
f
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
7
0
7
1
7
2
7
3
7
4
7
5
7
6
7
7
7
8
7
9
7
a
7
b
7
c
7
d
7
e
7
f
p
q
r
s
t
u
v
w
x
y
z
d
e
l
6- Characters
- Example what does the following ASCII string
represent 48 65 6C 6C 6F 20 32 30 31
7Sorting ASCII Characters
- ASCII and EBCDIC codes are designed so the
computer can do alphabetic comparisons. - In Windows, comparisons are case insensitive (in
most instances) - In Unix comparisons are generally case sensitive
8EBCDIC codes
Reference Englander, chapter 3
9UNICODE
Greek
- 16 bit code (encode 65536 characters)
- Modelled on ASCII character set
- Encodes most characters currently in use
- Uses scripts to define characters in a particular
language
Tibetan
Dingbats
Katakana
10Big Endian vs Little Endian
- On most computers the storage unit is a byte
- Multiple bytes are required to store most data
types (e.g. integers gt 4 bytes)
byte
byte
byte
byte
Address
11Big Endian vs Little Endian
- How do we pack words into a byte addressable
memory?
12Question Does it matter?
- Answer yes, of course but not much! The
differences in performance are minor, but (of
course) you must choose one. Yes, it is possible
to swap around in software between them. - Intel processors use Little Endian
- Motorola processors use Big Endian.
- Some programs (e.g.Windows) likewise insist on a
particular format Windows .bmp format is Little
Endian, for instance - Internet protocols are Big Endian. Conversion is
required on Little Endian processors.
13Some common file formats(reference
www.cs.umass.edu , Dr. William Verts)
- Big Endian
- Adobe Photoshop
- JPEG
- MacPaint
- SGI (silicon graphics)
- Sun Raster
- WPG (word perfect graphics metafile)
- Little Endian
- BMP
- GIF
- PCX (paintbrush)
- QTM (quicktime)
- Microsoft RTF
And some can be either, selected by codes in file
14Pictures
- Many different formats used to store Images in a
computer - Two Main categories
- Bit map images
- E.g. photographs paintings
- Characterised by continuous variations
- in shading, colour, shape and texture
- Necessary to store info about each point
- Vector Images
- Made up of geometrical shapes (e.g. lines
circles etc) - Sufficient to store geometrical detail plus its
position
15Bit map Images
- Many different formats
- bit map e.g. GIF, TIFF, ...
Image
Binary Representation
16Bit map storage
- Consider an image with 600 rows of 800 pixels
one byte used to store each of the three colours
of each pixel - Total memory 600 800 3 ? 1.5MB
- Alternative representation is to use a palette
a lookup table which defines the colours in the
image - An index into this table is then stored for each
pixel - Can also reduce the size by reducing the
resolution (I.e. increase the size of each pixel)
or by employing various compression algorithms (p
78, 79) to lower storage requirements.
17Example GIF
- Graphics Interchange Format
- A proprietary format developed in 1987
- Gif 89A defn http//www.dcs.ed.ac.uk/home/mxr/gfx
/2d/GIF89a.txt - Assumes a rectangular screen containing a number
of images - Areas not filled with images are filled with a
background colour - Uses a palette to store 256 colours
18Gif Screen
19GIF File Format
20Vector Graphics
- series of objects such as lines and circles e.g.
PICT, TIFF, ... - line 0,50,100,50
- line 50,0,50,100
- char A, 75, 25
A
21Example Postscript
- A page description language
- An image consists of a program written in the
postscript language - Encoded in ASCII or Unicode
- Contains functions to
- draw lines
- Draw bezier curves
- Join simple object into more complex ones
- Translate or scale an object
- Fill an object
22Figure 3.13 A PostScript program
23Figure 3.14 Another PostScript program
24Audio Data
- Sound is normally digitised from an audio source
- Analog waveform sampled at regular times
intervals - The amplitude at each interval is recorded using
an A-to-D converter - Most positive peak set max binary number
- Most negative peak set to zero
25Figure 3.15 Digitizing an audio waveform
26Wave (.WAV) Sound format
- Designed by Microsoft
- Supports 8 or 16 bit sound samples
- Sample rates 11.025KHz, 22.05KHz or 44.1KHz
- Supports stereo or mono
- Very simple format
27Wave Format
28Wave Format
29Some statistics
- If we encode sound at 44kHz, each sample at 16
bits, stereo (2 channels), this amounts to 1.4
MBits/sec and three minutes will take about 25
Mbytes of space! - It we only encode the most important features, it
is termed data compression, and can reduce file
size by about 101
30Two popular methods
- Real Audio is one method used for data
compression. - MP3 is another.
- Comparative file sizes
- WAV file at 44KHz, 16 bit 5 MB
- Real Audio will take 304KB
- MP3 will take 409KB
- Source www.howstuffworks.com