Introduction to Databases - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Introduction to Databases

Description:

Introduction to Databases Data Organisation Definition Data modelling SQL DBMS functions – PowerPoint PPT presentation

Number of Views:252
Avg rating:3.0/5.0
Slides: 56
Provided by: FAdam
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Databases


1
Introduction to Databases
  • Data Organisation
  • Definition
  • Data modelling
  • SQL
  • DBMS functions

2
Basics of data Organisation
  • DATA HIERARCHY (four categories)
  • Fields represent a single data item
  • Records made up of a related set of fields
    describing one instance of an entity
  • File / Table a set of related records - as many
    as instances (occurrence) in the set
  • Database a collection of related files

3
Example of data structure
Fields
Name First name Telephone Sampras Pete 45 25
65 65 Healy Margaret 25 58 96 63 Clinton Bill 12
25 28 89 Henry Thierry 25 78 85 85
Records
Other files gtcomplete data Structure DB
File / Table
4
Database Definition.
  • "A collection of interrelated data stored
    together with controlled redundancy, to serve one
    or more applications in an optimal fashion the
    data is stored so that it is independent of the
    application programs which use it a common and
    controlled approach is used in adding new data
    and in modifying existing data within the
    database."

5
Definition - closer look
  • A collection of interrelated data stored together
  • with controlled redundancy
  • to serve one or more applications in an optimal
    fashion
  • the data is stored so that it is independent of
    the application programs which use it
  • a common and controlled approach is used in
    adding new data and in modifying existing data
    within the database.

6
Advantages of Databases
  • data are independent from applications - stored
    centrally
  • data repository accessible to any new program
  • data are not duplicated in different locations
  • programmers do not have to write extensive
    descriptions of the files
  • These save enough money and time to offset the
    extra costs of setting and maintaining DBs

7
Disadvantages of DBs
  • Data are more accessible so more easily abused
  • Large DBs require expensive hardware and software
  • specialised / scarce personnel is required to
    develop and maintain large DBs
  • People / business units may object to their
    data being widely available in a DB

8
Characteristics of DBs
  • High concurrency (high performance under load)
  • Multi-user (read does not interfere with write)
  • Data consistency changes to data dont affect
    running queries no phantom data changes
  • High degree of recoverability (pull the plug
    test)

9
ACID test
  • Atomicity
  • Consistency
  • Isolation
  • Durability

All or nothing
Preserve consistency of database
Transactions are independent
Once committed data is preserved
10
DataBase Management System (DBMS)
  • program that makes it possible to
  • create
  • use
  • maintain a database
  • It provides an interface / translation mechanism
    between the logical organisation of the data
    stored in the DB and the physical organisation of
    the data

11
Using a database
  • Two main functions of the DBMS
  • Query language - for people who are not
    programmer (greatest advantage of DB)
  • Data manipulation language - for programmers who
    want to modify the links between data elements
    within the DB
  • Also, Host Language - the language used by
    programmers to develop the rest of the
    application - eg Visual Basic for Applications
    (VBA) / Oracle developer 2000

12
Different types of DBs
  • creating the DB specifying the links between
    data items
  • different types of relationships can be specified
    - ie different logical views
  • they correspond to three main types of DBMSs
  • Hierarchical DBs
  • Network DBs
  • Relational DBs
  • Object Oriented DBs

13
Hierarchical DBs
  • data item are related as Parent and Child in
    a tree-like structure
  • parent means data item is higher in the tree
    than child and connected to it
  • one parent can have more than one child, but
    one child can only have one parent
  • most common platform IBMs Information
    Management System (IMS)

14
Example
Customers
Orders
Payments
Items
Currency
Unit of packaging
Substitution Product
Very fast retrieval
15
Undesirable side effects
  • Insertion of record
  • dependent record cannot be added without a parent
  • eg units of packaging cannot be added without
    linkage to an existing item
  • Deletion of record
  • deletion of a parent deletes all children
  • deleting an existing item will delete its
    replacement items
  • Impossible to have two parents trouble

16
Network DBs
  • same as parent and children in Hierarchical DB,
    but children can have more than one parent
  • It is also possible to link items upwards to
    other items parents
  • practically, it means that the DBMS is more
    flexible for data retrieval

17
Example
Suppliers
Customers
Orders
Payments
Items
Currency
Unit of packaging
Substitution Product
18
Relational DBs
  • Data items stored in tables
  • Specific fields in tables related to other field
    in other tables (joint)
  • infinite number of possible viewpoints on the
    data (queries)
  • Highly flexible DB but overly slow for complex
    searches
  • Oracle, SyBase, Ingres, Access, Paradox for
    Windows...

19
Describing relationships
  • Attempt at modelling the business elements
    (entities) and their relationships (links)
  • Can be based on users descriptions of the
    business processes
  • Specifies dependencies between the data items
  • Coded in an Entity-Relationship Diagram (ERD)

20
Types of Relationships
  • one-to-one one instance of one data item
    corresponds to one instance of another
  • one-to-many one instance to many instances
  • many-to-many many instance correspond to many
    instances
  • Also some relationships may be
  • compulsory
  • optional

21
Example
  • Student registering system
  • What are the entities?
  • What type of relationship do they have?
  • Draw the diagram

22
Entity Relationship Diagram
23
Next step - creating the data structure
  • Few rules - a lot of experience
  • Can get quite complex (paramount for the speed of
    the DB)
  • Tables must be normalised - ie redundancy is
    limited to the strict minimum by an algorithm
  • In practice, normalisation is not always the best

24
Data Structure Diagrams
  • Describe the underlying structure of the DB the
    complete logical structure
  • Data items are stored in tables linked by
    pointers
  • attribute pointers data fields in one table that
    will link it to another (common information)
  • logical pointers specific links that exist
    between tables
  • Tables have a key
  • If an attribute seems to belong to a relationship
    rather than an attribute, it may mean an
    associative entity must be added

25
ORDER order number Item description Item
Price Quantity ordered Customer number Item number
Customer Customer number Customer name Customer
address Customer balance Customer special rate
1
2
3
4
Item Item number Item description Item
cost Quantity on hand
compulsory attributes 0 optional attributes
26
Definitions
  • Entity
  • Attributes
  • Instance(s)
  • Domain
  • Key (candidate primary and foreign)

27
Definitions
  • Relationship
  • Ordinality
  • Cardinality
  • Associative Entity

28
Some test questions
  • Is it a bird is it a plane?
  • Is it an entity or an attribute?

29
Normalisation
  • Process of simplifying the relationships amongst
    data items as much as possible (see example
    provided - handout)
  • Through an iterative process, structure of data
    is refined to 1NF, 2NF, 3NF etc.
  • Reasons for normalisation
  • to simplify retrieval (speed of response)
  • to simplify maintenance (updates, deletion,
    insertions)
  • to reduce the need to restructure the data for
    each new application

30
First Normal Form
  • design record structure so that each record looks
    the same (same length, no repeating groups)
  • repetition within a record means one relation was
    missed create new relation
  • elements of repeating groups are stored as a
    separate entity, in a separate table
  • normalised records have a fixed length and
    expanded primary key

31
Second Normal Form
  • Record must be in first normal form first
  • each item in the record must be fully dependent
    on the key for identification
  • Functional dependency means a data items value
    is uniquely associated with anothers
  • only on-to-one relationship between elements in
    the same file
  • otherwise split into more tables

32
Third normal form
  • to remove transitive dependencies
  • when one item is dependent on an item which is
    dependent from the key in the file
  • relationship is split to avoid data being lost
    inadvertently
  • this will give greater flexibility for the design
    of the application eliminate deletion problems
  • in practice, 3 NF not used all the time - speed
    of retrieval can be affected

33
Beyond data modeling
  • Model must be normalised purpose ?
  • Outcome is a set of tables logical design
  • Then, design can be warped until it meets the
    realistic constraints of the system
  • Eg what business problem are we trying to solve?
    see handout riccardi p. 113, 127

34
Realistic constraints
  • Users cannot cope with too many tables
  • Too much development required in hiding complex
    data structure
  • Too much administration
  • Optimisation is impossible with too many tables
  • Actually RDBs can be quite slow!

35
Key practical questions
  • What are the most important tasks that the DB
    MUST accomplish efficiently?
  • How must the DB be rigged physically to address
    these?
  • What coding practices will keep the coding clean
    and simple?
  • What additional demands arise from the need for
    resilience and security?

36
Analysis - Three Levels of Schema
External Schema 2
External Schema
External Schema 1
Tables
Logical Schema
Disk Array
Internal Schema
37
4 way trade-off
Security
Ease of use
Performance
Clarity of code
38
Key decisions
  • Oracle offers many different ways to do things
  • Indexes
  • Backups
  • Good analysis is not only about knowing these gt
    understanding whether they are appropriate
  • Failure to think it through gt unworkable model
  • Particularly, predicting performance must be done
    properly
  • Ok on the technical side, tricky on the business
    side

39
Design optimisation
  • Sources of problems
  • Network traffic
  • Excess CPU usage
  • But physical I/O is greatest threat (different
    from physical I/O)
  • Disks still the slowest in the loop
  • Solution minimise or re-schedule access
  • Also try to minimise the impact of Q4 (e.g.
    mirroring, internal consistency checks)

40
Creating links between the tables
  • use common fields to join tables / queries
  • very easy when data is properly normalised
  • Gives total flexibility in terms of data
    retrieval
  • Main strength of RDBs (SQL)

41
Structured Query Language
  • used for defining and manipulating data in
    Relational DBs
  • aimed at
  • reducing training costs
  • increasing productivity
  • improve application portability
  • increase application longevity
  • reduce dependency on single vendors
  • enable cross systems communication
  • In practice, SQLs can be a bit different

42
Querying RDBs with SQL
  • use a form of pseudo english to retrieve data in
    a view (which looks like a table)
  • syntax is based on a number of clauses
  • Select specifies what data elements will be
    included in the view
  • From lists the tables involved
  • Where specifies conditions to filter the data
  • specific values sought
  • links between tables

43
Example with one table
  • find the name and address of customer number 1217

44
Example with a range
  • find the items which are priced between 50 and
    15000

45
Example with two tables
  • find the rep name of all customers

46
Example with two tables
  • same for customer Robson only

47
Use of a Search Condition - nested queries
  • find the name and address of the customer who
    ordered order 110

48
Additional syntax
  • Add computation in the select statement
  • select SUM(price)
  • select AVG(price), MAX, MIN, COUNT
  • Simplify comparisons with a BETWEEN clause and
    LIKE clause (with , ?)
  • Add sorting instruction after the where clause
  • ORDER BY name (alphabetical)
  • ORDER BY price (ascending)
  • Provide aggregate information by grouping data
  • GROUP BY customer

49
  • find contents (item and description) of order
    110

50
  • find the average price of the cars for sale
  • find the average price of all orders taken so far
    by customer Jones

51
  • find how much cash customer Barry has generated
    in total

52
find the average price of all orders taken so far
53
(No Transcript)
54
Oracle Demo Set - Sales Order Processing
CUSTOMER TABLE
SALES_ORDER TABLE
PRODUCT TABLE
ITEM TABLE
PRICE TABLE
55
DEPARTMENT TABLE
EMPLOYEE TABLE
LOCATION TABLE
Oracle Demo Set - Employee Data
JOB TABLE
Write a Comment
User Comments (0)
About PowerShow.com