Prof. Ray Larson - PowerPoint PPT Presentation

About This Presentation
Title:

Prof. Ray Larson

Description:

Identify the characteristics of each data element. Numeric. Text. Date/time. Etc. ... Fifth Normal Form (5NF) 2003.10.02 - SLIDE 34. IS 202 FALL 2003. Normalization ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 69
Provided by: ValuedGate1
Category:
Tags: element | fifth | larson | prof | ray

less

Transcript and Presenter's Notes

Title: Prof. Ray Larson


1
Lecture 12 Database Design
SIMS 202 Information Organization and Retrieval
  • Prof. Ray Larson Prof. Marc Davis
  • UC Berkeley SIMS
  • Tuesday and Thursday 1030 am - 1200 pm
  • Fall 2003
  • http//www.sims.berkeley.edu/academics/courses/is2
    02/f03/

2
Lecture Overview
  • Review
  • Databases and Database Design
  • Database Life Cycle
  • ER Diagrams
  • Database Design
  • Normalization
  • Discussion Questions

3
Lecture Overview
  • Review
  • Databases and Database Design
  • Database Life Cycle
  • ER Diagrams
  • Database Design
  • Normalization
  • Discussion Questions

4
Models (1)
5
Database System Life Cycle
6
Another View of the Life Cycle
Integration 4
Operations 5
Design 1
Physical Creation 2
Conversion 3
Growth, Change 6
7
Database Design Process
8
Entity
  • An Entity is an object in the real world (or even
    imaginary worlds) about which we want or need to
    maintain information
  • Persons (e.g. customers in a business,
    employees, authors)
  • Things (e.g. purchase orders, meetings, parts,
    companies)

Employee
9
Attributes
  • Attributes are the significant properties or
    characteristics of an entity that help identify
    it and provide the information needed to interact
    with it or use it (This is the Metadata for the
    entities)

10
Relationships
  • Relationships are the associations between
    entities
  • They can involve one or more entities and belong
    to particular relationship types
  • One to One
  • One to Many
  • Many to Many

11
Relationships
12
Types of Relationships
  • Concerned only with cardinality of relationship

1
1
n
1
n
m
Chen ER notation
13
More Complex Relationships
1/1/1
1/n/n
n/n/1
SSN
Project
Date
1
4(2-10)
Manages
1
Is Managed By
n
14
Weak Entities
  • Owe existence entirely to another entity

15
Supertype and Subtype Entities
16
Many to Many Relationships
Employee
17
Lecture Overview
  • Review
  • Databases and Database Design
  • Database Life Cycle
  • ER Diagrams
  • Database Design
  • Normalization
  • Discussion Questions

18
Database Design Process
19
Database Design Process
Application 1
Application 2
Application 3
Application 4
External Model
External Model
External Model
External Model
Application 1
Conceptual requirements
Application 2
Conceptual Model
Logical Model
Conceptual requirements
Internal Model
Application 3
Conceptual requirements
Application 4
Conceptual requirements
20
Requirements Analysis
  • Conceptual Requirements
  • Systems Analysis Process
  • Examine all of the information sources used in
    existing applications
  • Identify the characteristics of each data element
  • Numeric
  • Text
  • Date/time
  • Etc.
  • Examine the tasks carried out using the
    information
  • Examine results or reports created using the
    information

21
Database Design Process
Application 1
Application 2
Application 3
Application 4
External Model
External Model
External Model
External Model
Application 1
Conceptual requirements
Application 2
Conceptual Model
Logical Model
Conceptual requirements
Internal Model
Application 3
Conceptual requirements
Application 4
Conceptual requirements
22
Conceptual Design
  • Conceptual Model
  • Merge the collective needs of all applications
  • Determine what Entities are being used
  • Some object about which information is to
    maintained
  • What are the Attributes of those entities?
  • Properties or characteristics of the entity
  • What attributes uniquely identify the entity
  • What are the Relationships between entities
  • How the entities interact with each other?

23
Developing a Conceptual Model
  • Overall view of the database that integrates all
    the needed information discovered during the
    requirements analysis
  • Elements of the Conceptual Model are represented
    by diagrams, Entity-Relationship or ER Diagrams,
    that show the meanings and relationships of those
    elements independent of any particular database
    systems or implementation details
  • Can also be represented using other modeling
    tools (such as UML)

24
Database Design Process
Application 1
Application 2
Application 3
Application 4
External Model
External Model
External Model
External Model
Application 1
Conceptual requirements
Application 2
Conceptual Model
Logical Model
Conceptual requirements
Internal Model
Application 3
Conceptual requirements
Application 4
Conceptual requirements
25
Logical Design
  • Logical Model
  • How is each entity and relationship represented
    in the Data Model of the DBMS
  • Hierarchic?
  • Network?
  • Relational?
  • Object-Oriented?

26
Database Design Process
Application 1
Application 2
Application 3
Application 4
External Model
External Model
External Model
External Model
Application 1
Conceptual requirements
Application 2
Conceptual Model
Logical Model
Conceptual requirements
Internal Model
Application 3
Conceptual requirements
Application 4
Conceptual requirements
27
Physical Design
  • Internal Model
  • Choices of index file structure
  • Choices of data storage formats
  • Choices of disk layout

28
Database Design Process
Application 1
Application 2
Application 3
Application 4
External Model
External Model
External Model
External Model
Application 1
Conceptual requirements
Application 2
Conceptual Model
Logical Model
Conceptual requirements
Internal Model
Application 3
Conceptual requirements
Application 4
Conceptual requirements
29
Database Application Design
  • External Model
  • User views of the integrated database
  • Making the old (or updated) applications work
    with the new database design

30
Terms and Concepts
  • Key
  • An attribute or set of attributes used to
    identify or locate records in a file
  • Primary Key
  • An attribute or set of attributes that uniquely
    identifies each record in a file
  • Candidate Key
  • An attribute or set of attributes that might be
    used as a primary key

31
Lecture Overview
  • Review
  • Databases and Database Design
  • Database Life Cycle
  • ER Diagrams
  • Database Design
  • Normalization
  • Discussion Questions

32
Normalization
  • Normalization theory is based on the observation
    that relations with certain properties are more
    effective in inserting, updating and deleting
    data than other sets of relations containing the
    same data
  • Normalization is a multi-step process beginning
    with an unnormalized relation

33
Normal Forms
  • First Normal Form (1NF)
  • Second Normal Form (2NF)
  • Third Normal Form (3NF)
  • Boyce-Codd Normal Form (BCNF)
  • Fourth Normal Form (4NF)
  • Fifth Normal Form (5NF)

34
Normalization
35
Unnormalized Relations
  • First step in normalization is to convert the
    data into a two-dimensional table
  • In unnormalized relations data can repeat within
    a column
  • (The following is a highly contrived example that
    actually bears only a slight resemblance to the
    current implementation of the Phone/Photo project
    database)

36
Unnormalized Relations
37
First Normal Form
  • To move to First Normal Form a relation must
    contain only atomic values at each row and column
  • No repeating groups
  • A column or set of columns is called a Candidate
    Key when its values can uniquely identify the row
    in the relation

38
First Normal Form
39
1NF Storage Anomalies
  • Insertion A new person has not yet taken a
    picture -- hence no Picture -- Since Picture
    is part of the key we cant insert
  • Insertion If People is are known and likely to
    be photographed, but havent been yet -- there is
    be no way to include that person in the database
  • Update If a Person changes status (e.g. Mary
    Jones becomes a Student) we have to change
    multiple rows in the database
  • Deletion (type 1) Deleting a Person record may
    also delete all info about People in the pictures
  • Deletion (type 2) When there are functional
    dependencies (like Object and Object_features)
    changing one item eliminates other information

40
Second Normal Form
  • A relation is said to be in Second Normal Form
    when every nonkey attribute is fully functionally
    dependent on the primary key
  • That is, every nonkey attribute needs the full
    primary key for unique identification

41
Second Normal Form
Person Table
42
Second Normal Form
People Table
43
Second Normal Form
Picture Table
44
1NF Storage Anomalies Removed
  • Insertion Can now enter new Persons who havent
    yet taken pictures
  • Insertion Can now enter People who havent been
    photographed
  • Deletion (type 1) If Charles Brown withdraws his
    photos the corresponding tuples from Person and
    Picture tables can be deleted without losing
    information on David Rosen
  • Update If John White takes a third picture, and
    has changed status (e.g., graduate), we only need
    to change the Person table in one place

45
2NF Storage Anomalies
  • Insertion Cannot enter the fact that a
    particular object has a particular feature unless
    it is associated with a particular picture
  • Deletion If John White describes some other
    object that Beth Little has while shopping, we
    lose the fact that the bookbag is blue
  • Update If the features of an object change
    change we have to update multiple occurrences of
    object features

46
Third Normal Form
  • A relation is said to be in Third Normal Form if
    there are no transitive functional dependencies
    between nonkey attributes
  • When one nonkey attribute can be determined with
    one or more nonkey attributes there is said to be
    a transitive functional dependency
  • The Obect_Feature column in the Picture table is
    determined by the Object
  • Object_Feature is transitively functionally
    dependent on Object so Picture is not 3NF

47
Third Normal Form
Picture Table
48
Third Normal Form
Object Table
49
2NF Storage Anomalies Removed
  • Insertion We can now enter the fact that an
    object has a particular feature
  • Deletion If John White describes some other
    object that Beth Little has while shopping, we
    dont lose the fact that the bookbag is blue
  • Update The features for each object appear only
    once

50
Boyce-Codd Normal Form
  • Most 3NF relations are also BCNF relations
  • A 3NF relation is NOT in BCNF if
  • Candidate keys in the relation are composite keys
    (they are not single attributes)
  • There is more than one candidate key in the
    relation, and
  • The keys are not disjoint, that is, some
    attributes in the keys are common

51
Most 3NF Relations Are Also BCNF Is This One?
52
BCNF Relations
53
Additional Issues
  • Why separate Person and People?
  • They are really all People/Persons in different
    roles
  • Shouldnt a picture have a unique ID regardless
    of Who is in it?
  • Cant we have multiple people in the same
    picture, multiple objects, etc.?
  • Cant objects have multiple characteristics?

54
BCNF Relations
55
BCNF Added Capabilities
  • Can now have a picture with no (identified)
    people in it
  • Can have multiple objects, activities, and people
    associated with each picture

56
Fourth Normal Form
  • Any relation is in Fourth Normal Form if it is
    BCNF and any multivalued dependencies are trivial
  • Eliminate non-trivial multivalued dependencies by
    projecting into simpler tables

57
Fifth Normal Form
  • A relation is in 5NF if every join dependency in
    the relation is implied by the keys of the
    relation
  • Implies that relations that have been decomposed
    in previous NF can be recombined via natural
    joins to recreate the original relation

58
Fifth Normal Form Relations
People Table
59
Normalizing to Death
  • Normalization splits database information across
    multiple tables
  • To retrieve complete information from a
    normalized database, the JOIN operation must be
    used
  • JOIN tends to be expensive in terms of processing
    time, and very large joins are very expensive

60
Lecture Overview
  • Review
  • Databases and Database Design
  • Database Life Cycle
  • ER Diagrams
  • Database Design
  • Normalization
  • Discussion Questions

61
Questions Brooke Maury
  • Discussion Questions on Hoffer McFadden
  • If the goal of the relational database model is
    to encode a conceptual design into a logical
    design, is it possible that improved technology
    and the development of new modeling techniques
    will supplant the RDBMS? Specifically, what
    impact will XML and the development of document
    engineering have on organizing information in
    multiple normalized tables?
  • Conversely, what does the relational model have
    that would be lost if a conceptual design was
    encoded in another model?

62
Questions Brooke Maury
  • The drive to develop the RDBM was in part
    motivated by a need to minimize the space
    required and improve the performance of database
    systems by removing redundancies. What impact
    will very inexpensive data storage and computing
    power have on the relational database model and
    the third normal form especially?

63
Questions Shane Ahern
  • Discussion Questions for "Logical Database Design
    and the Relational Model"
  • Is the normalization process described really
    necessary? When I design a database schema, I
    find that by thinking of tables in terms of they
    entities they represent (employees, sales,
    events), I avoid most of the problems of
    normalization that the process seeks to address
    (i.e. salesperson and region in Sales table,
    salesperson is clearly a distinct entity from
    sales). If the formal process described in the
    article is not followed, are there potential
    pitfalls that might lead to problems with your
    database schema?

64
Questions Shane Ahern
  • The article points out that "the relational model
    does not yet directly support supertype/subtype
    relationships." Once the tables in a relational
    database have been decomposed to third normal
    form, the database is efficient from systems
    point-of-view, but the tables no longer represent
    a representation of the data that is intuitive to
    humans. The object-oriented model more accurately
    mirrors the way we think about the concepts that
    we wish to store in databases. So perhaps
    object-oriented database systems are worth
    considering. What about XML databases?

65
Questions Arthur Law
  • The three models that we have been presented
    with, Entity Relationship Model, NIAM Model, and
    Object Oriented Model all enforce a specific
    thought process in the organization and
    relationship between items in a database. With
    all of our recent discussion of computers
    understanding natural language are these methods
    now out of date with how we should be organizing
    information? Should we use artificial
    intelligence or learning algorithms to
    statistically determine the relationship between
    entities or is there still value in using these
    models?

66
Questions Arthur Law
  • Each model is approximately one decade apart in
    development and a quick Google search shows that
    companies are using databases with one of the
    three models. However, as new models arise there
    doesn't seem too much interest in migrating from
    one data model to another. Which makes sense
    given that an organization using a given model
    probably finds that it works. Now with the
    proliferation of XML, we see more information
    being shared between organizations, so are we
    fated for an expensive and lengthy translation
    process between databases? Or should all DB
    administrators be responsible for upgrading to
    the latest model?

67
Lecture Overview
  • Review
  • Databases and Database Design
  • Database Life Cycle
  • ER Diagrams
  • Database Design
  • Normalization
  • Discussion Questions
  • Next Time/Readings

68
Next Time
  • Guest Lecture Bob Glushko on XML and Document
    Engineering
  • Readings on Class website
  • No assigned discussion questions (but bring your
    questions on the readings)
Write a Comment
User Comments (0)
About PowerShow.com