Database Systems

1 / 168
About This Presentation
Title:

Database Systems

Description:

Home loan system. Checking Acct. System. Saving Acct. System. Securities Acct. System ... Applications. Securities Acct. Apps. Checking File. Savings File ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 169
Provided by: hleven

less

Transcript and Presenter's Notes

Title: Database Systems


1
Database Systems
2
Management Challenges
  • Organizational obstacles to a database
    environment
  • Integrating data and ensuring quality

3
OS Support for Data Management
  • Data can be stored in RAM
  • this is what every programming language offers!
  • RAM is fast, and random access
  • Every OS includes a File System
  • manages files on a magnetic disk
  • allows open, read, seek, close on a file
  • allows protections to be set on a file
  • drawbacks relative to RAM?

4
COMMUNICATIONS
PROCESSING
Control
A/L
Register
OUTPUT
INPUT
Memory
Secondary STORAGE
5
Memory Hierarchy
Register
Cache
MEMORY
RAM
SECONDARY STORAGE
Magnetic Tapes Magnetic Disks Optical Disks Zip,
PCMCIA
6
File Organization
  • Bit Smallest unit of data binary digit (0,1)
  • Byte Group of bits that represents a single
    character
  • Field Group of words or complete number
  • Record Group of related fields
  • File Group of records of the same type

7
File Organization
  • FILE
  • A Collection of Similar RECORDS
  • Contains data about similar objects
  • DATABASE
  • An Organizations Electronic Library of FILES
  • A repository containing organizational data

8
File Organization
  • Entity person, place, thing, event about which
    data must be kept
  • Attribute description of a particular entity
  • Key field field used to retrieve, update, sort
    record

9
Entities and Attributes
10
Key field
  • Field in each record
  • Uniquely identifies this record for
  • Retrieval
  • Updating
  • Sorting

11
Sequential vs. Direct File Organization
  • Sequential
  • Tape oriented
  • One record follows another
  • Follows physical sequence
  • Direct
  • Disk oriented
  • Can be accessed without regard to physical
    sequence

12
Direct File Access-Indexing
  • Indexed sequential access method (ISAM)
  • Each record identified by key
  • Grouped in blocks and cylinders
  • Keys in index
  • Virtual storage access method (VSAM)
  • Memory divided into areas and intervals
  • Dynamic file space
  • Btree widely used for relational databases
  • Direct file access method

13
The Indexed Sequential Access Method
14
Direct or Random File Access Method
  • Each record has key field
  • Key field fed into transform algorithm
  • Algorithm generates physical storage location of
    record (record address)

15
The Direct File Access Method
16
Traditional File Environment (Flat File)
  • Data redundancy The presence of duplicate data
    in multiple data files so that the same data are
    stored in more than one place or location
  • Data inconsistency The same attribute may have
    different values.

17
Traditional File Environment (Flat File)
  • Program-data dependence
  • The coupling of data stored in files and the
    specific programs required to update and maintain
    those files
  • changes in programs require changes to the data
  • Lack of flexibility
  • can deliver routine scheduled reports after
    extensive programming efforts, but
  • it cannot deliver ad-hoc reports or
  • respond to unanticipated information requirements
    in a timely fashion.

18
Traditional File Environment (Flat File)
  • Poor security
  • little control or management of data,
  • management will have no knowledge of
  • who is accessing or
  • even making changes to the organizations data.
  • Lack of data sharing and availability
  • Information cannot flow freely across different
    functional areas or different parts of the
    organization.
  • Users find different values of the same piece of
    information in two different systems, and
  • hence they may not use these systems because they
    cannot trust the accuracy of the data.

19
Traditional File Processing
20
Example A Bank
Checking File
Reports
Savings File
Reports
Investment File
Reports
Loan File
Reports
Home loan system
21
Example A Bank
Checking Acct. Applications.
D B M S
Saving Acct. Applications
Securities Acct. Apps
Ad hoc Applications
By using a DBMS, the Bank can quickly and easily
create customer profiles
22
Database
  • Collection of centralized data
  • Controls redundant data
  • Data stored so as to appear to users in one
    location
  • Services multiple applications

23
The Contemporary Database Environment
24
Database Management System (DBMS)
  • Software for creating and maintaining databases
  • Permits firms to rationally manage data for the
    entire firm
  • Acts as interface between application programs
    and physical data files
  • Separates logical and design views of data
  • Solves many problems of the traditional data file
    approach
  • Examples
  • Oracle
  • DB2
  • Microsoft SQL Server
  • Sybase

25
DBMS Features
  • Optimize Queries
  • Manage memory
  • Control concurrent data access

26
Elements of Database Environment
27
DB Stakeholders
  • DBMS vendors, programmers
  • Oracle, IBM, Microsoft, Sybase, Informix,
  • End users in many fields
  • Business, education, science,
  • DB application programmers
  • Build data entry and analysis tools on top of
    DBMSs
  • Build web services that run off DBMSs
  • Database administrators (DBAs)
  • Design logical/physical schemas
  • Handle security and authorization
  • Data availability, crash recovery
  • Database tuning as needs evolve

28
Database Administration
  • Defines and organizes database structure and
    content
  • Develops security procedures
  • Develops database documentation
  • Maintains DBMS

29
Three Components to a DBMS
  • Data definition language Specifies content and
    structure of database and defines each data
    element
  • Data manipulation language Used to process data
    in a database, e.g. SQL
  • Data dictionary Stores definitions of data
    elements and data characteristics

30
Advantages of DBMS
  • Reduces development / maintenance costs
  • Enhances system flexibility
  • Increases access / availability of information
  • Allows central management of data, data use, and
    security
  • Improved data security
  • Improved data integrity
  • Uncouples programs from data
  • Improved programmer productivity
  • Improved data sharing
  • Reduces data redundancy
  • Eliminates data inconsistency

31
Do not use a DBMS when
  • The initial investment in hardware, software, and
    training is too high
  • The generality a DBMS provides is not needed
  • The overhead for security, concurrency control,
    and recovery is too high
  • Data and applications are simple and stable
  • Real-time requirements cannot be met by it
  • Multiple user access is not needed

32
Databases Model the Real World
  • Data Model allows us to translate real world
    objects into structures computers can store
  • Many models
  • Relational,
  • E-R,
  • O-O,
  • Network,
  • Hierarchical, etc.
  • Relational
  • Rows and Columns
  • Keys and Foreign Keys to link Relations

33
Hierarchical DBMS
  • Organizes data in a tree-like structure
  • Supports one-to-many parent-child relationships
  • Prevalent in large legacy systems

34
Hierarchical Data Model
35
Pointer
  • Field in one record is address of next record in
    sequence

36
Types of Relations
37
Network Data Model
  • Depicts data logically as many-to-many
    relationships
  • A coordination variation of hierarchical model
  • Useful for many-to-many relationships

38
The Relational Data Model
39
Object-Oriented Model(s)
  • Based on the object-oriented paradigm,
  • e.g., C, Java, smalltalk
  • Object-oriented DBMS Stores data and procedures
    as objects that can be retrieved and shared
    automatically
  • Object-relational DBMS Provides capabilities of
    both object-oriented and relational DBMS

40
Comparison of Database Alternatives
  • Hierarchical
  • Processing Efficiency High
  • Flexibility Low
  • User Friendliness Low
  • Program Complexity High

41
Comparison of Database Alternatives
  • Network
  • Processing Efficiency Medium / High
  • Flexibility Low / Medium
  • User Friendliness Low / Moderate
  • Program Complexity High

42
Comparison of Database Alternatives
  • Relational
  • Processing Efficiency Low But Improving
  • Flexibility High
  • User Friendliness High
  • Program Complexity Low

43
Structure of a Relational DBMS
  • A typical DBMS has a layered architecture.
  • Concurrency control and recovery components not
    shown.
  • Each system has its own variations.

44
Describing Data Data Models
  • A data model is a collection of concepts for
    describing data.
  • A schema is a description of a particular
    collection of data, using the given data model.
  • The relational model of data is the most widely
    used model today.
  • Main concept relation, basically a table with
    rows and columns.
  • Every relation has a schema, which describes the
    columns, or fields.

45
Data Modeling
DATABASE SYSTEM
  • REALITY
  • structures
  • processes

MODEL
data modeling
  • The model represents a perception of structures
    of reality
  • The data modeling process is to fix a perception
    of structures of reality and represent this
    perception
  • In the data modeling process we select aspects
    and we abstract

46
Process Modeling
DATABASE SYSTEM
  • REALITY
  • structures
  • processes

MODEL
process modeling
  • The use of the model reflects processes of
    reality
  • Processes may be represented by programs with
    embedded database queries and updates
  • Processes may be represented by ad-hoc database
    queries and updates at run-time

DML
PROG
DML
47
Levels of Abstraction
  • Views describe how users see the data.
  • Conceptual schema defines logical structure
  • Physical schema describes the files and indexes
    used.
  • Sometimes called the ANSI/SPARC model

48
Example University Database
  • Conceptual schema
  • Students(sid string, name string, login
    string, age integer, gpareal)
  • Courses(cid string, cnamestring,
    creditsinteger)
  • Enrolled(sidstring, cidstring, gradestring)
  • Physical schema
  • Relations stored as unordered files.
  • Index on first column of Students.
  • External Schema (View)
  • Course_info(cidstring,enrollmentinteger)

49
Data Independence
  • Applications are insulated from how data is
    structured and stored.
  • Logical data independence Protection from
    changes in logical structure of data.
  • Physical data independence Protection from
    changes in physical structure of data.
  • Q Why is this particularly important for DBMS?
  • A Because databases and their associated
    applications persist.

50
Concurrency Control
  • Concurrent execution of user programs a key to
    good DBMS performance.
  • Disk accesses frequent, pretty slow
  • Keep the CPU working on several programs
    concurrently.
  • Interleaving actions of different programs
    trouble!
  • e.g., deposit and withdrawal on same account
  • DBMS ensures such problems dont arise
  • users can pretend they are using a single-user
    system. (called Isolation)

51
Transaction An Execution of a DB Program
  • Key concept is a transaction an atomic sequence
    of database actions (reads/writes).
  • Each transaction, executed completely, must take
    the DB between consistent states.
  • Users can specify simple integrity constraints on
    the data. The DBMS enforces these.
  • Beyond this, the DBMS does not understand the
    semantics of the data.
  • Ensuring that a single transaction (run alone)
    preserves consistency is ultimately the users
    responsibility!

52
Advanced DBMS Features
  • Integrity maintenance
  • Entity Integrity (not null, uniqueness)
  • Referential Integrity (foreign key references)

53
Advanced DBMS Features
  • Access Control
  • Grant and Revoke
  • Access to tables
  • Capability to perform operations (Insert, update,
    delete, etc.)

54
Desirable ACID Properties
  • Atomicity - Transactions are atomic
  • Consistency - Transformations preserve database
    consistency
  • Isolation - concurrent users updates dont
    interfere
  • Durability - Once a transaction commits, its
    update survives (a system crash, not a hard drive
    failure)

55
Ensuring Transaction Properties
  • DBMS ensures atomicity (all-or-nothing property)
    even if system crashes in the middle of a
    transaction.
  • DBMS ensures durability of committed transactions
    even if system crashes.
  • Idea Keep a log (history) of all actions carried
    out by the DBMS while executing a set of
    transactions
  • Before a change is made to the database, the
    corresponding log entry is forced to a safe
    location.
  • After a crash, using the log, the effects of
  • partially executed transactions are undone
  • committed transactions are redone.

56
Scheduling Concurrent Transactions
  • DBMS ensures that execution of T1, .. , Tn is
    equivalent to some serial execution T1 ... Tn.
  • Before reading/writing an object, a transaction
    requests a lock on the object, and waits until
    the DBMS gives it the lock. All locks are held
    until the end of the transaction.
  • Idea If an action of Ti (say, writing X) affects
    Tj (which perhaps reads X), one of them, say Ti,
    will obtain the lock on X first and Tj is forced
    to wait until Ti completes this effectively
    orders the transactions.
  • What if Tj already has a lock on Y and Ti later
    requests a lock on Y? (Deadlock!) Ti or Tj is
    aborted and restarted!

57
Deadlocks
  • When concurrent transactions hold locks on common
    objects
  • A and B want to update both x and y
  • A locks x, B locks y, neither can complete the
    transaction
  • Solution
  • Select one transaction as a victim and roll it
    back
  • Deadlock detection and resolution is built into
    DBMS

58
Database Design
  • Requirements Analysis
  • user needs what must the database do?
  • Conceptual Design
  • Abstract model of database from a business
    perspective (often done with ER model)
  • Logical Design
  • translate ER into DBMS data model
  • Schema Refinement
  • consistency, normalization
  • Physical Design
  • Detailed description of business information
    needs
  • indexes, disk layout
  • Security Design
  • who accesses what

59
Conceptual Design
  • What are the entities and relationships in the
    enterprise?
  • What information about these entities and
    relationships should we store in the database?
  • What are the integrity constraints or business
    rules that hold?
  • A database schema in the ER Model can be
    represented pictorially (ER diagrams).
  • Can map an ER diagram into a relational schema.

60
ER Model Basics
  • Entity-relationship diagram Methodology for
    documenting databases illustrating relationships
    between database entities
  • Entity Real-world object distinguishable from
    other objects.
  • An entity is described (in DB) using a set of
    attributes.
  • Entity Set A collection of similar entities.
    e.g., all employees.
  • All entities in an entity set have the same set
    of attributes.
  • Each entity set has a key (underlined).
  • Each attribute has a domain.

61
ER Model Basics
  • Relationship Association among two or more
    entities.
  • e.g., Ahmet works in CRM department.
  • relationships can have their own attributes.
  • Relationship Set Collection of similar
    relationships.
  • An n-ary relationship set R relates n entity sets
    E1 ... En
  • Each relationship in R involves entities e1? E1,
    ..., en ? En
  • Same entity set can participate in different
    relationship sets, or in different roles in the
    same set.

62
Key Constraints
  • Example
  • An employee can work in many departments
  • A department can have many employees.
  • In contrast, each department has at most one
    manager, according to the key constraint on
    Manages.

63
Conceptual Design Using the ER Model
  • Design choices
  • Should a concept be modeled as an entity or an
    attribute?
  • Should a concept be modeled as an entity or a
    relationship?
  • Identifying relationships Binary or ternary?
    Aggregation?
  • Constraints in the ER Model
  • A lot of data semantics can (and should) be
    captured.
  • But some constraints cannot be captured in ER
    diagrams.

64
Entity- Relationship Diagram
65
Entity vs. Attribute
  • Should address be an attribute of Employees or an
    entity (related to Employees)?
  • Depends upon how we want to use address
    information, and the semantics of the data
  • If we have several addresses per employee,
    address must be an entity (since attributes
    cannot be set-valued).
  • If the structure (city, street, etc.) is
    important, address must be modeled as an entity
    (since attribute values are atomic).

66
Relational Database Definitions
  • Relational database a set of relations.
  • Relation made up of 2 parts
  • Schema specifies name of relation, plus name
    and type of each column.
  • e.g. Students(sid string, name string, login
    string, age integer, gpa real)
  • Instance a table, with rows and columns.
  • rows cardinality
  • fields degree / arity
  • Can think of a relation as a set of rows or
    tuples.
  • i.e., all rows are distinct

67
Terminology related to relations
68
Example Instance of Students Relation
  • Cardinality 3, arity 5 , all rows distinct
  • Do all values in each column of a relation
    instance have to be distinct?

69
Data
  • Example
  • Products
  • ProductID, name, price, quantity
  • Orders
  • OrderID, ProductID, userID
  • Users
  • userID, name, credit card

70
The Table Perspective
  • Viewed by the user as
  • Rows records
  • Columns fields
  • Stored value cell
  • Relations are abstract tables
  • Hence the name relational databases
  • Describes users perspective, not the internal
    storage format

71
Mathematical Foundations
  • A relation is a collection of tuples
  • Usually a subset of the data
  • A tuple is a collection of attributes
  • ltA1vi1gt ltAn vingt
  • An attribute is a scalar
  • Each scalar has a domain

72
Scalars
  • Smallest syntactic unit of data
  • Atomic
  • No internal structure
  • Typical attribute types
  • Integer
  • Float
  • String
  • Time/Date

73
Properties of relations
  • There are no duplicate tuples
  • Tuples are unordered (Top to Bottom)
  • Attributes are unordered (Left to Right)
  • Implies that they are referenced by name, not
    position
  • All attribute values are atomic

74
Named Relations
  • Base relation references a non-derived relation
  • View refers to a named (derived) relation

75
Relational Operators
  • Operations that (conceptually) return a table!
  • Select (restrict)
  • Returns a subset of rows
  • Project
  • Returns a subset of fields
  • Join
  • Returns a combined set of tables
  • Division
  • Returns rows of A that contain all elements of B

76
Relational Query Languages
  • A major strength of the relational model
    supports simple, powerful querying of data.
  • Queries can be written intuitively.
  • The DBMS is responsible for efficient evaluation.
  • The key precise semantics for relational
    queries.
  • Allows the optimizer to extensively re-order
    operations, and still ensure that the answer does
    not change.

77
SQL
  • SQL (Sequel), standard language for Relational
    DBs
  • Standard Query Language
  • Developed in early 1970s by IBM
  • Now an international standard
  • SQL can be used (invoked) in a number of ways
  • Interactive
  • Embedded in various applications both static and
    dynamic
  • SQL is not 100 in accordance with what is
    theoretically correct
  • Current std is SQL99, SQL92 is a basic subset

78
Components of SQL
  • Data Definition Language (DDL)
  • create, modify, delete relations
  • specify constraints
  • administer users, security, etc.
  • Data Manipulation Language (DML)
  • Specify queries to find tuples that satisfy
    criteria
  • add, modify, remove tuples

79
Three Basic Operations in a Relational Database
  • Select Creates subset of rows that meet specific
    criteria
  • Join Combines relational tables to provide users
    with information
  • Project Enables users to create new tables
    containing only relevant information

80
Project Operation
81
SQL Overview
  • CREATE TABLE ltnamegt ( ltfieldgt ltdomaingt, )
  • INSERT INTO ltnamegt (ltfield namesgt)
  • VALUES (ltfield valuesgt)
  • DELETE FROM ltnamegt
  • WHERE ltconditiongt
  • UPDATE ltnamegt
  • SET ltfield namegt ltvaluegt
  • WHERE ltconditiongt
  • SELECT ltfieldsgt
  • FROM ltnamegt
  • WHERE ltconditiongt

82
Creating Relations in SQL
  • Creates the Students relation.
  • Note the type (domain) of each field is
    specified, and enforced by the DBMS whenever
    tuples are added or modified.
  • CREATE TABLE Students
  • (sid CHAR(20),
  • name CHAR(20),
  • login CHAR(10),
  • age INTEGER,
  • gpa FLOAT)

83
Table Creation (continued)
  • Another example the Enrolled table holds
    information about courses students take.
  • CREATE TABLE Enrolled
  • (sid CHAR(20),
  • cid CHAR(20),
  • grade CHAR(2))

84
Adding and Deleting Tuples
  • Can insert a single tuple using
  • INSERT INTO Students (sid, name, login, age, gpa)
  • VALUES (2001100, Ali, alik, 18, 1.89)
  • Can delete all tuples satisfying some condition
    (e.g., name Ali)
  • DELETE
  • FROM Students S
  • WHERE S.name Ali

85
Keys
  • Keys are a way to associate tuples in different
    relations
  • Keys are one form of integrity constraint (IC)
  • PRIMARY Key
  • FOREIGN Key

86
Primary Keys
  • A set of fields is a superkey if
  • No two distinct tuples can have same values in
    all key fields
  • A set of fields is a key for a relation if
  • It is a superkey
  • No subset of the fields is a superkey
  • what if there are many keys for a relation?
  • one of the keys is chosen (by DBA) to be the
    primary key. Other keys are called candidate
    keys.
  • e.g.
  • sid is a key for Students.
  • What about name?
  • The set sid, gpa is a superkey.

87
Primary and Candidate Keys in SQL
  • Possibly many candidate keys (specified using
    UNIQUE), one of which is chosen as the primary
    key.
  • Keys must be used carefully!
  • For a given student and course, there is a
    single grade.
  • CREATE TABLE Enrolled
  • (sid CHAR(20)
  • cid CHAR(20),
  • grade CHAR(2),
  • PRIMARY KEY (sid,cid))

88
Foreign Keys, Referential Integrity
  • Foreign key Set of fields in one relation that
    is used to refer to a tuple in another
    relation.
  • Must correspond to the primary key of the other
    relation.
  • Like a logical pointer.
  • If all foreign key constraints are enforced,
    referential integrity is achieved (i.e., no
    dangling references.)

89
Foreign Keys in SQL
  • Only students listed in the Students relation
    should be allowed to enroll for courses.
  • sid is a foreign key referring to Students
  • CREATE TABLE Enrolled
  • (sid CHAR(20),cid CHAR(20),grade CHAR(2),
  • PRIMARY KEY (sid,cid),
  • FOREIGN KEY (sid) REFERENCES Students )

90
Enforcing Referential Integrity
  • Consider Students and Enrolled
  • sid in Enrolled is a foreign key that references
    Students.
  • What should be done if an Enrolled tuple with a
    non-existent student id is inserted? (Reject it!)
  • What should be done if a Students tuple is
    deleted?
  • Also delete all Enrolled tuples that refer to it?
  • Disallow deletion of a Students tuple that is
    referred to?
  • Set sid in Enrolled tuples that refer to it to a
    default sid?
  • (In SQL, also Set sid in Enrolled tuples that
    refer to it to a special value null, denoting
    unknown or inapplicable.)
  • Similar issues arise if primary key of Students
    tuple is updated.

91
Integrity Constraints
  • IC condition that must be true for any instance
    of the database e.g., domain constraints.
  • ICs are specified when schema is defined.
  • ICs are checked when relations are modified.
  • A legal instance of a relation is one that
    satisfies all specified ICs.
  • DBMS should not allow illegal instances.
  • If the DBMS checks ICs, stored data are more
    faithful to real-world meaning.
  • Avoids data entry errors, too!

92
Where do ICs Come From?
  • ICs are based upon the semantics of the
    real-world that is being described in the
    database relations.
  • We can check a database instance to see if an IC
    is violated, but we can NEVER infer that an IC is
    true by looking at an instance.
  • An IC is a statement about all possible
    instances!
  • From example, we know name is not a key, but the
    assertion that sid is a key is given to us.
  • Key and foreign key ICs are the most common more
    general ICs supported, too.

93
Querying Multiple Relations
  • What is the meaning of the following query?
  • SELECT S.name, E.cid
  • FROM Students S, Enrolled E
  • WHERE S.sidE.sid AND E.gradeB'

94
Basic SQL Query
  • SELECT DISTINCT target-list
  • FROM relation-list
  • WHERE qualification
  • relation-list A list of relation names
  • possibly with a range-variable after each name
  • target-list A list of attributes of tables in
    relation-list
  • qualification Comparisons combined using AND,
    OR and NOT.
  • Comparisons are Attr op const or Attr1 op Attr2,
  • DISTINCT optional keyword indicating that the
    answer should not contain duplicates.
  • In SQL SELECT, the default is that duplicates are
    not eliminated! (Result is called a multiset)

95
Query Semantics
  • Semantics of an SQL query are defined in terms of
    the following conceptual evaluation strategy
  • 1. do FROM clause compute cross-product of
    tables (e.g., Students and Enrolled).
  • 2. do WHERE clause Check conditions, discard
    tuples that fail. (called selection).
  • 3. do SELECT clause Delete unwanted fields.
    (called projection).
  • 4. If DISTINCT specified, eliminate duplicate
    rows.
  • Probably the least efficient way to compute a
    query!
  • An optimizer will find more efficient strategies
    to get the same answer.

96
SQL Select and Project
  • Syntax
  • SELECT column_nameFROM table_nameWHERE
    where_clause
  • Example
  • SELECT FROM orders WHERE user_ID 1
  • SELECT name FROM products WHERE price lt 1.00

97
SQL More about WHERE
  • Comparisons
  • gt, gt, lt, lt,
  • Logical Operators (AND, OR, NOT)
  • SELECT name FROM products WHERE product_id gt
    1234 ANDproduct_id gt 1237
  • Between
  • WHERE product_ID BETWEEN 1234 AND 1237

98
SQL Join
  • Merges tables that have common column(s)
  • A new row for every pair of rows with the common
    value in the field
  • Typically operates on Foreign/candidate keys
  • Example
  • Users JOIN Orders

99
SQL Select with (Implicit) Join
  • Example Suppose we wanted the names of everyone
    who ordered carrots
  • SELECT name FROM orders, users WHERE
    orders.user_ID users.user_ID AND product_ID
    1234

100
SQL 3-way Join
  • But this assumes we know carrots are product
    1234!,
  • SELECT users.name FROM orders, users, products
    WHERE orders.user_ID users.user_IDAND
    products.product_ID orders.product_IDAND
    product.name carrot

101
SQL In
  • Another way to achieve the previous request
  • SELECT name FROM usersWHERE user_ID in(SELECT
    user_ID FROM ordersWHERE product_ID (SELECT
    product_IDFROM products WHERE name carrot))

102
Functions
  • Min, Max
  • SELECT MAX (price) FROM products
  • Count
  • SELECT COUNT (product_ID) FROM products WHERE
    price gt 1.00
  • Average
  • SELECT AVG(price) FROM products

103
Other Elements of Select
  • Distinct
  • SELECT DISTINCT user_ID FROM orders
  • Order by
  • SELECT name FROM products ORDER BY name
  • Group by, having

104
Inserting New Data
  • Syntax
  • INSERT INTO table_name (column_names) VALUES
    (values)
  • Example
  • INSERT INTO users (user_ID, name, credit_card)
    VALUES (4, David, 234-55-6776)

105
SQL Select All
  • To select all columns
  • SELECT FROM orders

106
SQL In
  • Set membership operator
  • Another way to achieve the previous request
  • SELECT name FROM usersWHERE user_ID IN(SELECT
    user_ID FROM ordersWHERE product_ID IN (SELECT
    product_IDFROM products WHERE name carrot))

107
SQL In
  • Notice the nesting of SELECT statements
  • In general, the IN subquery works only if the
    relation returned is the same attribute
  • i.e. belongs to the same table
  • Is a foreign key to the table
  • Orders.user_ID is a foreign key referring to
    users.user_ID

108
SQL Like
  • To perform a pattern match search in character
    fields
  • SELECT FROM users WHERE name LIKE A
  • Two pattern matching symbols
  • represents any sequence of zero or more
    characters
  • _ (underscore) represents a single character

109
Inserting New Data
  • Syntax
  • INSERT INTO table_name (column_names) VALUES
    (values)
  • Example
  • INSERT INTO users (user_ID, name, credit_card)
    VALUES (4, David, 234-55-6776)
  • Can drop the explicit column references if they
    will be supplied in order defined

110
SQL Update
  • Modifies existing data
  • UPDATE table_name SET column_name some_value
    WHERE where_clause
  • The most dangerous command in SQL!
  • Example
  • UPDATE users SET credit_card 122-12-1212
    WHERE user_ID 1

111
Delete
  • Syntax
  • DELETE FROM table_name WHERE where_clause
  • Example
  • DELETE FROM products WHERE product_ID 1234
  • Would this command succeed?
  • Depends upon the configuration of the database

112
Normalization
  • Process of creating small stable data structures
    from complex groups of data
  • Relations that reduce data redundancy are created
    so the insertion, deletion and update operations
    can be safely done.
  • EXAMPLE
  • SP (S, SNAME, STATUSCITY, QTY, P, PNAME, COLOR,
    WEIGHT)
  • Can be decomposed to
  • S (S, SNAME, STATUSCITY)
  • P (P, PNAME, COLOR, WEIGHT)
  • SP (S, P, QTY)

113
An Unnormalized Relation for ORDER
114
Normalized Tables Created from ORDER
115
Database Design
  • Think about how data is related
  • Entity-Relation diagrams
  • Think about how queries will be executed
  • Avoid repeated data
  • Changing the structure of tables
  • ALTER Adds/Deletes columns, constraints, defaults
  • Cannot change field sizes

116
Triggers
  • Trigger procedure that starts automatically if
    specified changes occur to the DBMS
  • Three parts
  • Event (activates the trigger)
  • Condition (tests whether the triggers should run)
  • Action (what happens if the trigger runs)
  • Triggers (in some form) are supported by most
    DBMSs Assertions are not.
  • Support for triggers is defined in the SQL1999
    standard.

117
Triggers Example
  • CREATE TRIGGER member_delete
  • ON member FOR DELETE
  • AS
  • IF (Select COUNT () FROM loan INNER JOIN deleted
  • ON loan.member_no deleted.member_no) gt 0
  • BEGIN
  • PRINT ERROR - member has books on loan.
  • ROLLBACK TRANSACTION
  • END
  • ELSE
  • DELETE reservation WHERE reservation.member_no
    deleted.member_no

118
Using SQL
  • Interactively
  • Via a command line program
  • Embedded SQL
  • Embed complete SQL statements in code
  • Precompiler modifies the source code
  • Dynamic SQL
  • Embedded in scripts or other programs
  • Dynamically interpreted
  • Variables bound at run time

119
What is Embedded SQL?
  • The standard SQL embedded in general purpose
    programming languages such as Pascal, Fortran,
    Cobol, C and C.
  • A language in which SQL queries are embedded is
    referred to as a host language, and the SQL
    structures permitted in the host language
    constitute embedded SQL.

120
Why Embedded SQL?
  • There exist queries that can be expressed in a
    language such as C, Cobol, or Fortran that can
    not be expressed in SQL.
  • Non-declarative actions-such as printing a
    report, interacting with a user, or sending the
    results of a query to a graphical user
    interface-cannot be done from within SQL.

121
How embedded SQL works?
  • An embedded SQL program must be processed by a
    special preprocessor prior to compilation.
  • Embedded SQL requests are replaced with
    host-language declarations and procedure calls
    that allow run-time execution of the database
    accesses.
  • Then, the resulting program is compiled by the
    host-language compiler.

122
Embedding SQL in C An Example
  • char SQLSTATE6
  • EXEC SQL BEGIN DECLARE SECTION
  • char c_sname20 short c_minrating float c_age
  • EXEC SQL END DECLARE SECTION
  • c_minrating random()
  • EXEC SQL DECLARE sinfo CURSOR FOR
  • SELECT S.sname, S.age FROM Sailors S
  • WHERE S.rating gt c_minrating
  • ORDER BY S.sname
  • EXEC SQL OPEN sinfo
  • do
  • EXEC SQL FETCH sinfo INTO c_sname, c_age
  • printf(s is d years old\n, c_sname, c_age)
  • while (SQLSTATE ! 02000)
  • EXEC SQL CLOSE sinfo

123
Database APIs alternative to embedding
  • Rather than modify compiler, add a library with
    database calls (API)
  • special procedures/objects
  • passes SQL strings from language, presents result
    sets in a language-friendly way
  • Microsofts ODBC becoming C/C standard on
    Windows
  • Suns JDBC a Java equivalent
  • For Perl there is DBI or oraPerl
  • Mostly DBMS-neutral (or at least they try to hide
    the complexities of dealing with different
    database systems).

124
Evolution of Database Technology
  • 1960s
  • Data collection, database creation, IMS and
    network DBMS
  • 1970s
  • Relational data model, relational DBMS
    implementation
  • 1980s
  • RDBMS, advanced data models (extended-relational,
    OO, deductive, etc.) and application-oriented
    DBMS (spatial, scientific, engineering, etc.)
  • 1990s2000s
  • Data mining and data warehousing, multimedia
    databases, and Web databases

125
Centralized database
  • Used by single central processor or multiple
    processors in client/server network
  • There are advantages and disadvantages to having
    all corporate data in one location.
  • Security is higher in central environments, risks
    lower.
  • If data demands are highly decentralized, then a
    decentralized design is less costly, and more
    flexible.

126
Distributed database
  • Databases can be decentralized either by
    partitioning or by replicating
  • Partitioned database Database is divided into
    segments or regions. For example, a customer
    database can be divided into Eastern customers
    and Western customers, and two separate databases
    maintained in the two regions.
  • Duplicated database The database is completely
    duplicated at two or more locations. The
    separate databases are synchronized in off hours
    on a batch basis.

127
Distributed Databases
128
Ensuring Data Quality
  • Corporate and government databases have
    unexpectedly poor levels of data quality.
  • National consumer credit reporting databases have
    error rates of 20-35.
  • 32 of the records in the FBIs Computerized
    Criminal History file are inaccurate, incomplete,
    or ambiguous.
  • Gartner Group estimates that consumer data in
    corporate databases degrades at the rate of 2 a
    month.

129
Ensuring Data Quality (Continued)
  • The quality of decision making in a firm is
    directly related to the quality of data in its
    databases.
  • Data Quality Audit Structured survey of the
    accuracy and level of completeness of the data in
    an information system
  • Data Cleansing Consists of activities for
    detecting and correcting data in a database or
    file that are incorrect, incomplete, improperly
    formatted, or redundant

130
Online Analytical Processing (OLAP)
  • Multidimensional data analysis
  • Supports manipulation and analysis of large
    volumes of data from multiple dimensions/perspecti
    ves

131
Multidimensional Data Model
132
Data warehouse
  • Supports reporting and query tools
  • Stores current and historical data
  • Consolidates data for management analysis and
    decision making

133
Components of a Data Warehouse
134
Data mart
  • Subset of data warehouse
  • Contains summarized or highly focused portion of
    data for a specified function or group of users

135
Benefits of Data Warehouses
  • Improved and easy accessibility to information
  • Ability to model and remodel the data

136
What Is Data Mining?
  • Data mining (knowledge discovery in databases)
  • Extraction of interesting (non-trivial, implicit,
    previously unknown and potentially useful)
    information or patterns from data in large
    databases
  • Alternative names and their inside stories
  • Data mining a misnomer?
  • Knowledge discovery(mining) in databases (KDD),
    knowledge extraction, data/pattern analysis, data
    archeology, data dredging, information
    harvesting, business intelligence, etc.
  • What is not data mining?
  • (Deductive) query processing.
  • Expert systems or small ML/statistical programs

137
Why Data Mining? Potential Applications
  • Database analysis and decision support
  • Market analysis and management
  • target marketing, customer relation management,
    market basket analysis, cross selling, market
    segmentation
  • Risk analysis and management
  • Forecasting, customer retention, improved
    underwriting, quality control, competitive
    analysis
  • Fraud detection and management
  • Other Applications
  • Text mining (news group, email, documents) and
    Web analysis.
  • Intelligent query answering

138
Market Analysis and Management
  • Where are the data sources for analysis?
  • Credit card transactions, loyalty cards, discount
    coupons, customer complaint calls, plus (public)
    lifestyle studies
  • Target marketing
  • Find clusters of model customers who share the
    same characteristics interest, income level,
    spending habits, etc.
  • Determine customer purchasing patterns over time
  • Conversion of single to a joint bank account
    marriage, etc.
  • Cross-market analysis
  • Associations/co-relations between product sales
  • Prediction based on the association information

139
Market Analysis and Management
  • Customer profiling
  • data mining can tell you what types of customers
    buy what products (clustering or classification)
  • Identifying customer requirements
  • identifying the best products for different
    customers
  • use prediction to find what factors will attract
    new customers
  • Provides summary information
  • various multidimensional summary reports
  • statistical summary information (data central
    tendency and variation)

140
Corporate Analysis and Risk Management
  • Finance planning and asset evaluation
  • cash flow analysis and prediction
  • contingent claim analysis to evaluate assets
  • cross-sectional and time series analysis
    (financial-ratio, trend analysis, etc.)
  • Resource planning
  • summarize and compare the resources and spending
  • Competition
  • monitor competitors and market directions
  • group customers into classes and a class-based
    pricing procedure
  • set pricing strategy in a highly competitive
    market

141
Fraud Detection and Management
  • Applications
  • widely used in health care, retail, credit card
    services, telecommunications (phone card fraud),
    etc.
  • Approach
  • use historical data to build models of fraudulent
    behavior and use data mining to help identify
    similar instances
  • Examples
  • auto insurance detect a group of people who
    stage accidents to collect on insurance
  • money laundering detect suspicious money
    transactions (US Treasury's Financial Crimes
    Enforcement Network)
  • medical insurance detect professional patients
    and ring of doctors and ring of references

142
Fraud Detection and Management
  • Detecting inappropriate medical treatment
  • Australian Health Insurance Commission identifies
    that in many cases blanket screening tests were
    requested (save Australian 1m/yr).
  • Detecting telephone fraud
  • Telephone call model destination of the call,
    duration, time of day or week. Analyze patterns
    that deviate from an expected norm.
  • British Telecom identified discrete groups of
    callers with frequent intra-group calls,
    especially mobile phones, and broke a
    multimillion dollar fraud.
  • Retail
  • Analysts estimate that 38 of retail shrink is
    due to dishonest employees.

143
Other Applications
  • Sports
  • IBM Advanced Scout analyzed NBA game statistics
    (shots blocked, assists, and fouls) to gain
    competitive advantage for New York Knicks and
    Miami Heat
  • Astronomy
  • JPL and the Palomar Observatory discovered 22
    quasars with the help of data mining
  • Internet Web Surf-Aid
  • IBM Surf-Aid applies data mining algorithms to
    Web access logs for market-related pages to
    discover customer preference and behavior pages,
    analyzing effectiveness of Web marketing,
    improving Web site organization, etc.

144
Data Mining A KDD Process
Knowledge
  • Data mining the core of knowledge discovery
    process.

Pattern Evaluation
Data Mining
Task-relevant Data
Data Warehouse
Selection
Data Cleaning
Data Integration
Databases
145
Steps of a KDD Process
  • Learning the application domain
  • relevant prior knowledge and goals of application
  • Creating a target data set data selection
  • Data cleaning and preprocessing (may take 60 of
    effort!)
  • Data reduction and transformation
  • Find useful features, dimensionality/variable
    reduction, invariant representation.
  • Choosing functions of data mining
  • summarization, classification, regression,
    association, clustering.
  • Choosing the mining algorithm(s)
  • Data mining search for patterns of interest
  • Pattern evaluation and knowledge presentation
  • visualization, transformation, removing redundant
    patterns, etc.
  • Use of discovered knowledge

146
Data Mining and Business Intelligence
Increasing potential to support business decisions
End User
Making Decisions
Business Analyst
Data Presentation
Visualization Techniques
Data Mining
Data Analyst
Information Discovery
Data Exploration
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
OLAP, MDA
DBA
Data Sources
Paper, Files, Information Providers, Database
Systems, OLTP
147
Architecture of a Typical Data Mining System
Graphical user interface
Pattern evaluation
Data mining engine
Knowledge-base
Database or data warehouse server
Filtering
Data cleaning and data integration
Data Warehouse
Databases
148
Databases and the Web
  • The Web and Hypermedia database
  • Organizes data as network of nodes
  • Links nodes in pattern specified by user
  • Supports text, graphic, sound, video, and
    executable programs

149
  • Database server
  • Computer in a client/server environment runs a
    DBMS to process SQL statements and perform
    database management tasks.
  • Application server
  • Software handling all application operations

150
Linking Internal Databases to the Web
151
Databases and the Web
  • The Web extends the database interface
  • Consider making a plane reservation
  • Typically involves multiple forms (choose flight,
    choose seat, choose payment method)
  • Information from the beginning may be invalid by
    the end

152
Keeping Track of Users
  • HTTP is stateless
  • Request-response paradigm
  • If we need to keep track of users between
    requests, we can
  • Embedding hidden fields with values in sequential
    forms
  • Cookies

153
Hidden Fields
  • What are some of the problems?
  • Only work in forms
  • Bookmarkable? If form uses GET, but not if POST
  • Can be manipulated

154
Cookies
  • Two HTTP headers
  • COOKIE
  • Used by browser sending cookie info back to
    server
  • SET-COOKIE2
  • Used by server to request that a cookie be stored
  • Associated with hosts
  • Typically one line/cookie in a text file
  • Body contains keyvalue pairs

155
Cookie Format
  • The header includes a line
  • Set-Cookie2 NAMEVALUE ( attributevalues)
  • Attributes include
  • Comments
  • Discard the cookie when client quits
  • Maximum age
  • Domain, path, and port of relevance

156
Cookie Example
  • Netscape stores in a MagicCookie file
  • .ncsu.edu TRUE / FALSE 2051222363 SITESERVER
    IDbd2f6b1d2e2eed30af221fe4a64eb077

157
What Can You Do With a Cookie?
  • Can store data on client,
  • Store a user ID
  • Store a transaction ID
  • Store preferences on client machine
  • A server can store more than one cookie (using
    different names)

158
When is the Data Returned?
  • Whenever the client is sending a request to a
    server and
  • The host, URL, and port match an existing cookie
  • The cookie has not expired
  • It should include the cookie data in the request
    header

159
ACID Properties For Web Transactions?
  • Awkward, at best
  • Must be carefully managed by the developers
  • unlike databases, in which it is a part of the
    system

160
Dynamic SQL Basics
  • Typically
  • Prepare phase
  • Parses, validates, optimizes the query
  • Creates an execution plan
  • Execute Phase
  • Executes the query
  • Can be called repeatedly once prepared
  • Syntax varies

161
Dynamic SQL Example
  • Using VBScript and ASP, ActiveX Data Object
  • ltSQLCAR"SELECT FROM tableset conn
    server.createobject("ADODB.Connection")conn.open
    database_nameset carsconn.execute(SQLCAR)gt

162
Cursors
  • Allow application to access an arbitrary number
    of rows
  • FETCH to retrieve the next row
  • Placed in a loop to retrieve next row
  • Until an SQLCODE of Not Found is returned

163
TPC
164
TPC-C
  • TPC-C simulates a complete computing environment
    where a population of users executes transactions
    against a database.
  • The benchmark is centered around the principal
    activities (transactions) of an order-entry
    environment.
  • These transactions include entering and
    delivering orders, recording payments, checking
    the status of orders, and monitoring the level of
    stock at the warehouses.
  • TPC-C performance is measured in new-order
    transactions per minute.  The primary metrics are
    the transaction rate (tpmC), the associated price
    per transaction (/tpmC), and the availability
    date of the priced configuration.

165
Management Opportunities
  • Business firms have exceptional opportunities to
    exploit modern relational database technologies
    to improve decision making, and to increase the
    efficiency of their business processes.

166
Management Challenges
  • Organizational obstacles to a database
    environment
  • Need for cooperation in developing corporate-wide
    data administration
  • Cost/benefit considerations
  • Bringing about significant change in the
    database environment of a firm can be very
    expensive and time consuming.

167
Solution Guidelines
  • The critical elements for creating a database
    environment are
  • Data administration
  • Data-planning and modeling methodology
  • Database technology and management
  • Users

168
Key Organizational Elements in the Database
Environment
Write a Comment
User Comments (0)