Title: Database
 1Database
- An introduction to database 
- concepts and vocabulary
2Ubiquity of Databases
- Mauchly and Eckert designed the ENIAC to perform 
 calculations (shell trajectories).
- After the ENIAC was built, it was used to do 
 thermonuclear chain reaction calculations.
- But when Mauchly and Eckert went into business, 
 their first customer was the census bureau.
- And ever since computers have played an important 
 role in filing and record keeping.
- Suffice it to say 
- Databases are very important. 
- Databases are all around us. 
3Mauchly  Eckert and early programmers of the 
ENIAC 
 4What do we want?
- Desired Features of our database 
- Storage 
- We want to store data efficiently, have it 
 centralized (or at least seemingly centralized).
- Centralization (integration) are subject to 
 bottlenecks and single-point-of-failure issues.
- Retrieval 
- We want to have the data at our fingertips when 
 we want it.
- Querying 
- We want to ask various questions about the data 
 (and get answers in a timely manner).
- (These desires are to some extent in conflict.)
5Automating the procedure
- We would like to have the computer perform the 
 tedious aspects of such tasks.
- An outdated approach would be to use a file-based 
 system, that is, to have the data stored in
 various (flat, simple text) files and write a
 program that reads the files, parses the
 information, does the required searching,
 sorting, correlating, etc.
6Entities
- Even in the file-based approach, one must 
 identify units of information that will be
 contained in a single file. These are known as
 the entities.
- An entity is somewhat similar to an object in 
 programming, it collects data that belongs
 together in some immediate way.
- Entities also separate the data into distinct 
 units.
- Database entities often reflect real 
 objects/entities (persons, buildings, courses,
 etc.)
7Fields
- The lower-level pieces of data gathered together 
 to form an entity are known as fields or
 attributes or properties.
- The Person entity might consist of fields like 
 FirstName, LastName, JobType, SocSecNum, etc.
- Fields are analogous to properties of an object. 
- Fields have a type (Text, Number, Yes/No, Memo, 
 Date/Time, etc.) which indicate how the
 information is to be stored and interpreted.
8Relationship
- The various entities may be distinct, but they 
 are not completely disconnected.
- E.g. a Customer places an order 
- An association between two entities is known as a 
 relationship.
- The Customer-places-Order relationship was 
 realized in Access by using the Lookup Wizard to
 ensure that the two tables had a common field
 (CustomerID).
9ER Diagram
- One can visualize the entities and their 
 relationship using an Entity-Relationship (ER)
 diagram.
- The entities are represented by rectangles. 
- The relationships are represented by arrows 
 between the rectangles.
- The arrow may include a verb to capture the 
 nature of the relationship (as well as other
 notations).
10ER Diagram Example
Customer
CustomerID CustomerFirstName 
Places
Order
OrderID ShippingCost 
Is part of
Item
ItemID ItemDescription  
 11File-based Systems
- In a file-based approach, there would be a file 
 corresponding to each entity.
- (There may be more files than entities since some 
 relationships are realized through their own
 tables/files.)
- These files must be located, read, parsed. The 
 data is then used to initialize some variables
 and/or objects which are then analyzed (searched,
 sorted, etc.) by the remainder of the program.
12Details, details
- The programmer must have information about the 
 data files. For example
- where they are to be found 
- the order in which the fields occur 
- the length of the fields and/or the delimiter 
 used
- Changing the length of a field or adding a field 
 may require that all of the corresponding
 programs be rewritten.
- Such features of the file-based approach are 
 called program-data dependence.
13Automating the Automation
- Since 
- Reading is reading 
- Parsing is parsing 
- Searching is searching 
- Sorting is sorting 
- Why have programmers continually repeating these 
 tedious tasks?
- Automate and/or generalize the process. 
- These are some of the aspects of a database 
 management system (DBMS).
14Specific Info in Database 
- The generalized routines for reading, parsing, 
 searching, sorting etc. are in the DBMS.
- But information specific to a particular case 
 (number of fields, their type, size and so on) is
 still required. This data is placed together with
 the actual data in the database.
15Meta-data
- This data about the data is known as meta-data. 
- Meta a prefix meaning after, along with or 
 beyond
- The meta-data describes the actual data, and so 
 databases are sometimes called self-describing.
- Related terms include data dictionary, system 
 catalog and schema.
16Meta-data Open a database file in Notepad 
Some actual data
One can see theres more to this file than just 
customer data. 
 17Layers
- The inclusion of the meta-data (the 
 self-describing aspect of a database) allows a
 separation of the data from the processing,
 providing program-data independence.
- Another way to say this is that there is a 
 separation between the database (specific) and
 the database management system (generic).
18Database/DBMS Distinction 
Database Raw-data and meta-data
DBMS
User
Application
Users and applications interact with a database 
only through the DBMS. 
 19Pros of Database Approach
- Control of data redundancy 
- Data consistency 
- More info from same data 
- Sharing of data 
- Improved data integrity 
- Improved security 
- Enforcement of standards 
- Economy of scale 
- Balancing of conflicting requirements 
- Improved accessibility and responsiveness 
- Improved maintenance through data independence 
- Increased concurrency 
- Improved backup and recovery services 
20Pros 
- Control of data redundancy and consistency 
- If the same data is entered more than once, it is 
 said to be redundant.
- An obvious point is that this wastes space. 
- If the data is updated, it must be updated in 
 several places or the data will be inconsistent.
- Relationships are realized through repeated data, 
 but one tries to use something like an ID (a
 name might change but an ID does not have to).
- (Redundancy reduction and query simplicity can be 
 at odds, sometimes one sacrifices redundancy in
 order to make querying easier, e.g. in data
 mining. )
21Pros (Cont.)
- More information, sharing of data and 
 standardization
- Because databases facilitate querying, they can 
 yield more information.
- A database approach often centralizes 
 (integrates) the records of different
 departments, making more (raw) data and
 information available to the users
- Integration often leads to standardization, 
 consistent naming schemes, consistent report
 formats, etc.
22Pros (Cont.)
- Improved data integrity 
- An old computing axiom says garbage in, garbage 
 out (GIGO). If the raw data is bad, so too is the
 resulting information.
- In the database approach, one can apply 
 constraints to help ensure that the data is
 reliable.
- Accesss lookup table for the foreign keys is an 
 example. A foreign key is supposed to match an
 entry from another table, the lookup table helps
 ensure that.
- We also saw that we could Enforce Referential 
 Integrity.
- We also mentioned masks, which are another 
 integrity check.
23Pros (Cont.)
- Improved security 
- Part of the meta-data can be used to authenticate 
 users who are allowed to access the data.
- Different users may have different access 
- Data is often not entered directly into a Table 
 using the DataSheet but by using Views and/or
 Forms, which can hide sensitive data from certain
 users.
24Pros (Cont.)
- Economy of scale 
- A benefit of an organization centralizing 
 (integrating) its record-keeping efforts is the
 money applied to individual departments is
 pooled.
- Not only is duplication of effort reduced or 
 eliminated, but so too is duplication of hardware
 and software.
- Balancing of conflicting requirements 
- Integration can lead to a resolution or at least 
 a balancing of different departments, which may
 have conflicting goals.
25Pros (Cont.)
- Improved data accessibility and responsiveness 
 and increased productivity
- Because nitty-gritty details (reading, parsing, 
 sorting, searching, etc.) are built into the
 DBMS, the database staff work at a higher level
 closer to the users, responding to their
 particular needs.
- Again with fewer details to attend to, more work 
 can be accomplished.
26Pros (Cont.) 
- Improved maintenance through data independence 
- Change of a fields type or size or introduction 
 of a new field changes only the database and not
 the DBMS.
- This layering yields independence which 
 simplifies maintenance. Changing the database
 does not require changing the DBMS, which was not
 the case in the file-based approach.
27Pros (Cont.)
- Increased concurrency 
- The DBMS can handle multi-users using and even 
 updating the database.
- There are built-in mechanisms to prevent two 
 users from changing the data in conflicting ways.
 
- Improved backup and recovery services 
- Backing up and recovering the database may be 
 handled by the DBMS (that is, they are
 integrated services) rather than externally.
28Cons of Database Approach
- Complexity 
- Size 
- Cost of DBMS 
- Additional hardware costs 
- Cost of conversion 
- Performance 
- Higher impact of failure
29Cons
- Complexity and Size 
- Because so many features have been integrated 
 into the DBMSs, they have become complicated
 software packages. One must understand these
 features to utilize them properly.
- Integrating information from various departments 
 makes the database more complicated. Good design
 is crucial.
- Integration of features into the DBMS and data 
 into the database means that both may become
 quite large.
30Cons (Cont.)
- Cost OF DBMSs and the hardware 
- Again the size and complexity of the software 
 means that such packages are expensive.
- The larger, more complex software requires more 
 powerful hardware to run on.
- It also requires a knowledgeable, well-trained 
 (hopefully high paid) staff.
31Cons (Cont.)
- Conversion cost 
- Legacy system 
- Performance 
- More complexity may slow down some tasks. 
- Higher impact of failure. 
- Integrating (centralizing) the information can 
 mean that everything is lost at once.
32Things in the database environment
- In addition to the data, theres 
- Hardware that stores and manipulates the data 
- Software to 
- Interface with the hardware 
- (actually the operating system which 
 interfaces with the BIOS which interfaces
 with the hardware)
- Provide the data with structure 
- Interface with the user and/or applications 
 
- People
33Hardware
- Could be 
- A single PC 
- A mainframe and terminals 
- A network of computers
34A scenario
Database
Database Server
Network Server
Network Server
Network Server
Client
Client
Client
Client
Client
Client 
 35Client-Server
- The client-server model is a way for transactions 
 to take place.
- The transaction is viewed as a service. 
- The client requests the service. 
- The server provides the service. 
- For example, to query a networked database 
- A client would request the network server(s) to 
 connect it to the database server
- The database server queries the database 
- The result is passed from database server to 
 network server to client.
- The client-server terminology can be applied to 
 both software and hardware.
36Front-end and Back-end
- In large-scale client-server interaction, there 
 may be many intermediate client-server
 interactions (e.g. the network servers become
 clients of the database server).
- The software and hardware near the beginning of 
 the transaction (initial request) is called
 front-end while that near the ultimate providing
 of the service is known as back-end.
-  In the analogy of getting a meal at a 
 restaurant, the waiter is front-end and the cook
 is back-end.
37Software
- The bulk of the software is contained in the 
 database management system (DBMS). It handles
 everything from storage and structure to security
 and integrity.
- There may also be application software that 
 interfaces with the DBMS.
- The DBMS allows one to interface with the 
 database on a higher level.
38Prescriptive vs. Descriptive
- In a file-based approach, ones program is a 
 step-by-step procedure explicitly determining how
 a question will be answered
- Read this file, parse it this way, create these 
 objects, search them this way
- This approach is sometimes called prescriptive 
- Prescription originally meant a set of 
 instructions for preparing and/or taking a drug,
 only later did the word become synonymous with
 the drug itself.
39Prescriptive vs. Descriptive (Cont.)
- In the database approach, most of the 
 nitty-gritty, step-by-step instructions are
 hidden in the DBMS and the user need only
 describe the data (the meta-data, the
 self-describing database) and describe what he or
 she wants from the data.
- This approach is sometimes called descriptive.
40Language Generations
- People talk about generations of programming 
 languages or the level of a language.
- A first generation language (1GL) is machine 
 code, that is, a binary representation of
 instructions (e.g. 11001101)
- A second generation language (2GL) is assembly 
 language, that is, mnemonics for machine code
 (e.g. STA 13)
- A third generation language (3GL) is a high-level 
 language, which includes most compiled languages,
 such as Fortran, C, BASIC, Java, etc. (e.g. int
 a  13)
- A fourth generation language (4GL) is used to 
 develop database applications. They are designed
 to be closer to natural language.
41SQL
- SQL (Structured Query Language), pronounced S-Q-L 
 or See-Quel, has become the standard language for
 relational databases.
- SQL is part third generation and part fourth 
 generation.
SQL
SQL
SQL
SQL
SQL 
 42SQL is the sequel to SEQUEL
- The original version was called SEQUEL and was 
 developed at IBM in the mid-70s.
- However, Oracle Corporation was the first company 
 to use SQL in a commercial product in 1979.
43Whats it made of?
- SQL has 3 components 
- Data Definition Language (DDL) 
- The part that allows you to establish the 
 structure of the database
- Data Manipulation Language (DML) 
- The part that allows you to enter data, update 
 data and ask questions of the data (queries)
- Data Control Language (DCL) 
- The part that allows you to add security features 
 (e.g. user authentication), concurrency
 (multi-user) features, recovery features, etc.
44The People Whos involved with this database 
anyway?
- Data Administrator (DA) 
- Oversees data resources. 
- More of a hands-off role. 
- Deals with other managers. 
- Sets policies. 
- Handles budgets. 
- Plans for future. 
45Whos involved with this database anyway? (Cont.)
- Database administrator (DBA) 
- More hands-on and more technical than the Data 
 Administrator (DA)
- Oversees hardware and software design, 
 implementation and maintenance
-  Responsible for security and integrity 
- Ensures users have appropriate accessibility 
- Etc. 
46Whos involved with this database anyway? (Cont.)
- Database Designer 
- Logical database designer 
- Identifies entities, fields, relationships 
- Applies high-level constraints including the 
 business rules
- E.g. A Simpsons database might have a business 
 rule that there must be between 10 and 30
 episodes in a complete season
- Physical database designer 
- Actually creates tables 
- Implements constraints 
- Introduces security measures 
- Etc. 
47Whos involved with this database anyway? (Cont.)
- Application Developer 
- After the overall structure of the database is 
 laid out and implemented, the application
 developer considers the more individual needs,
 such as what software does the payroll department
 need
- Application may involve third-generation or 
 fourth generation languages or a combination
- E.g. a Visual Basic program could use an SQL 
 statement to query a database
48Whos involved with this database anyway? (Cont.)
- End-Users 
- Naïve 
- Has little to no database knowledge 
- Uses applications that simplify interaction with 
 the database
- Cashier scanning an items barcode 
- Sophisticated 
- Knows something to a lot about databases 
- May use SQL to update or query database
49References
- Database Systems Rob and Coronel 
- Database Systems, Connolly and Begg 
- SQL for Dummies, Taylor 
- http//www.metacard.com/wp1a.html 
- http//www.oracle.com/glossary/index.html?axx.html
 
- Concepts of Database Management, Pratt and Adamski