The Software Infrastructure for Electronic Commerce - PowerPoint PPT Presentation

About This Presentation
Title:

The Software Infrastructure for Electronic Commerce

Description:

... Store information (called a cookie) at the browser of the ... Cookies have expiration dates. The secrets are in the (name=value) pairs (usually encrypted) ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 82
Provided by: johanne46
Category:

less

Transcript and Presenter's Notes

Title: The Software Infrastructure for Electronic Commerce


1
The Software Infrastructurefor Electronic
Commerce
  • Databases and Data Mining
  • Lecture 1 A Managers View of Database
    Technology
  • Johannes Gehrke
  • johannes_at_cs.cornell.edu
  • http//www.cs.cornell.edu/johannes

2
Goal Of Lectures One and Two
  • Understand the basics of modern data management
  • DBMS technology
  • Data models
  • Transaction processing versus decision support
  • An data management architecture of ane-business

3
Goals of the DBMS Lectures (Contd.)
  • Gain technical understanding to feel confident to
    ask questions.
  • Learn to ask the right questions.

4
The Big Picture
WWW SiteVisitor
Internal User
INTRANET,VPN
THE WEB
InternalWeb Server
MainMemory Cache
Public Web Server
DataWarehouseApplicationServer
BusinessTransactionServer
DBMS
5
Web Pages with Database Contents
  • Web pages contain the results of database
    queries. How do we generate such pages?
  • Web server creates a new process for a program
    interacts with the database.
  • Web server communicates with this program via
    some data exchange protocol
  • Program generates result page with content from
    the database

6
Business Transaction Server
  • Application server Piece of software between the
    web server and the applications
  • Functionality
  • Hold a set of pre-forked threads or processes for
    performance
  • Database connection pooling (reuse a set of
    existing connections)
  • Integration of heterogeneous data sources
  • Transaction management involving several data
    sources
  • Session management

7
Other Server-Side Processing
  • Java Servlets Java programs that run on the
    server and interact with the server through a
    well-defined API
  • JavaBeans Reusable software components written
    in Java
  • Java Server Pages and Active Server Pages Code
    inside a web page that is interpreted by the web
    server

8
What Happens If You Click On A Link?
www.company.com
1
Company web serverwww.company.com
Johannes Gehrke
2
Banner web serverwww.banners.com
3
Additional log serverwww.mycookies.com
Hidden link
4
SERVER SIDE
CLIENT SIDE
9
Web Server Log Data
  • Common Log Formathost ident authuser date
    request status bytes
  • host domain name or IP address of the client
  • date date and time of the request
  • request the request line from the client
  • status three digit status code returned to the
    client
  • bytes number of bytes in the object returned to
    the client

10
Web Server Log Data (Contd.)
  • Usually, even more data available
  • URL of the referring server
  • Name and version of the browser
  • Name of the file requested
  • Time to serve the request
  • IP address of the requesting browser
  • Cookie

11
Cookies
  • The communication protocol (HTTP) between your
    browser and the web server is stateless. (Compare
    to a vending machine.)
  • Remedy Store information (called a cookie) at
    the browser of the user
  • Example cookie (from www.google.com)
  • PREFID3415aaaf73b7bfe3,TM956724506google.com/
    0261887833632111634266272280029339450(namev
    aluedomainsecureexpiration dateexpiration
    timelast used datelast used time)

12
Cookies (Contd.)
  • A cookie is always associated with a specific
    domain (amazon.com, bn.com, preferences.com,
    doubleclick.net, cornell.edu)
  • Cookies have expiration dates
  • The secrets are in the (namevalue) pairs
    (usually encrypted)PREFID3415aaaf73b7bfe3,TM95
    6724506
  • Cookies have their own life on your computer
  • \Documents and Settings\UserName\Cookies,\Windows
    \Cookies\Windows\Profiles\UserName\Cookies\Progr
    amFiles\Netscape\Users\Default\cookies.txt

13
Cookies (Contd.)
  • Applications of cookies
  • Shopping carts
  • Log in once (example New York Times)
  • Personalization (amazon.com Welcome back,
    Johannes)
  • General tracking of user behavior
  • User privacy
  • Other personalization/tracking techniques Hidden
    fields in html pages, unique page names
  • Online demonstration

14
And Then You Click Purchase
  • (Simplified process)
  • Insert customer data into the database/check
    customer data
  • Check order availability
  • Insert order data into the database
  • Return order confirmation number to the customer
  • All this data is stored in a database system
    (DBMS).

15
Why Store Data in a DBMS?
  • Benefits
  • Transactions (concurrent data access, recovery
    from system crashes)
  • High-level abstractions for data access,
    manipulation, and administration
  • Data integrity and security
  • Performance and scalability

16
Transactions
  • A transaction is an atomic sequence of database
    actions (reading, writing or updating a database
    object).
  • Each transaction must leave the DB in a
    consistent state (if DB is consistent when the
    transaction starts).
  • The ACID Properties
  • Atomicity
  • Consistency
  • Isolation
  • Durability

17
Example Transaction Online Store
  • Your purchase transaction
  • Atomicity Either the complete purchase happens,
    or nothing
  • Consistency The inventory and internal accounts
    are updated correctly
  • Isolation It does not matter whether other
    customers are also currently making a purchase
  • Durability Once you have received the order
    confirmation number, your order information is
    permanent, even if the site crashes

18
Transactions (Contd.)
  • A transaction will commit after completing all
    its actions, or it could abort (or be aborted by
    the DBMS) after executing some actions.

19
Example Transaction ATM
  • You withdraw money from the ATM machine
  • Atomicity
  • Consistency
  • Isolation
  • Durability
  • Commit versus Abort?
  • What are reasons for commit or abort?

20
Concurrency Control
  • (Start A100 B100)
  • Consider two transactions
  • T1 START, AA100, BB-100, COMMIT
  • T2 START, A1.06A, B1.06B, COMMIT
  • The first transaction is transferring 100 from
    Bs account to As account. The second
    transaction is crediting both accounts with a 6
    interest payment.

21
Example (Contd.)
  • (Start A100 B100)
  • Consider a possible interleaving (schedule)
  • T1 AA100, BB-100 COMMIT
  • T2 A1.06A, B1.06B COMMIT
  • End result A106 B0
  • Another possible interleaving
  • T1 AA100, BB-100 COMMIT
  • T2 A1.06A, B1.06B COMMIT
  • End result A106 B6
  • The second interleaving is incorrect! Concurrency
    control of a database system makes sure that the
    second schedule does not happen.

22
Ensuring Atomicity
  • DBMS ensures atomicity (all-or-nothing property)
    even if the system crashes in the middle of a
    transaction.
  • Idea Keep a log (history) of all actions carried
    out by the DBMS while executing
  • Before a change is made to the database, the
    corresponding log entry is forced to a safe
    location.
  • After a crash, the effects of partially executed
    transactions are undone using the log.

23
Recovery
  • A DBMS logs all elementary events on stable
    storage. This data is called the log.
  • The log contains everything that changes data
    Inserts, updates, and deletes.
  • Reasons for logging
  • Need to UNDO transactions
  • Recover from a systems crash

24
Recovery Example
  • (Simplified process)
  • Insert customer data into the database
  • Check order availability
  • Insert order data into the database
  • Write recovery data (the log) to stable storage
  • Return order confirmation number to the customer

25
Tips
  • Transactions, concurrency control, and recovery
    are important aspects of the functionality of a
    database system
  • Tips for capacity planning
  • Load influences level of concurrency ?Determines
    hardware requirements
  • Insufficient resources for concurrency and
    recovery can force serialization of transactions
    ? Bad performance
  • Need ample space for the log, often mirrored onto
    two disks at the same time

26
Why Store Data in a DBMS?
  • Benefits
  • Transactions (concurrent data access, recovery
    from system crashes)
  • High-level abstractions for data access,
    manipulation, and administration
  • Data integrity and security
  • Performance and scalability

27
Data Independence
  • Applications should be insulated from how data is
    structured and stored. Thus the DBMS needs to
    provide high-level abstractions to applications!

View 1
View 2
View 3
Conceptual Schema
Physical Schema
28
Data Model
  • A data model is a collection of concepts for
    describing data.
  • Examples
  • ER model (used for conceptual modeling)
  • Relational model, object-oriented model,
    object-relational model (actually implemented in
    current DBMS)

29
The Relational Data Model
  • A relational database is a set of relations.
    Turing Award (Nobel Price in CS) for Codd in 1980
    for his work on the relational model
  • Example relationCustomers(cid integer, name
    string, byear integer, state string)

30
The Relational Model Terminology
  • Relation instance and schema
  • Field (column)
  • Record or tuple (row)
  • Cardinality

31
Customer Relation (Contd.)
  • In your enterprise, you are more likely to have a
    schema similar to the followingCustomers(cid,
    identifier, nameType, salutation, firstName,
    middleNames, lastName, culturalGreetingStyle,
    gender, customerType, degrees, ethnicity,
    companyName, departmentName, jobTitle,
    primaryPhone, primaryFax, email, website,
    building, floor, mailstop, addressType,
    streetNumber, streetName, streetDirection, POBox,
    city, state, zipCode, region, country,
    assembledAddressBlock, currency, maritalStatus,
    bYear, profession)

32
Product Relation
  • Relation schemaProducts(pid integer, pname
    string, price float, category string)
  • Relation instance

33
Transaction Relation
  • Relation schemaTransactions( tid
    integer, tdate date, cid integer, pid
    integer)
  • Relation instance

34
TIPS
  • Any enterprise has an abundance of different
    relations in its DBMS. Good management of this
    meta-data is crucial
  • Documentation
  • Evolution
  • Assignment of responsibilities
  • Security
  • (ERP packages usually create several thousand
    relations)

35
The Relational DBMS Market
36
The Relational DBMS Market (Contd.)
37
The Relational DBMS Market (Contd.)
38
The Object-Oriented Data Model
  • Richer data model. Goal Bridge impedance
    mismatch between programming languages and the
    database system.
  • Example components of the data model
    Relationships between objects directly as
    pointers.
  • Result Can store abstract data types directly in
    the DBMS
  • Pictures
  • Geographic coordinates
  • Movies
  • CAD objects

39
Object-Oriented DBMS
  • Advantages Engineering applications (CAD and CAM
    and CASE computer aided software engineering),
    multimedia applications.
  • Disadvantages
  • Technology not as mature as relational DMBS
  • Not suitable for decision support, weak security
  • Vendors are much smaller companies and their
    financial stability is questionable.

40
Object-Oriented DBMS (Contd.)
  • Vendors
  • Gemstone (www.gemstone.com)
  • Objectivity (www.objy.com)
  • Object Design (www.odi.com)
  • POET (www.poet.com)
  • Versant (www.versant.com)
  • Organizations
  • OMDG Object Database Management
    Group(www.omdg.org)
  • OMG Object Management Group (www.omg.org)

41
The OO DBMS Market
42
Object-Relational DBMS
  • Mixture between the object-oriented and the
    object-relational data model
  • Combines ease of querying with ability to store
    abstract data types
  • Conceptually, the relational model, but every
    field
  • All major relational vendors are currently
    extending their relational DBMS to the
    object-relational model

43
Query Languages
  • We need a high-level language to describe and
    manipulate the data
  • Requirements
  • Precise semantics
  • Easy integration into applications written in
    C/Java/Visual Basic/etc.
  • Easy to learn
  • DBMS needs to be able to efficiently evaluate
    queries written in the language

44
Relational Query Languages
  • The relational model supports simple, powerful
    querying of data.
  • Precise semantics for relational queries
  • Efficient execution of queries by the DBMS
  • Independent of physical storage

45
SQL Structured Query Language
  • Developed by IBM (System R) in the 1970s
  • ANSI standard since 1986
  • SQL-86
  • SQL-89 (minor revision)
  • SQL-92 (major revision, current standard)
  • SQL-99 (major extensions)

46
Why Store Data in a DBMS?
  • Benefits
  • Transactions (concurrent data access, recovery
    from system crashes)
  • High-level abstractions for data access,
    manipulation, and administration
  • Data integrity and security
  • Performance and scalability

47
Integrity Constraints
  • Integrity Constraints (ICs) Condition that must
    be true for any instance of the database.
  • ICs are specified when schema is defined.
  • ICs are checked when relations are modified.
  • A legal instance of a relation is one that
    satisfies all specified ICs.
  • DBMS should only allow legal instances.
  • Example Domain constraints.

48
Primary Key Constraints
  • A set of fields is a superkey for a relation if
    no two distinct tuples can have same values in
    all key fields.
  • A set of fields is a key if the set is a
    superkey, and none of its subsets is a superkey.
  • Example
  • cid, name is a superkey for Customers
  • cid is a key for Customers
  • Where do primary key constraints come from?

49
Primary Key Constraints (Contd.)
  • Can there be more than one key for a relation?
  • What is the maximum number of superkeys for a
    relation?
  • What is the primary key of the Products relation?
    How about the Transactions relation?

50
Foreign Keys, Referential Integrity
  • Foreign key Set of fields in one relation that
    is refers to a unique tuple in another relation.
    (The foreign key must be a superkey of the second
    relation.)
  • Example The field cid in the Transactions
    relation is a foreign key referring to Customers.
  • If all foreign key constraints are enforced, we
    say that referential integrity is achieved.
  • No dangling references.
  • Compare to links in HTML.

51
Foreign Keys Example
  • The pid field of the Transactions relation refers
    to the cid field of the Customer relation.

52
Enforcing Referential Integrity
  • What should be done if a Transaction tuple with a
    non-existent Customer id is inserted? (Reject
    it!)
  • What should be done if a Customer tuple is
    deleted?
  • Also delete all Transaction tuples that refer to
    it.
  • Disallow deletion of a Customer tuple that has
    associated Transactions.
  • Set cid in Transactions tuples to a default or
    special cid.
  • SQL supports all three choices

53
Where do ICs Come From?
  • ICs are based upon the semantics of the
    real-world enterprise that is being described in
    the database relations.
  • We can check a database instance to see if an IC
    is violated, but we can NEVER infer that an IC is
    true by looking at an instance.
  • An IC is a statement about all possible
    instances!
  • From example, we know state cannot be a key, but
    the assertion that cid is a key is given to us.
  • Key and foreign key ICs are very common a DBMS
    supports more general ICs.

54
Security
  • Secrecy Users should not be able to see things
    they are not supposed to.
  • E.g., A student cant see other students grades.
  • Integrity Users should not be able to modify
    things they are not supposed to.
  • E.g., Only instructors can assign grades.
  • Availability Users should be able to see and
    modify things they are allowed to.

55
Discretionary Access Control
  • Based on the concept of access rights or
    privileges for objects (tables and views), and
    mechanisms for giving users privileges (and
    revoking privileges).
  • Creator of a table or a view automatically gets
    all privileges on it.
  • DMBS keeps track of who subsequently gains and
    loses privileges, and ensures that only requests
    from users who have the necessary privileges (at
    the time the request is issued) are allowed.
  • Users can grant and revoke privileges

56
Role-Based Authorization
  • In SQL-92, privileges are actually assigned to
    authorization ids, which can denote a single user
    or a group of users.
  • In SQL1999 (and in many current systems),
    privileges are assigned to roles.
  • Roles can then be granted to users and to other
    roles.
  • Reflects how real organizations work.
  • Illustrates how standards often catch up with de
    facto standards embodied in popular systems.

57
Security Is Important
  • Financial estimate of database losses, by
    activity in 1998
  • (ENT/CSI/FBI Computer Crime Survey, 5/1999)
  • Activity Loss in million
  • Theft of Proprietary Information 28.51
  • System Penetration by Outsiders 13.39
  • Financial Fraud 11.24
  • Unauthorized Insider Access 50.57
  • Laptop Theft 0.16
  • Total 103.87

58
Why Store Data in a DBMS?
  • Benefits
  • Transactions (concurrent data access, recovery
    from system crashes)
  • High-level abstractions for data access,
    manipulation, and administration
  • Data integrity and security
  • Performance and scalability

59
Why Parallel Access To Data?
At 10 MB/s 1.2 days to scan
1,000 x parallel 1.5 minute to scan.
1 Terabyte
Bandwidth
1 Terabyte
10 MB/s
Parallelism Divide a big problem into many
smaller ones to be solved in parallel.
60
Parallel DBMS Intro
  • Parallelism is natural to DBMS processing
  • Pipeline parallelism many machines each doing
    one step in a multi-step process.
  • Partition parallelism many machines doing the
    same thing to different pieces of data.
  • Both are natural in DBMS!

Any
Any
Sequential
Sequential
Pipeline
Program
Program
Sequential
Any
Any
Partition
Sequential
Sequential
Sequential
Sequential
Sequential
Program
Program
outputs split N ways, inputs merge M ways
61
DBMS The Success Story
  • DBMSs are the most (only?) successful application
    of parallelism.
  • Every major DBMS vendor has some server
  • Workstation manufacturers now depend on DB
    server sales.
  • Reasons for success
  • Bulk-processing ( partition -ism).
  • Natural pipelining.
  • Inexpensive hardware can do the trick!
  • Users/app-programmers dont need to think in

62
Some Terminology
Ideal
  • Speed-Up
  • More resources means proportionally less time for
    given amount of data.
  • Scale-Up
  • If resources increased in proportion to increase
    in data size, time is constant.

Xact/sec. (throughput)
degree of -ism
Ideal
sec./Xact (response time)
degree of -ism
63
Architecture Issue Shared What?
Hard to program Cheap to build Easy to scaleup
Easy to program Expensive to build Difficult to
scaleup
Oracle
Compaq, Teradata, SP2
Sybase
64
Distributed Database Systems
  • Data is stored at several sites, each managed by
    a DBMS that can run independently.
  • Distributed data independence Users should not
    have to know where data is located (extends
    physical and logical data independence
    principles).
  • Distributed transaction atomicity Users should
    be able to write transactions accessing multiple
    sites just like local transactions.

65
Types of Distributed Databases
  • Homogeneous Every site runs same type of DBMS.
  • Heterogeneous Different sites run different
    DBMSs (different RDBMSs or even non-relational
    DBMSs).

Gateway
DBMS1
DBMS2
DBMS3
66
Distributed DBMS Architectures
QUERY
  • Client-Server

CLIENT
CLIENT
Client ships query to single site. All
query processing at server. - Thin vs. fat
clients. - Set-oriented communication,
client side caching.
SERVER
SERVER
SERVER
SERVER
  • Collaborating-Server

SERVER
Query can span multiple sites.
SERVER
QUERY
67
DBMS and Performance
  • Efficient implementation of all database
    operations
  • Indexes Auxiliary structures that allow fast
    access to the portion of data that a query is
    about
  • Smart buffer management
  • Query optimization Finds the best way to execute
    a query
  • Automatic query parallelization

68
Performance Tips
  • Bad connection management is the number one
    scalability problem (can make two order of
    magnitude difference)
  • Reuse once-executed queries as much as possible
  • Physical performance database tuning is crucial
    for good performance
  • Performance tuning is an art and a science
  • Recent trend DBMS include tuning wizards for
    automatic performance tuning
  • Include space for indexes during space planning

69
Summary Of DBMS Benefits
  • Transactions
  • ACID properties, concurrency control, recovery
  • High-level abstractions for data access
  • Data models
  • Data integrity and security
  • Key constraints, foreign key constraints, access
    control
  • Performance and scalability
  • Parallel DBMS, distributed DBMS, performance
    tuning

70
The Future of Data Exchange Markup
  • From HTML to XML
  • Markup Languages HTML
  • Simple markup language
  • Text is annotated with language commands called
    tags, usually consisting of a start tag and an
    end tag

71
HTML Example Book Listing
  • ltHTMLgtltBODYgt
  • Fiction
  • ltULgtltLIgtAuthor Milan Kunderalt/LI?
  • ltLIgtTitle Identitylt/LIgt
  • ltLIgtPublished 1998lt/LIgt
  • lt/ULgt
  • Science
  • ltULgtltLIgtAuthor Richard Feynmanlt/LIgt
  • ltLIgtTitle The Character of Physical
    Lawlt/LIgt
  • ltLIgtHardcoverlt/LIgt
  • lt/ULgtlt/BODYgtlt/HTMLgt

72
Beyond HTML XML
  • Extensible Markup Language (XML) Extensible
    HTML
  • Confluence of SGML and HTML The power of SGML
    with the simplicity of HTML
  • Allows definition of new markup languages, called
    document type declarations (DTDs)

73
XML Language Constructs
  • Elements
  • Main structural building blocks of XML
  • Start and end tag
  • Must be properly nested
  • Element can have attributes that provide
    additional information about the element
  • Entities like macros, represent common text.
  • Comments
  • Document type declarations (DTDs)

74
Booklist Example in XML
  • lt?XML version1.0 standaloneyes?gt
  • lt!DOCTYPE BOOKLIST SYSTEM booklist.dtdgt
  • ltBOOKLISTgt
  • ltBOOK genreFictiongt
  • ltAUTHORgt
  • ltFIRSTgtMilanlt/FIRSTgtltLASTgtKunderalt/LASTgt
  • lt/AUTHORgt
  • ltTITLEgtIdentitylt/TITLEgt
  • ltPUBLISHEDgt1998lt/PUBLISHEDgt
  • ltBOOK genreScience formatHardcovergt
  • ltAUTHORgt
  • ltFIRSTgtRichardlt/FIRSTgtltLASTgtFeynmanlt/LASTgt
  • lt/AUTHORgt
  • ltTITLEgtThe Character of Physical Lawlt/TITLEgt
  • lt/BOOKgtlt/BOOKLISTgt

75
XML DTDs
  • A DTD is a set of rules that defines the
    elements, attributes, and entities that are
    allowed in the document.
  • An XML document is well-formed if it does not
    have an associated DTD but it is properly nested.
  • An XML document is valid if it has a DTD and the
    document follows the rules in the DTD.

76
An Example DTD
  • lt!DOCTYPE BOOKLIST
  • lt!ELEMENT BOOKLIST (BOOK)gt
  • lt!ELEMENT BOOK (AUTHOR, TITLE, PUBLISHED?)gt
  • lt!ELEMENT AUTHOR (FIRST, LAST)gt
  • lt!ELEMENT FIRST (PCDATA)gt
  • lt!ELEMENT LAST (PCDATA)gt
  • lt!ELEMENT TITLE (PCDATA)gt
  • lt!ELEMENT PUBLISHED (PCDATA)gt
  • lt!ATTLIST BOOK genre (ScienceFiction)
    REQUIREDgt
  • lt!ATTLIST BOOK format (PaperbackHardcover)
    Paperbackgt
  • gt

77
Domain-Specific DTDs
  • Development of standardized DTDs for specialized
    domains enables data exchange between
    heterogeneous sources
  • Example Mathematical Markup Language (MathML)
  • Encodes mathematical material on the web
  • In HTML ltIMG SRCxysquare.gif ALT(xy)2gt
  • In MathML ltapplygt ltpower/gt ltapplygt ltplus/gt
    ltcigtxlt/cigt ltcigtylt/cigt lt/applygt ltcngt2lt/cngt
    lt/applygt

78
XML The Future
  • Seamless integration of heterogeneous B2B data
    sources and business processes
  • XML-specified services, discovered by autonomous
    software agents
  • Industry-specific DTDs that simplify data
    exchange
  • Only XML-industry revenue forecasts1999 31
    millions, 2001 93 millions(Interactive Week,
    4/1999)

79
Summary Of Lecture One
  • Database systems are powerful, but also complex
    pieces of software
  • Transactions (concurrent data access, recovery
    from system crashes)
  • High-level abstractions for data access,
    manipulation, and administration
  • Data integrity and security
  • Performance and scalability
  • XML is the Future of Data Exchange

80
In The Second Lecture
  • Database design
  • A short introduction to SQL
  • Transaction processing versus decision support
    On to the data warehouse!

81
Questions?
Write a Comment
User Comments (0)
About PowerShow.com