Wrapup - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Wrapup

Description:

Easy to make things complicated, very hard to keep them simple ... form of storage comes along with different properties (say holographic storage ? ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 11
Provided by: marily220
Learn more at: http://www.cs.umd.edu
Category:

less

Transcript and Presenter's Notes

Title: Wrapup


1
Wrapup
Amol DeshpandeCMSC424
2
Inventing the Future
  • Wednesday at 330pm
  • 1115 CSIC
  • http//www.cs.umd.edu/projects/ITF/
  • Exam

3
DBMS at a glance
  • Data Models
  • Conceptual representation of the data
  • Data Retrieval
  • How to ask questions of the database
  • How to answer those questions
  • Data Storage
  • How/where to store data, how to access it
  • Data Integrity
  • Manage crashes, concurrency
  • Manage semantic inconsistencies
  • Not fully disjoint categorization !!

4
DBMS at a glance
  • Data Models
  • E/R Model, Relational model
  • Very simple and hence effective
  • Easy to make things complicated, very hard to
    keep them simple
  • No other data model has survived for so long
  • What is the future of XML ?

5
DBMS at a glance
  • Data Retrieval
  • How to ask questions of the database
  • Declarative languages are great
  • Hide complexity from users, can optimize things,
    can evolve easily
  • SQL
  • More or less declarative
  • How to answer those questions
  • Parsing --gt Optimization --gt Processing
  • Operators Hashing, sorting, joins, aggregation
  • Data structures
  • Hash indexes Good for equality queries
  • Tree indexes For everything else
  • Optimization Complex, but key piece of a
    database system

6
DBMS at a glance
  • Data Storage
  • How/where to store data, how to access it
  • Need to be cognizant of the memory hierarchy
  • Memory is cheap, disk is very expensive to access
  • Further disk is cheap to access sequentially,
    much more expensive to access randomly
  • Many of our decisions are influenced by this
  • RAID Surviving failures
  • Accessing data Indexes
  • What happens if a new form of storage comes along
    with different properties (say holographic
    storage ?)
  • We will need to rethink the tradeoffs, but we now
    know the approach

7
DBMS at a glance
  • Data Integrity
  • Manage crashes, concurrency
  • Transactions, 2-phase locking
  • Write-ahead logging
  • DBMS pretty much the last word on
    concurrency/recovery
  • OSs dont come close to supporting anything like
    that
  • Manage semantic inconsistencies
  • Normalization, FDs
  • Not easy to identify tools, but we have learned
    how to think about them
  • Try to capture them in the E/R diagram as much as
    possible

8
Motivation Data Overload
  • We began the first lecture with discussing the
    data overload
  • Huge amounts of data generated every day
  • Much faster than our ability to process it
  • Increasing ability to capture more enterprise
    data
  • Web, blogs, RSS Feeds etc
  • Multimedia
  • Flickr and cellphone cameras has led a revolution
    in how people take pictures
  • Videos will be next
  • Not hard to imagine capturing every moment of
    your life
  • Sensor/RFID data
  • Tiny sensors/RFID just beginning to become
    ubiquitous
  • Billions of these generating a tiny amount of
    data every second is still too much
  • Biological/Scientific data

9
Motivation Data Overload
  • Relational databases help for structured data
  • But increasingly not sufficient
  • The things we want to do with data cant be
    expressed in SQL
  • E.g. with biological data, web
  • Too much unstructured data
  • Distributed data generation creates additional
    headaches
  • Almost impossible to try to collect the data in
    one location
  • Making sense of this requires not only advances
    in data processing, but also in data
    understanding/mining
  • Interdisciplinary efforts

10
Some Lessons from RDBMS
  • But can use the lessons learned from developing
    RDBMS
  • Data independence / abstraction is good
  • Hide details, even if initially it leads to
    inefficiency
  • Look for structure
  • Every seemingly highly unstructured data might
    have structure
  • Look for patterns in usage
  • Relational database are fast because query
    processing is predictable
  • Unlike say OS workloads which are very hard to
    optimize for
  • If you can identify patterns, you can probably
    optimize them
  • Declarative languages are great
  • Say what you want, not how to get it
Write a Comment
User Comments (0)
About PowerShow.com