Database Structure - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Database Structure

Description:

Writers write, and their works get published in ... Let's start over ... probably never see any indexes in their raw state - what we've been going over ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 40
Provided by: Christoph662
Category:

less

Transcript and Presenter's Notes

Title: Database Structure


1
Database Structure
  • Or, database lite - a very brief and simple
    introduction to the system behind the tool

2
In the beginning
  • There was hardcopy. Writers write, and their
    works get published in magazines, journals,
    newspapers, conference proceedings
  • Months or years afterwards, other writers and
    researchers would like to know what was written
    on a topic. Wouldnt it be great if there was a
    way to find everything that had been published on
    a topic without having to page through every
    likely journal?

3
Voila -
  • The hardcopy Index is born
  • An alphabetical list of words, representing
    subjects
  • Under each word a list of articles deemed to be
    about that subject.
  • This is all done, laboriously, by humans.
  • Given that, how many subjects do you think an
    article will be listed under?

4
Aside
  • If they thought their customers would pay the
    higher price, some Index vendors added Abstracts.
  • Thus, Abstracting and Indexing services or
    AI. You see this terminology in the literature
    sometimes.
  • The abstracts were all laboriously written by
    humans. Skilled humans.
  • Skilled humans are very expensive. And slow.
  • Paper and publishing are expensive too.
  • How many times do you think an entry for an
    article would be duplicated (appear under
    multiple subjects) in this situation?

5
Ok, ok, you get the picture
  • Just so you know that the electronic situation we
    have today is all grounded in a physical reality.
    Once it was all people and paper.
  • Now its very electronic, but there are still a
    lot of people involved.

6
Enter the computer
  • Even before word processors, when articles were
    still being produced on typewriters -
  • What if you could put information about those
    articles into a machine readable file - a
    database - something you could search?
  • Think of the advantages!

7
Database advantages
  • No longer bound by the confines of paper, space,
    expensive skilled personnel
  • Articles could be found under any of the terms
    associated with them, not just one or two
  • Even if material has to be re-keyed - typed into
    the database - this doesnt require subject
    specialists, just typists (cheap labor)
  • Turn around time is faster - your labor force
    isnt thinking and composing, just typing and if
    you choose to just offer your index online, you
    dont wait for physical publishing to take place.

8
As time goes on
  • The process of from article to index gets even
    faster
  • when articles are created electronically (e.g.
    word processing), no re-keying is needed, just
    software to convert/rearrange the material to fit
    your database fields
  • If you want to enhance the records at all,
    though, youre back to dealing with humans

9
Note
  • The electronic database is not necessarily
    Cheaper to create. The costs are shifted. Maybe
    it actually costs more! But customers will buy it
    because
  • Its so much more powerful and efficient.

10
End of historical background.
  • Now we really will get on to database structure.

11
Databases - fields and records
  • Think about drivers licenses. They all have the
    owners name, address, date of birth, bad photo
    and you can bet the DMVs database has fields
    just like that Name, Addr, DOB, BadPic, etc.
  • The database designer decides what fields are
    needed.

12
Fields and records cont.
  • For an article database, youd probably have a
    field for the article title, the name of the
    journal it appeared in what else?
  • One full set of fields will make up a record.
    Every record in the database will have the same
    set of fields (even if, in some records, some
    fields are blank).

13
Define your fields
which make up records
Sleep deprived rats are bigger biters
001
Science News
and form the basis of your database.
14
Beyond fields and records
  • Fields and records are the basis. What makes the
    database fast, powerful, and efficient are the
    indexes of the fields.
  • An index is a list of all the words in a field,
    with some kind of identifier pointing back to the
    record each word came from.
  • An index list can be alphabetized, and in various
    ways optimized for searching. This list is
    sometimes referred to as an inverted file.

15
A very, very simplistic example
  • Say we had 3 articles
  • Milky Ways Last Major Merger. Science News. v.
    162 24 p. 376
  • Its a Dogs Life.The Economist. December 21,
    2002. p. 61
  • Manhattan Mayhem.Smithsonian. v. 33 9 p. 44

16
Lets enhance these just a little
  • Well add a one-line description to each record
    (so we have a few more words to search on)
  • Record 1
  • Milky Ways Last Major Merger. Science News. v.
    162 24 p. 376 New clues about galaxy formation
    indicate early collision affected Milky Ways
    shape.

17
Record 2
  • Its a Dogs Life.The Economist. December 21,
    2002. p. 61 From hard labour to a beauty contest,
    a history of the work and whims of dog breeding.

18
Record 3
  • Manhattan Mayhem.Smithsonian. v. 33 9 p.
    44Martin Scorseses realistic portrayal of
    pre-Civil War strife - Gangs of New York -
    re-creates the brutal street warfare waged
    between immigrant groups.

19
My database has 4 fields
  • Record number
  • Article title
  • Journal name
  • Abstract


50 characters
50 characters
200 characters
20
Milky Ways Last Major Merger
New clues about galaxy formation indicate early
collision affected Milky Ways shape.
0001
Science News
From hard labour to a beauty contest, a history
of the work and whims of dog breeding.
The Economist
Its a Dogs Life
0002
Martin Scorseses realistic portrayal of
pre-Civil War strife - Gangs of New York -
re-creates the brutal street warfare waged
between immigrant groups.
Manhattan Mayhem
0003
Smithsonian
21
Indexing the fields
  • The initial list of words from the Article Title
    field looks like
  • Milky
  • Ways
  • Last
  • Major
  • Merger
  • Its
  • a
  • Dogs
  • Life
  • Manhattan
  • Mayhem
  • Hmm what is funky here? Will our indexing
    program handle those possessives and contractions
    ( s )? Do we want to clog it up with little
    words like a ?

22
After some decisions
  • Now we can sort!
  • Dogs
  • Its
  • Last
  • Life
  • Major
  • Manhattan
  • Mayhem
  • Merger
  • Milky
  • Ways
  • Uh oh. How will we know which record a word came
    from??

23
Lets start over
  • Lets make sure our index list includes the
    record number, and which field the word came
    from
  • 0001 Milky TI
  • 0001 Ways TI
  • 0001 Last TI
  • 0001 Major TI
  • 0001 Merger TI
  • 0002 Its TI
  • 0002 Dogs TI
  • 0002 Life TI
  • 0003 Manhattan TI
  • 0003 Mayhem TI

24
And one more thing
  • We can include a number representing the order of
    the word within the field (why?)
  • So we get something like this

25
Now well sort again
26
Our index of the Journal Name field
27
After indexing each field, combine the indexes so
you only have one list to search
28
And the abstracts
  • Decisions and clean-up on this list of words
  • Stop words
  • Hyphenated words
  • Proper names - double post to include the
    phrase too?
  • Alternate spellings - do we do anything about
    them or no?

29
(No Transcript)
30
Luckily
  • Software does almost all of this work for us.
  • And youll probably never see any indexes in
    their raw state - what weve been going over here
    is very under the hood, often even proprietary
    material for the database vendors.
  • But we can see a sort of users view

31
Examples
  • EBSCO MasterFiles Subjects list
  • FirstSearch indexes - single word and phrase
  • Reminder indexes come from record structure
  • example of elaborate record structure ERIC on
    FirstSearch

32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
Assignment
  • What would your ideal database record look like?
Write a Comment
User Comments (0)
About PowerShow.com