Lecture II Database Structure, Organization and Selection - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Lecture II Database Structure, Organization and Selection

Description:

Dictionary/Inverted File or Basic Index. Set of records created from a linear file. ... Such terms in dictionary file would also require huge amounts of storage space ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 37
Provided by: rickb9
Category:

less

Transcript and Presenter's Notes

Title: Lecture II Database Structure, Organization and Selection


1
Lecture II - Database Structure, Organization and
Selection
  • Review - Database, Producer, Vendor???
  • Online Industry
  • Effective Searcher Characteristics
  • Online Timeline

2
Database Selection
  • Single most important factor in performing a good
    search
  • Requires minimal understanding of database
    structure
  • Practical v. Technical level approach

3
Basic Database Groups
  • Bibliographic - Consists of a citation to a
    published article, book, or report, with a
    summary (or abstract) and indexing terms
    describing the publication.
  • Non-Bibliographic - Often serves as the final
    source, usually. Provides actual answers.
    Fulltext. Financial reports. Searching for actual
    piece of information.

4
Types of Databases
  • Bibliographic
  • Full-Text- Complete text plus basic citation
  • Numeric - Contains statistical data, such as
    company financials, demographics, etc.
  • Directory - Provides succinct information about a
    company or product. Ex. - company name, address,
    phone number, sales figures, names of officers,
    etc.

5
Basic Database Terminology
  • Database - Collection of items or entities
  • Unit Record - Discrete representation of
    individual items in the database. Card in library
    catalog is UR representing a book. Also referred
    to as an index record. Surrogate of an entity.

6
Unit Record
  • Subdivided into different fields/paragraphs
  • Each has a label (field) and may be searchable or
    only displayable
  • Vendor takes tapes, then formats the UR to fit
    the systems protocol. (BIOSIS - sep. YR field
    may create unique tags)
  • Sub-fields such as journal name, city of
    publication in PU field, or subheadings.

7
Unit Record (contd)
  • Vendor may take 6 mo. to 1 year to mount a new
    database
  • How UR is formatted explained in documentation
    produced by vendor
  • MUST UNDERSTAND THE UR from both vendors
    producers view to search database effectively.
    Fields present and searchable. More than search
    engine!

8
Linear File
  • Set of index records each record describes one
    item, arranged in an order based on the values of
    one or more attributes
  • Entered by accession number, highest number the
    record input most recently
  • Linear file of card catalog the librarys
    accession list
  • Linear file of a book pages in numerical order

9
Inverted File
  • Created from the linear file
  • Already familiar - card catalog has two of them
    author/title index and subject index index in
    back of a book
  • Make information accessible and useable
  • Creation enables searching by subject, author,
    page number, or any entity deemed searchable

10
Inverted File (contd)
  • When performing a search, system does not have to
    search each individual record in database for
    your terms
  • Quickly searches the inverted file and find all
    documents that had a particular entity
  • Usually referred to as the dictionary file or the
    basic index
  • Varies from system to system

11
Dictionary/Inverted File or Basic Index
  • Set of records created from a linear file.
    Typically, lists of accession numbers where each
    AN is associated with a different entity. Ex.-
    The index at back of book.
  • Makes the information accessible/useable
  • Variable by system

12
Standardization
  • Reputable vendors attempt to standardize unit
    records (tho they didnt start that way) for the
    fields that can be
  • PY or AU fields are susceptible to variations in
    the way they have been loaded
  • Not always vendors fault but can be producers
    as well
  • Impossible to become expert searcher in all
    systems or all databases

13
Why is there no standardization?
  • Complex problem
  • Vendors that are essentially competitors have no
    reason to cooperate
  • Producers often vying to be the source of
    information in a subject area and arent about to
    get together with a rival
  • Will continue to be a problem

14
Parsing
  • Separating and sorting operations performed by a
    search service (vendor) on a given data field
    when inverted index is prepared from the linear
    file
  • Vendor determines items that are searched as one
    unit
  • Multi-word descriptor or an entire journal name
    may be considerd one word in the dictionary file

15
Parsing (contd)
  • To flexibility and comprehensiveness, vendor
    may elect to enter field elements into the
    dictionary file more than one way
  • Each searchable filed is indexed (or parsed) by
    words or phrases being extracted and entered into
    an alphabetical list so they can be searched
    separately
  • Varies from database to database and from vendor
    to vendor. Vendor determines items to be searched
    as a single unit

16
Field may be parsed into
  • Single words only
  • Multiple-word phrases only
  • Both single words and multiple-word phrases
  • Word fragments (such as segments of chemical
    compounds)

17
Parsing Rules
  • Double-Posting - Process of entering
    multiple-word search terms both as one unit and
    as individual words
  • Binding - Hyphenating or otherwise parsing
    multi-unit search phrases so that they are
    searched as one word

18
Parsing (contd)
  • Double-Posting - Consider the descriptor
    computer-assisted instruction could be entered
    as one searchable with hyphens between the words
    and as three separate terms. Ex.-computer-assiste
    d-instruction as one unit computer, assisted,
    instruction as three individual words.

19
Binding or Bound
  • Author names may be bound in a database and this
    means that the last name plus first name or
    initials can be searched quickly as one word
  • A bound descriptor such as computer-assisted
    instruction can be searched as one word
  • And, double-posted means it can be searched using
    any of the three terms computer, assisted, or
    instruction

20
Stopwords
  • Common words used so often and are so heavily
    posted in the database quite literally those
    words that the search system ignores
  • Otherwise, inordinate amount of processing time
    expended when cluttered with millions of
    occurrences of words
  • Such terms in dictionary file would also require
    huge amounts of storage space
  • Each system has determines its own list

21
Stopwords as Concepts
  • Type A Behavior or Cyclosporin A used to be
    extremely difficult to search before bound
    descriptors evolved
  • OK when they occur within a bound descriptor
    phrase
  • If Type A Behavior is double-posted, however,
    only the words type and behavior are entered
    into the dictionary file
  • Bypass surgery another case in point

22
Punctuation
  • Many systems elect to drop as it occurs within
    the text or descriptors
  • Apostrophes, hyphens, asterisks, parentheses,
    etc. will be disregarded and not appear in the
    dictionary file
  • But, they do still print out in retrieval text
  • Check system documentation for rules
  • Childrens literature 3 words!

23
System Limits
  • Each system will have limits on how many
    characters make a search statement
  • Also, a limit on the number of search queries or
    statements the system will hold

24
Remember.......
  • Rules for parsing, text-editing, and query memory
    may seem purely mechanical and arbitrary, but
    because they vary widely and can affect your
    searching, you must be aware of them for the
    system youre using before you begin a search.

25
Search Logic
  • Basic and common to most online systems
  • Order of Operators
  • George Boole - Mathematical Logic
  • John Venn - Pictorial Representations

26
Logical Operators
  • Also called Boolean Operators
  • Logically represent relationships between
    concepts
  • AND, OR, and NOT

27
AND
  • Simply means two or more concepts are contained
    within one set or within one document and is
    represented by two intersecting circles, where
    the ANDed area is shaded in .
  • AND Intersection
  • Decreases retrieval

28
OR
  • Creates the larger set by looking for any
    occurrence of the terms you have listed in your
    search
  • Merely an addition process or Union
  • OR is represented by the entire area of all the
    circles
  • Increases retrieval
  • Eliminates duplication within the set

29
NOT
  • Excludes a term or terms
  • NOT Exclusivity or Difference
  • Must be used Judiciously!
  • Effective use is to employ NOT in strategies to
    remove sets already viewed or printed
  • Cost Savings Effect

30
Order of Processing
  • Processing Order varies by system for
    logical/boolean operators
  • DIA processes NOT first, followed by AND and then
    OR
  • Most systems process the ANDs first
  • Check system documentation

31
NESTING
  • Allows us to group sets and effectively use
    boolean operators
  • Use of Parentheses directs the system to process
    certain operations first
  • Again, important to now the operator order
  • Works because most systems follow the rule that
    what is enclosed in ( ) will be processed first
    and a good idea to nest concepts

32
Positional Operators
  • Also called Proximity Operators
  • Express relationships between terms contained in
    the same document
  • Ordinarily, a proximity operator stipulates that
    one term take a certain order
  • Specifies word order of search terms
  • ADJ, WITH (or NEAR), SAME(or FIELD), LINK, etc.

33
Positional Operators (contd)
  • ADJ adjacency very specific word order term
    must be directly next to another term in that
    specified order most specific of proximity
    operators
  • WITH Terms can occur in any order leaving room
    for unanticipated adjectives between two terms
    preference is WITH over ADJ bi-directional
  • Both are ususally defined to a sentence which may
    vary in definition

34
Positional Operators (contd)
  • SAME(or FIELD) Words in same field but no word
    order specified
  • LINK Words in same descriptor unit. Used to
    link heading and subheadings
  • n Up to n intervening words

35
Hierarchy of Positional Operators
  • ADJ is first and most specific in the hierarchy
  • WITH is broader in the hierarchy
  • SAME is broadest in the hierarchy
  • All combine concepts within the same document as
    opposed to logical operators
  • Order of processing, generally is ADJ, followed
    by WITH followed by SAME

36
Other System Commands
  • Logon and Logoff
  • Cost
  • Time
  • Passwords
  • Save Search
  • Execute Search
  • Type/Print
  • Delete Sets
  • Limits
  • Sort
  • Change Databases
  • Display Sets
Write a Comment
User Comments (0)
About PowerShow.com