Principles of Computing and Information Technology Lecture 6 Databases and database retrieval - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Principles of Computing and Information Technology Lecture 6 Databases and database retrieval

Description:

Number of items found from total number in the database. Precision: ... Keys for successful searching. Need to be aware of true information need! ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 44
Provided by: AndyD153
Category:

less

Transcript and Presenter's Notes

Title: Principles of Computing and Information Technology Lecture 6 Databases and database retrieval


1
Principles of Computing and Information
Technology Lecture 6 - Databases and database
retrieval
  • Andy Dawson
  • Department of Information Studies, UCL

2
What were going to look at today
  • What is a database?
  • Database structure
  • (R)DBMS vs IR
  • Indexing of databases
  • Mechanisms for database retrieval

3
What is a database?
  • Not just electronically mounted!
  • A collection of data
  • With inherent relationships
  • Put together in a structured way
  • With the intention of being used for a purpose

4
What is a database?
  • DATA ELEMENTSare combined into
  • DATA STRUCTURESwhich are built into a
  • DATABASEAn organised collection of related sets
    of data managed in such a way as to allow the
    user to view the complete collection, or a
    logical subset of that collection, as a single
    unit.

5
Views in a database
  • A view is like a search request find all books
    by Dickenslist all authors in catalogue
  • i.e. retrieve those items which match our search
    terms
  • Databases can vary from the relatively simple to
    the highly complex

6
The simplest possible model of a Computer System
revisited!
7
The importance of database structure
  • Organisation of data in a database affects the
    ways in which information can be retrieved

8
Database structure
  • Records
  • Fields
  • File structure
  • Indexing
  • Classification

9
Structural organisation within databases
  • Flat file databases
  • Complex databases
  • DBMS and Relational databases

10
A complex flat file database system
11
A DBMS system
12
Database Management Systems (DBMS)
  • Special purpose software to facilitate shared
    access to data in a database, maintaining
    reliability, security and integrity of the
    database by controlling access and supervising
    updates.
  • A set of programs which enable us to set up a
    database of our own data.

13
Database Management Systems
  • DBMS are intended to handle structured data
  • Architecture varies
  • Flexible report generation based on field
    structure
  • BUT limitations of DBMS in searching
  • Need to specify field
  • Exact match on whole field
  • Slow on unindexed fields
  • Report formats may be limited

14
RDBMS
  • Development of DBMS concept
  • Holds different database files in separate
    tables
  • Tables can be linked together as needede.g.
    borrower and bibliographic records can be linked
    to give a full listing of which readers have
    borrowed which books

15
RDBMS Table structure
16
DBMS vs Text/Information Retrieval Systems
  • Database (data-based) systems data is typically
    encoded, highly structured, with fixed access
    points
  • Text retrieval/information retrieval systems data
    is typically textual information, less
    structured, less organised, more natural

17
Text/Information Retrieval Systems
  • Data is still held in a database but is accessed
    by different software
  • Distinction is in this and the different
    organisation (and nature) of the data

18
Text/Information Retrieval Systems
  • Can search for terms anywhere in document
  • MAY limit search to specific fields, if required
    and present
  • Terms may be word, phrase, etc.
  • Index is an inverted file

19
Inverted file structure
20
Indexing
  • Hybridised systems common
  • Indexing effort relates to indexing method
  • Automated systems - minimal indexer effort,
    maximum user effort
  • Controlled systems - maximum indexer effort,
    minimum user effort

21
Indexing spectrum
  • free text unindexed
  • free text indexed
  • free text indexed and controlled
  • controlled
  • INITIAL EFFORT AT INPUT
  • DETERMINES
  • USER EFFORT AT OUTPUT

22
NB do not confuse
  • Free-text systems referring to the method of
    indexing
  • Full-text systems describing the content
  • Full-text systems contain complete documents

23
What is online searching?
  • Looking for information
  • Using a computer
  • In real time (interactively)
  • Includes
  • Traditional (remote host) online searching
  • Internet searching
  • CD-ROM/DVD searching

24
Benefits of using online searching
  • Speed/volume of material
  • Ability to combine topics
  • Currency
  • Immediate feedback

25
Very abridged history of Online
  • 1951 Bagley experiments at MIT
  • 1964 MEDLARS on tape
  • 1972 Dialog commercial service
  • 1974 Telenet/Tymnet nodes in Europe
  • 1991 ESA-IRS new price structure
  • 1992 Dialog provides Internet access
  • 1996 Surge in Internet search engines
  • 2000 Portals, agents, precision problems

26
Basic components of an online search service
  • Information providers (database producers)
  • Search services (hosts)
  • Communications links (telephone/data lines)
  • Terminal equipment (workstation/software)

27
Basic tools for online searching
  • Boolean operators
  • Proximity searching
  • Truncation/substitution
  • Index viewing
  • Qualification
  • Controlled terms
  • Best match searching

28
Boolean operators
  • And
  • Narrows a search
  • Reduces number of hits
  • Or
  • Broadens a search
  • Increases number of hits
  • Not
  • Narrows a search
  • Excludes some hits

29
Boolean diagrams
  • marine and biology
  • marine or sea
  • york not new
  • 1 and 2 not 3 or 4

30
Boolean AND
31
Boolean OR
32
Boolean NOT
33
Boolean order of processing
34
Proximity searching
  • The importance of word position
  • Proximity operators
  • Adjacency/phrase
  • Distance
  • Sentence/paragraph
  • Element

35
Truncation, Index viewing, Qualification
  • Stemming
  • Left/right truncation
  • Limited unlimited substitution
  • Index viewing
  • Field/Code qualifiers

36
Controlled terms
  • Index terms
  • Major/minor descriptors
  • Codes
  • Thesaural control

37
Best match searching
  • Searching for idiots...
  • Automatic fuzzy searching
  • No operators
  • Ranked output
  • Weighting of terms

38
Recall and Precision
  • RecallNumber of items found from total number
    in the database
  • PrecisionNumber of items suitable from total
    number retrieved
  • Typically trade off against each other

39
Net searching
  • Types of engine
  • Machine indexed
  • Human indexed
  • Agent/Metasearch tools

40
Net searching
  • Particular problems with overload
  • Default high recall, low precision
  • Strength of categorised services
  • Google and citation ranking
  • benefits
  • drawbacks
  • Fundamental difficulties of the global database

41
Strategies for searching
  • Quick n dirty
  • Building block approach
  • Successive fraction approach
  • Pearl growing approach

42
Keys for successful searching
  • Need to be aware of true information need!
  • Need to take account of database type
  • Need to be aware of particular structures
  • Need to be aware of special tools
  • Need to understand limitations of interface

43
Thats it for today...
  • Any Questions?

44
This weeks practicals
  • More XHTML!
  • Only two weeks to Online Information at Olympia!
  • Register free online right away via
  • http//www.online-information.co.uk/
  • (link for registration in the top right hand
    corner)or youll have to pay 25 to get in!
Write a Comment
User Comments (0)
About PowerShow.com