Title: Principles of Computing and Information Technology Lecture 6 Databases and database retrieval
1Principles of Computing and Information
Technology Lecture 6 - Databases and database
retrieval
- Andy Dawson
- Department of Information Studies, UCL
2What were going to look at today
- What is a database?
- Database structure
- (R)DBMS vs IR
- Indexing of databases
- Mechanisms for database retrieval
3What is a database?
- Not just electronically mounted!
- A collection of data
- With inherent relationships
- Put together in a structured way
- With the intention of being used for a purpose
4What is a database?
- DATA ELEMENTSare combined into
- DATA STRUCTURESwhich are built into a
- DATABASEAn organised collection of related sets
of data managed in such a way as to allow the
user to view the complete collection, or a
logical subset of that collection, as a single
unit.
5Views in a database
- A view is like a search request find all books
by Dickenslist all authors in catalogue - i.e. retrieve those items which match our search
terms - Databases can vary from the relatively simple to
the highly complex
6The simplest possible model of a Computer System
revisited!
7The importance of database structure
- Organisation of data in a database affects the
ways in which information can be retrieved
8Database structure
- Records
- Fields
- File structure
- Indexing
- Classification
9Structural organisation within databases
- Flat file databases
- Complex databases
- DBMS and Relational databases
10A complex flat file database system
11A DBMS system
12Database Management Systems (DBMS)
- Special purpose software to facilitate shared
access to data in a database, maintaining
reliability, security and integrity of the
database by controlling access and supervising
updates. - A set of programs which enable us to set up a
database of our own data.
13Database Management Systems
- DBMS are intended to handle structured data
- Architecture varies
- Flexible report generation based on field
structure - BUT limitations of DBMS in searching
- Need to specify field
- Exact match on whole field
- Slow on unindexed fields
- Report formats may be limited
14RDBMS
- Development of DBMS concept
- Holds different database files in separate
tables - Tables can be linked together as needede.g.
borrower and bibliographic records can be linked
to give a full listing of which readers have
borrowed which books
15RDBMS Table structure
16DBMS vs Text/Information Retrieval Systems
- Database (data-based) systems data is typically
encoded, highly structured, with fixed access
points - Text retrieval/information retrieval systems data
is typically textual information, less
structured, less organised, more natural
17Text/Information Retrieval Systems
- Data is still held in a database but is accessed
by different software - Distinction is in this and the different
organisation (and nature) of the data
18Text/Information Retrieval Systems
- Can search for terms anywhere in document
- MAY limit search to specific fields, if required
and present - Terms may be word, phrase, etc.
- Index is an inverted file
19Inverted file structure
20Indexing
- Hybridised systems common
- Indexing effort relates to indexing method
- Automated systems - minimal indexer effort,
maximum user effort - Controlled systems - maximum indexer effort,
minimum user effort
21Indexing spectrum
- free text unindexed
- free text indexed
- free text indexed and controlled
- controlled
- INITIAL EFFORT AT INPUT
- DETERMINES
- USER EFFORT AT OUTPUT
22NB do not confuse
- Free-text systems referring to the method of
indexing - Full-text systems describing the content
- Full-text systems contain complete documents
23What is online searching?
- Looking for information
- Using a computer
- In real time (interactively)
- Includes
- Traditional (remote host) online searching
- Internet searching
- CD-ROM/DVD searching
24Benefits of using online searching
- Speed/volume of material
- Ability to combine topics
- Currency
- Immediate feedback
25Very abridged history of Online
- 1951 Bagley experiments at MIT
- 1964 MEDLARS on tape
- 1972 Dialog commercial service
- 1974 Telenet/Tymnet nodes in Europe
- 1991 ESA-IRS new price structure
- 1992 Dialog provides Internet access
- 1996 Surge in Internet search engines
- 2000 Portals, agents, precision problems
26Basic components of an online search service
- Information providers (database producers)
- Search services (hosts)
- Communications links (telephone/data lines)
- Terminal equipment (workstation/software)
27Basic tools for online searching
- Boolean operators
- Proximity searching
- Truncation/substitution
- Index viewing
- Qualification
- Controlled terms
- Best match searching
28Boolean operators
- And
- Narrows a search
- Reduces number of hits
- Or
- Broadens a search
- Increases number of hits
- Not
- Narrows a search
- Excludes some hits
29Boolean diagrams
- marine and biology
- marine or sea
- york not new
- 1 and 2 not 3 or 4
30Boolean AND
31Boolean OR
32Boolean NOT
33Boolean order of processing
34Proximity searching
- The importance of word position
- Proximity operators
- Adjacency/phrase
- Distance
- Sentence/paragraph
- Element
35Truncation, Index viewing, Qualification
- Stemming
- Left/right truncation
- Limited unlimited substitution
- Index viewing
- Field/Code qualifiers
36Controlled terms
- Index terms
- Major/minor descriptors
- Codes
- Thesaural control
37Best match searching
- Searching for idiots...
- Automatic fuzzy searching
- No operators
- Ranked output
- Weighting of terms
38Recall and Precision
- RecallNumber of items found from total number
in the database - PrecisionNumber of items suitable from total
number retrieved - Typically trade off against each other
39Net searching
- Types of engine
- Machine indexed
- Human indexed
- Agent/Metasearch tools
40Net searching
- Particular problems with overload
- Default high recall, low precision
- Strength of categorised services
- Google and citation ranking
- benefits
- drawbacks
- Fundamental difficulties of the global database
41Strategies for searching
- Quick n dirty
- Building block approach
- Successive fraction approach
- Pearl growing approach
42Keys for successful searching
- Need to be aware of true information need!
- Need to take account of database type
- Need to be aware of particular structures
- Need to be aware of special tools
- Need to understand limitations of interface
43Thats it for today...
44This weeks practicals
- More XHTML!
- Only two weeks to Online Information at Olympia!
- Register free online right away via
- http//www.online-information.co.uk/
- (link for registration in the top right hand
corner)or youll have to pay 25 to get in!