LIS618 lecture 0 - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

LIS618 lecture 0

Description:

I will not talk about the strike. A look at the course home page ... IR has received a lot of impetus through the web, which poses unprecedented search challenges. ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 41

Provided by: kric

Learn more at: http://openlib.org

Category:

more less

Transcript and Presenter's Notes

Title: LIS618 lecture 0

1
LIS618 lecture 0

Thomas Krichel
2003-09-14

2
today's lecture

I will not talk about the strike.
A look at the course home page
http//wotan.liu.edu/home/krichel/lis618n03a
administrative stuff
historical matters about the course
about me
business of database searching
indexes
the Boolean information retrieval model
practice example on Dialog

3
Organization

homepage http//wotan.liu.edu/home/krichel/lis618n
03a
Contents to be discussed today.
Send mail to krichel_at_openlib.org
Your name
Your secret word for grades delivery
Interrupt me with as many questions as possible!
Ask for breaks!

4
Proposed Organization

Normal lecture
Quiz at the beginning of every lecture
Factually oriented, around 15 minutes
Remove worst performance
Average to form 50
Search exercise 50
Formal syllabus to be made early next week!

5
Search exercise

find victim of an information need
best to take someone you know in a professional
capacity
conduct interview about an information need
experienced by the victim, write down
expectations
search in formal database and on web
discuss results with the victim
write essay, no longer than 7 pages.

6
about the course

This course is new wine in an old bottle
Officially a merger of
lis566 information resources on the Internet
mailing lists
usenet news
web searching
lis618 database searching
access and use of commercial databases

7
mix of theory and practice

I am not a database search practitioner.
Each database is different, practical skills are
not easily transferable.
Thus my emphasis in the course is more on theory.
In the past, I theory first, then practice.
This year I will try to mix. Some theory and some
practice in every session.

8
What databases?

Dialog has been the traditional database covered.
They were the market leaders in online databases
in the past.
Nowadays the field is much more open
In addition I have done Nexis, FirstSearch (OCLC)
in the past.
But I am open to suggestions.

9
About me

Born 1965, in Völklingen (Germany)
Studied economics and social sciences at the
Universities of Toulouse, Paris, Exeter and
Leiceister.
PhD in theoretical macroeconomics
Lecturer in Economics at the University of Surrey
1993 and 2001
Since 2001 assistant professor at the Palmer
School

10
Why?

During research assistantship period, (1990 to
1993) I was constantly frustrated with difficult
access to scientific literature.
At the same time, I discovered easy access to
freely downloadable software over the Internet.
I decided to work towards downloadable scientific
documents. This lead to my library career
(eventually).

11
Steps taken I

1993 founded the NetEc project at
http//netec.mcc.ac.uk, later available at
http//netec.ier.hit-u.ac.jp as well as at
http//netec.wustl.edu.
These are networking projects targeted to the
economics community. The bulk is
Information about working papers
Downloadable working papers
Journal articles were added later

12
Steps taken II

Set up RePEc, a digital library for economics
research. Catalogs
Research documents
Collections of research documents
Researchers themselves
Organizations that are important to the research
process
Decentralized collection, model for the open
archives initiative

13
Steps taken III

Co-founder of Open Archives Initiative
Work on the Academic Metadata Format
Co-founded rclis, a RePEc clone for (Research in
Computing, Library and Information Science)

14
Interest in databases

From my point of view I have two interests in
database searching
As a provider, I must understand how people
search in order to provide some data that they
can use and will use.
As an economist, I have a strong interest in
information as a commodity. The database market
is an important market place.
Main emphasis of course is still on databases.

15
Database searching (DS)

subset of the subject of information retrieval
(IR)
DS mainly thought as applicable to the set of
large structured databases as opposed to do web
searching
for those, a general knowledge of what databases
are seems useful
Concentrate on textual databases

16
traditional social model

user goes to a library
describes problem to the librarian
librarian does the search
without the user present
with the user present
hands over the result to the user
user fetches full-text or asks a librarian to
fetch the full text.

17
economic rational for traditional model

In olden days the cost of telecommunication was
high.
database use costs
cost of communication
cost of access time to the database
the traditional model controls an upper bound on
costs

18
disintermediation

with access cost time gone, the traditional model
is under threat
there is disintermediation where the librarian
looses her role
but that may not be good news for information
retrieval results
user knows subject matter best
librarian knows searching best

19
Web searching

IR has received a lot of impetus through the web,
which poses unprecedented search challenges.
with more and more data appearing on the web DS
may be a subject in decline
it is primarily concerned with non-web databases
There is more and more web-based methods of
searching

20
Public access vs quality

Now the public at large is able to do online
searching.
At the same time need for quality answers has
grown.
Quality-filtered services will become more
important.
In the current databases, there is as lot that
would already be available for free mixed with
quality-controlled stuff.
Publishers have direct offerings and
intermediated vending is in decline.

21
Main theory part

Literature "Modern Information Retrieval" by
Ricardo Baeza-Yates and Berthier Ribiero-Neto
Don't buy it. It is a not a good book.

22
before the IR process

provider
define data that is available
documents that can be used
document operations
document structure
index
user
user need
IR system familiarity

23
the IR process

query expresses user need in a query language
processing of query yields retrieved documents
calculation of relevance ranking
examination of retrieved documents
possible relevance cycle

24
main problem

user is not an expert at the formulation of a
query
garbage in garbage out, the retrieval yields poor
result
ways out
design very intuitive interface for the query
give expert guidance

25
taxonomy of classic IR models

Boolean, or set-theoretic
fuzzy set models
extended Boolean
vector, or algebraic
generalized vector model
latent semantic indexing
neural network model
probabilistic
inference network
belief network

26
summary

There are three basic types of models in classic
information retrieval.
Extensions of these types are a matter of
research concern and require good mathematical
skills.
All classic models treat document as individual
pieces.

27
key aid index

an index is a list of terms, with a list of
locations where the term is to be found.
The way to express locations usually depends on
the form that the indexed data takes.
for a book, it is usually the page number, e.g.
"shmoo 34, 75"
for computer files it is usually the name of the
file plus the number of the byte where the
indexed term starts, e.g. "krichel index.html 34,
cv.html 890 1209"
there is usually more than one location of the
term.

28
key aid index terms

index term is a part of the document that has a
meaning on its own.
it is usually a noun word.
retrieval based on index term raises questions
semantics in query or document is lost
matching done in imprecise space of index terms
predicting relevance is a central problem
the IR model determines the process of relevance
ranking

29
basic concept weight of index term

given all nouns, not all appear to have the same
relevance to the text
sometimes, we can have a simple measure of the
importance of a term, example?
more generally, for each indexing term and each
document we can associate a weight with the term
and the document.
usually, if the document does not contain the
term, its weight is zero

30
Boolean model

in the Boolean model, the index weight of all
index term for any document is 1 if the term
appears in the document. It is 0 otherwise.
This allows to combine query terms with Boolean
operator AND, OR, and NOT
thus powerful queries can be written

31
Classic implementation dialog

http//training.dialog.com/sem_info/courses/pdf_se
m/dlg1.pdf
http//training.dialog.com/sem_info/courses/pdf_se
m/dlg2.pdf
http//training.dialog.com/sem_info/courses/pdf_se
m/dlg3.pdf
http//training.dialog.com/sem_info/courses/pdf_se
m/dlg4.pdf

32
Dialog is a databank

over 500 databases
these are also known as files and cover
references and abstracts for published
literature,
business information and financial data
complete text of articles and news stories
statistical tables
Directories
DIALOG uses the Boolean model

33
DIALOG interface

is still rooted in "traditional" database systems
dismissed as "dial-a-dog"
is uses a command-driven interface
it is very complicated to learn fully
it is not suitable for the end-user
it therefore offers a valuable skill to the
information professional
it is a challenge for a professor to teach

34
Accessing DIALOG

On the web, go to
http//www.dialogweb.com/
Enter username and password
Forget about subaccount
then click on logon
On the next screen go to command search
"continue" at the next screen

35
two steps in DIALOG

step one select databases (aka files) to look at
step two perform searches on the selected
databases
You may wonder why one does not have one single
step like in a search engine. Discuss.

36
sample search