Title: MetaLib Technical Notes Sue Dentinger UW Madison Library Technology Group 32906 WAAL presentation
1MetaLib Technical NotesSue DentingerUW Madison
Library Technology Group3/29/06 WAAL
presentation
2Outline
- MetaLib Both a Content Management System and a
Metasearch engine - What does a Metasearch look like in MetaLib
- What is a Metasearch?
- Types of Metasearch Connections to Various
Sources - Metasearch Pros and Cons
- Metasearch/Federated Searching Differences
- Example Federated Search
- Federated Search Pros and Cons
- Conclusion
- Articles Cited
3MetaLib Both a Content Management System (CMS)
and Metasearch engine
- CMS (Oracle-based, menu-driven, web interface
provided for resource administration) - Web-based public interface you can tailor to some
degree with logos/graphics/fonts - Or for develop your own interface using XML
gateway API
4Metalib Admin
5Already saw MetaLib Interface, What does MetaLib
look like as a Metasearch?
6During a Metasearch
7Metasearch Results Page
8One View of Results from a Metasearch
9What is a Metasearch?
- A metasearch uses real-time to search external
resources. - Metasearching dynamically searching 2 or more
resources. - Search occurs in a unified environment. You dont
control incoming data sources, - Interface controls how results presented.
- Many sources searched goes through a program,
(often ExLibris supplied, or write your own) to
pre-establish search mechanism. - Search method is predefined using for example
- Z39.50, XML gateway, Search and Link or Screen
scraping, - Patron doesnt really know/care which method is
used.
10Types of Metasearch Connections to Various
Sources
- Z39.50 Meaning vendor must run a Z39.50 server.
Metalib is the client. (like EndNote) - (uses MARC, SUTRS (Simple Unstructured Text
Record Syntax) or other underlying data formats) - XML gateway (SRU/SRW) (Search Retrieve Web
Service /Search Retrieve URL Service)
(Best choice) - 3 basic operations Explain structure Scan
Search-Retrieve - May use Common Query Language (CQL)
- More efficient than Z39.50, returning XML results
- HTTP requests, search and link resources.
- Base URL and request sent to source, only hits
returned, go to native interface to see results. - Format conversion tables often used to read
results back into MetaLib. - External program (often screen scraping).
11Z39.50 Example
Format Conversion pgm specifies program used to
convert incoming recs to Metalib internal format.
ExLibris supplied, or you can write.
Record type Incoming record format.
12Z39.50 Specify what/how to search
Z39.50 attributes... u use attribute, specifies
field to search, s structure (phrase or word
search), t truncation (left, right, both)
EEEK!
Term transformations convert users metalib query
into target db search format by applying
pre-programmed term transformations.
13Metasearch Pros
- Data comes from original sources.
- Data is as fresh as can be.
- Sources can have varying underlying formats.
- Different types of sources all relating to topic
can be hand-selected for specific purposes. - Deduping, sorting, presenting unified results all
in one interface across disparate sources. - User or library can control quality/content of
incoming sources to most pertinent. - Roy Tennant Its not just what you search but
what you dont search that counts.
14Metasearch Cons
- Real time is s-l-o-w-e-r
- Results coming back to web must have limits on
number to process from each source. - Sorting/deduping 10,000 records from 5 sources
would take too long.
15Metasearch/Federated Searching Differences
- In past metasearching federated searching
cross database searching parallel searching
broadcast searching integrated searching! - More recently, some library leaders (Roy Tennant,
CDL, Tamar Sadeh, ExLibris, and others) are
making a finer distinction between a metasearch,
a federated search and cross-database searching.
16Whats the Difference Between a Metasearch and a
Federated Search?
- Metasearch uses just-in-time (real-time)
processing for all kinds of data sources. - Metasearch systems are not databases of the data,
but hold structural info on retrieving from many
sources. - Federated search uses Just-in-case processing of
pre-populated, underlying repository of data
pre-harvested and ranked or sorted in a
pre-specified order.
17Metasearch 1
- Ideal solution is for metasearch systems to
receive resource specific info at time of actual
interaction and figure out flow of the
interaction based on a picture or design of the
information itself. - Tamar Sadeh, Google Scholar versus Metasearch
Systems. Jan 11, 2006. http//library.cern.ch/HEP
LW/12/papers/1 - Concept of a metasearch is based on Tim Berners
Lee concept of the semantic web, a W3C project.
18Metasearch 2
- Semantics meaning. If a computer
understands the semantics of a document, it
understands the meaning, rather than just
interpreting a series of characters. - Semantic web project of the W3C in which
automated methods based on quality metadata are
envisaged to replace much human searching of the
web. Relies on ontologies, XML and RDF. - http//www.webindexing.biz/Webbook2Ed/glossary.htm
19Example of a Federated Search?
Google Scholar, Elseviers SCIRUS, OVID,...
20Federated Search
- Searches a repository or index of objects
populated earlier from multiple data sources. - Presents unified interface.
- Just-in-case processing pre-process ranking
algorithm often based on number of times cited or
other criteria. It can be applied to data
elements unrelated to any future query. - Endeca used at NCSU called a web navigation
system, but could be considered closer to a
federated search of a pre-populated database. - FAST used at Elsevier uses similar
pre-processing - Predetermined rank can be used to better evaluate
relevance of an item retrieved in a query.
21Federated Search Pros
- In general just-in-case pre-processing will have
much better performance. VERY FAST! - Can provide an initial sort and often relevance
ranking.
22Federated Search Cons
- Searches not done in real time.
- Often content in repository is not as fresh as it
should or could be. - Can have months of delay.
- Content provider must maintain underlying
database and constantly feed it. - Pre-sorted, so almost always no ability to
re-sort on any other criteria.
23Conclusion
- Behind the scenes MetaLib doing a lot of real
time work! - ExLibris continually updates connection programs
for many vendors so we dont have to. Regular
updates. - We do need to tweak/maintain our local access to
each vendor. - What to answer when people ask....Why is
metasearching so slow compared to Google? - Comparing apples and oranges!
- Google is not just articles, but blogs, comments,
unfiltered web. - Google is not real-time.
- Google cannot be sorted differently.
- Google is too humongous! Harder to do nuanced
searching. BUT....
24Conclusion cont
- Google Scholar or federated searching on the
other hand... gives metasearching a run for the
money despite current drawbacks in content and
freshness. - Real advantage in speed and numbers of records
processed ahead. - Real problems in data harvested and which
articles are most relevant for whom and when. - In long run, the concept of metasearching better
suits the concept of the semantic web.
25Articles Cited
- California Digital Library Glossary Definition.
http//www.cdlib.org/inside/diglib/glossary/ - Sadeh, Tamar. Google Scholar versus Metasearch
Systems. In High Energy Physics Libraries
Webzine. Issue 12, Feb. 2006. http//library.cern
.ch/HEPLW/12/papers/1 - Lease Morgan, Eric. SRW/U in Five Hundred Words.
http//www.loc.gov/z3950/agency/zing/srw/brief.htm
l - SRW Search/Retrieve Web Service. Z39.50
International Next Generation.
http//www.loc.gov/z3950/agency/zing/srw/z3950.htm
l - Jermey, Jon and Browne, Glenda. Website Indexing
enhancing access to information within websites.
Glossary. http//www.webindexing.biz/Webbook2Ed/gl
ossary.htm
26Questions?
- Continued Usability Studies
- Custom Search Launch
- My Space Login
- Campus Marketing and Instruction
Contact Information Todd Bruns
tbruns_at_library.wisc.edu Sue Dentinger
sdentinger_at_library.wisc.edu Amy Kindschi
kindschi_at_engr.wisc.edu