Title: From Web Indexing To Hybrid Libraries, With Thanks to eLib
1From Web Indexing To Hybrid Libraries,With
Thanks to eLib
- Brian Kelly
- UK Web Focus
- UKOLN
- University of Bath
- Bath, BA2 7AY
- Email B.Kelly_at_ukoln.ac.uk
- URL http//www.ukoln.ac.uk/
- Aims of Talk
- Review approaches taken by UK HE community to
indexing web sites - Discussion of findings
- Describe future developments
UKOLN is funded by the Library and Information
Commission, the Joint Information Systems
Committee (JISC) of the Higher Education Funding
Councils, as well as by project funding from the
JISC and the European Union. UKOLN also
receives support from the University of Bath
where it is based.
2Which To Choose?
Can choose byreading reviews, web sites, etc. or
by looking at usage in community
- Glimpse
- Harvest
- ht//Dig
- ICE
- iHound (ICATT)
- Index Search (Xavatoria)
- Index Server (Microsoft)
- IndexMySite (remote)
- Infoseek - Ultraseek
- Intermediate Search
- intraSearch (remote)
- I-Search
- Isearch
- ITMS
- Isysweb
- Java Applets
- JHLSearch
- JObjects QuestAgent
- Lycos / InMagic
- Alkaline (Vestris)
- AltaVista - Search Intranet
- ASTAWare SearchKey
- atomz Search (remote)
- BooleanSearch
- BBDBot
- BRS/Search (Dataware)
- Compass Server (Netscape)
- Cybotics
- DataWare BRS/Search
- DocFather (formerly SiteSearch)
- dtSearch Web
- Excalibur RetrievalWare
- EWS (Excite)
- Excerpt (Obsolete)
- Extense
- FAST Search Server
- Findex (code library)
- Folio siteDirector
- Magnifi Enterprise Server
- Matt's SimpleSearch
- Microsoft Index Server
- Microsoft Site Server
- MiniSearch (remote)
- MondoSearch
- Muscat
- NetResults (now SearchKey Plus)
- Netscape - Compass Server
- OpenText - LiveLink
- Perl Scripts
- Perlfect Search
- Phantom (Maxum)
- PicoSearch (remote)
- Etc.
Indexing software from lthttp//searchtools.com/too
ls/tools.htmlgt Which to choose? What software
may be obsolete? What does remote mean?
3Findings UK HE Web Sites
- Main findings of 2 surveys
Nos. (Mar)
Software
Nos. (Jul)
32
?
ht//Dig
25
17
eXcite
19
?
15
?
Microsoft
12
6
?
Harvest
8
9
?
Ultraseek
7
34
Other
29
50
None
60
?
Totals
160
163
- Article published in Ariadne issue 21 -
lthttp//www.ariadne.ac.uk/issue21/webwatch/gt - Results (including update on survey) available
fromlthttp//www.ukoln.ac.uk/web-focus/surveys/uk
-he-search-engines/gt
4Popular Products ht//Dig
- ht//Dig
- Now used at 32 (up from 25) UK HEIs
- Freely available
- New version released in December 1999
- Own domain with well-designed web site
- Robot to index multiple servers
See lthttp//www.htdig.org/gt
Oxford Case Study 131 servers 438,500
resources Indexes MS Office, PDF, etc. files
(external parser)
Case Studies produced by Helen Sargan (Cambridge)
5Popular Products Ultraseek
- Ultraseek
- Used at 9 (up from 7) UK HEIs
- Powerful but expensive
- See lthttp//software.infoseek.com/gt
Cambridge Case Study 232 servers 188,000
resources Weightings given to meta tags Useful
logs and reports
6Popular Products Harvest
- Harvest
- Now used at 6 UK HEIs (down from 8)
- For IR research use?
- See lthttp//www.tardis.ed.ac.uk/harvest/gt
- Issue
- Pay for software
- Pay for programming support to implement free
software
7Use of Third Party Services
- Small usage of third parties to provide indexes
- FreeFind (Used at 2 HEIs) and AltaVista (Used at
1 HEI) - Why not more use by 50 institutions with no
search facility?
- Benefits from services provided by popular
large-scale search engine - Low cost (free?)
- Incomplete coverage?
- Document fluctuation
- Loss of control, advertising,
8Try Them For Yourself
- Interfaces to UK University search engines are
available, providing a single location for
evaluation - The page also provides a link toorganisational
search pages - The resources are grouped in alphabetical
orderand by search engine
What does Aberdeen's search facility provide?
What functionality do libraries using Domino
provide?
See lthttp//www.ukoln.ac.uk/web-focus/surveys/gt
9Other Developments
- What else is happening to indexing of these
communities? - National search engines
- Local initiatives
- eLib Hybrid Libraries
10National Search Engines
- ACDC (Academic Directory)
- (Unfunded) pilot of index of ac.uk domain based
on distributed approach using Harvest - Set up in March 1996
- Lack of development effort resulted in degraded
service (e.g. indexer not aware of JavaScript
code) - No longer being developed
http//acdc.hensa.ac.uk/
11Institutional Developments
- Maestro robot (Dundee)
- Indexes Scottish resources
- Volunteer effort
- North East Universities (UNIS4NE)
- Appearance of cross-searching
- Actually interface to HotBot / AltaVista
12eLib Hybrid Libraries
- eLib Phase 3 includes "Hybrid Library" projects
- Help users find electronic (web, OPAC, etc.) and
"real world" resources - Includes regional and subject-specific approaches
MusicOnline search of Music Catalogues
BUILDER search of eLib Phase 3 web sites
13Other Possibilities
- What other developments may we expect
- Increased indexing in institutions of other web
sites (opposition / friends) - Leave it to commercial sector
- Development of a HE (or public sector?) national
search engine - New developments (XML / RDF / etc.)
14Indexing Remote Sites
- May see increased indexing of remote sites within
institutions - Examples provided by Dundee and BUILDER (eLib)
- Feeling of ownership
- Easily done
- Can develop enhancements locally
- Increased server load locally
- Increased server load remotely
- Increased network load
- Not scalable
- Unnecessary duplication
15Commercial Solutions
- Could leave searching to commercial world
- No costs to institution / HE community
- Not integrated with non-Web services
- Results too broad
- Distracting interface
- Little scope for tailoring
16What About Metadata?
- Metadata can
- Improve search results
- Provide structured information (for automated
processing) which can provide richer services - Fielded searches
- Limit searches (e.g. only Library pages on
Council web site) - Web site administration
- Alternative browsing interfaces
- Tools, standards, etc. becoming available
- Expected growth area
17Example
- Exploit Interactive web magazine
(www.exploit-lib.org) is using metadata to
provide enhanced searching - Search for foo in
- Issue 2 or in issue 2 and 4 (this is possible
using directory structure) - Feature Articles(needs metadata)
- Articles about EU-funded projects
- Etc.
- Combinations of above
- Also provides alternative browsingstructures
18JISC Developments
- DNER (Distributed National Electronic Resource)
- Seamless access to national resources
- What about local resources?
- Need for "institutional portals"
- RDN
- Resource Discovery Network
- Builds on work of eLib subject gateways
- Based on standards (Dublin Core, Z39.50,
whois, LDAP, RSS,Dublin Core,etc.)
19Conclusions
Questions welcome
- To conclude
- No clear "best buy" for indexing software
- Probably some to avoid
- In 2 years time are you likely to still be using
same software? Have changed software /
architecture? - If changes likely, need to think about change
migration strategies, interoperability issues,
etc. - Library community has much to offer
- Need for user studies (not covered)
Useful Resources http//SearchTools.com/ http//ww
w.searchenginewatch.com/ http//www.builder.com/Se
rvers/AddSearch/