Title: Search and the Net at 2004 Trends, Challenges and CuttingEdge Developments in Internet Search Servic
1Search and the Net at 2004Trends, Challenges
and Cutting-Edge Developments in Internet Search
Services
- Michael Hunter
- Reference Librarian
- Hobart and William Smith Colleges
- for Rochester Regional Library Council
- Member Libraries Staff
- Sponsored by the
- Rochester Regional Library Council
- Supported by Library Services and Technology Act
(LSTA) and/or
- Regional Bibliographic Databases and Resources
Sharing (RBDB) funds granted by the
- New York State Library 2003
2For Today .
- State of the Net and its Users
- Search Industry Overview
- Recent Developments in Established Services
- New Services
- The Deep Web at 2004
- Tracking the Living Web Weblogs and RSS
- Cutting-edge Developments
- Trends and Challenges to Todays Search Services
3The Internet and its Users at 2004
4How large is the Web?
- What do you mean by the Web?
- The totality of all Web sites
- Sounds simple .
- BUT IS IT?
5UC Berkeleys How Much Information
Projecthttp//www.sims.berkeley.edu/research/proj
ects/how-much-info-2003/internet.htmNOTE 10
terabytes total print collections of the
Library of Congress
6Internet Use Worldwide
7Internet Use in the UShttp//www.pewinternet.org
8Internet Use in the UShttp//www.pewinternet.org
9Top Ten things our users do onlinehttp//www.pe
winternet.org
10Top Ten things our users do onlinehttp//www.pe
winternet.org
11Undergraduates and Search EnginesColaric, S.
Instruction for Web Searching An Empirical
Study College and Research Libraries 64 (2)
March 2003 p. 111-116
12The Internet Search IndustryConsolidationPerfor
mance MeasuresPopularity
13The Shrinking Search IndustryEditorial control
of search is shared among few
- Yahoo owns
- AlltheWeb, Altavista, Inktomi, Overture (paid
listings)
- Google
- MSN
- AskJeeves owns Teoma
- LookSmart owns Wisenut
- Gigablast
- NOTE Ownership is different from database
affiliation
14GoogleDatabase Affiliates
15(No Transcript)
16Database Freshnesshttp//www.searchengineshowdown
.com/stats/freshness.shtml
- Based on a series of 6 current topic searches
- Pages that are updated daily
- AND report that date on the page
- Queries submitted May 17, 2003
17(No Transcript)
18Database Freshnesshttp//www.searchengineshowdown
.com/stats/freshness.shtml
- Most have some results indexed in the last few
days
- The bulk of most of the databases is about 1
month old
- Some pages may not have been re-indexed for much
longer
19 Popularity Searches per day self-reported
data, as of 2/28/03http//searchenginewatch.com/r
eports/article.php/2156461
20Recent Developments among Established Services
21Google
- Froogle
- Phonebook
- Wildcard Words
- Info
- Synonym feature
- Supplemental Index
- Search by location
- News Advanced Search and News Alerts
- ???
22Froogle
- Locates information about products for sale
online
- Gives URLs of sites offering the item
- Provides links to exact page in the site where
you can make the purchase
23Froogle
- Ranking follows normal Google ranking processes
- Paid placements always clearly marked
- Price range limits available
- Access at http//froogle.google.com or via Google
Advanced Search
24Phonebook Command Search
- Searches US residential (rphonebook) and
business (bphonebook) listings of Yahoo,
MapQuest and other services
- rphonebook
- MUST INCLUDE
- Last name City and/or State
- MAY INCLUDE
- First name
- bphonebook
- MUST INCLUDE
- Business name (min. 1 word) City and/or State
- MAY INCLUDE
- Full Business name
25Wildcard Words
- Google offers a word-sized asterisk to function
as a wildcard
- Stands for a whole word
- Cannot be used for part of a word
- three mice 22,000
- three bl mice 0
26Wildcard Words
- Several can be used together
- milosevic International Hague
-
- Retrieves military tribunal OR
- military court OR war tribunal OR military
tribunal
27info
- Not exactly hidden, but not well-known
- Searches for any information Google has about a
site
- Convenient way to monitor linkage
- Typing a URL in the search box will give the same
results
28(No Transcript)
29Synonym Feature
- Place a tilde immediately before a term to
retrieve synonyms or related terms from the
Google Index
- Eliminate the original term by placing a minus
sign before it.
- hiking -hiking
30(No Transcript)
31Googles Supplemental Index
- For obscure or unusual searches
- Queried when Google fails to find good matches
within its main web index.
- Live 9/9/03
- Sample queries
- St. Andrews United Methodist Church Homewood
IL
- nalanda residential junior college alumni
- illegal access error jdk 1.2b4
- supercilious supernovas
32(No Transcript)
33Search by Location (beta)
- http//labs.google.com/location
- U.S. only
- Keyword(s) combined with address, city, state or
zip
- Search results appear on a map
34News Advanced Searchand News Alerts
- Advanced News Search added this Fall
- News Alerts
- Requires a (free) account
- One query per alert limit of 50 alerts per
e-mail address
- Alerts contain links to news containing your
alert keywords
- Cannot edit a query delete and create a new one
instead
- Alerts sent once a day or as it happens
35More about Google.
- Google World http//indicateur.com
- Maintained by a French Search Engine Site and
listed under Guides. Use Google translator (see
Language Tools) to translate the site)
- Google Lab http//labs.google.com
- Place for cutting edge developments, many in beta
awaiting user feedback and testing.
36Beyond Google AskJeeves
- Simpler, cleaner interface
- Teoma crawler-based results blended with AJ
answers
- Improved image database
- Smart Answers
- Popular queries mapped to news, image and other
sources appropriate to the query
37ATW (FAST)http//alltheweb.com
- Continued commitment to a large database (2nd to
Google)
- Powerful, new advanced search capabilities
- Extensive page customization options
- Results clustered by topic (Folders)
- Both HTML and Multimedia given, when available
- NOTE Folders located at the BOTTOM of each
results screen
38Altavista
- Simpler interface
- More language options
- Expanded image and multimedia collections
- Results labeledRefreshed in last 48 hours
- Includes PDF files
- US and Local search options
- Prisma query refinement
39AltavistaPrisma Query Refinement
- Offers a maximum of 12 terms having the strongest
associations with the original query term(s)
- Selected from the top 50 results of the original
query
- NOTE Clicking on a Prisma term adds it to your
original query, creating a new set of Prisma
terms.
- Similar to Refine (1997) but less graphic
40Teoma
- Ranking Includes a sites relationship to other
sites with similar content
- Results
- Ranked database results, with Related Pages
- Refine
- Clustering of your results and other related
sites based on term relationships and web
community linkages derived from your original
results - Resources
- Link Collections from experts and enthusiasts
- (Subject metasites)
41Hotbot
- Searches Hotbot (Inktomi) OR Google OR Lycos OR
AskJeeves
- Not a true metaengine
- Advanced features operable only if supported by
source engines
-
42(No Transcript)
43Metacrawler
- Along with Dogpile and Webcrawler, owned by
Infospace
- Simpler interface
- Offers the following customizations
- Selection of sources searched
- Total number of results retrieved
- Length of search (time-out period)
- Offers a wide range of vertical searches Images,
MP3, Shopping, Subject Directory, Multimedia,
News, Message Boards
44(No Transcript)
45New Services Attracting Attention
46Gigablast
- Launched April, 2002
- Smaller database than others
- Over 200 million on 10/4/03
- pope canterbury Google83,200
Gigablast24,919
- Created and maintained by Matt Wells (alone)
- Only search engine continuously updated with
index refreshed in real time (Site submissions
are immediately searchable)
- Ranking depends less on linkage than Googles
ranking, to avoid penalizing newer pages.
- No advertising (to date)
47Gigablast Search Features
- Basic search Full Boolean
- Advanced Search Full Boolean and 2 (!) phrase
boxes
- Limit by site
- Limit by domain (URL)
- Links to a page available
- Most generic html metatags indexed, searched
and made available for display
- Unique to Gigablast!!!
48Gigablast Search Features
- Field searches include title, IP address and
non-html filetypes
- PDF, Word, Excel, PPT, PostScript, Ascii Text
- Results from one site clustered
- Cached version available
- Results include date indexed and last modified
(!!)
- Linking to Gigablast improves ranking there
49KillerInfohttp//www.killerinfo.com
- Metaengine searching Google, AOL, Lycos,
Gigablast, MSN, Altavista, LookSmart and Open
Directory
- 9 topical Deep Web channels offered
- Boolean and phrase search
- No other Advanced Search features
- Results clustering (a la Vivisimo)
- Number of results not given
- Adult content filter
50Surfwaxhttp//surfwax.com
- Demo site for federated search software
- Simultaneous search of Deep Web, Intranets, Web
and more
- Metaengine searches Wisenut, AOL, MSN, Yahoo,
Incarta, CNN, LookSmart
- FOCUS search refinement feature
- Online thesaurus of related terms and
definitions
51Surfwaxhttp//surfwax.com
- Site SNAP of a result offers
- Author summary (from metatags)
- Related sites
- Sites FOCUS words
- Key Points (query-related sections)
- Results ranking options Relevance, Alpha and
Source
- Preferences and Advanced Features require a
(free) account more options available to
fee-based accounts
52Nutchhttp//nutch.org
- Project to implement an open source web search
engine
- Why open source?
- With open source, search results processing is
transparent, not hidden. Bias (if any) can be
examined by anyone.
- Open source applications are free and available
for use, modification or for-profit use. Users
are asked to contribute their innovations back to
the code base - Nutch is seeking volunteer developers and
donations
53The Deep Web at 2004
54The Topography of the Internetor The Layers of
the Web
- Mapping the web is challenging
- Unregulated in nature
- Influences from all over the globe
- Fulfills many purposes, from personal to
commercial
- Changes rapidly and unexpectedly
- Divisions and terminology are inherently
ambiguous eg. Deep vs Invisible Web
55May I suggest a biological, nautical metaphor,
perhaps the ocean?
- SURFACE WEB
- SHALLOW WEB
- OPAQUE WEB
- DEEP WEB
- DARK WEB
56Surface Web
- Static html documents
- Crawler-accessible
57Shallow Web
- Static html documents loaded on servers that use
ColdFusion or Lotus Domino or other similar
software
- A different URL for the same page is created each
time it is served.
- Crawlers skip these to avoid multiple copies of
the same page in their database
- Technically human accessible via directories,
Deep Web gateways or links from other sites
58Opaque Web
- Static html documents
- Technically crawler accessible
- 2 types
- Downloaded and indexed by crawler
- Not downloaded or indexed by crawler
59Opaque Web
- Downloaded and indexed by crawler
- Buried in search results you never look at
- A casualty of relevance ranking
- Not downloaded or indexed by crawler due to
programmed download limits
- Document buried deep in the site
- Part of a large document that did not get
downloaded (Typical crawl per page is 110 K or
less)
- Document added since last crawler visit (Even the
best revisit on an average of every 2 weeks,
depending on amount of change at a site)
60Opaque Web
- Access to the Opaque Web
- Specialized search engines
- General and specialized directories
- Subject metasites
- These services typically index more thoroughly
and more often than large, general search
engines
61Deep Web
- Technically inaccessible to crawlers
- Dynamically created pages
- Databases
- Non-textual files
- Password protected sites
- Sites prohibiting crawlers
- Technically accessible to crawlers
- Textual files in non-html formats
62Dark Webhttp//research.arbornetwords.com
- Up to 5 of the web is completely unreachable due
to
- Misconfigured routers
- Contractual disputes between ISPs
- Broadband users with personal or corporate
firewalls
- US Military sites
63UC Berkeleys How Much Information
Projecthttp//www.sims.berkeley.edu/research/proj
ects/how-much-info-2003/internet.htmNOTE 10
terabytes total print collections of the
Library of Congress
64http//www.sims.berkeley.edu/research/projects/how
-much-info-2003/internet.htm
65Reducing the Deep Webmod_rewriteMaking dynamic
pages available to crawlers
- Mod_rewrite software loaded onto a web server
containing dynamic pages (databases, etc)
- Crawler follows a link to a stable URL on the
server www.mydomain.com/dvdplayers.html
- Mod_rewrite searches all the servers dynamic
pages containing dvdplayers and creates temporary
pages with stable URLs.
- These pages are linked to each other, creating a
stream of virtual pages that can be crawled by
any of the search engines
- Search engines often check the stream for spam or
duplicate pages
66Mining the Deep WebDirected Query Engines or
Intelligent Agents
- Designed to access distributed Deep Web
resources
- Some can be configured to search specific URLs
- Databases
- Subject metasites
- report collections
- dynamic pages
- online newsletters
67Directed Query Engines for purchase
- Simultaneous search of Deep Web and other
resources with many additional features
- Lexibot http//www.lexibot.com
- If you complete survey 189 upgrades 15
- If you dont 289 upgrades 50
- BullsEye http//info.intelliseek.com
- BullsEye Pro 199 with free upgrades for 6
months
68 Hunters Maximfor the Deep Web
- Plan to first locate the category of information
you want, then browse.
- Dont be too specific in your searches.
- Cast a wide net.
69TRACKING THE LIVING WEBWEBLOGS AND RSS FEEDS
70Blogs What are they?
- Online diaries or journals, usually by one
person, though many invite comments
- First developed in 1997
- Within the same blog tone can range from personal
musings to discussion of recent issues in
technology and research
- High link-to-word ratio
- Often link to other weblogs of similar content
71Blogs What are they?
- Can contain rumor, inside information,
speculation, blatant errors as well as
- Breaking news political and technical/research
- Commentary on new software or websites
- Consumer reaction to products or services
- Blog authoring tools are basic content management
software, useful in ways other than online
diaries
- Typify the spirit of information sharing that has
fueled the Internet since its beginnings
72How large is the blogosphere?2.4 to 2.9 million
active blogs (est.)
73Whos blogging?Jupiter Research
- 2 of Internet users have created a blog
- About 50 women, 50 men
- Over 50 are in English remaining language, in
order of prevalence
- Portuguese, Polish, Farsi, French, Spanish,
German, Italian, Dutch and Icelandic
74More
- About 4 of Internet users read blogs, 60 men,
40 women
- On average, blogs are updated every 3 days
- About 4 of online Americans have gone to blogs
for information about the Iraq War
- LiveJournal (large blog host) was the 650th most
popular site on the Internet (May, 2003)
- 184,000 readers every 10 days
- Spend average of 22 minutes at the site
75Creating a Blog Blogger http//new.blogger.com
- Free, automated Web publishing tool
- Requires no new software
- Send posts to an existing website or create a
free blog at Blogger
- Provide a site template and where you want the
postings to appear
- To update, create posting, submit permission form
and Blogger will sent FTP
- Advanced options available
76Locating Blogs
- Blog Hosting Sites
- www.livejournal.com
- diaryland.com
- radio.userland.com (39.95 with added
features)
- Blog metasites
- www.lights.com (library-related, world-wide)
- www.blogrunner.com
- www.llrx.com/columns/notes46.htm
- portal.eatonweb.com/
77Locating Blogs
- Subject Directories
- dmoz.org/Computers/Internet/On_the_Web
- General Search Engines
- Blog keyword(s) or URL(bloghost)
keyword(s)
- Professional Association homepages
- Subject Metasites
- Use Teoma.com Resources
78Searching Blog Content
- Blog hosting sites
- www.livejournal.com
- Blog Search Engines
- Feedster.com (includes RSS feeds also)
- Daypop.com (current events)
- Blogdex.media.mit
- www.technorati.com
- blogging-news.info
- Topical Blog Search Engines
- Detod (blawgs.detod.com) Exclusively legal
weblogs
79Blogs and General Search Engines
- Blog-rich sites are increasingly visited by major
crawler-based search services
- HOWEVER
- ANY rapidly-changing content can easily be missed
by crawlers
80Obstacles to Crawling and Indexing Blog Content
- Only the most recent postings appear on the blog
homepage (older are archived, and inaccessible to
crawlers)
- Many bloggers post dozens of times a day
- Frequent postings may contain critical
information to time-sensitive topics
- Even a daily crawl would miss these postings
(typical crawl is about once every 3 weeks)
81Obstacles to Crawling and Indexing Blog Content
Page Design
- Several postings usually appear on the blog
homepage
- Postings are NOT indexed separately, as crawler
indexes the page as a whole
- Retrieval of an individual posting on a topic is
unreliable
82Blogs and Libraries
- Blogs can offer an opportunity to post content on
the Web quicklyno delay of FTP uploading or
submission to a webmaster
- Whats New
- Favorite Books
- Recent Acquisitions
- Program Changes due to the Weather
83Blogs and Libraries
- Get more people involved in posting content on
the Library (or library-sponsored) website
- No knowledge of html, RSS or XML needed
- Log onto the blog hosting website, create
content, and update the page
- Current awareness without the annoyance of
un-wanted e-mails
- Choose when YOU want that information by visiting
your blogs of choice
84Blogs and LibrariesMetasites
- Blogs and Libraries A Bibliography (online)
- http//www.etches-johnson.com/nolibrary/bib.html
- Library Weblog Directory
- http//www.libdex.com/weblogs.html
- Blogs at the University of Minnesota Libraries
- http//www.lib.umn.edu/san/mt/
- Fichter, D. (2003). Why and how to use blogs to
promote your library's services. Marketing
Library Services 17(6).
- http//www.infotoday.com/mls/nov03/fichter.shtml
85RSS
- Rich Site Summaries
- Really Simple Syndication
- Really Stops Spam
86Before RSSTracking latest news and site updates
- Software packages that monitored and reported
changes at sites of your choosing
- News alert services, free and fee
- Manual checking of your bookmarks
- Hit or miss Listserv and Usenet postings
87RSS What is it?
- XML filetype with content that is
- Structured (tags, standard and/or
author-defined)
- Re-useable (can be integrated into web,
- e-mail, multimedia and many other formats
- Originally developed by Netscape as a content
management tool for personalizing home pages
- My News My Sports My Weather
- RSS in detail http//blogs.law.harvard.edu/tech/
rss
88RSS What can it do?
- Creates a broadcast version of frequently updated
content from a website, blog, news page or other
source
- Authors can
- Summarize new content
- Broadcast new content eg. online newsletters
- Can be used as a way to distribute content to
subscribers (syndication) independent of e-mail.
Subscribers logon or access via aggregators.
89How do I access them?
- As RSS is in XML, may require downloading reader
software (older versions of browsers cannot read
XML). Sources for reader software include
- www.lights.com
- blogspace.com
- Sites with RSS feeds display a small icon
(usually orange) labeled RSS or XML
- General search engines (limited, but worth a
try)
- filetypexml keyword(s)
90RSS Directories and Search Engines
- Syndic8 syndic8.com
- Directory of available syndicated news feeds
- Provides no reading area
- Uses Open Directory classification
- Feedster www.feedster.com
- The best search engine for blogs and RSS feeds
- Yahoo news.yahoo.com/rss
- Canadian Government tinyurl.com/vrh7
- Often found in Blog Directories and Engines
91RSS aggregators
- Receive general or topical RSS feeds and blog
postings
- Many are focused on news only
- Present content in compact form
- Combine multiple sources in one interface
- Provide links to full content
- In personal desktop versions or online
92Personal desktop aggregators
- Lets you specify any feeds you want access to
- Ampheta Desk www.disobey.com/amphetadesk/
- Radiouserland radio.userland.com ()
- Feedreader feedreader.com
93Feedreader.com
94Online aggregators
- Selection of feeds may be limited
- NewsIsFree NewsIsFree.com
- 7379 sources grouped into 16 channels
- Create custom pages
- offers more Premium options
- Many RSS sites include links to other aggregators
95Authoring and Producing RSS
- Lockergnome
- rss.lockergnome.com
- Documents, tools, developers, aggregators, free
feed generator for you site
- RSS Primer for Publishers
- www.eevl.ac.uk/rss_primer/
- Producing RSS feeds
- Technical information
- Feed promotion
- Feedster www.feedster.com
96Blogs and RSS
- Blogs may offer some or all of their content as
RSS feeds, or not
- Blogs can exist as pure html documents, updated
frequently
- Making content available in RSS increases a
blogs access and exposure via aggregators and
other RSS-based search services
97The Living WebWhat can blogs and RSS feeds tell
us about an authors point of view?
- Which ones does an author list on their
blog/homepage?
- Which ones does an author visit/subscribe to?
- Sometimes I want to know what the world thinks
- GOOGLE
- Sometimes I want to know what I think
- MY WEBLOG
- Sometimes I want to know what those I respect
think
- BLOGS AND FEEDS I READ
98Beyond todays(free) search enginesCutting
edge developments
99Including Context in System Design
- Context matters (!!??!)
- Textual context
- Query context Who is asking and why?
- Traditional approaches to retrieval have been
deductive
- Data organized and mapped to anticipated query
terms (controlled vocabularies, taxonomies)
- Human created and maintained
- Too slow for rapid data streams
100Bayesian approaches
- Uses statistical inference based on Bayes
Theorem of Probability (Thomas Bayes,
1702-1761)
- Inductive approach (adaptive processing)
- Take the users information environment
- Infer structures, relationships, likely queries
- Inferred structures and relationships can then be
mapped to a human-created classification scheme
- Currently used in corporate intranet and
fee-based content management software
- Will be used more in general information systems
of the future
101Adaptive ProcessingLearning the searchers
interests
- What term(s) did you search?
- What did you select?
- How long did you look at it?
- What is its source?
- How old was it?
- Direct input from searcher
- Rank the sources
- Rate individual results
- Eliminate certain sources, sites
102Inquirushttp//inquirus.nj.nec.com
- Query interface research project
- Attempts to improve precision of results
- Monitors users search behavior to infer intent
of queries
- Re-formulates queries to increase likelihood of
desired answers
103Inquirushttp//inquirus.nj.nec.com
- USER How do you make salsa?
- SYSTEM salsa and (recipe or ingredients or
food)
- Eliminates pages on salsa dancing
- Ranking relies heavily on proximity of query
terms and system-provided cognates to each other
in the document
104Vector-Space Model3-dimensional retrieval
- A way of ordering documents by word
frequency/context in a term spaceand matching
them to queries
- Documents are assigned coordinates
- One document may be in many term spacesor
vectors
- Queries that fall within a given vector are
likely to be answered by documents located in
that vector
105A Multi-dimensional Boolean
- Boolean limited to term matches
- Vector-space model
- More complex relationships can be mapped
- Degrees of relatedness of document to query
- Query and document weights based on length and
direction of their vectors
terrier
female
puppy
106Documents in Vector SpaceWhat do you have on
movie stars diets?
STAR
Doc about movie stars
Doc about astronomy
Doc about mammal behavior
DIET
107Phibothttp//phibot.org
- Project of the Univ. of Mainz and German
Institute of Artificial Intelligence
- Crawls science, medicine and news web sites
- 200 million general science sites
- 70 million medical sites
- Traditional Google-like processing
- Vector-Space
- Optimization greater vector-space processing
108Digital Video Search
- Searches actual visual content
- Project of Dublin City University
- http//www.cdvp.dcu.ie
- Determine structure of the video by identifying
shots with the greatest degree of change
(keyframes)
- Use these to create a structure, and allow user
to refine query based on these
- Needed by journalists, governments and airport
security
109Current Trends in and Challenges to Todays
Search Industry
110User Interface Trends
- Toolbars, Toolbars, everywhere
- Review site searchenginewatch.com/links/article.
php/2156381
- Search by Location Major engines with local
search options and local specialized ones
- Makes the haystack smaller important in
e-commerce
- P2P networks (Peer-to-peer)
- File-sharing networks, a la Napster
- KaZaA - most popular download EVER!
- Shares any filetype
- 90 of files shared are audio-visual in nature
111User Interface Trends
- Application Program Interface (API)
- Published set of programming hooksthat lets you
interact directly with a companys open servers
- You can mine the companys databases for free
- WHY? To attract more traffic to the site
- Example http//www.googlerace.com
- Enter 1 or 2 terms/phrases and see how Bush and
Democratic candidates stack up!
- Created by Tara Calishain
112Search in Corporate Settings Drive Search Engine
RD
- Uniform, seamless access to all information
- Internal external, data content
- XML
- More natural language processing
- Hybrid systems to search structured AND
unstructured data
- Adaptive processing (Bayesian)
- Use of intelligent agent software
- Easier user interfaces
- Personalization
113Industry-wide Trends
- Distributed Crawling
- Volunteer your PC when not in use
- Grub.com, Looksmart
- Search continues to be driven by advertising and
revenue
- Fewer services maintain their own crawler-created
database
- Increased crawling of non-html filetypes
114Challenges to the Industry
- Revenue
- E-content providers have cut into search software
sales with their proprietary engines
- Fighting fraud
- Cloaking, ranking manipulation
- Scalability
- Size of surface Web increases
- Over 300 million queries a day to all Web S.E.s
115Challenges to the Industry
- Freshness
- Competitive edge demands recent crawls
- Deep Web
- Embedded databases
- Non-html filetypes
- Real-time information
- Growing importance of the Living Web
116Challenges to the Industry
- Ambiguous query refinement
- Not very hopeful among general search engines
- User group too large
- User profiling difficult
- Indexing the smaller, newer sites
- Googles link-based PageRank penalizes these sites
117The Biggest ChallengeJust what are you looking
for?
- A known needle in a known haystack
- A known needle in an unknown haystack
- An unknown needle in an unknown haystack
- Any needle in a haystack
- The sharpest needle in a haystack
- Most of the sharpest needles in a haystack
- All the needles in a haystack
118The Biggest ChallengeJust what are you looking
for?
- Affirmation of no needles in the haystack
- Things like needles in any haystack
- Let me know if any new needles show up
- Where are the haystacks?
- Needles, haystacks, .whatever
119Thank You andHappy Holidays!
Michael Hunter Reference Librarian Hobart and
William Smith Colleges Geneva, NY 14456 (315)
781-3552 hunter_at_hws.edu