Title: Galaxy of News: An Approach to Visualizing and Understanding Expansive News Landscapes Earl Rennison In UIST `94, ACM Symposium on User Interface Software and Technology. New York: ACM Press, 1994.
1Galaxy of News An Approach to Visualizing and
Understanding Expansive News Landscapes Earl
RennisonIn UIST 94, ACM Symposium on User
Interface Softwareand Technology. New York ACM
Press, 1994.
- Paper presentation by Mark Sharp
- 17610554 Information Visualization, Prof.
Spoerri - 11/11/2002
2Paper Summary
- PROBLEM Accessing and understanding news
information is not well-supported by the
information infrastructure. - VISION An intelligent infrastructure that
automatically builds the correlations and
relationships between news articles and
constructs an environment that allows readers to
dynamically explore and gain understanding.
3How does it work?
- Articles have features (metadata) extracted by
parsing algorithms, then they are clustered by
ARN (a neural network algorithm) and mapped to a
3D space layout. - Nodes keyword hierarchy / headlines / full text
- Zoom in with left mouse button, out with right.
direct manipulation - Animation (4D) helps user understand what system
is doing.
motion an early/pre-attentive visual cue
4(No Transcript)
5(No Transcript)
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10Model components
Temporal and behavior interaction controls
level-of-detail, user orientation cues,
transition to new views.
Spatial construction can be 2-, 3-, or
n-dimensional uses relationships dynamic
(appropriate for news)..
Relationships designer-specified e.g. temporal
ordering
.
News base not raw data objects and annotations
(keywords, slugwords, location, time, subject,
etc.) manually or automatically derived from raw
data.
11reading
writing
12(No Transcript)
13(No Transcript)
14Which early / pre-attentivevisual processes are
leveraged?
- Position
- Proximity
- Motion
- Brightness
- Size
- Color
15What is working?
- Principled (algorithmic) feature extraction and
clustering. - Direct manipulation.
- True zooming (seamless exploration of categories,
document labels, and full texts). - Dynamic updating of content (new articles).
16What is not working or clear?
- Clustering based on skinny metadata rather than
full text vectors. - Keywords are single words, not terms.
- Relationships?
17What surprised you?
- Naivete about understanding and media studies.
18Key Insights what I learned
- Detailed look into the architecture of a true
large text corpus info viz system with many
desirable features.
19What is the key contribution?
- True zooming (seamless integration of all levels)
is feasible in large text corpora.
20Take-away messages?What can be generalized?
- Computational feasibility forces some
compromises. - What is not working
- Human heuristics (relationships?)
- BUT help is on the way (bigger iron)
213 questions for groupand class discussion.
- Is volume and lack of organization really our
biggest problem with modern news information? - Would you use Galaxy of News? Why or why not?
- What other kinds of text data would you like to
see this approach applied to? How might a
different domain affect the specification of
metadata object representations and/or
relationships?
22TileBars Visualization of Term Distribution
Information in Full Text Information
AccessMarti HearstProceedings of the ACM
SIGCHI Conference onHuman Factors in Computing
Systems (CHI), pp. 59-66, Denver, CO, May 1995.
- Paper presentation by Mark Sharp
- 17610554 Information Visualization, Prof.
Spoerri - 11/11/2002
23Paper Summary
- PROBLEM Traditional IR is focused on text
databases consisting of titles and abstracts
assumptions are not necessarily appropriate for
full text. - VISION Utilize term distribution within the text
as well as overall frequency to model document
relevance. Replace opaque ranking with a
transparent means for swift appraisal of the
query-document relationship.
24How does it work?
- TextTiling algorithm partitions full text into
adjacent, non-overlapping, multi-paragraph
segments reflecting subtopic structure based on
term co-occurrence and repetition. - Segments are scored for similarity to query
terms. - Display shows document length, term frequency,
and term distribution across segments.
25Length of rectangle length of
document Each gray square 1 tile
(segment) Tile darkness term
freq. Query term sets tile rows
26(No Transcript)
27(No Transcript)
28Which early / pre-attentivevisual processes are
leveraged?
- Length
- Position
- Darkness (gray scale)
29What is working?
- Elegant rep. of document length.
- Adjacency of tiles between term rows gt overlap.
- Gray scale leverages relative (vs. absolute)
judgment. - Meaningful labels (start of text).
- Direct click link from tiles to text segments.
- Starting TREC/TIPSTER evaluation.
30What is not working or clear?
- Depends on skillful Boolean query formulation
(e.g. no stopwords). - Doesnt appear to be scalable to large queries
(gt3 conjunctive terms).
31What surprised you?
- Because they do have a natural visual hierarchy,
varying shades of gray show varying quantities
better than color.
32Key Insights what I learned
- Relevance ranking is not the only game in town
for putting cognitive cues on multi-document
retrievals.
33What is the key contribution?
- Text segmentation can enhance traditional
(whole-document) IR as well as fact retrieval. - Novel paradigms for text retrieval can be both
principled and computationally efficient.
34Take-away messages?What can be generalized?
- Marti Hearst is a major player in text mining /
text visualization.
353 questions for groupand class discussion.
- Instead of integer term frequency, what else
could be used to color the tiles for relevance? - How might documents be ranked?