Title: New Applications of Tagging
1New Applications of Tagging
- November 17, 2006
- Jaesun Han (jshan0000_at_gmail.com)
- Research Fellow / Ph.D
- ANLAB, Dept. of EECS, KAIST
- Contact http//www.web2hub.com
2Contents
- GeoTagging
- Auto-Tagging
- Auto-Tagging for Text
- Auto-Tagging for Image
- Tagging the physical world
3GeoTagging
4GeoTagging
- GeoTagging (also referred to GeoCoding)
- Adding geographical identification metadata to
various media such as websites, RSS feeds, or
images - latitude and longitude coordinates ( altitude
and place names) - non-coordinate based geographical identifiers (a
postal address, etc) - Applications
- GeoTagging-enabled image search engine
- GeoTagging-enabled information services
- Examples TripTracker, Zooomr, Mappr, Platial, etc
5GeoTagging techniques
- Manually inputting
- Initial GeoTagging convention by GeoBloggers
- geotagged
- geolatlatitude e.g. geolat51.4989
- geolonlongitude e.g. geolon-0.1786
- Manually positioning into map
- automatically augment geotags
- Examples Flickr, Picasa Google Earth
- Using a location-aware device such a GPS receiver
link - JPEG and TIFF image file formats can store the
geographical coordinates in the EXIF header - Examples Cameraphone with Zonetag Bluetooth
GPS linkphotos - Digital Camera with Sony
GPS-CS1 link - Using scene recognition program
6Representation of GeoTag in HTML
GeoTags
GeoURL
GeoRSS
geo microformats
7James Reserve Observing Systems at CENS
http//dms.jamesreserve.edu/
http//dms.jamesreserve.edu/jrcensweb/GE_GUI_v8.ph
p
8SenseWeb at MSR
- Goal Publishing and querying real-time data
over such geo-centric web interfaces - Common platform and set of tools
- For data owners to easily publish their data
- For users to make useful queries over the live
data sources - Example data
- temperature, humidity
- weather data
- parking space
- restaurant's wait time
- traffic camera images
- all types of image, audio
- video
http//atom.research.microsoft.com/sensormap/ http
//research.microsoft.com/nec/senseweb/
9Auto-Tagging
10Auto-Tagging for Text
- Paper Improved Annotation of the Blogopshere via
Autotagging and Hierarchical Clustering (WWW
2005) - Two Questions
- Do tags provide users with the necessary
descriptive power to successfully group articles
into sets? (for discovering) - Can tags help with search task? (for searching)
- Experiments
- Retrieved the top 350 tags from Technorati, and
then the 250 most recent articles for each tag - All articles that share a tag are assigned to a
tag cluster - Articles are converted into weighted vectors,
using TFIDF to assign weights to each word - Similarity is measured by the average pairwise
cosine similarity of all articles in each cluster
11Auto-Tagging for Text
cluster by random selection similarity 0.1 0.2
cluster by a user tag similarity 0.2 0.3
cluster by Google News similarity 0.4
cluster by auto-tagging similarity 0.5 0.7
12Auto-Tagging for Text
- Automated Tagging
- Automatically assigning tags based on the content
of an article - Auto-Tagging technique of this paper
- Assign TFIDF scores to all words and extract the
top three highest-scoring words as tags - Experimental results
- Significantly better similarity scores than
tagging does - Automated tagging produces more focused, topical
clusters - Tags extracted from user text are more helpful in
creating specific categories than user-selected
tags are - Discussion
- Can tags help with search task for text?
- How different is auto-tagging from indexing for
search? - Does auto-tagging cooperate with user-tagging?
- Any other auto-tagging technique for text?
13Auto-Tagging for Image
- Existing image retrieval
- Google Image Search
- Google uses surrounding text and ignores contents
of the image - Tag Search
- Automatic image annotation
- Automatically annotating by analyzing contents of
images - Advantages
- No manual annotation
- Easy to handle textual queries
- Does not ignore contents of the images
dahlia, golden, gate, park, flower,
and fog
cameraphone, animal, dog, and tyson
search by cameraphone
search by gate
14Automatic image annotation
15Case Study ALIPR(Automatic Linguistic Indexing
for Pictures - Real Time)
- Goal Can a computer do this?
Building, sky, lake, landscape, Europe, tree
- References
- Automatic Linguistic Indexing of Pictures by a
Statistical Modeling Approach (IEEE Transactions
on Pattern Analysis and Machine Intelligence
2003) - Real-time Computerized Annotation of Pictures
(ACM Multimedia Conference 2006) - Project homepage http//wang.ist.psu.edu/IMAGE/
- Online demo http//alipr.com/
16ALIPR
- Image DB Constructing Process
- Corel image database 100 images X 600 CD-ROMs
(by topics) - Each concept is manually annotated with a few
keywords (total of 332 distinct words) - Training Process
- Feature extraction color and texture
- Region segmentation k-means clustering of
feature vectors - Statistical modeling discrete distribution
(D2-) clustering - Annotation Process
- Image signature extraction
- Computing concept likelihood score
- Computing the probability for each word
- Selecting top ranked words
17ALIPR 600 Categories of Images
Image Database - Corel Stock Photo Dataset - 100
images X 600 CD-ROMs - categorized under same
topic - manually assign category descriptions
(total of 332 distinct words)
18ALIPR A Category of Images
Concept(Paris/France) Annotation Paris,
European, historical building, beach, landscape,
water
19ALIPR Training Process
20ALIPR Automatic Annotation Process
21Tagging the physical world
22Olalog at Olaworks
- Manage digital data from everyday life
Not a plain tag, we use a SPOT!
Space
Person
Object
Time
olalog
Where
Who
What
When
Everyday Life
olalog
olalog
Community
???
olalog
Web / P2P
?
? PC
olalog
??????
Daily Digital Data
SMS/MMS
????, ??, ??, MP3, E-mail, Web log,..
23Olalog Auto-Tagging
- S Cell or GPS-based LBS
- Latitude, Longitude
- P Face Recognition
- ID or E-mail
- O Barcode, Character, Trademark, ID3Tag, RFID
- ISBN, UPC/EAN
- T Time Stamp (ex. EXIF)
- YYYYMMDDHHMMSS
24MyLifeBits at MSR
A lifetime store of everything articles, books,
cards, CDs, letters, memos, papers, photos,
pictures, presentations, home movies, videotaped
lectures, and voice recordings beginning to
capture phone calls, IM transcripts, television,
and radio
25The 1 TB Life
- 1TB gives you 65 years of
- 100 email messages a day (5KB each)
- 100 web pages day (50KB each)
- 5 scanned pages a day (100KB each)
- 1 book every 10 days (1 MB each)
- 10 photos per day (400 KB JPEG each)
- 8 hours per day of sound - e.g. telephone,voice
annotations, and meeting recordings (8 Kb/s) - 1 new music CD every 10 days (45 min each at
128 Kb/s) - It will take you 5 years to fill up your 80 GB
drive - Want video? Buy more cheap drives (1 TB/year lets
you record 4 hours/day of 1.5 Mb/s video)
26MyLifeBits System
Capture
Organization Annotation
Storage Retrieval
No organization Automatic annotation
SQL
27MyLifeBits Software
MyLifeBits store
database
28MyLifeBits Organization
powerful search
NO Organization
Personal
Professional
Archive
Archive
Current
Current
Classification Sharing
full-text search is not enough flat tagging does
not scale
Example document type - several hundred unique
entries article, bill, will, business card,
report card,
greeting card, and birth certificate -
different dimensions size, form, content,
supplier etc
29MyLifeBits Annotation
- Manual annotation by user
- still recommended
- Automatic annotation by speech analysis
- increasing audio contents
- voice annotation, notes, telephone calls,
meetings, conversation etc - using speech-to-text for annotating other types
of contents - example photos taken on greeting a new person
- Photo annotation
- who, what, when, where
- who what are diffcult ? image analysis
technology - Video annotation
- same problems as audio and photo
30Discussion