Title: Prof. Marc Davis
1 Mobile Media Metadata The Future of Mobile
Imaging
- Prof. Marc Davis
- University of California at Berkeley
- School of Information Management and Systems
- Garage Cinema Research
- http//garage.sims.berkeley.edu
2Four Questions
- How many of you read text every day?
- How many of you write text every day?
- How many of you watch video (movies, TV, DVD,
Internet video, etc.), listen to recorded audio
(music, speech, etc.), or look at photographs
everyday? - How many of you make video (movies, TV, DVD,
Internet video, etc.), record audio (music,
speech, etc.), or take photographs everyday?
3Garage Cinema Research
- Research and develop technology and applications
that will enable daily media consumers to become
daily media producers - Theory, design, and development of digital media
systems that - Create descriptions of media content and
structure (metadata) - Use metadata to automate media production and
reuse
4What is the Problem?
- Today people cannot easily find, edit, share, and
reuse media - Computers dont understand media content
- Media is opaque and data rich
- We lack structured representations
- Without metadata (descriptions of media content
and structure), manipulating digital media will
remain like word-processing with bitmaps
5Signal-to-Symbol Problems
- Semantic Gap
- Gap between low-level signal analysis and
high-level semantic descriptions - Vertical off-white rectangular blob on blue
background does not equal Campanile at UC
Berkeley
6Signal-to-Symbol Problems
- Sensory Gap
- Gap between how an object appears and what it is
- Different images of same object can appear
dissimilar - Images of different objects can appear similar
7Computer Vision and Context
- You go out drinking with your friends
- You get drunk
- Really drunk
- You get hit over the head and pass out
- You are flown to a city in a country youve never
been to with a language you dont understand and
an alphabet you cant read - You wake up face down in a gutter with a terrible
hangover - You have no idea where you are or how you got
there - This is what its like to be most computer vision
systemsthey have no context - Context is what enables us to understand what we
see
8Traditional Media Production Chain
METADATA
Metadata-Centric Production Chain
PRE-PRODUCTION
POST-PRODUCTION
PRODUCTION
DISTRIBUTION
9Research Projects
- Media Streams
- A framework for creating metadata throughout the
media production cycle to enable media reuse - Active Capture
- Automates direction and cinematography using
real-time audio-video analysis in an interactive
control loop to create reusable media assets - Adaptive Media
- Uses adaptive media templates and automatic
editing functions to mass customize and
personalize media - Mobile Media Metadata
- Leverages the spatio-temporal context and social
community of media capture to automate metadata
creation for mobile media - Social Uses of Personal Media
- Analysis of social uses of media to predict
future uses and shape the design of
next-generation personal media devices and
applications
10Automated Media Production Process
11The Coming Media Revolution
- Media capture devices become programmable and
networked - Metadata creation and use become integrated
throughout media production and reuse - Media production changes from being a mechanical
process to a computational process - Media becomes programmable and networked
- Daily media consumers become daily media producers
12Research Projects
- Media Streams
- A framework for creating metadata throughout the
media production cycle to enable media reuse - Active Capture
- Automates direction and cinematography using
real-time audio-video analysis in an interactive
control loop to create reusable media assets - Adaptive Media
- Uses adaptive media templates and automatic
editing functions to mass customize and
personalize media - Mobile Media Metadata
- Leverages the spatio-temporal context and social
community of media capture to automate metadata
creation for mobile media - Social Uses of Personal Media
- Analysis of social uses of media to predict
future uses and shape the design of
next-generation personal media devices and
applications
13Moores Law for Cameras
2000
2002
400
Kodak DX4900
Kodak DC40
40
SiPix StyleCam Blink
Nintendo GameBoy Camera
14CaptureProcessingInteractionNetwork
15Camera Phones as Platform
- Media capture (images, video, audio)
- Programmable processing using open standard
operating systems, programming languages, and
APIs - Wireless networking
- Personal information management functions
- Rich user interaction modalities
- Time, location, and user contextual metadata
16Camera Phones as Platform
- In the first half of 2003, more camera phones
were sold worldwide than digital cameras - By 2008, the average camera phone is predicted to
have 5 megapixel resolution - Last month Casio and Samsung introduced 3.2
megapixel camera phones with optical zoom and
photo flash - There are more cell phone users in China than
people in the United States (300 million) - For 90 of the world their computer is their
cell phone
17Campanile Inspiration
18Mobile Media Metadata Idea
- Leverage the spatio-temporal context and social
community of media capture in mobile devices - Gather all automatically available information at
the point of capture (time, spatial location,
phone user, etc.) - Use metadata similarity and media analysis
algorithms to find similar media that has been
annotated before - Take advantage of this previously annotated media
to make educated guesses about the content of the
newly captured media - Interact in a simple and intuitive way with the
phone user to confirm and augment system-supplied
metadata for captured media
19MMM Demo Video
20From Context to Content
- Context
- When
- Date and time
- Where
- CellID refined to semantic place
- Who
- Cellphone user
- What
- Activity as product of when, where, and who
- Content
- When was the photo taken?
- Where is the subject of the photo?
- Who is in the photo?
- What are the people doing?
- What objects are in the photo?
21Space Time Social Space
22What is Location?
23Camera Location vs. Subject Location
- Camera Location Golden Gate Bridge
- Subject Location Golden Gate Bridge
- Camera Location Albany Marina
- Subject Location Golden Gate Bridge
24Kodak Picture Spot
25Themed Kodak Picture Spot
26Location Guesser Concept
- Calculate weighted sum of features
- Most recently visited location
- Most visited location by me in this CellID
around this time - Most visited location by me in this CellID
- Most visited location by others in this
CellID around this time - Most visited location by others in this CellID
27Mobile Media Metadata Project (MMM)
- Develop and test our technology for mobile media
metadata creation, sharing, and reuse - Develop application scenarios that use this
technology - Study the usage patterns of applications for
mobile media metadata creation, sharing, and
reuse
28MMM Fall 2003 Project
- Numerous application ideas for mobile media
capture and sharing (at least seven documented in
detail with persona and scenario descriptions and
annotated storyboards) - A metadata framework for describing and sharing
mobile media - A database of annotated captured media
- A prototype of one of the student application
scenarios - A write-up of the project including student
feedback, assessment of application ideas, and
recommendations for further research and
development
29MMM Research Issues
- Metadata and media creation, sharing, and reuse
- Ontology design
- Vocabulary control
- Leveraging spatial, temporal, and social context
to infer media content - Determining similarity
- Application ideas and use scenarios
- User interfaces
30MMM Research Issues
- Ontology of photography
- Who owns/accesses/changes pictures of me?
- Who owns/accesses/changes my pictures?
- Who owns/accesses/changes my metadata?
- Who owns/accesses/changes metadata about me
and/or my media? - Mobile media application design methods
- HCI methodology for mobile media design
- Social science methodology for design of future
applications
31MMM Initial Application Ideas
- Metadata creation sharing, and reuse of media and
metadata from various users connected in space,
time, and affinity - Content-based access to mobile media assets
enabling - Networked photo albums
- Personalized media messages
- Personalized media CallerID
- Matchmaking services
- Ecommerce
- Location-based and travel applications for using
networked media assets and metadata - Networked travel albums
- Networked travel guidebooks
32Over 150 MMM Application Ideas
- Real time party ratings with pictures
- Forgotten historical site cataloguing
- Mapping wireless access on campus for public
- Picture/video comparison to map location
- Tagging people
- "Audio graffiti" audio tour of the world
- Game Spot the terrorist
- Amateur photo contest.
- Mobile American Idol
- Affair detection
- Do I sue or not? Online car estimates
- Family tree builder/viewer
- Human Clock Pictures of people showing the time
- Human Weather What are people wearing in San
Francisco? - Reality show that is following someone around for
a night going out and party that people could
watch online
- Family tree builder/viewer
- Human Clock Pictures of people showing the time
- Human Weather What are people wearing in San
Francisco? - Reality show that is following someone around for
a night going out and party that people could
watch online - Translation. Take a picture of something, and get
the word for it in the local language. Or enter a
word in your own language to get a picture which
you can show to a native - Directions/locations. Illustrated travel guides,
an interactive visual Lonely Planet - Indie reporting. News coverage via amateur
reporters. The blackout provided some interesting
examples - Some "leveling game" like taking photos of people
wearing gap jeans on the street, if you get
enough, you win a pair, and go onto the next level
33MonkeyBotster
- Monkeybotster aims to allow you to access
interesting events that have occurred in
someone's past as well as contribute things from
the present through two degrees of social
separation (your friend's friends can see your
events but not your friend's friend's friends,
etc.)
34LunchMeister
- LunchMeister enables users to share information,
opinions, and photos of local restaurants in real
time, allowing them to maximize the use of an
often limited lunch hour
35DARE
- DARE will leverage the camera phones as tools
that allow users to play games with and against
each other - The main objective of DARE is to bring different
social networks together informally, or as ice
breakers and team builders for corporations,
school groups and other such groups
36Recipe Box
- Recipe Box will facilitate cooking for people who
wish to spend less money and time making
presentable meals - Users will be able to select from various recipes
in an annotated media database on their mobile
phones - Users can add recipes illustrated with annotated
media to the database from their mobile phones
37HouseBuddies
- HouseBuddies transforms the mobile phone into a
collaborative apartment-hunting device that
enables real-time information sharing - It facilitates a group of friends in their search
for a place to live by allowing them to assemble
a list of house ads from various sources and
collaborate on investigating their housing options
38Recreation Evaluation Interface
- Recreation Evaluation Interface enables camera
phone users to see what's happening where they
are not, and let others in their community know
what's happening where they are - Using real-time data and a reference database,
this community can connect with other people with
similar interests to maximize their local
knowledge, enabling better decision-making about
how and where to spend one's time
39Wishter
- Wishter would manage
- Lists of potential gifts for friends and family
- A wishlist of items for oneself
- It would allow the user to annotate a potential
gift with a physical description, the event for
which the gift is being considered, and the
location where the item was seen
40Wishter Rapid Development
41MMM User Studies
- Deployed MMM prototype with 40 graduate students
and 15 researchers for 4 months - Weekly surveys administered to MMM users
- Current usage patterns gathered
- 5 subjects for usability testing with MMM users
- User interaction and user motivation addressed
- 8 subjects for first focus group of MMM users
- Current annotation habits discussed
- User motivation addressed
- 7 subjects for second focus group
- Current use of photos (capture, annotation,
sharing)
42User Interaction Findings
- Network unpredictability
- Slow and intermittent connectivity
- Frequently dropped service with little feedback
- User work to that point was often lost
- Proposed Solution Limit continual network
interaction by creating a full-client application
that can use network in background
43User Interaction Findings
- Mobile UI usability
- Hardware buttons were not remapped to support MMM
UI functions - Desktop-based prototyping tools and methods dont
predict usability issues with mobile devices - Proposed Solution Use a prototyping methodology
that better simulates the specifics of mobile
interaction to design user experiences and test
usability
44User Interaction Findings
- Presentation of faceted hierarchical structure
- Lists of 12-15 items should not be exceeded as
selection devices - Proposed Solution Create smaller application
specific hierarchies to present to the user,
(which correlate into a broader general structure
on the backend)
45Use Patterns and Motivation Findings
- Power-of-Now
- Camera phones are always available
- Take pictures of ad-hoc subjects
- Phototaking happens in daily lived context
- Currently Not Search-Centric
- For our users, sharing and browsing are more
important than search and retrieval - Role of the Desktop
- A desktop component adds great value to the
mobile application by easing browsing - Challenges in Offloading Media
- Non-MMM users find it difficult and frustrating
to try to get their photos off their cameraphone
46Use Patterns and Motivation Findings
- Funnel Effect Selective metadata annotation
- Two types of annotations
- Content 1-2 facets describing the photograph
(e.g., Person, Location) - Comment a personal remark about the photograph,
why they took it, a witty comment, something
personal to share
47Use Patterns and Motivation Findings
- Integrate or motivate annotation
- Tie annotation to some other activity the user
already does in a seamless way - Sharing
- Storytelling
- Motivate users to annotate
- Demonstrate clear benefit of later reuse
- Make annotation intrinsically fun (e.g., image
annotation games)
48Application Design Findings
- Input-output application design
- Combination of simple small applications into
more powerful applications - Network effects
- Power of many users sharing media and metadata
- Media and coordination/collaboration
- Using media to coordinate and collaborate among
groups - Mobile media and games
- Using media to document games
- Using games to make media and metadata
49Nokia Platform Issues
- Hardware
- Keypad layout
- Camera
- Development environment
- Access to files
- Access to camera
- Access to contextual metadata (time, CellID,
username) - Access to contacts
- Browser
- Nokia web browser (caching)
- Prototyping tools
- Emulator
- Wizard of Oz
50Research Projects
- Media Streams
- A framework for creating metadata throughout the
media production cycle to enable media reuse - Active Capture
- Automates direction and cinematography using
real-time audio-video analysis in an interactive
control loop to create reusable media assets - Adaptive Media
- Uses adaptive media templates and automatic
editing functions to mass customize and
personalize media - Mobile Media Metadata
- Leverages the spatio-temporal context and social
community of media capture to automate metadata
creation for mobile media - Social Uses of Personal Media
- Analysis of social uses of media to predict
future uses and shape the design of
next-generation personal media devices and
applications
51Social Uses of Personal Photos
- Looking not just at what people do with digital
imaging technology, but why they do it - Goals
- Identify social uses of photography to predict
resistances and affordances of next generation
mobile media devices and applications - Methods
- Situated video interviews
- Review of online photo sites
- Sociotechnological prototyping (magic thing,
technology probes)
52Preliminary Findings
- Social uses of personal photos
- Memory
- Creating and maintaining relationships
- Self-expression
- Media and resistance
- Materiality
- Orality
- Storytelling
53From MMM-1 To MMM-2
- MMM-1 asked
- What did I just take a picture of?
- MMM-2 adds
- Who do I want to share this picture with?
Content
Community
Context
Community
54MMM-2 Goals
- Integrated photo sharing and mobile media
metadata application - Metadata as by product of sharing
- Metadata enables new types of sharing
- Advanced spatio-temporal-social inferencing
- Integrated media analysis functions
- RDF backend for ontology and metadata sharing and
reuse - Connected handset and web applications
55Sharing ? Metadata
- From sharing to metadata
- A birdwatcher takes a photo in a bird sanctuary
and sends it to her birdwatching group - What is the photo of?
- From metadata to sharing
- A parent takes a photo of his child on the
childs birthday - Who does he share it with?
56MMM-2 Use Scenarios
- Share photo
- Rank share lists
- Add/delete members of share list
- Create new share list
- Annotate photo
- Annotate photo on handset
- Annotate photo on web
- Annotate photo while sharing on handset
- Annotate photo while sharing on web
57Scaling Up Photo Sharing
100K
100M
58MMM-2 Prototype
- Phase 1 MMM-2 Prototype Design
- In Process Spring 2004
- Phase 2 MMM-2 Prototype Development
- Summer 2004
- Phase 3 MMM-2 Prototype Deployment
- Fall 2004
- Phase 4 MMM-2 Prototype Evaluation
- Fall 2004 Spring 2005
59MMM Publications
- CHI 2004
- Anita Wilhelm, Yuri Takhteyev, Risto Sarvas,
Nancy Van House, Marc Davis. "Photo Annotation on
a Camera Phone." In Proceedings of the
Conference on Human Factors in Computing Systems
(CHI 2004) in Vienna, Austria. ACM Press, 2004. - MobiSys 2004
- Risto Sarvas, Erick Herrarte, Anita Wilhelm, and
Marc Davis. Metadata Creation System for Mobile
Images. In Proceedings of the Second
International Conference on Mobile Systems,
Applications, and Services (MobiSys2004) in
Boston, Massachusetts. ACM Press, 2004. - Marc Davis, Nathan Good, and Risto Sarvas. "From
Context to Content Leveraging Context for Mobile
Media Metadata (Workshop Paper)." Presented At
MobiSys 2004 Workshop on Context Awareness at the
Second International Conference on Mobile
Systems, Applications, and Services in Boston,
Massachusetts, 2004. - Marc Davis. "Mobile Media Metadata (Video)."
Presented At Second International Conference on
Mobile Systems, Applications, and Services
(MobiSys 2004) in Boston, Massachusetts, 2004.
60MMM Publications
- ICME 2004
- Marc Davis and Risto Sarvas. Mobile Media
Metadata for Mobile Imaging. In Proceedings of
IEEE International Conference on Multimedia and
Expo (ICME 2004) Special Session on Mobile
Imaging in Taipei, Taiwan, IEEE Computer Society
Press, 2004. - ACM Multimedia 2004
- Marc Davis, Simon King, Nathan Good, and Risto
Sarvas. "From Context to Content Leveraging
Context to Infer Media Metadata." In Proceedings
of 12th Annual ACM International Conference on
Multimedia (MM 2004) Brave New Topics Session on
"From Context to Content Leveraging Contextual
Metadata to Infer Multimedia Content" in New
York, New York, ACM Press, Forthcoming 2004. - Marc Davis. "Mobile Media Metadata Metadata
Creation System for Mobile Images (Video)." In
Video Proceedings of 12th Annual ACM
International Conference on Multimedia in New
York, New York, ACM Press, Forthcoming 2004. - Marc Davis. "Mobile Media Metadata Metadata
Creation System for Mobile Images (Video
Description)." In Video Proceedings of 12th
Annual ACM International Conference on Multimedia
in New York, New York, ACM Press, Forthcoming
2004.
61http//garage.sims.berkeley.edu