Bits about Bits: Bitzi and the Business of Metadata - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Bits about Bits: Bitzi and the Business of Metadata

Description:

Generic: Origin, Free-form description, Comments, Community Ratings ... LimeWire. Evaluate search results before downloading. WinAmp. See more about 'what's playing' ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 28
Provided by: ORA27
Category:

less

Transcript and Presenter's Notes

Title: Bits about Bits: Bitzi and the Business of Metadata


1
(No Transcript)
2
Bits about Bits Bitzi and the Business of
Metadata
  • Gordon Mohr
  • Bitzi Corporation
  • Founder Chief Technology Officer
  • September 17, 2001

3
Bits about Bits Bitzi and Open,Cooperative
Metadata
  • Gordon Mohr
  • Bitzi Corporation
  • Founder Chief Technology Officer
  • November 7, 2001

4
Overview
  • P2P File Sharing a cornucopia without
    confidence
  • Four Missing Ingredients
  • The Bitzi Approach
  • Demos
  • Future Directions
  • Could metadata be a big business?

5
Everything is now Bits
  • Anything can be encoded, stored, shifted, shared
  • Thc cloud is coming to include everything
  • Tech and social trends are against strict control

010 100 101
100 100 101
010 110 001
011 010 001
and Bits Move Freely
6
No Confidence or Context
  • You can get anything imaginable, BUT
  • Is it complete? Where did it originate?
  • Has it been damaged or altered?
  • Is this the best or current instance?
  • Whats related? Is it legitimate?
  • What should I seek next?
  • Current ad hoc P2P sharing/distribution nets
    inherently blur these issues
  • Filename-centric
  • Mr. Short-Term Memory

7
Whats Missing?
  • Were craving four things
  • Reliable Names
  • Nothing can masquerade as something else
  • Easy to ask for exactly the right thing
  • Rich Metadata
  • Beyond just filename and length
  • Easy Access
  • Everywhere the files are, and then some
  • A Consensus View
  • Eliminate frivolous skew of understanding

8
We Want Reliable Names
  • Does a file have a True Name?
  • Yes, via Cryptographic Hashes
  • Essentially, these are digital fingerprints
  • Any-sized input (any digital file) to
    fixed-sized output (hash value)
  • Deterministic but unpredictable
  • Infeasible to create specific desired hash value
  • Infeasible to find two inputs with same hash
    value
  • Examples
  • MD5 (but maybe not as reliably as once thought)
  • SHA1 (and now SHA256, SHA512)
  • Tiger
  • RIPEMD160

9
We Want Rich Metadata
  • Metadata is Data about other Data
  • Filename and Length are a trivial start
  • Intrinsic or extrinsic to file itself
  • Examples
  • Generic Origin, Free-form description, Comments,
    Community Ratings
  • Format-specific Encoding parameters, Resolution,
    Playback length
  • Growing body of useful standards and conventions
  • XML, RDF, Dublin Core, domain-specific proposals

10
We Want Easy Access
  • Ubiquity
  • Anywhere the files are and where theyre not
  • Simplicity
  • Familiar interfaces
  • Reliability
  • Canonical location
  • Redundant Mirrors
  • Multiple paths same paths as files

11
We Want A Consensus View
  • Avoid redundant efforts
  • Achieve convergence on simple issues
  • Trivial disagreements and mistakes should be
    quickly and permanently resolved
  • Robustness against casual mischief
  • Capture and highlight enduring disagreements
  • Even arbitrary commonality is valuable
  • Naming systems
  • A central reference point is the easy solution

12
The File Trust Utility
13
The Bitzi Approach
  • A metadata aggregator, consisting of
  • Website
  • Community of contributors
  • Editorial/rating policies
  • Canonical datastore
  • Web service
  • Free access and reuse
  • Just give us attribution
  • Other restrictions only get in the way
  • Our long-term role stewardship
  • We live or die by the usefulness of the dataset

14
Sources of Inspiration
  • Open Directory Project
  • AKA NewHoo, GnuHoo, DMoz(illa)
  • Volunteer-built Yahoo-like categorical web index
  • CD/Music projects
  • CDDB (before dataset lockdown)
  • FreeDB MusicBrainz (since)
  • Oxford English Dictionary
  • The Professor and the Madman
  • Naspter et al
  • De facto quality filtering
  • Usenet (esp. FAQs), Epinions, Amazon reviews,
    EBay, Zagats

15
How Bitzi Works Bitprints Tickets
Every discrete file out there can be boiled down
to
Over time, the Tickets in our database collect
all the best metadata about the corresponding
original file.
At no point does Bitzi receive, store, transmit,
or link to actual files we deal strictly in
Bitprints and metadata.
16
How Bitzi Works Tickets Out
Our database grows to describe a useful
proportion of all files in circulation.
A wide variety of people and applications use
ticket info for their own ends.
010 110 001
010 100 101
  • Website visitors/searchers
  • Desktop file lookups
  • Media player apps
  • Filesharing apps
  • Derivative services

111 010 000
17
How Bitzi Works Tech Details
  • Our Bitprint
  • Master key into our catalog
  • Concatenation of two nonproprietrary hashes
  • SHA1 safe, standard
  • TigerTree different basis, range benefits
  • Robustness against research breakthroughs
  • Our data model terminology
  • Bitprints may be tagged
  • Tags are arbitrary XML blobs
  • Growing set of types
  • Usually coercible into a database row or RDF
  • Tags compete with each other as necessary
  • Tickets are created from the best tags

18
How Bitzi Works Current tools
  • Data collection
  • Downloadable Bitcollider utility
  • Windows Linux
  • Free source code
  • Calculates bitprint, extracts some intrinsic tags
  • Web forms
  • Viewing/rating/searching
  • All at our website

19
How Bitzi Works Open Code Data
  • Bitcollider bitprinting code available
  • Public Domain
  • C Java
  • Free dataset access OpenBits
  • Draft OpenBits License based on Open Directory
    Project license
  • Preliminary RDF dump available
  • http//preview.openbits.org
  • Eventually, at the Ticket granularity

20
Using Bitzi
  • On your desktop
  • Identify anything youve got including possible
    problems, newer versions, etc.
  • At our website
  • Find interesting potential new things to get in
    context, presented alongside other options
  • In other applications, devices, websites
  • Identify whats playing
  • Choose between offered options
  • Organize/correct your collection
  • Much more ?

21
Demos
  • Bitzi Bitcollider
  • Desktop utility
  • LimeWire
  • Evaluate search results before downloading
  • WinAmp
  • See more about whats playing
  • Bitzi Website
  • Search for new items of interest

22
Future Greater Integration
  • Standard, generic get facility
  • We expect single-click from Ticket asks multiple
    applications to locate matching file
  • Ticket info inside applications
  • Get Ticket direct from Bitzi, or elsewhere
  • Verify Ticket validity (cryptographically signed)
  • Display as locally appropriate

23
Future Website and Community
  • Enhanced search
  • Improved rating and peer-review processes
  • Browsing/Categorization
  • Automatic and manual
  • Dataset mining
  • Variety of rankings

24
Is this a Business?
  • Not all Tickets are (or should be) equal
  • Fuzzy vs. guaranteed trust
  • Community vs. promotional info
  • Attention is always scarce
  • Some special inserts will cost
  • Someone always needs to be found trusted
  • Users benefit
  • Fees subsidize verification procedures
  • Prices self-select for appropriateness
  • Has anyone succeeded with free lookups, paid
    inserts? (Yes examples should be obvious)

25
The End
  • Gordon Mohr
  • Founder Chief Technology Officer
  • Bitzi Corporation
  • Email gojomo_at_bitzi.com
  • Bitizen Page http//bitzi.com/bitizen/gojomo
  • OReilly Webloghttp//www.oreillynet.com/weblogs/
    gojomo

26
(No Transcript)
27
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com