Title: Semantics, Syndication and Social Networks: Mechanisms for Future Structured Information Spaces
1Semantics, Syndication and Social Networks
Mechanisms for Future Structured Information
Spaces Hamish Cunningham (University of
Sheffield) Werner Haas (Johaneum Research) Ant
Miller (BBC) Libby Miller (University of
Bristol) Ralph Traphoener (Empolis /
Bertelsmann) Paul Warren (British Telecom)
2Whats the difference between Mother Theresa and
Tony Bliar? http//gate.ac.uk/
http//nlp.shef.ac.uk/ Hamish Cunningham Dept.
Computer Science, University of Sheffield
3Why semantic metadata?
- Different types of metadata allow different types
of search (but also incur different costs and
have different limits) - full text "find me Nevsky in Bulgaria"
- taxonomy / thesaurus / semantic annotation /
ontology "find me churches in Eastern Europe" - E.g. BBC's INFAX taxonomic system 66 of
searches would fail if only full text - The web promotes diversity but also
fragmentation there's too much of it less and
less impact for curated data - In face of this cultural memory institutions need
- Syndication and mediation (to pool outlets and
multiply impact) this means presentation-independ
ent, multipurpose content - Users as assistants (to cut the cost of
metadata) this can mean shared
conceptualisations of content - How do we get there?
4The semantic web and why you can't have it (yet)
- The semantic web is about a semantic layer for
interoperability, machine-readability, inference
ideal for semantic libraries? - Problems
- Construction and maintenance of shared
taxonomies, terminologies ontologies is
expensive - Annotation of content relative to them is v.
expensive - How does a machine tell the difference between
"Mother Theresa is a Saint" and "Tony Blair is a
Saint"? (Beyond the shallow and the general we
get into typical AI problems, the contextual and
shifting nature of meaning, etc.)
5Four promising directions
- Use recommender systems to make the users into
curators assistants (who tells Google which page
is important? other web users do, by linking
also Amazon) - Allow curators and users to DIY simple specific
ontologies and KBs (targetted adjuncts to general
models like CIDOC) - Use Information Extraction (IE) to populate
semantic models - Ride the next wave of social software and on-line
communities (Wikis, Bloggs, OSN, file sharing /
P2P, RSS/ATOM)
6IT context the Knowledge Economy and Human
Language
- Gartner, December 2002
- taxonomic and hierachical knowledge mapping and
indexing will be prevalent in almost all
information-rich applications - through 2012 more than 95 of human-to-computer
information input will involve textual language - A contradiction
- to deal with the information deluge we need
formal knowledge in semantics-based systems - our archived history is in informal and ambiguous
natural language - The challenge to reconcile these two phenomena
7HLT Closing the Loop
KEY MNLG Multilingual Natural Language
GenerationOIE Ontology-aware Information
ExtractionAIE Adaptive IECLIE Controlled
Language IE
(M)NLG
Semantic Web Semantic GridSemantic Web
Services
Formal Knowledge(ontologies andinstance bases)
HumanLanguage
OIE
(A)IE
ControlledLanguage
CLIE
8Information Extraction
- Information Extraction (IE) pulls facts and
structured information from the content of large
text collections. - Contrast IE and Information Retrieval
- NLP history from NLU to IE
- Progress driven by quantitative measures
- MUC Message Understanding Conferences
- ACE Advanced Content Extraction
- General Architecture for Text Engineering (GATE)
http//gate.ac.uk/
9IE Example
- The shiny red rocket was fired on Tuesday. It is
the brainchild of Dr. Big Head. Dr. Head is a
staff scientist at We Build Rockets Inc.
- NE "rocket", "Tuesday", "Dr. Head, "We Build
Rockets"
- CO"it" rocket "Dr. Head" "Dr. Big Head"
- TE the rocket is "shiny red" and Head's
"brainchild".
- TR Dr. Head works for We Build Rockets Inc.
- ST rocket launch event with various participants
10Ontology-based IE
XYZ was established on 03 November 1978 in
London. It opened a plant in Bulgaria in
Ontology KB
Location
Company
HQ
partOf
City
Country
type
type
HQ
type
type
establOn
partOf
03/11/1978
11A Necessary Trade-Off Domain specificity vs.
task complexity
general
acceptableaccuracy
specificity
domainspecific
complexity
complex
simple
bag-of-words
events
entities
relations
12Open information, defended communities
- Trend 1 seconds out, round 5 file sharing is
about to go social - Trend 2 the living room is about to be
computerised - What will happen when all your living room
devices fold into a single PC? - Bill Gates hopes you'll be running Windoze, but
Consumer Electronics firms bet on Linux stable
hardware (no viruses, no crashes, cheap, ...) - What if these two trends combine? Ubiquitous
on-line communities centred on shared content,
with a model of trust - What if memory institutions provide means of
organising, explaining, interlinking the
cross-over between modern popular culture and the
curated memory? - Important because DRM is the beginning of the end
of civilisation as we know it (controls how you
consume media you buy has the potential to be
linked with censorship and with invasive
behaviour logging) - you can't make digital objects behave like
physical objects - unless you totally control the
hardware and the operating system - if someone has control, then we may end up
finding that someone has given the contract for
preserving our culture to Haliburton
13Memory is not a luxury
- C21st all the C20th mistakes but bigger
better? - If you dont know where youve been, how can you
know where youre going? - Libraries, museums, archives ammunition in the
war on ignorance (more dangerous than terror?) - Ammunition is useless if you cant find it new
technology must make our history accessible to
all, for all our futures
14Summary
- Cultural memory can benefit from semantic
metadata, presentation-independence and
repurposing - Semantic web technology
- no it wont make machines intelligent
- perhaps simple specific models can work
- Four ways to cross the AI bridge DIY models
recommenders IE OSN P2P - This talk http//gate.ac.uk/talks/ecdl-sept-2004.
ppt - More http//gate.ac.uk/ ? Related projects