Title: Enhancing social tagging with a knowledge organization system
1Enhancing social tagging with a knowledge
organization system
2Outline
- Who are STFC ?
- Controlled Vocabulary
- Social Tagging
- EnTag
- Aims
- Glamorgan/UKOLN/Intute Experiment
- STFC Experiment
- SKOS
3Science and Technology Facilities Council
- Provide large-scale scientific facilities for UK
Science - particularly in physics and astronomy
- E-Science Centre at RAL and DL
- Provides advanced IT development and services to
the STFC Science Programme - Also includes library and institutional
repository - Strong interest in Digital Curation of our
science data - Keep the results alive and available
- RD Programme
- DCC, CASPAR
- EnTag
4Controlled Vocabulary
- Traditional way of providing subject
classification - For shelf-marking
- For searching
- For association of resources
- Several different types used, such as
- Subject Classification
- Keyword lists
- Thesaurus
- Each has different characteristics
5HASSET (I)
- UK Data Archive, Univ of Essex
- Humanities and Social Science Electronic
Thesaurus - Some 1000s of terms
- Structure based on British Standard
57231987/ISO 2788-1986 (Establishment and
development of monolingual thesauri). - preferred terms, broader-narrower relations,
associated terms - http//www.data-archive.ac.uk/search/hassetSearch.
asp
6HASSET (II)
7HASSET (III)
8Observations on using controlled vocabularies
- Precise classification of resources
- Good for precision and recall
- Can exploit the hierarchy to modify query
- Using the broader/narrower/related terms
- Highly expensive
- Requires investment in specialist expertise to
devise the vocabulary - Requires investment in specialist expertise to
classify resources. - Hard to maintain currency
9Social Tagging
- The Web 2.0 way of providing search terms
- People tag resources with free-text terms of
their own choosing - Tags used to associate resources together
- del.icio.us, flickr
- Folksonomy
- the terms a community choses to use to tag its
resources.
10Connotea
11Connotea sharing tags
12Connotea Tag Cloud
13Observations on Social Tagging
- People often use the same tags or keywords (e.g.
Preservation, Digital Library) - this makes things which mean the same thing to
people easier to find - Cheap way of getting a very large number of
resources marked up and classified - Represents the community consensus in some
sense - The Wisdom Of Crowds
- Has currency as people update
- Tag clouds of popular tags
- However, people often use similar but not the
same tags - e.g. Semantic Web, SemanticWeb, SemWeb, SWeb
- People make mistakes in tags
- mispellings, using spaces incorrectly.
- Some tags are more specific than others
- E.g. controlled vocabulary, thesaurus, HASSET
- People often associate the same words together
with particular ideas in images - these are captured in clusters
14EnTag Project
- Enhanced tagging for discovery
- JISC funded project
- Partners
- UKOLN
- University of Glamorgan
- STFC
- Intute
- Non-funded
- OCLC Office of Research, USA
- Danish Royal School of Library and Information
Science - Period 1 Sep 2007 -- 30 Sep 2008
- http//www.ukoln.ac.uk/projects/enhanced-tagging/
15EnTag Background
- Controlled vocabularies
- Improve information retrieval and discovery
- But, costly to index with, especially the amount
of digital documents - Require subject and classification experts
- Social tagging
- Holds the promise of reducing indexing costs
- Uses terms describing how people see the resource
- Serendipity
- But, tags uncontrolled,
- missed associations
- Relating different views
- Highly personal (me, important),
- Quality and ranking
- Depth of term
16EnTag Purpose
- Investigate the combination of controlled and
social tagging approaches to support resource
discovery in repositories and digital collections - Aim to investigate
- whether use of an established controlled
vocabulary can help move social tagging beyond
personal bookmarking to aid resource discovery
17EnTag Objectives
- Investigate indexing aspects when using only
social tagging versus when using social tagging
in combination with a controlled vocabulary - In particular, does this lead to
- Improve tagging
- Relevance of tags (perspective, aspects,
specificity, exhaustivity, terminology
(linguistic level, semantic level, contextual
level) - Consistency
- Efficiency (time used, user satisfaction)
- Use (tags selected, clouds consulted, order of
consultation) - Improve retrieval
- Effectiveness (degree of match between user and
system terminology) - In two different contexts
- Tagging by readers
- Tagging by authors
18Testing Approach
- Main focus
- free tagging with no instructions
- Versus
- tagging using a combined system and guidance for
users - Two demonstrators
- Intute digital collection http//www.intute.ac.uk
- Major development
- Tagging by reader
- DDC
- STFC repository http//epubs.cclrc.ac.uk/
- Complementary development
- Tagging by author
- A more qualitative approach
19Intute
20Intute demonstrator searching
21Intute demonstrator basic tagging
22Intute demonstrator enhanced tagging
23EnTag Intute user study (II)
- Test setting
- 50 graduate students in political science
- 60 documents, covering up to four topics of
relevance for the students - Data collection
- Logging time spent, selection patterns,
- Pre- and post-questionnaires
24EnTag Intute user study (I)
- Test comparison of basic and advanced system
- Indexing
- Perspective, specificity, exhaustivity
- Linguistics (word class, single word/compound,
spelling, language) - Consistency
- Efficiency (time used, user satisfaction)
- Use (tags selected, clouds consulted, order of
consultation) - Retrieval efficiency
- Degree of match between user and system
terminology - user tags, DDC tags, controlled Intute keywords,
title terms, text terms
25STFC Case Study EPubs
26STFC demonstrator
27STFC Author study
- A study on a Authors of papers
- Smaller number - c.10-12.
- Regular depositors ( gt 10 papers each)
- Subject experts
- Expect that they would want their papers
accurately tagged so that they are precisely
found - A more qualitative study
28Expected Feedback
- Relative value of tagging vs. controlled terms
- Does it give more satisfactory (accurate,
consistent) tags? - Does it lead to the consideration of tags they
would not have thought of? - Do they select deeply in the hierarchy?
- Is this something they would like to see
supported more, and would use? - Is it worth the overhead?
- How we should use a combination of tagging and
controlled vocab in our system ? - To Be Continued..
29Building a Web of Knowledge
- Social tagging and controlled vocabulary
complement each other - Tagging entry level, quick, does the job, but
error prone, fuzzy - Controlled vocabulary, accurate, but slow and
expensive - Use one to leverage the other
- Use both to build a Web of knowledge
- The things in the world and their link via their
subjects - Get the users to build the means of organising
the knowledge
30http//purl.org/net/aliman
30
31SKOS Simple conceptual relationships
32Conclusions
- Controlled vocabulary and Tags complement each
other - Hope to get some interesting evidence over the
next month as the studies are complete. - Web 2.0 world offers the possibility of
combining these results - SKOS a format to use both tags and controlled
vocabulary as part of the Web of Linked Data - Also use Web 2.0 to build the vocab themselves.
33- Questions?
- b.m.matthews_at_rl.ac.uk
34EnTag Enhanced tagging for discovery
- Research collaboration between Glamorgan
University, UKOLN, INTUTE, CCLRC, OCLC, and DB - Financed by JISC Capital Programme
- Research goal
- Investigation of the combination and comparison
of controlled and folksonomy approaches to
semantic interoperability supporting resource
discovery in repositories and digital collections
- Evaluation in two communities of use at Intute
(Social science), focussing on tagging by readers
(postgraduate users), and at CCLRC, focussing on
tagging by authors - The two studies are carried out as separate
projects - Intute project use DDC as controlled vocabulary
- Evaluation by quantitative and qualitative
measures
35Evaluation Intute focus and objective
- Context tagging as part of information
searching and relevance assessment, tagging for
recommendation and sharing - Hybrid system investigate whether tagging can
be improved by a combination of traditional tag
clouds and clouds of controlled descriptors,
including interactive tools such as tag
suggestions, access to browsing of DDC, etc. - Improve tagging
- Relevance of tags (perspective, aspects,
specificity, exhaustivity, terminology
(linguistic level, semantic level, contextual
level) - Consistency
- Efficiency (time used, user satisfaction)
- Use (tags selected, clouds consulted, order of
consultation) - Improve retrieval
- Effectiveness (degree of match between user and
system terminology)