Text Classification of USENET messages for a Conversation Visualisation System Final Year Project Fi - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Text Classification of USENET messages for a Conversation Visualisation System Final Year Project Fi

Description:

if ($ARGUMENT = 2) { $classification = 'ARGUMENT'; 6th May 2003 ... Key excerpts given to human testers (ten people) asked to rate. System vs. Humans! ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 17
Provided by: jolyon6
Category:

less

Transcript and Presenter's Notes

Title: Text Classification of USENET messages for a Conversation Visualisation System Final Year Project Fi


1
Text Classification of USENET messages for a
Conversation Visualisation System Final Year
Project Final Presentation
  • Jolyon Hunter
  • cs91jh_at_surrey.ac.uk
  • www.jrth.co.uk
  • Tuesday 6th May 2003

2
Introduction
  • Aim
  • To investigate how messages and conversations on
    USENET newsgroups can be classified automatically
    as part of a system to visually represent online
    discussions.
  • Objectives
  • To review systems which visualise online
    discussions -enabling the identification of
    phenomena to be visualised
  • To analyse 250,000 word corpus of text try to
    identify potential cues for classification
  • To specify and design a system for automatic
    classification of messages/conversations
  • To implement, test and evaluate this system

3
Conversation Visualisation Systems?For example
PeopleGarden
Xiong, Rebecca Donath, Judith 1999
PeopleGarden Creating Data Portraits for Users
MIT Media Laboratory http//smg.media.mit.edu/be
cca/
  • Others includeLoom (Donath et al), Netscan
    (Smith) and Conversation Map (Sack), and
    CodeZebra (Diamond et al)

4
Phenomena to Visualise and how to do it!
  • Emotion (Happy, Sad)
  • Agreement/Disagreement (Argument)
  • Involvement Sense of Community
  • Character traits of users and many more
  • How to Classify?
  • Automated Text Analysis
  • Smokey (Spertus)
  • WebSOM (Kohonen)
  • CLUTO (Karypis)

5
Analysis Overview
  • HOW? Initial Observations phenomena
    featuresIn-depth corpus analysis
  • WHAT? 6000 messages from various newsgroups (4
    million words)
  • UniS/CodeZebra Workshop features (words)
  • Using System Quirk to extract words frequency
    counting (Kontext) gtgt Relative Frequencies
  • Using gCLUTO to visualise data for interpretation
  • WHY? Formulate programmable rules to code into a
    system

6
gCLUTO Visualisations
  • Visualise clusters and the relationships between
    clusters
  • Possible to see patterns or heuristics to help
    derive rules
  • CLUTO has potential for future use within a
    system to automatically classify text - e.g.
    real-time clustering

7
Analysis Creating Rules
  • Possible to derive example rules from analysis
  • More analysis random sample using 6 classes
  • Similar patterns emerge
  • Example rules also gtgtgt SYSTEM!

8
System Development
  • Process Model of Software EngineeringRequirement
    s, Design, Implementation, Testing and Evaluation
  • SystemSystem Quirk gt Rules gt Program gt
    CLASSIFICATION
  • Rule-Based Processor IF..THEN.. Rules coded
    into Perl program to produce classifications

9
Generic Conversation Visualisation System
10
Message Text Analysis Module
11
Perl Code Key points
  • IFTHEN RULES (as seen earlier) CLASS COUNTER
  • if((word eq "agree") (relativeword gt
    0.003))
  • AGREEMENT
  • CLASSIFICATIONS
  • if (AGREEMENT gt 2)
  • classification "AGREEMENT"
  • if (ARGUMENT gt 2)
  • classification "ARGUMENT"

12
Testing Evaluation
  • Ten sample messages either Agreement or
    Disagreement
  • Small sample
  • Key excerpts given to human testers (ten people)
    asked to rate
  • System vs. Humans!
  • System correct 3 times, most inconclusive
  • Human responses correlate with system, but
    ambiguities also exist
  • Conclusions? Results not conclusive but show
    promise gt Larger sample more research

13
Recap Mission Accomplished?
  • Aim
  • To investigate how messages and conversations on
    USENET newsgroups can be classified automatically
    as part of a system to visually represent online
    discussions.
  • Objectives
  • To review systems which visualise online
    discussions -enabling the identification of
    phenomena to be visualised
  • To analyse 250,000 word corpus of text try to
    identify potential cues for classification
  • To specify and design a system for automatic
    classification of messages/conversations
  • To implement, test and evaluate this system

14
Text Classification of USENET messages for a
Conversation Visualisation System
  • Thanks for listening
  • Any Questions?

15
Final Report
  • The Final Report for this project is also
    available online at
  • www.jrth.co.uk

16
REFERENCES
  • Loom" Judith DonathDonath, Judith 2002 A
    Semantic Approach to Visualising Online
    Conversation Communications of the ACM 45(4)
    45-49http//web.media.mit.edu/kkarahal/loom/inde
    x.html
  • Conversation Map Warren SackSack, Warren 2000
    Design for Very Large-Scale Conversations Ph.D.
    Thesis, February 2000, MIT Media Laboratory
    http//www.sims.berkeley.edu/sack/cm/
  • Netscan Marc SmithSmith, Marc. 2001. Netscan
    A tool for measuring and mapping social
    cyberspaces. http//netscan.research.microsoft.c
    om
  • PeopleGarden Rebecca Xiong Judith
    DonathXiong, Rebecca Donath, Judith 1999
    PeopleGarden Creating Data Portraits for Users
    MIT Media Laboratory http//smg.media.mit.edu/be
    cca/
  • CodeZebra Sara DiamondDiamond, Sara (Project
    Leader) - Banff New Media Institute, Canada plus
    many others (inc. Dr. A. Salway, University of
    Surrey)http//www.codezebra.net
  • Smokey Ellen SpertusSpertus, Ellen 1997
    "Smokey Automatic Recognition of Hostile
    Messages, Innovative Applications of Artificial
    Intelligence 97http//www.spertus.com/ellen/
  • WebSOM Teuvo KohonenKohonen, T. 1996 onwards
    more details at http//websom.hut.fi/websom/
  • CLUTO George KarypisKarypis, George - 2002 -
    CLUTO, gCLUTO and wCLUTO University of
    Minnesota, MN USA Software available from
    http//www-users.cs.umn.edu/karypis/cluto/
Write a Comment
User Comments (0)
About PowerShow.com