Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages - PowerPoint PPT Presentation

About This Presentation
Title:

Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages

Description:

School of applied Science, Nanyang Technological University, Singapore. Project Objectives ... A computer interface for browsing the Warlpiri dictionary. ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 46
Provided by: kevin73
Learn more at: https://nlp.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages


1
Kirrkirr A Java-based visualisation tool for XML
dictionaries of Australian Languages
  • Kevin Jansz
  • Department of Computer Science, University of
    Sydney, Australia
  • Dr. Christopher Manning
  • Computer Science and Linguistics, Stanford
    University, USA
  • Dr. Nitin Indurkhya
  • School of applied Science, Nanyang Technological
    University, Singapore

2
Project Objectives
  • Aims of the project
  • examining the richness of lexical structure, in
    particular the connotational and figurative use
    of words
  • providing innovative ways for representing a
    dictionary, through creative use of the medium of
    computers
  • augmenting dictionaries from corpora
  • to be able to provide practical educationally
    useful programs as a result (at low labor cost)
  • Main initial target an interactive front end for
    exploring or using the Warlpiri dictionary.

3
Talk Outline
  • The research agendas
  • Kirrkirr A Warlpiri dictionary browser
  • The Lexical Database
  • exploiting the strengths of XML
  • indexing XML data
  • User interface and visualization
  • User studies

4
Research Program Lexicon
  • A lexicon is not just words but a vast network of
    associations between words and within and across
    the concepts represented by words
  • The aim of this work is to provide people with a
    better understanding of this conceptual map.
  • Traditional paper dictionaries offer very limited
    ways for making such networks visible
  • On a computer, one can imagine all sorts of ways
    of bringing out such relationships

5
MRD Structure
  • The internal structures of current Machine
    Readable Dictionaries usually merely mimic the
    structure of the printed form (Boguraev 1990)
  • Some work, notably WordNet (Miller 1995) has
    involved a fundamental rethinking of dictionary
    content and organization (here, organization via
    synsets which are related via links of part,
    subkind, opposite)
  • There has been little in the way of software to
    make them truly usable by different communities
    of users.

6
Initial focusKirrkirr a Warlpiri browser
  • Warlpiri is an Australian Aboriginal language
    spoken in the Tanami desert (NW of Alice)
  • A computer interface for browsing the Warlpiri
    dictionary.
  • Rich lexical materials have been collected by
    linguists over decades (Hales fieldwork from
    1959 on, MIT Lexicon Project in the 1980s)
  • Before Kirrkirr, results still havent been
    produced in a format usable by the community
    (only printouts)

7
Our educational goals
  • There are a number of reasons for focusing on
    Warlpiri for this electronic bilingual dictionary
  • There has been a large amount of research on
    Warlpiri creating one of the most comprehensive
    lexical databases for any Australian Language
  • There is a relatively large community of people
    interested in learning their traditional language
  • The low level of literacy in the region makes an
    e-dictionary potentially more useful than a paper
    edition as it is less dependent on good knowledge
    of spelling and alphabetical order. Making it fun
    and easy to use is a considerable help as well.
  • Features such as being able to point and click,
    and hear the words take the emphasis away from
    knowing the written form of the word before the
    system is used

8
Target user community
9
(No Transcript)
10
Kirrkirr A Warlpiri dictionary browser
  • (Jansz 1998 Jansz, Manning and Indurkhya 1999)
  • An environment for the interactive exploration of
    dictionaries.
  • Although our current work has just been with
    Warlpiri, the design is general (Arrernte coming
    soon!)
  • Attempts to more fully utilize graphical
    interfaces, hypertext, multimedia, and different
    ways of indexing and accessing information
  • Written in Java, it can either be run over the
    web high bandwidth or run locally (here Javas
    main advantage is cross-platform support).

11
Specific goals
  • An interactive environment that encouraged
    exploration easy and fun to use
  • Reduction of the dependence on alphabetical order
  • Catering to the needs of different user groups
    (kids, teachers, professionals)
  • Flexible enough to display appropriate
    information in appropriate ways depending on user
    level

12
Overview
  • Kirrkirr provides various modules
  • Graph layout of word relationships
  • Formatted dictionary entries
  • Semantic domain browsing
  • A notes facility for jotting in the margin
  • Multimedia audio, pictures
  • Advanced searching interfaces
  • others in planning formatting (XSL) editing,
    figuration patterns
  • These attempt to cater to users with different
    competence levels

13
(Kirrkirr screen shot)
14
(No Transcript)
15
The lexical database
  • Existing materials are stored in an ad hoc format
    of markup using backslash codes with some (rather
    odd) nesting of structural tags
  • These were converted to XML using an
    error-correcting stack-based parser (written in
    PERL).
  • The inconsistency and flexibility of dictionary
    entries actually made this a surprisingly
    difficult task.
  • But parser tries to impose data integrity
  • Use of XML gives a clear structure to the data,
    and makes available many (free) tools

16
XML
  • XML separates the structure of the data from its
    presentation
  • Much of the recent enthusiasm for XML has
    centered around representing simple and rigid
    structures such as database records
  • The rich hierarchical and variable structure of
    dictionary entries is really more what something
    like XML excels at!
  • Result remains a portable, tangible text file

17
Alternative a database
  • The obvious thing for storing a lot of data
  • Has clear advantages structure, indexing, query
    language, relationships, integrity.
  • Many people have suggested using a database for
    lexical data and some have actually done it
    (IITLEX, Austin and Nathan)
  • But in general lexicographers oppose the
    rigidity, and, in practice, standard relational
    databases are quite ill-suited to dictionaries

18
Problems with Relational Database
  • Dictionary entries vary enormously
  • Data is fragmented
  • Dictionaries are only loosely structured
  • Same element can appear at many levels (dialect,
    cross-reference, )
  • Database model is inflexible to extending the
    dictionary structure
  • Lessens portability

19
XML indexing - challenges
  • Despite the various XML parsers available, it is
    surprising that there has been little
    consideration in making single entries retrivable
    from the file
  • Present XML Parsers tend to put the entire XML
    document in memory (or some form of it) in
    memory, before the data extraction process begins
  • This is not practical when parsing significant
    XML databases (eg. the Warlpiri dictionary is
    approx. 10Mb).

20
XML Indexing - solutions
  • The heirachial structure of XML lends itself to
    indexing, as each separate entry in the XML file
    can be considered as a separate entity
  • To make the Warlpiri dictionary usable for
    Kirrkirr an ad hoc indexing system was developed
  • Uses a slightly modified Ælfred parser
  • Entries are indexed by headword
  • The system returns an XML document object
    containing the single dictionary entry,
    facilitating
  • processing for related words (Graph layout)
  • XSL processing to HTML

21
XML Indexing - solutions (2)
  • The use of the XML indexing process considerably
    improves effeciency as only requested entries are
    parsed, hence consering time and bandwidth
  • Once whole entries are parsed, they are kept
    temporarily in a cache
  • Thus the System uses XML as a median between the
    structure and indexing of a relational database,
    with the freedom and functionality of XML.

22
Kirrkirrs XML Index Process
Index in Memory
Kirrkirr
5
XML document object
23
XQL - Potential
24
(No Transcript)
25
Visualization of dictionary information
  • For applications with simple textual content
    behind them, there is little that can be done but
    an on-line reflection of a printed page
  • But we want more than just definitions of words
    we want to know their relationships to other
    words, and the patterning in these relationships
  • In a computational approach, can mediate between
    the lexical data and the user
  • The interface can select from and choose how to
    present information (according to the users
    preferences) in many different ways

26
Previous work
  • Current systems present the search-dominated
    interface of classic Information Retrieval
    systems you type a word in a search box
  • Results try to mimic, but are generally inferior
    to, the printed version of the dictionary
  • Good feature rapid searching
  • These systems do little to utilize the
    captivating qualities of computers
    interactivity, user control and adaptability
    (Brown 1985).

27
Previous work (2)
  • Only effective when user has a clearly specified
    information need even here, we are ignoring the
    distinction between information gained and
    knowledge sought (Sharpe 1995)
  • Lack browsing, and chances for incidental or
    curiosity driven learning
  • Lack tangibility and situatedness of paper
    ineffective for getting an idea of a collection
  • We wish to exploit the essence of hypertext,
    which is click to explore browsing

28
Previous work (3)
  • Little research work (in corpus linguistics,
    visualization etc.) on dictionary visualization
  • WordNet built a rich network of relationships,
    which fundamentally departed from the paper
    dictionary tradition, and has been used in many
    computational projects
  • However very little has been done in the way of
    interfaces that make these relationships visible
    and intelligible to users.
  • Graphical representations seem particularly
    important given our target users.

29
Graph-based visualization
  • There is a little previous work on graphical
    representations of dictionaries
  • For instance, the visual-thesaurus by plumbdesign
    derived from WordNet
  • But it is also a good demonstration of how
    chaotic and confusing graphical interfaces can
    become.

30
Graph-based visualization
  • (Jansz 1998 Jansz, Manning and Indurkhya 1999)
  • Classic graph layout problem
  • Adapts work by Eades et al. (1998) and Huang et
    al. (1998) on visualization and navigation of WWW
    document linkages
  • Uses the spring algorithm. Big advantage is that
    it is an iterative updating algorithm, and so
    gives an easy interactivity
  • it wiggles and people can play with it.
  • Clarity and simplicity of graph Software
    maintains a set of focus nodes to prevent
    overcrowding

31
Educational advantages
  • Alphabetical order is important, but
  • A web of words offers other effective
    opportunities for learning
  • A student can opportunistically explore words
    that are related in various ways
  • Important semantic relationships can be
    understood

32
Kirrkirr network display
33
Kirrkirr network display
34
(No Transcript)
35
Formatted dictionary entries
  • Are produced automatically from the XML by using
    XSL (James Clarks XT)
  • XSL allows easy modeling of some user
    preferences.
  • Most trivially, one can leave out information
    such as part of speech, or detailed definitions
  • This is useful as many users find information
    overload quite confusing and demotivating
  • Can produce bilingual or monolingual dictionary
  • Opportunities for various output styles, and
    formats such as RTF or TeX for printing.

36
Formatted dictionary entries
37
Rich typology of link types
  • The semantically rich types of linkages present
    in a dictionary (synonym, antonym, hyponym,
    subheadword, variant, coverbs, ) solves one of
    the major problems of the web we have many link
    types with a clear semantic interpretation
  • Use consistent color-coded text and edges to show
    these link types
  • Gives a richer browsing experience
  • Can tell where you are going before clicking

38
Browsing
  • Work (at PARC and elsewhere Pirolli et al. 1996)
    has stressed role for browsing as well as
    searching in information access
  • It provides a context for learning
  • We provide browsing in several ways
  • conventional hypertext
  • but with rich semantically-interpreted links
  • their color-coding matches network edges
  • network-based display of words
  • browsing through semantic domains
  • Such cultural information is hard to learn, and
    not normally in dictionaries or thesauri
  • Question can terminology sets be derived
    automatically from appropriate corpora?

39
(No Transcript)
40
User study problems
  • Since at present there is no dictionary available
    except the printed out database complete with
    markup codes, it was hard for many people to
    judge the use of the interface, since there was
    no point of comparison.
  • First impressions only It would have been good
    to let people try it out at their leisure, but
    unfortunately not possible (NT Ed all Macintosh,
    MRJ 2.1 shipping deadlines slipped past our study
    date)

41
User study
  • Mim Corris (Yuendumu, Willowra)
  • User testing with primary and (lower) secondary
    students
  • Comments from teachers, other adults etc.
  • Purely qualitative observational study of
    dictionary use
  • (Doing anything much else would be difficult.)
  • Initial reactions are very enthusiastic
  • Could use as a basis for classroom activities
    (better with some further development games and
    puzzles)

42
(No Transcript)
43
Conclusions
  • Kirrkirr is just a prototype of what one can do
    to visualize dictionaries
  • We have addressed the challenge of making
    dictionary information usable in the creation of
    an application which mediates between
    well-structured data and users needs for
    searching/browsing and presentation
  • While we have focused our research on Warlpiri,
    the system can be easily applied to other
    languages

44
Conclusions (cont.)
  • ... The best future applications of MRDs in
    education will be those most able to respond to
    the insights and needs of their users (Kegl
    1995)
  • Kirrkirr can be seen as a step towards the future
    of e-dictionaries

45
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com