HTAC 2003 Spring Colloquium Humanities Computing in the 21st Century 2 May 2003 Poster Session: Rese - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

HTAC 2003 Spring Colloquium Humanities Computing in the 21st Century 2 May 2003 Poster Session: Rese

Description:

Marjorie K.M. Chan. The Ohio State University chan.9_at_osu.edu CONCORDANCES. Concordances are, by dictionary definition, 'an alphabetical index of the ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 29
Provided by: marjor
Category:

less

Transcript and Presenter's Notes

Title: HTAC 2003 Spring Colloquium Humanities Computing in the 21st Century 2 May 2003 Poster Session: Rese


1
HTAC 2003 Spring Colloquium Humanities
Computing in the 21st Century2 May 2003Poster
Session Research ToolsChinese Computing
ConcordancingMarjorie K.M. Chan The Ohio
State University
2
  • CONCORDANCES

Concordances are, by dictionary definition, an
alphabetical index of the principal words in a
book or the works of an author with their
immediate contexts. In the West, traditional
examples are concordances of the bible. In
pre-computer days, these concordances were
compiled manually, one word at a time. There
exist today many concordances covering the bible
and other important historical writings and
literary works. A search for concordance under
Subject heading in the Ohio State University
Libraries online catalogue, for example, yields
1,054 entries. Besides 75 titles under the entry,
Bible Concordances, English, one also finds
Apocryphal books (Old Testament)
Concordances, Koran Concordances, Arabic,
Aristophanes Concordances, Austen, Jane,
1775-1817 Concordances, Baudelaire, Charles,
1821-1867. Fleurs du mal Concordances, etc.
For Chinese works, these include a three-volume
concordance of the poems of Bai Juyi (???),
eleven-volume concordance of Tang poetry, Laozi
(??). Dao de jing (???) Concordances, Li ji
(??) Concordances, and so forth.
3
  • Exhaustive Concordance of the Bible
  • 1,920 pages, two columns per page, three
    segments per column, including
  • 270 pages for two Dictionaries 8,674 words in
    Hebrew 5,624 words in Greek
  • (from pages 168 and 169)

4
  • CONCORDANCERS

Concordancers are tools developed for mainframes,
PCs, Macs, and other computers and operating
systems for conducting searches for words, or
strings within a word, and then, in a matter of
seconds, exhaustively listing the occurrences of
that word (or string) in the electronic corpus,
together with the contexts in which the words or
strings occur in the source text. Hence, rather
than time-consuming, manual tasks of
concordancing, these concordancing software, or
concordancers, provide an easy, and yet powerful,
means to genderate concordances. Such
concordancing results enable one to study the
multiple meanings and functions of a given word,
compare usages and distribution of two or more
words that are near-synonyms, analyze vocabulary
choices and grammatical patterns in different
portions of a literary work to determine single
or multiple authorship, generate collocations for
studying words commonly associated with the
searched word, and so forth. A full concordance
of a corpus also yield frequency statistics that
can be useful for muliple purposes, including
language teaching and textbook-writing.
5
  • CHINESE CONCORDANCING

Concordancers have been used in foreign language
classrooms in the past decade for English and
other Western languages, but the same is not true
for learning Chinese. There are ample e-texts
encoded in GB and Big5 on the web that can be
used as corpora. Nonetheless, there is still a
lag in using concordancing programs in Chinese
language teaching. There are a number of reasons
for the lag. These include problems of
double-byte Chinese characters and non-spaced
Chinese text that make searching,
wordlist-generation, creation of concordances (in
keyword-in-context (KWIC) format), sorting, and
so forth, a major challenge if one wishes to use
concordancing programs that were designed for
English orthography that have, in turn, been
extended for other, left-to-right orthography
that use single-byte encoding. Two software
programs that are currently available for making
concordances will be introduced. We will explore
together issues of how concordances can be used
for language teaching with respect to materials
preparation and teacher- or student-initiated,
interactive searches using concordancing programs
and appropriate e-texts.
6
  • What does Concordancing
  • Look Like?

Lets begin by taking a look at
what concordancing is like displayed in
a concordancer.
7
  • Some Examples
  • from
  • English
  • (Figures 1 to 4)

8
Figure 1. A KWIC (Keyword-in-Context) display of
a search for the word, handsome in the 1815
British novel, Emma, by Jane Austen.
9
Figure 2. Collocates of handsome in Jane
Austens Emma (1815).
10
Figure 3. A fast concordance of quite,
rather, and very in Jane Austens Emma
(1815).
11
Figure 4. A full concordance of Jane Austens
Emma (1815).
12
Concordancers and Concordancing in Chinese
  • For Chinese, to conduct concordances, there are
    problems of
  • Displaying of Chinese characters
  • Texts with no spacing between
  • characters, or grouping of polysyllabic
  • words flanked by spaces.

13
Examples using Wenlin 3.1 Concordance 3.0
Running under English Windows 2000 (Figures 5
to 15)
English Windows 2000 has much better
multilingual support than English Windows 95/98.
(English Windows XP (professional edition), which
also has excellent multilingual support, is not
used in this workshop.)
14
Figure 5. Use of Wenlin 3.1 for searching for ?
in chapter 2 of the Hongloumeng, one of the
non-spaced, GB-encoded e-texts included with the
program. As shown, there are 15 hits.
15
Figure 6. Two windows are displayed. The top
window shows two occurrences of ?, with the
second occurrence highlighted in lower window
showing the full context.
16
Figure 7. With spaced text in MonoConc Pro, could
also search for strings containing a wildcard. In
the display here, search was for ? _at_ ?, where _at_
is a wildcard that I, as a user, set to 9 words
or less for this particular search. Counted as
words here are the pairs of Chinese punctuation
marks. The last row contains 9 words between ?
and ?. The upper window displays the context from
the e-text containing the searched string
highlighted in the KWIC display window. (Win98
with decoder used for displaying Chinese.
17
Figure 8. Concordance of ?? in the Mary
Erbaughs Pear Stories, with KWIC display, using
Concordance 3.0. The corpus is a set of
non-spaced e-texts.
18
Figure 9. Concordance of ? ... ? using
wildcards in the Pear Stories (non-spaced
e-texts).
19
Figure 10. Concordance of ??, ??, and ?? in Bai
et al.s Across the Straits (1998), a set of
character-spaced e-texts.
20
Figure 11. Concordance of ??, ??, and ?? in
Dialogue A16-17 of Duanmu San et al.s Taiwanese
Putonghua (1997), a word-spaced e-corpus.
21
Figure 12. Concordance 3.0 can also perform a
full concordancethat is, a concordance of every
single word in the e-corpus. An example is given
here. The screenshot above displays a source
e-text that is the word-segmented spoken corpus,
Taiwanese Putonghua.
22
Figure 13. A full concordance of the words in
the corpus (see fig. 12). The Headword column on
the list is sorted by frequency, with number of
hits and percentages shown. The highlighted word
is ??, with 55 hits. In the KWIC display Context
window, sorting is performed to the string to the
right of the keyword, ??. In the Context window,
one line is highlighted. The full context is
displayed in the View window, where the keyword
is highlighted.
23
Figure 14. The headwords in the concordance in
Figure 13 are sorted here by word endings
(string). All instances of ? in the corpus occur
together, providing a basis for analyzing ? not
only as a main verb and as an element in
compounds, but also as a directional complement
co-occurring with specific types of verbs.
24
Figure 15. Collocations can also be displayed.
In this figure, the highlighted headword is ?,
with 114 hits, which occurs in a number of
polysyllabic compounds and phrases. In this
display, the Context Window is sorted by the word
after the headword.
25
Figure 16. The highlighted headword,?, in Figure
15 is displayed here with collocates to its right
(first, second, third, and fourth character to
its right of?) in the top half of the figure, and
with collocates to its left (fourth, third,
second, and first character to the left of ?) in
the bottom half of the figure. In each column,
the characters are displayed in descending order
of frequency.
26
  • The figures provide a sample of concordancing
    results doing
  • different searches and using spaces
  • and non-spaced e-texts.

27
  • Concluding Remarks
  • Course-material preparations by teachers
  • Teacher-guided, student-initiated concordancing
  • Authentic and pedagogically-prepared e-texts as
    corpus,
  • depending on students language competence
  • Searches of specific words/phrases vs. full
    concordances
  • for studying an entire corpus
  • Concordance for language teaching, corpus-based
  • synchronic and historical linguistic
    research, literary
  • studies, etc.

28
  • The End
Write a Comment
User Comments (0)
About PowerShow.com