Scripts, Layout and Segmental Awareness - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Scripts, Layout and Segmental Awareness

Description:

At an abstract level, symbols are just catenated together: the particular mode ... Chinese readers who have been exposed to the pinyin transliteration system ... – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 48
Provided by: richar781
Category:

less

Transcript and Presenter's Notes

Title: Scripts, Layout and Segmental Awareness


1
Scripts, Layout and Segmental Awareness
  • Richard Sproat
  • SALA 25
  • September 15-18, 2005
  • University of Illinois at Urbana-Champaign

2
Overview
  • Computational model of script layout
  • Application to Brahmi-derived scripts
  • Implications for phonemic awareness
  • Are readers of Indian scripts aware of phonemes?
  • A computational model of scriptal influence on
    phonemic awareness
  • Further issues phonology or writing?

3
Computational Theory of Writing Systems
  • Relation between writing and language is a
    regular relation (in the sense of formal language
    theory)
  • Writing tends to represent a consistent level of
    linguistic representation
  • Glyphs are combined via a small set of
    two-dimensional catenators

4
Two-Dimensional Catenators
95 of Chinese Characters ever invented consist
of a semantic and a phonetic component
5
?
?
?
6
?
?
?
7
?
?
?
8
?
?
?
9

?
?
?
10
Brahmi-Derived Indic Scripts
11
Properties of Indic Scripts
  • Glyphs are arranged into orthographic syllables,
    called aksara -CV- CV- CV -
  • Within each aksara
  • consonant sequences are first composed together
    using various script/glyph-dependent catenators
  • then vowels are arranged around the consonant
    glyphs using various catenators
  • Word-initial vowels are written in a full form
  • A consonant (sequence) written with no vowel
    symbol is understood to have an inherent vowel

12
Devanagari Vowels
13
Kannada Diacritic Vowels
14
(No Transcript)
15
(No Transcript)
16
g(x) graphical expression of phone x
gr(x) reduced form
17
Script Index Feature Vectors
expressed,catenator,reduced,fused,complex,transpa
rent
18
Summary of Formal Treatment
  • The theory explicitly treats Indic writing
    systems as segmental
  • At an abstract level, symbols are just catenated
    together the particular mode of catenation is
    only an issue of rendering.
  • Cf. text transmission standards such as Unicode.
  • But do Indic writing systems behave segmentally?

19
Alphabets and Segmental Awareness
  • A Claim Readers of non-alphabetic writing
    systems have no conscious awareness of segments
  • investigations of language use suggest that many
    speakers do not divide words into phonological
    segments unless they have received explicit
    instruction in such segmentation comparable to
    that involved in teaching an alphabetic writing
    system (Faber, 1992)
  • According to Faber, only Western alphabets, which
    represent both vowels and consonants inline,
    count as alphabetic
  • Indic scripts are not alphabetic, so readers
    should not have segmental awareness

20
Fabers Criteria
  • Faber classifies scripts according to two main
    criteria
  • Are all segments represented?
  • Are all segments represented linearly with vowels
    and consonants on a par (versus with some being
    diacritics)

21
Fabers Classification of Scripts
Korean
22
Ethiopic (Geez)
23
Is Segmental Awareness a Biproduct of Literacy in
an Alphabetic Script?
  • Recently literate Portuguese speakers outperform
    illiterates on phonemic segmentation
  • Japanese school children are less able to perform
    segmental manipulation tasks than their American
    counterparts
  • Chinese readers who have been exposed to the
    pinyin transliteration system outperform Chinese
    readers who have not had this exposure.
  • Conclusion literacy per se is not sufficient for
    phonemic awareness to develop. One needs an
    alphabet.

24
Segmental Awareness in Korean(Sohn, 1987)
Vowel switching
o
a
This is not expected on Fabers account
25
Segmental Awareness in Indian Languages
  • Padakannaya (2000) tested awareness of syllables
    and phonemes
  • Syllable manipulation rhyme recognition,
    syl.deletion,syl. reversal even illiterate
    speakers can handle these.
  • Phoneme manipulation ph. oddity, ph. deletion,
    ph. reversal these cause problems for
    readers of non-alphabetic writing systems.
  • Compared sighted children, who learned the
    Kannada script with blind children who learned a
    purely alphabetic Kannada Braille.
  • Blind children consistently outperformed sighted
    children on segmental manipulation tasks.

26
Phoneme Reversal
Kids start learning English
27
Phoneme Awareness and Graphic Prominence
  • Phonemic awareness in Kannada and other Indic
    writing systems is affected by how noticeable
    the components are (Padakannaya et al, 1993)
    this varies cross-scriptally.
  • Thus, Hindi speakers find it hard to treat
    anusvara and repha as separate segments.
  • But this is easy for Kannada speakers

28
Diacritics Cross-Scriptally
  • In Devanagari, anusvara is a diacritic
  • Also find it easier to delete /y/ in
    than /r/ in
  • Diacritics are less salient than non-diacritics
    in other scripts. E.g. work of van Heuven (2002)
    for Dutch
  • Errors in placement of diaeresis e.g. Bedouïen
    Bedouin have no effect on word recognition,
    unlike errors in letters, which have a
    significant effect.
  • But diaeresis is required according to the Dutch
    spelling conventions without the diaeresis
    Bedouien should be pronounced b?duj? rather
    than (correct) beduin

29
Phonemic Awareness and
  • Hindi speakers find it easier to delete /d/ in
    doshii than they do /n/ in nadii
  • Vaid and Gupta (2002) show that (inline) /i/ in
    Devanagari seems to be treated as a separate
    segment in reading.

30
Vaid Gupta (2002) Evidence for Devanagari as
an Alphabet
  • Studied naming latencies in Hindi-speaking adults
    and naming errors in Hindi-speaking children for
    words containing short /i/.
  • Single C /tilak/
  • Heterosyllabic C
    /masjid/
  • If D. is a syllabary then misorder should
    only cause problems if the C sequence contains a
    phonological syllable boundary (syllable-delimited
    view).
  • If D. is an alphabet then both /tilak/ and
    /masjid/ should cause problems
    (phoneme-delimited view)
  • Both /tilak/ and /masjid/ show slower naming and
    higher error rates than forms not including short
    /i/.
  • This is consistent with Devanagari being an
    alphabet.

31
Vaid Guptas Results Naming
32
Vaid Guptas Results Errors
33
Kannada Reduced Consonants
  • Padakannaya suggests an explanation for why
    deleting in should be
    harder than deleting the .
  • He notes that in cases where there is an explicit
    vowel, this is generally ligatured with the
    .
  • So the is more opaque than the
  • This is not wholly satisfactory

34
Proposed Model
  • The ease/difficulty with which a segment is
    available for conscious manipulation is directly
    related to two factors
  • The visual prominence of the graphemic
    representation of the segment
  • The complexity of the editing operations involved
    in transforming the graphic form of the stimulus
    into the graphic form of the response
  • How to compute edit distance?

35
An Alternative Explanation Edit Operations
36
Edit Operations rakta ? rata
  • Delete
  • Move up to inline position
  • Change into full form glyph

37
Edit Operations rakta ? raka
  • Delete

38
Edit Operations rakti ? rati
  • Delete
  • Move up to inline position
  • Change into full form glyph, linking with

39
Korean Vowel Switching
hobak (pumpkin)
habok
40
Formal Model
  • Cost of an edit operation is given by
  • We could hope to quantify the ?s by regression
    against real psycholinguistic data

Movement cost
Deletion cost
Substitution cost
41
Prominence and Similarity
  • Need some measure of what it means to be a
    diacritic
  • Also need a measure of similarity to quantify the
    cost of substituting one glyph form for another

42
Similarity Metric for Glyphs
  • 26 subjects took part in a web-based survey
  • Task was to rate pairs of glyphs on a 5 point
    scale of similarity
  • Least similar 1
  • Most similar 5
  • 153 pairs of glyphs were judged from 3 scripts
    Devanagari, Kannada and Malayalam

43
Some Dissimilar Glyphs
44
Some Similar Glyphs
45
Are we really talking about phonology?
  • Are peoples judgments of the number of sounds in
    a word influenced by
  • Number of phonemes?
  • Number of letters?
  • Answer seems to be that both are relevant
    (Scholes, 1993)

46
How many sounds in a word?
  • Scholes gave explicit instructions
  • at has 2 sounds
  • cat has 3 sounds
  • Used a verification test to make sure people had
    mastered the task

47
Results
48
So
  • No question that judgments about segments are
    influenced by spellings of words
  • But speakers still have some sense of the
    underlying phonological structure
  • In Indian languages, we might assume that
    speakers knowledge of phonemes is influenced by
    the layout of symbols, but tests of phonemic
    awareness are at least in part targeting
    phonological knowledge.
  • Explanation of phonemic awareness behavior seems
    to lie in understanding the graphical properties
    of the scripts involved.
Write a Comment
User Comments (0)
About PowerShow.com