Building the Federal Multilingual Infrastructure in Unicode Foreign Language Dictionary Tools - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Building the Federal Multilingual Infrastructure in Unicode Foreign Language Dictionary Tools

Description:

Title: PowerPoint Presentation Last modified by: John Kovarik Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 17
Provided by: chine2
Category:

less

Transcript and Presenter's Notes

Title: Building the Federal Multilingual Infrastructure in Unicode Foreign Language Dictionary Tools


1
Building the Federal Multilingual Infrastructure
in UnicodeForeign Language Dictionary Tools
  • .

2
Project Goals
  • Unite federal foreign language analysts in
    communities of interest by language to increase
    the speed and accuracy of multilingual work
  • Outgrowth of NSA legacy individual foreign
    language dictionary tools
  • Share Next Generation tool suite across the
    federal government in 90 languages

3
Foreign Language Work 1970s
  • Manual tools
  • Hardcopy dictionaries (2-10 per person)
  • 3x5 card files for specialized vocabulary
  • Pen and paper only
  • Work environment
  • Career analysts revered as subject matter experts
    rule the work place.
  • College graduates hired right out of school, some
    with military experience, enter the job.

4
Foreign Language Challenge IThe classic sparse
data problem
  • Never enough vocabulary
  • Never enough grammar training
  • Never enough cultural knowledge

5
Foreign Language Challenge IIWhy its a sparse
data problem.
  • Communication is usually spontaneous between 2 or
    more people who share a great deal of special
    knowledge in common
  • Ultimate goals often not explicit
  • Ambiguity reigns for outsiders
  • No simple rules for filling in the blanks

6
An example ?? ? ?? ??? ?? ? ?? ?? ?? ?
  • All glossed (4 min/chr 17chrs) meaning
    obscureFemale people go hit knock bamboo
    curtains secret doctor come untie decide her ask
    issue.
  • All phrases verified (longest string match9)
    clearerA woman goes and knocks on the bamboo
    curtains secret doctor to come resolve her
    problem.but still uncertain
  • Check for neologismgo to FBIS recent
    translations, look to clarify meaning of new term
    knock bamboo curtain.
  • Knock on the bamboo curtain for a secret doctor
    seek out an illegal quack
  • A woman (must) go seek out an illegal quack to
    resolve her problem.

7
People say, Whats the big deal with just an
on-line dictionary?
  • I never/seldom use a dictionary!
  • Native speaker syndrome
  • Vast majority of people must use a dictionary in
    a second/third language
  • Hardcopy dictionaries are better.
  • Cant do wild-card searches by hand
  • Not engineered for 10 sec. avg. response
  • Humans tire machines do not.

8
1991First Generation Dictionary DB Tool
  • 200,000 entries from 3x5 cards collected over 20
    years
  • Wild card searchable
  • Cross referenced 4 ways in accordance with user
    requirements
  • Displayed in native script
  • Can cut and paste queries/responses

9
Reactions to 1st Generation Tool
  • Younger analysts used it liked it made great
    suggestions to improve it
  • Senior analysts usually would not use it

10
19952nd Generation Dictionary DB Tool
  • Responses faster on queries with leading wild
    card
  • GUI customized per user input
  • Candidate entry system established
  • Usership doubled !
  • Senior analysts start to use it

11
19983rd Generation Dictionary DB Tool
  • Database re-encoded in UTF8
  • Simultaneous simplified and traditional Chinese
    display enabled
  • Average 1,000-3,000 candidate entries approved
    annually 98-02
  • Usership again doubled !

12
Today WordscapeThe Next Generation Dictionary
Tool
  • Retains all Chinese capabilities
  • Expands to all language collections
  • Neologism newswire research tools
  • Over 90 languages represented in one Unicode DB
    unified under one XML schema and one suite of
    tools
  • Under LASER ACTD funding, extending all across
    the federal government!

13
Technology and Standards
  • New technology being used
  • Benefits of scale from use of UTF8, XML
  • Standards adoptedleading change
  • Participating in ISO standards group Technical
    Committee 37 on terminology and language
    resources (developing standardized formats for
    foreign language lexical resources and data
    exchange)

14
When do Unicode standards fail? When Unicode
standards are not standard!
  • 3rd World languages less commonly taught in the
    United States
  • Hindi (many different script rendering
    implementations)
  • Mongolian (no standardized spelling, many
    newswire web sites employ non-standard fonts)

15
Language Knowledge Services Team/Resources
  • John L. George Program Manager (301) 688-9133
  • Over 20 computer scientists/techs
  • Currently deploying Beta version
  • Learning from testing with earlier version
    instantiations at FBI and NSA
  • on JWICS now, SIPRnet/NIPRnet next

16
Contact Information
  • John J. Kovarik
  • Senior Language Technology Authority
  • NSA Representative to LASER ACTD 
  • National Security Agency
  • 9800 Savage Road
  • Suite 6486 S2
  • Phone (301) 688-7198
  • Kovarik_at_afterlife.ncsc.mil
Write a Comment
User Comments (0)
About PowerShow.com