Automatic Generation of Speech Interface for GUI ToolsApplications using Accessibility Framework - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Automatic Generation of Speech Interface for GUI ToolsApplications using Accessibility Framework

Description:

ATK: Accessibility Toolkit ... ATK allows us to describe roles and states for ... GNOME applications/widgets achieve accessibility by implementing ATK. ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 50
Provided by: Naveen53
Category:

less

Transcript and Presenter's Notes

Title: Automatic Generation of Speech Interface for GUI ToolsApplications using Accessibility Framework


1
Automatic Generation of Speech Interface for GUI
Tools/Applications using Accessibility Framework
  • Naveen Kumar, M Sasikumar
  • CDAC Mumbai

2
Overview
  • Importance of interaction through speech.
  • Problems in making interaction through speech
    possible.
  • Accessibility API's (ATK, AT-SPI, GAIL etc.).
  • Speech Recognition Engines(Sphinx-4).
  • Architecture and approach for Speech enabling
    linux desktop.

3
Jargons I may use...
  • Widget Any GUI component (e.g button, menu iterm
    etc.)
  • AT Assistive Technology
  • ATK Accessibility Toolkit
  • Acoustic model mathematical description of the
    trained spoken phrases, which are used for speech
    recognition.
  • Speech Corpus Acoustically recorded data that
    can be used to create acoustic model.

4
First Principal
  • Do not wait for anybody...
  • Learn and do it yourself...

5
How to achieve Accessibility in software
applications
  • Segregate model from modality.
  • Data semantics must be seperated from the way
    they are perceived.

6
Importance of interaction through speech
  • Natural way of interaction.
  • This form of interaction is more familiar to most
    of the people.
  • Poeple sufferning from varios forms of motor
    disability would find it convenient to control
    their desktop applications through speech
    interactions.

7
Speech recognition in general...
  • Automatic speech recognition is still an unsolved
    problem.
  • More research is required to utilize its full
    potential.
  • But mature enough to be used enough constrained
    condition.
  • Good recognition rates for small vocabulary.

8
How to achieve Accessibility in software
applications
  • Segregate model from modality.
  • Data semantics must be seperated from the way
    they are perceived.

9
Speech Interaction problems...
  • Tremendous range of variability.
  • Not very deterministic.
  • Natural interaction still not possible.
  • We do not know what to speak, in order to make
    application do something.
  • Though recognition accuracy of the speech engines
    are improving there is still a long way to go.

10
...Speech Interaction problems
  • Use of speech recogntion in open domain is still
    not practical.
  • Ethenic groups speak differently.
  • Non-availability of a speaker independent
    acoustic model for all languages.
  • Localisation becomes a difficult task.

11
Speech as Input Method?...
  • Can we use human speech to control software
    applications?
  • What are the difficulties in achieving such a
    goal?
  • How is it different from conventional input
    method interaction.
  • What software frameworks are required to achieve
    this.

12
...Speech as Input Method?
  • How much of source code modification required on
    application/ environment part.

13
Factors affecting speech as input method...
  • Speech engine capabilities.
  • Quality of trained acoustic model used.
  • Processing memory constraints of the existing
    system.
  • Sound card quality etc.
  • Ease of integration with existing application or
    software environment.

14
...Factors affecting speech as input method...
  • Presence/absence of a Large ascent neutral
    acoustic model.
  • Phrases to speak may acoustically too close.
  • Too many phrases to increase the chance of
    acoustic closeness.
  • Save, Save As... Save All
  • Open File... Open Location...
  • New Tab New Window

15
..Factors affecting speech as input method
  • Criticality of application.
  • How to enter free form text in editable widgets
    such as textbox or textarea.

16
Some words with phonetic similarity
  • dine mine kind mind ...
  • tea t pea ...
  • device advice nice ...

17
Strategy...
  • Speak what is written on the menus and standard
    tool bar.
  • Speak according to visible cues.
  • Take cues from visible user interface (widget) to
    speak.
  • Ask application to give a vocabulary list to
    speak all the times.

18
...Strategy
  • Provide this list as different modalities.
  • Load what is relevant.
  • Resolve acoustic closeness of context phrases
    with some preprocessing.
  • Provide alternative mechanism to enter continuous
    text. Such mechanism itself should be controlled
    through speech.

19
How to achieve this...
  • Integrate speech API's with application source
    code.
  • Use generic accessibility framework to create
    AT's integrated with speech code.

20
Components...
  • Speech recognition engine (e.g. Sphinx-4)
  • Assitive Technology (AT)
  • Generic Accessibility Framework (e.g. ATK, GAIL,
    AT-SPI etc.)
  • Application

21
Speech recognition engine...
  • Means and API to perform various forms of speech
    recognition task.
  • Mechanisms to train acoustic vocabularies.
  • Generally difficult to train by a naive user.
  • Generally available in two modes
  • Batch mode
  • Live mode

22
Sphinx-4
  • ASR engine written entirely in Java.
  • Developed and maintained at CMU
  • API's which allow developers to integrate speech
    recognition with other applications.
  • Specialised acoustic training framework known as
    Sphinx train.

23
...Sphinx-4
  • Allows us to use JSGF.
  • JSGF is textual representation of grammars for
    use in speech recognition.
  • Grammars determine what the recognizer should
    listen for.
  • A rule grammar specifies the types of utterances
    a user might say and associates a context with
    it.
  • no nein nao non nem Negative

24
Speech API's with application source code...
  • Very crude method...
  • Not killer...greedy...
  • But most efficient...
  • Blocking I/O could be a problem...

25
Generic accessibility framework to create AT's
integrated with speech code.
26
What are AT's
  • Most commonly known as Assistive Technology.
  • Sometimes referred as accessibility aids.
  • They provide alternate I/O modalit(y/ies), for
    software applications, to people with different
    to abilities.

27
...What are AT's
  • Some common examples of AT's
  • Screen magnifiers
  • Screen readers
  • On-screen keyboards
  • Predictive Text Entry Systems
  • All information required by the AT's are provided
    by the running applications or their widgets.

28
...What are AT's
  • These ATs act as clients to these widgets.
  • AT's can request state information.
  • AT's can generate events on these widgets.
  • Running applications notify AT's of any state
    changes in them through registry mechanisms.

29
...What are AT's
  • Two programming strategies.
  • Firstly, an application can provide it's own
    mechanism to make itself accessible.
  • Not a standard way. Application specific.
  • Secondly, a standard mechanism to make
    applications accessible.
  • Generic accessibility framework.
  • Can be easily standardised for applications
    implemented using coherent frameworks.

30
Generic Accessibility framwork...
  • Toolkit independent.
  • Mechanisms to expose widget states and events.
  • Mechanisms to register callback listeners for
    events on widgets.
  • Mechanism to modify widget states and generate
    events on them.

31
(No Transcript)
32
Accessibility Toolkit (ATK)...
  • Describes a set of interfaces.
  • Toolkit independent implementation can be written
    for any widget set. (e.g GTK Motif and Qt.)
  • Widgets implement these interfaces to make
    themselves accessible in ways defined by these
    interfaces.
  • ATK allows us to describe roles and states for
    individual widgets.

33
...Accessibility Toolkit (ATK)
  • ATK allows us to describe roles and states for
    individual widgets.
  • roles are a kind of string enumeration which
    decribe what role a particular widget plays in an
    application.
  • states are a kind of string enumeration which
    describe the in-process current state of the
    widget.

34
GNOME accessibility...
  • GNOME applications/widgets achieve accessibility
    by implementing ATK.
  • GTK implementation of these interfaces are
    available in a module called GAIL.
  • AT's access and modify these accesibility
    informations using a toolkit independent SPI
    called AT-SPI.
  • Information to AT-SPI are made available through
    relevant bridge.

35
AT-SPI...
  • Allows to poke/dig the entire widget hierarchy of
    the applications running on the GNOME desktop and
    desktop itself.
  • Provides primitives to perform actions on the
    widgets.
  • Allows to register for accessibility related
    events and perform actions on these events
    through event listeners.

36
Application...
  • Not all widgets are useful for invoking commands
    and controlling applications through speech.
  • Some GUI components are only for beautification
    or placement purposes.
  • Some GUI components just act as containers for
    other GUI components.
  • These could be ignored when widget hierarchy is
    accessed for information.

37
...Application
  • Some widgets resuire us to enter free form text.
  • How to enter free form text in editable widgets
    such as textbox or textarea.

38
(No Transcript)
39
Major Components
  • AT Process
  • AT Manager
  • Speech Recognition Thread
  • Feedback.

40
AT Manager...
  • Initialize AT-SPI library to access GNOME
    desktop.
  • Initialize speech thread which recognizes speech
    invocations.
  • Register callbacks for events on particular
    widgets.
  • Listen to speech events.

41
...AT Manager
  • Generate events on GNOME desktop or GNOME desktop
    application based on speech events.
  • Create and modify context grammar, which
    specifies speech vocabularies, using Accessible
    interfaces, based on new context.
  • Update Feedback module, of current context and
    grammar.

42
Speech Recognition Thread...
  • Listen and recognise speech invocations based on
    a context grammar.
  • Load new speech grammar based on changed context.
  • Notify Listeners of the speech event.

43
Feedback
  • Information of in-context vocabulary to user.
  • Feedback can be provided as plain text or through
    other modalities like TTS.
  • IT makes users aware of what is to said at an
    instant to certain action.

44
Dialog Management...
  • Dialogs break the hierarchy and become children
    of Application itself.
  • Dialogs can be handled through callbacks
    registered for invoking widget.
  • This callback will reset the context of speech
    vocabulary to application context.
  • The context is not transferred to the dialog
    context directly owing to possibility presence of
    multiple child dialogs.

45
Widgets with state EDITABLE
  • Use Predictive text entry system to enter free
    form texts
  • Control such Predictive interface through speech.

46
(No Transcript)
47
(No Transcript)
48
Talk is cheap. Show me the code. Torvalds, Linus
(2000-08-25)
49
  • Thank You
  • nav007_at_gmail.com, naveenk_at_cdacmumbai.in
Write a Comment
User Comments (0)
About PowerShow.com