Title: Automatic Generation of Speech Interface for GUI ToolsApplications using Accessibility Framework
1Automatic Generation of Speech Interface for GUI
Tools/Applications using Accessibility Framework
- Naveen Kumar, M Sasikumar
- CDAC Mumbai
2Overview
- Importance of interaction through speech.
- Problems in making interaction through speech
possible. - Accessibility API's (ATK, AT-SPI, GAIL etc.).
- Speech Recognition Engines(Sphinx-4).
- Architecture and approach for Speech enabling
linux desktop.
3Jargons I may use...
- Widget Any GUI component (e.g button, menu iterm
etc.) - AT Assistive Technology
- ATK Accessibility Toolkit
- Acoustic model mathematical description of the
trained spoken phrases, which are used for speech
recognition. - Speech Corpus Acoustically recorded data that
can be used to create acoustic model.
4First Principal
- Do not wait for anybody...
- Learn and do it yourself...
5How to achieve Accessibility in software
applications
- Segregate model from modality.
- Data semantics must be seperated from the way
they are perceived.
6Importance of interaction through speech
- Natural way of interaction.
- This form of interaction is more familiar to most
of the people. - Poeple sufferning from varios forms of motor
disability would find it convenient to control
their desktop applications through speech
interactions.
7Speech recognition in general...
- Automatic speech recognition is still an unsolved
problem. - More research is required to utilize its full
potential. - But mature enough to be used enough constrained
condition. - Good recognition rates for small vocabulary.
8How to achieve Accessibility in software
applications
- Segregate model from modality.
- Data semantics must be seperated from the way
they are perceived.
9Speech Interaction problems...
- Tremendous range of variability.
- Not very deterministic.
- Natural interaction still not possible.
- We do not know what to speak, in order to make
application do something. - Though recognition accuracy of the speech engines
are improving there is still a long way to go.
10...Speech Interaction problems
- Use of speech recogntion in open domain is still
not practical. - Ethenic groups speak differently.
- Non-availability of a speaker independent
acoustic model for all languages. - Localisation becomes a difficult task.
11Speech as Input Method?...
- Can we use human speech to control software
applications? - What are the difficulties in achieving such a
goal? - How is it different from conventional input
method interaction. - What software frameworks are required to achieve
this.
12...Speech as Input Method?
- How much of source code modification required on
application/ environment part.
13Factors affecting speech as input method...
- Speech engine capabilities.
- Quality of trained acoustic model used.
- Processing memory constraints of the existing
system. - Sound card quality etc.
- Ease of integration with existing application or
software environment.
14...Factors affecting speech as input method...
- Presence/absence of a Large ascent neutral
acoustic model. - Phrases to speak may acoustically too close.
- Too many phrases to increase the chance of
acoustic closeness. - Save, Save As... Save All
- Open File... Open Location...
- New Tab New Window
15..Factors affecting speech as input method
- Criticality of application.
- How to enter free form text in editable widgets
such as textbox or textarea.
16Some words with phonetic similarity
- dine mine kind mind ...
- tea t pea ...
- device advice nice ...
17Strategy...
- Speak what is written on the menus and standard
tool bar. - Speak according to visible cues.
- Take cues from visible user interface (widget) to
speak. - Ask application to give a vocabulary list to
speak all the times.
18...Strategy
- Provide this list as different modalities.
- Load what is relevant.
- Resolve acoustic closeness of context phrases
with some preprocessing. - Provide alternative mechanism to enter continuous
text. Such mechanism itself should be controlled
through speech.
19How to achieve this...
- Integrate speech API's with application source
code. - Use generic accessibility framework to create
AT's integrated with speech code.
20Components...
- Speech recognition engine (e.g. Sphinx-4)
- Assitive Technology (AT)
- Generic Accessibility Framework (e.g. ATK, GAIL,
AT-SPI etc.) - Application
21Speech recognition engine...
- Means and API to perform various forms of speech
recognition task. - Mechanisms to train acoustic vocabularies.
- Generally difficult to train by a naive user.
- Generally available in two modes
- Batch mode
- Live mode
22Sphinx-4
- ASR engine written entirely in Java.
- Developed and maintained at CMU
- API's which allow developers to integrate speech
recognition with other applications. - Specialised acoustic training framework known as
Sphinx train.
23...Sphinx-4
- Allows us to use JSGF.
- JSGF is textual representation of grammars for
use in speech recognition. - Grammars determine what the recognizer should
listen for. - A rule grammar specifies the types of utterances
a user might say and associates a context with
it. - no nein nao non nem Negative
24Speech API's with application source code...
- Very crude method...
- Not killer...greedy...
- But most efficient...
- Blocking I/O could be a problem...
25Generic accessibility framework to create AT's
integrated with speech code.
26What are AT's
- Most commonly known as Assistive Technology.
- Sometimes referred as accessibility aids.
- They provide alternate I/O modalit(y/ies), for
software applications, to people with different
to abilities.
27...What are AT's
- Some common examples of AT's
- Screen magnifiers
- Screen readers
- On-screen keyboards
- Predictive Text Entry Systems
- All information required by the AT's are provided
by the running applications or their widgets.
28...What are AT's
- These ATs act as clients to these widgets.
- AT's can request state information.
- AT's can generate events on these widgets.
- Running applications notify AT's of any state
changes in them through registry mechanisms.
29...What are AT's
- Two programming strategies.
- Firstly, an application can provide it's own
mechanism to make itself accessible. - Not a standard way. Application specific.
- Secondly, a standard mechanism to make
applications accessible. - Generic accessibility framework.
- Can be easily standardised for applications
implemented using coherent frameworks.
30Generic Accessibility framwork...
- Toolkit independent.
- Mechanisms to expose widget states and events.
- Mechanisms to register callback listeners for
events on widgets. - Mechanism to modify widget states and generate
events on them.
31(No Transcript)
32Accessibility Toolkit (ATK)...
- Describes a set of interfaces.
- Toolkit independent implementation can be written
for any widget set. (e.g GTK Motif and Qt.) - Widgets implement these interfaces to make
themselves accessible in ways defined by these
interfaces. - ATK allows us to describe roles and states for
individual widgets.
33...Accessibility Toolkit (ATK)
- ATK allows us to describe roles and states for
individual widgets. - roles are a kind of string enumeration which
decribe what role a particular widget plays in an
application. - states are a kind of string enumeration which
describe the in-process current state of the
widget.
34GNOME accessibility...
- GNOME applications/widgets achieve accessibility
by implementing ATK. - GTK implementation of these interfaces are
available in a module called GAIL. - AT's access and modify these accesibility
informations using a toolkit independent SPI
called AT-SPI. - Information to AT-SPI are made available through
relevant bridge.
35AT-SPI...
- Allows to poke/dig the entire widget hierarchy of
the applications running on the GNOME desktop and
desktop itself. - Provides primitives to perform actions on the
widgets. - Allows to register for accessibility related
events and perform actions on these events
through event listeners.
36Application...
- Not all widgets are useful for invoking commands
and controlling applications through speech. - Some GUI components are only for beautification
or placement purposes. - Some GUI components just act as containers for
other GUI components. - These could be ignored when widget hierarchy is
accessed for information.
37...Application
- Some widgets resuire us to enter free form text.
- How to enter free form text in editable widgets
such as textbox or textarea.
38(No Transcript)
39Major Components
- AT Process
- AT Manager
- Speech Recognition Thread
- Feedback.
40AT Manager...
- Initialize AT-SPI library to access GNOME
desktop. - Initialize speech thread which recognizes speech
invocations. - Register callbacks for events on particular
widgets. - Listen to speech events.
41...AT Manager
- Generate events on GNOME desktop or GNOME desktop
application based on speech events. - Create and modify context grammar, which
specifies speech vocabularies, using Accessible
interfaces, based on new context. - Update Feedback module, of current context and
grammar.
42Speech Recognition Thread...
- Listen and recognise speech invocations based on
a context grammar. - Load new speech grammar based on changed context.
- Notify Listeners of the speech event.
43Feedback
- Information of in-context vocabulary to user.
- Feedback can be provided as plain text or through
other modalities like TTS. - IT makes users aware of what is to said at an
instant to certain action.
44Dialog Management...
- Dialogs break the hierarchy and become children
of Application itself. - Dialogs can be handled through callbacks
registered for invoking widget. - This callback will reset the context of speech
vocabulary to application context. - The context is not transferred to the dialog
context directly owing to possibility presence of
multiple child dialogs.
45Widgets with state EDITABLE
- Use Predictive text entry system to enter free
form texts - Control such Predictive interface through speech.
46(No Transcript)
47(No Transcript)
48Talk is cheap. Show me the code. Torvalds, Linus
(2000-08-25)
49- Thank You
- nav007_at_gmail.com, naveenk_at_cdacmumbai.in