Title: Computer Speech/Voice Recognition
1Computer Speech/Voice Recognition - IBM ViaVoice
-
April 2005 IBM PC Club
Bernhard Krevet, IBC, Napa
2Overview
- Definitions
- Categories of speech recognition software
- Products Dragon NaturallySpeaking, ViaVoice
- ViaVoice by IBM for Windows Mac
- System Requirements 2003
- System Requirements 1994 1997
- Installation experience
- Resources (Web) Comments
- Using Speech Recognition
- Demo
3Speech Recognition (1/3)
- ... refers to the process by which a person
dictates a phrase that the computer translates
into typed text. The dictated words can be
interpreted as a command or stored as the words
in a document.
4Speech Recognition (2/3)
- What It Does
- Transform spoken words into written text or
commands - Recognize context (e.g. differentiate homonyms)
- Learn from you
- Use personal voice model
- Extensible vocabularies
- Support many languages
5Speech Recognition (3/3)
- What It Does Not
- Accept more than one person talking at the same
time - "Understand"
- Think / Create ideas
- Organize
- Replace a secretary
6- Categories of speech recognition software (1/2)
- Continuous speech, which means speaking words
without pauses in between. It's not quite
"natural," but it's close. - Natural user interface, with things like natural
language commands. Instead of using a set of
specified commands, you would say what you want,
and the computer would take the appropriate
action. The programs available today aren't fully
"natural" yet, especially since usually they let
you be "natural" only in certain applications. - Short training period , this means the software
makers are looking for what is known as "speaker
independence." The hope is that someday you'll be
able to sit down at a strange computer and tell
it what to do, or to record somebody and then
have the computer do the transcribing. - Discrete speech dictation (pause between words)
7- Categories of speech recognition software (2/2)
- Programs geared toward specific tasks
- Speech-enabled PC apps recognize commands
- VoicePilot,
- EasyVoice,
- ASR (Automatic Speech Recognition)
- Platforms
- Windows
- Macintosh
- Unix
- OS/2 (comes with IBM's discrete speech engine)
8Popular Speech Recognition Products
Talk To Me !
9Dragon NaturallySpeaking 7 is the most accurate
and full-featured Dragon NaturallySpeaking ever
released! Accuracy up to 99! 15 Accuracy
Improvement. Breakthroughs in speech engine
technology deliver the largest single accuracy
improvement ever for a Dragon NaturallySpeaking
release. PC Magazine - May 2003
"ScanSoft's Dragon NaturallySpeaking Preferred
7 makes dictation, correction, and voice control
of your PC faster and easier than any voice
recognition software yet." "...the new
auto-punctuation option worked admirably at
adding commas and period to our dictations it
should be ideal for casual dictation such as
e-mail or online chat."
10ViaVoice Characteristics
IBM ViaVoice technology, available on the
Windows, Macintosh and handheld computer
platforms, can afford a 'multi-modal'
environment, freeing users from dependence on the
mouse, keyboard and stylus for many
applications. ViaVoice personal computer
software leverages generations of IBM voice
recognition research and accomplishment. ViaVoice
for Windows Release 10 product family offers a
complete portfolio appealing to every level of
user expertise, and our ViaVoice for Mac
offerings were the first continuous speech
products on the Apple Macintosh platforms in the
consumer marketplace.
11- Windows products
- Pro USB Edition Flagship edition, featuring a
digitally-enhanced stereo headset microphone. - Advanced Edition Productivity tool with new
command and control features. - Standard Edition Great dictation accuracy for
the home/home office. - Personal Edition Introduction to natural,
continuous speech recognition on the PC - Macintosh products
-
- ViaVoice for Mac OS X Edition with the sleek
"Aqua" look and feel - Simply Dictation for Mac OS X Introduction to
dictation on the Mac
12ViaVoice Pro USB Windows
- System Requirements (2003)
- gt 300 MHz processor, gt 128MB RAM
- 500MB available hard drive space
- Sound card with microphone jack, USB
- CD ROM drive (for installation)
- Windows 98SE, Me or XP
- MS Office for direct input or
- Any word processor with access to the clipboard
(copy/paste) - MSRP 200 incl. Headset (100 upgrade)
13- Components / Prerequisites -1996-
- Hardware
- Pentium/100MHz processor, 24MB RAM
- Any sound card
- Software
- IBM's OS/2 WARP 4.0 (189) which included
- OS/2 Speech Recognition SW
- Headset Microphone with ANC
- IBM's "Simply Speaking" for Windows95 (600)
- Any word processor or editor with access to the
clipboard (copy/paste)
14- Components / Prerequisites -1994 -
- Hardware
- 486/33MHz processor, 65MB disk space
- IBM VoiceType Dictation adapter ISA, MC, PCMCIA
- Unidirectional microphone
- Powered speakers or headphones
- Software
- IBM VoiceType Dictation Program Product
- Any word processor or editor with access to the
clipboard (copy/paste) - Price 1000.00 for VTD HW and SW
15ViaVoice Setup
- Installation
- SW installation
- Registration of each user / language
- Training
- HumanAbout 90min reading predefined texts
- ComputerAbout 30 min processing of personal
language voice model
16- ViaVoice Installation Experience January 2005
- Installation of two languages, must be same
version (USB Pro) - 560MB hard drive space
- Headset on phone jacks or USB
- Check audio levels and record sample texts to
build voice models - First dictation many errors, need to improve
voice model - Check your voice, drink water initially tedious
correction process - Web-advice use special SpeakPad (not Word) with
open correction window - Learn how to use the Correction Window
- File (save) sessions to give program a chance to
improve the model - Some idiosyncrasies e.g OPEN-QUOTE
CLOSE-QUOTE - Analyze existing documents to add specific words
to vocabulary(only supports .doc .txt, not
even IBM Lotus own WordPro .lpw) - Manage vocabulary - OK
17Voice Recognition Sites Most Popular
(Yahoo) Lernout Hauspie - provider of speech
and language products, technologies, and
services, including speech recognition, text to
speech, compression, and translation. Dragon
NaturallySpeaking - family of software products
that turn speech into text. Nuance Communications
- provides enterprise-level speech recognition
and speaker verification software to automate
v-commerce and communications transactions. Genera
l Magic - voice infrastructure software company
that provides enterprise-class software and
supporting voice dialog design and hosting
services. SpeechWorks International - provider of
speech recognition, text-to-speech (TTS), and
speaker verification for network and embedded
environments. Philips Speech Processing - large
vocabulary continuous speech recognition products
for PCs. Also Digital Dictation devices and
solutions for the medical and legal
area. Sensory, Inc. - low-cost integrated circuit
providing speech recognition, speech synthesis,
music synthesis and 8-bit micro controller. IBM
Voice Systems - offering the ViaVoice line of
speech recognition software. Fonix Corporation -
voice recognition technology featuring automatic
speech recognition (ASR) using neural network
(artificial intelligence) techniques. Conversá -
develops speech-enabled software and hardware
that allows users a conversational way of
interacting with their computers.
18http//www.out-loud.com/
This site is intended to help people using speech
recognition software, whatever the variety, and
to do so without the filters of vendors. We have
our own filters, of course, so please read
critically. By Susan Fulton, longtime user of
speech recognition and assorted gadgets for
easier, less painful computing.
19http//www.voicerecognition.net/
- List established in January 1996 for discussing
all aspects of using voice recognition input
systems. The focus is on effective use of voice
recognition. - Sample topics
- Using such systems safely, without muscle or
voice strain - Techniques for improving recognition accuracy
- How to set up the physical voice workstation
optimally - General tips for effective use of voice
interfaces - Configuration of specific systems,
troubleshooting, etc.
20Voice Recognition Dead End? You've heard of
killer apps? How about an app killer? This is
what voice recognition has become over the years,
because for the most part, it doesn't work. I see
no evidence that it ever will, at least not in
the sense that we can achieve true voice
dictation capability. ... John C. Dvorak,
January 25, 2002
21Using Speech Recognition
- The Verdict Is in Speech Recognition Software
Can Increase Productivity in the Legal
Environment - The Use of Speech Technologies in Mobile
Computing - Creating Access for People with Disabilities
through Speech and Language Technologies - Voice Recognition Software Versus a Traditional
Transcription Service for Physician Charting in
the ED - From RSI to ROI Speech Recognition Can Increase
Productivity and Reduce Repetitive Stress
Injuries in Your Workplace - Automating the Electronic Forms Process
- Leading Electronic Medical Records Program
Incorporates Dragon NaturallySpeaking Speech
Recognition - Dragon NaturallySpeaking Speech Recognition
Benefits Not Only Doctors Who Use it, but
Patients Who Rely on Them
22(No Transcript)
23CONVERSATION Q. What is one divided by zero. A.
Error in line above malformed? Q. What is one
divided by zero? A. Error in line above
attempting division by zero. Q. What is one
divided by one. A. Error in line above
malformed? Q. What is one divided by one? A.
One. Problem trivial for my capacity. Please
consult supervisor before typing in further
instructions. Q. Why? A. Question not
quantified or answerable by "yes" or "no." This
is fourth malformed or erroneous question typed
in during your time interval. Consult supervisor
about formation of instructions. Cost
effectiveness ratio for this machine prohibits
further use by this operator until recycling code
is typed in. Sorry. Q. Why? A. KEYBOARD NOW
BLOCKED.
24ViaVoice Demonstration
Talk To Me !