[opensuse] Re: Speech-to-Text; Q's and VV backstory...



Donald D Henson wrote:
Donald D Henson wrote:
Does anyone know of an Open Source application to accept continuous speech and convert it to text? I've found a couple of proprietary apps but you have to use Voice mail as an input. Any suggestions appreciated.

Don Henson

A couple of weeks ago it was suggested that I try a product from Nuance call dragon Naturally Speaking. As this was a non-open source product, I had to pay for it. Bummer. However, my problem was serious enough that I decided to go with a product that I had to pay for. I also promised the list that I would post a review after using the product for a couple of weeks. Here's the review.
------
How is it on resources? I.e. memory and cpu? Last I used
drag-in-dict, it was ages ago -- before it allowed continuous speech
recognition -- but even then, after training, it still was pretty
slow.
I eventually migrated to IBM's ViaVoice Pro-USB 10.0.
unfortunately, IBM stopped offering the product and sold or gave a
resale license to Nuance, but as far as I know, the source didn't migrate
with the resale license and no new work has been done on it since it first
came out ~ 5 years ago. It was the first to offer continuous dictation --
Dragon was in financial woes at the time and it took nearly 2-3 years
before they recovered and had a continuous speech product.

Thing with VVPro, is that it is resource intensive. I'd
say starting with 1GB under XP is a minimum, and a 1GHz P-III Pentium was
too slow to be usable. 3GB and a 2GHz Core-Duo, was "ok", but it
grabs onto the system input mechanism and slows down all input/output --
even when it is "asleep" or its in the microphone 'off' state. Am on
a 3GB 3.2GHz machine now and if I'm dictating into word, it's pretty
good for recognition and speed.

For application integration, though, IBM only added full
integration for MS Office and IE. It can blindly type text into a
non-integrated application, but that can be painful. A nice feature,
which I consider 'essential', is that when you dictate into word or its
speakpad, it stores the voice sessions with the document. This allows
later re-editing in the case of word (clicking on a word, you hear your
voice) -- and when you correct words, it 're-learns' what the word
should have been based on what you said. So with the fully integrated
applications allow the speech recognizer to be trained at the same time
you are dictating -- so it will learn new vocabulary and learn your nuances
of changing pronunciation.

IBM released a development pack for linux, but nothing ever
happened with it, and it was too primitive to make use of in the
general case -- would have required specific apps to include and
call their API -- a benefit of the MS platform where most programs go
through common API's (though not Firefox nor T-bird). About 2-3 years
ago, IBM announced their latest voice technology -- requiring no training
-- but did not announce any products with it. The "product" they
were demoing for their announcement was a foreign speech translation
program -- and specifically, the plans were to sell the product
to the US armed forces for use in the field in Iraq, where it had already
been field tested with some success to allow soldiers to communicate
and understand basic phrases in the local language.

I tried to find out more info -- and when something might
be released for consumers (at the time was projected that something
might be available for consumers that summer (2006). I never
heard anything after that -- but have heard occasional stories that
the tech is still being used. Purely a guess, but maybe the
military thought it worked "too well", and bought up the entire
product for military/government use only. Maybe they didn't want
such easy-to-use translation technology in the hands of possible
enemies...or maybe they just wanted to keep civilians from being able
to easily access such translation technologies.

Obviously IBM continued their voice recognition and
synthesis development, but it seems they dropped consumer level offerings
off their map -- probably selling expensive custom business and government
systems was far more profitable than trying to sell and support end
users.

Anyway -- as computers have gotten faster, their original tech
is still pretty good. Required minimal training ~10-30 minutes.
Occasionally I still see the product for sale, but the price has not
gone down -- was best in class and retail was $200. They sold medical
and legal specific vocabularies for an additional ~$200 each. No
competition or 3rd party sellers ever came into the market to reduce
the prices. Trés sad.

Linda


--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse+help@xxxxxxxxxxxx



Relevant Pages

  • Re: Scripting languages
    ... The distinction should be made because "recognition" is far from ... If it works like some voice recognition ... phone service must be much greater than 50% hit or miss. ... Doctors and other medical professionals use speech recognition. ...
    (sci.electronics.design)
  • Re: Top Ten Features Youd Like to See on Future Macs
    ... "Voice recognition" is often taken as meaning the recognition ... "Speech recognition" is commonly used to mean understanding ... For example, I can train my Mac to recognize MY voice, but ...
    (comp.sys.mac.apps)
  • Re: Speech Recognition, just a toy for most of us
    ... the quick links to that of the human voice and not the singing voice or ... Our five best microphones for speech recognition would not be affected by ...
    (microsoft.public.windows.vista.general)
  • Re: Voice recognition for Simplified Chinese
    ... Speech Recognition Frequently Asked Questions ... > recognition for simplified Chinese is one the 3 Microsoft ... > Keyboard support for Chinese is enabled; however, voice ...
    (microsoft.public.office.misc)
  • Speech to text under Linux
    ... Does anybody know about any program that can do "Speech to Text" under ... I read that IBM had a program named Via Voice for Linux, ...
    (comp.os.linux.misc)

Loading