[opensuse] Re: Speech-to-Text; Q's and VV backstory...
- From: Linda Walsh <suse@xxxxxxxxx>
- Date: Wed, 21 May 2008 16:43:04 -0700
Donald D Henson wrote:
Donald D Henson wrote:------Does anyone know of an Open Source application to accept continuous speech and convert it to text? I've found a couple of proprietary apps but you have to use Voice mail as an input. Any suggestions appreciated.
Don Henson
A couple of weeks ago it was suggested that I try a product from Nuance call dragon Naturally Speaking. As this was a non-open source product, I had to pay for it. Bummer. However, my problem was serious enough that I decided to go with a product that I had to pay for. I also promised the list that I would post a review after using the product for a couple of weeks. Here's the review.
How is it on resources? I.e. memory and cpu? Last I used
drag-in-dict, it was ages ago -- before it allowed continuous speech
recognition -- but even then, after training, it still was pretty
slow.
I eventually migrated to IBM's ViaVoice Pro-USB 10.0.
unfortunately, IBM stopped offering the product and sold or gave a
resale license to Nuance, but as far as I know, the source didn't migrate
with the resale license and no new work has been done on it since it first
came out ~ 5 years ago. It was the first to offer continuous dictation --
Dragon was in financial woes at the time and it took nearly 2-3 years
before they recovered and had a continuous speech product.
Thing with VVPro, is that it is resource intensive. I'd
say starting with 1GB under XP is a minimum, and a 1GHz P-III Pentium was
too slow to be usable. 3GB and a 2GHz Core-Duo, was "ok", but it
grabs onto the system input mechanism and slows down all input/output --
even when it is "asleep" or its in the microphone 'off' state. Am on
a 3GB 3.2GHz machine now and if I'm dictating into word, it's pretty
good for recognition and speed.
For application integration, though, IBM only added full
integration for MS Office and IE. It can blindly type text into a
non-integrated application, but that can be painful. A nice feature,
which I consider 'essential', is that when you dictate into word or its
speakpad, it stores the voice sessions with the document. This allows
later re-editing in the case of word (clicking on a word, you hear your
voice) -- and when you correct words, it 're-learns' what the word
should have been based on what you said. So with the fully integrated
applications allow the speech recognizer to be trained at the same time
you are dictating -- so it will learn new vocabulary and learn your nuances
of changing pronunciation.
IBM released a development pack for linux, but nothing ever
happened with it, and it was too primitive to make use of in the
general case -- would have required specific apps to include and
call their API -- a benefit of the MS platform where most programs go
through common API's (though not Firefox nor T-bird). About 2-3 years
ago, IBM announced their latest voice technology -- requiring no training
-- but did not announce any products with it. The "product" they
were demoing for their announcement was a foreign speech translation
program -- and specifically, the plans were to sell the product
to the US armed forces for use in the field in Iraq, where it had already
been field tested with some success to allow soldiers to communicate
and understand basic phrases in the local language.
I tried to find out more info -- and when something might
be released for consumers (at the time was projected that something
might be available for consumers that summer (2006). I never
heard anything after that -- but have heard occasional stories that
the tech is still being used. Purely a guess, but maybe the
military thought it worked "too well", and bought up the entire
product for military/government use only. Maybe they didn't want
such easy-to-use translation technology in the hands of possible
enemies...or maybe they just wanted to keep civilians from being able
to easily access such translation technologies.
Obviously IBM continued their voice recognition and
synthesis development, but it seems they dropped consumer level offerings
off their map -- probably selling expensive custom business and government
systems was far more profitable than trying to sell and support end
users.
Anyway -- as computers have gotten faster, their original tech
is still pretty good. Required minimal training ~10-30 minutes.
Occasionally I still see the product for sale, but the price has not
gone down -- was best in class and retail was $200. They sold medical
and legal specific vocabularies for an additional ~$200 each. No
competition or 3rd party sellers ever came into the market to reduce
the prices. Trés sad.
Linda
--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse+help@xxxxxxxxxxxx
- Follow-Ups:
- Re: [opensuse] Re: Speech-to-Text; Q's and VV backstory...
- From: Donald D Henson
- Re: [opensuse] Re: Speech-to-Text; Q's and VV backstory...
- References:
- Re: [opensuse] Speech-to-Text - mini Review
- From: Donald D Henson
- Re: [opensuse] Speech-to-Text - mini Review
- Prev by Date: Re: [opensuse] cups help needed
- Next by Date: [opensuse] features 11.0
- Previous by thread: Re: [opensuse] Speech-to-Text - mini Review
- Next by thread: Re: [opensuse] Re: Speech-to-Text; Q's and VV backstory...
- Index(es):
Relevant Pages
|
Loading