Text to speech bleg:

Do readers have recommendations for good software for text to speech conversion? I would like to take some of the articles I have written, for which I still of course have the electronic files, and make them available in MP3 format. The project only seems worth doing if the resulting MP3 file is high quality. So what software should I investigate?

Bobo Linq (mail):
Perhaps you should investigate outsourcing this to a human in India. Machine reading is not so great.
3.2.2009 5:09pm
Colin (mail):
Soronel Haetir commented on the Kindle thread below that TTS can be better than human narration, for some reasons that hadn't occurred to me. As a regular user of such services, he'd be the person to answer this question.

For myself, I had fun with Bell Labs' TTS system. They used to have a free demo online, although it was only good for short snippets. Assuming you mean your academic articles, and that your intended use is noncommercial, perhaps they'd be willing to run some of your articles through their system for you?
3.2.2009 5:48pm
Jon Roland (mail) (www):
I recommend you read them yourself. You have a good voice and might as well use it. You might even find some changes you want to make as you read aloud and find that the sentences don't flow well when spoken.

As for software, I have tried some that work reasonably well for ordinary writings, but might not be satisfactory for the kinds of legal writing you do. The various packages mostly offer free trials, so download some and try them for yourself:

1. Natural Reader

2. ReadPlease
3. 2nd Speech Center
4. CoolSpeach and TextSound
5. Sagirus Reader
6. TextAloud
7. Wizzard
3.2.2009 5:56pm
FWB (mail):

A few years ago I purchased a really nice digital recorder (MBR-32) from The Spy Store in Seattle.

It came with a pretty reasonable S/W package that would read files. You could then use Audacity (DLable ware) along with Lame to record and save the files to MP3 format.

Point the SW to any text file and the SW reads it, even pausing for periods, something many packages do not do.

3.2.2009 5:56pm
Daryl Herbert (www):
Get a microphone.

If anyone wants to listen to your articles you (or someone else) should narrate them.

There are already text-to-speech programs available with Windows and Macs (as a means of helping the handicapped). That technology will only improve over time, beating out the MP3s you want to record.

Just make sure that your articles are available in a text format (.txt, or .pdf with rendered text rather than scans of a printed page)

Or is there some reason that you want to distribute the sound but not the text?
3.2.2009 5:59pm
Soronel Haetir (mail):
I'm not sure about software to do the actual conversion, the software I use doesn't save the sound files. The voices however are the important part.

I have been very happy withRealSpeak Emily and Daniel. Voice packages (at least for Windows) are also inexpensive. The TTS software that I use is not inexpensive (over $1k) but I would expect what you are looking for to be much less expensive.

There are also free packages, festival being one that comes immediately to mind. The voice quality however is much lower, and afaik is unable to use the voice packages available with Windows.

One thing to note about the RealSpeak voices, there are certain miscues (words with -che get pronounced as in panache rather than Apache as an example)Other natural sounding voices have their own sets of miscues, that's just one that sticks out with my choice. . More mechanical voices tend not to have that problem. Also, all TTS that I'm aware of have problems distinguishing different sunding words with the same spelling "minute quantity" vs "wait a minute". Those things just drop out of notice after awhile, but would likely confuse people who just wanted to access the paper.

Also note that if your paper is distributed as PDF, it may already work with Acrobat Reader's Read-Out-Loud feature which can already take advantage of any voices the user happens to have installed. Read-out-loud can however be disabled by the publisher, much as with the Kindle discussion. It also doesn't work with PDFs consisting of scanned images of pages (a format I also detest).
3.2.2009 6:04pm
Sean O'Hara (mail) (www):
I definitely recommend recording it yourself -- text-to-speech is okay if its the only way to generate audio, but human-read is superior. All you need is a $20 USB mic (you don't want one with a standard audio jack), and the open source program Audacity. Generally 10,000 words of text will come out to one hour of audio, with a 4:1 ratio of how long it takes to record and edit to the length of the final product.

Librivox, which is a group that creates public domain audiobooks, has detailed information on how to record and edit effectively.
3.2.2009 6:39pm
Jon Roland (mail) (www):
One thing that you might want to consider is how a TTS tool is going to handle footnotes. Most of them will see the exponent and pronounce it, but that may or may not be desirable for your purposes. If you read it yourself you can skip over them for a smooth reading.
3.2.2009 8:45pm
Soronel Haetir (mail):
Worse than the in-line foot note markers are the foot notes themselves. It is extremely difficult keeping the various audio streams sorted out in-head. Especially foot notes that continue from one page to the next. It would almost be preferable in such cases if the foot-note were actually read in-line with the marker rather than following page layout. Either that or save all foot notes until the end of document. It is extremely disconcerting having unrelated text break apart a page-spanning sentence.
3.2.2009 9:40pm
Either hire someone to read them, or read them yourself. Even the best text-to-speech that I've heard still is not particularly pleasant to listen to.

Just get a microphone (many low cost options out there) and use Audacity (open source, Mac or PC) for recording. It's very nice, and pretty easy to use. Makes great MP3s and many podcasts are produced with it.
3.2.2009 10:12pm
Jon Roland (mail) (www):
Soronel Haetir:

save all foot notes until the end of document.

That is one of several reasons why we convert footnotes to endnotes in our online Liberty Library. I recommend that to all authors of scholarly writings.
3.2.2009 10:47pm
Dave, read and record the articles yourself. The inflection and (I daresay) passion will be vastly better, and thus the verbal product more interesting to your intended audience.

I know many people who enjoy narrated "books on tape" (or nowadays, iPod) during long drives or flights. Don't know any who enjoy machine-generated text-to-speech, which pretty much anybody can have on their PC or netbook computer with very little trouble.
3.2.2009 11:16pm
Grover Gardner (mail):
Could you use your computer's built-in speech recognition capability along with an audio-capture utility?
3.2.2009 11:16pm
Soronel Haetir (mail):

I am one of those people who prefer machine generated speech. The downside is that because many publishers wish to sell audio books as well the computer formatted ebooks are locked against TTS. (This is true for nearly every PDF and MS Reader e-book I've come across). Like I said in one of the Kindle threads however, I feel no ethical qualms about breaking such encryption, or downloading text files of books I already own in order to be able to access them.


Speech recognition is even more problematic than optical character recognition. Also I'm not sure what you would hope to acheive via the method you outline, especially when the document is already in an electronic format.
3.2.2009 11:24pm
Chimaxx (mail):
I'd suggest just using the speech synthesizer built into your Mac. It can produce speech out of almost any text-based file, and if you slow it down a tiny bit from its default speed, the new "Alex" voice produces results that are quite good.

Audio Hijack Pro can capture the resulting speech directly to an MP3 file like this one.

[Hear this exchange.]
3.3.2009 3:30am
Soronel Haetir (mail):

As I've said on a couple of these threads, the speed problem is on the too slow end. As I grew used to using TTS after my vision degraded to where TTS is required for my access to computers I continually bumped up the speed. I now have my software set to 100% on speed and wouldn't mind if it could go still faster.

This is not to say that it's impossible to set it too fast, acrobat's read out loud uses it's own speed setting and is possible to set to where entire words drop out of the generated speech.
3.3.2009 9:23am
_Jon (mail) (www):
Chimaxx - great job converting and hosting that. But I think it provides motivation in the other direction. If that is 'good', I wouldn't want to listen to that for a few hours and try to distinguish the legal terms.
3.3.2009 10:03am
Chimaxx (mail):

I guess the question is: Are you making the recording for those with sight problems who are used to TTS, or for the occasional listener, who is listening while doing something else--commuting, driving, cooking?
3.3.2009 11:50am
Clayton E. Cramer (mail) (www):
Machine voice translation almost makes an East Indian accent seem preferable!

You should definitely pay someone to record these, with a good voice, who is familiar with your work, and the issues in question. :-)
3.3.2009 1:30pm
Eric Brown (mail):
I used to work on a project that did text-to-speech conversion, and, unless the technology has significantly improved in the last couple of years, machine generated speech is only tolerable in small doses (e.g., for voice prompts).

Most text-to-speech engines do not have a very good sense of vocal rhythm (aka prosody). This means that for any significant amount of text (more than a paragraph or so), the computer voice is very flat and monotone, and when I listen to it, I quickly lose track of what the computer is saying.

I emphatically recommend that for any sort of document, that you hire a grad student or the like to read the document out loud in front of a microphone.

You'll be glad you did.
3.4.2009 1:01pm

Post as: [Register] [Log In]

Remember info?

If you have a comment about spelling, typos, or format errors, please e-mail the poster directly rather than posting a comment.

Comment Policy: We reserve the right to edit or delete comments, and in extreme cases to ban commenters, at our discretion. Comments must be relevant and civil (and, especially, free of name-calling). We think of comment threads like dinner parties at our homes. If you make the party unpleasant for us or for others, we'd rather you went elsewhere. We're happy to see a wide range of viewpoints, but we want all of them to be expressed as politely as possible.

We realize that such a comment policy can never be evenly enforced, because we can't possibly monitor every comment equally well. Hundreds of comments are posted every day here, and we don't read them all. Those we read, we read with different degrees of attention, and in different moods. We try to be fair, but we make no promises.

And remember, it's a big Internet. If you think we were mistaken in removing your post (or, in extreme cases, in removing you) -- or if you prefer a more free-for-all approach -- there are surely plenty of ways you can still get your views out.