Miscellaneous index page

Technology for converting text books to MP3 books

It is obviously tedious to read books into a microphone in order to generate MP3 books ready for people to download.

Fortunately modern technology can comes to our aid. A computer can have a feature set vocabulary of spoken words associated with a dictionary of words and can thus convert a text file into speech by joining the word sounds end to and which appropriate gaps and pauses. There is only a need to record each word once even though it may be used many hundreds of times in a complete book.

The process sounds simple and it is possible to produce stilted speech without too much difficulty. This is not good enough however as a reader when speaking will form their speech into nicely sounding sentences so that the start and end of the phrases and sentence come as expected. This requires some quite clever analysis of the original text. It helps if the original text is well written with good spelling, grammar and punctuation!.

The more sophisticated text to speech systems allow for alternative feature set voices, so you if you prefer a female speaker with a Scottish accent you can choose that, alternatively you can have an American male speaker with strong southern accent. Add a bit of computer language translation and a simple low bit rate text feed in the Queen's most perfect and correctly spelled English can be heard by everyone on the planet in their own lingo.

Since the same script-coding technique can be applied to video image transmission this will make is possible to distribute whole audio-visual TV programmes using very low bit rates like 64 kbit/s composite. Prior to the programme transmission global feature set keys will be sent defining the male/female/Scottish accent for the audio plus the body dimensions, clothing and environment description etc for the video.

In case you don't get the message this means new script coding TV system of the future with ultra low bit rate and image and sound quality defined by the viewers equipment and budget ) using very low data dates by using script coding which may then be expanded at the customer end to generate sounds and vision suited to the consumer.

Script coding is the equivalent of sending out the script of a play to many theatres - the recipient may have cheap hand held TV and will see and hear sound and small image, a the wealthy recipient will have expensive hardware and see a large screen high definition video with beautiful people and hi-fi sound, all to suit their personal desires. Talk about a complete shock for Logie Baird. Imagine millions of people watching worldwide a single transmission of some rubbish soap programme at 64kbit/s but with every viewer seeing different images and sound and in qualities and with feature sets to suit their local fantasies. Logie really should have used a photocopier and just sent the text script to everyone instead.

Anyway to get back to some immediate reality here is a useful link:

TextAloud™ reads text from email, web pages etc aloud to your PC loudspeakers. The text to speech software may be downloaded from https://www.textaloud.com/ Try the free demo before you buy.

► Page created 24 Aug 2005, amended 10 Oct 2020