An Algorithm That Converts Text To Speech "With Emotion And Feeling"

15.ai describes itself as a “high-fidelity, natural and emotive text-to-speech synthesis system with as little data as possible.” It can be tested by typing any short text (or clicking the Random icon); and playing the three different versions that are generated with a certain degree of confidence. AI technologies are developing by leaps and bounds. Next we will tell you a little about 15.ai; also, about Woord, an API that uses the same technology but in a simpler way; so that you can generate the dialogs you want.

15.ai AKA DeepThroat: An Algorithm That Converts Text To Speech “With Emotion And Feeling”

The most interesting thing about 15.ai (also known as DeepThroat) is how he does it. Its “learning” process starts by hearing what the voice of various characters looks like from a few minutes of sample and imitating them in the results. This way, can easily pass himself off as different characters; as HAL9000, as the Portal robot or as Doctor Who himself with his characteristic British accent.

Among the algorithms that 15.ai uses are separating words into phonemes and “feeling detection” with DeepMoji. In fact, I found it especially nice that from a few sentences he extract the most appropriate emoticons; with sad, angry, or funny faces, with the thumbs up or the biceps flexed for «strength!» and things like that.

This is way above Voicemaster-type gizmos and is said by its creators to be far superior to other text-to-speech systems when it comes to emotion and sentiment. Personally, I found the pauses and rhythm interesting; it is certainly noticeable that depending on the content of the text, the result is “interpreted” according to what is being conveyed. Better than some actors and actresses it does.

What Are Text-To-Speech Generators?

Text-to-speech works on almost all personal digital devices, including computers, smartphones, and tablets. It is something that is in our daily life in technologies such as, for example, the Google translator. All types of text files can be transcribed and read aloud with this type of technology, including Word documents, PDF files, and web pages. Some TTS software tools also have a technology called optical character recognition (OCR). This technology allows the AI to read the text present in the images.

The voice is computer generated and the reading speed can usually be increased or decreased. Also, you can usually choose the gender of the AI’s voice. Voice quality varies, but there are TTS with very realistic sounding voices. Many text-to-speech tools allow words to be highlighted when they are read aloud by the AI.

Woord: A Free AI Voice Generator

Woord is our top pick because it’s a feature-rich program with a simple and easy interface that transcribes and reads text of virtually any format: pdf, txt, doc(x), pages, odt, ppt(x), ods, no DRM . epub, jpg and png in an MP3 file. It has a version that is completely free and can be downloaded on both desktop computers and cell phones, as well as used online. Something interesting about this program is that it optimizes the MP3 audio file it converts for the device you want to play it on, you just have to “tell” it what type of device it will be.

Woord‘s AI allows you to generate realistic voices in more than 20 languages. In addition, it has different dialects in different languages and a variety of genders. The free version offers high-quality voices, an extension for Google Chrome, and free download of audio in MP3 format. Likewise, all versions of this Saas have an SSML editor. On the other hand, Woord‘s Premium versions allow you to modify the text to add pauses, sounds, and tones more precisely. We recommend you to explore the paid versions of Woord if you are going to use it for business purposes, as they are low-cost and grant you 100% ownership of intellectual property for all files.

If you want to know more about Woord…

How To Use Woord’s SSML Editor

How To Adjust Your Audios With SSML Editor

Also published on Medium.

An Algorithm That Converts Text To Speech “With Emotion And Feeling”