Text-to-speech (TTS) technology is used for converting text information into audio format for voice-enabled services. First it transcribes text input (e.g., data from word processors, standard ASCII from emails, mobile text messages, scanned texts from newspapers, etc.) into phonetic representation (high-level synthesis), and then it generates speech waveforms from the phonetic and prosodic information (low-level synthesis).
Speech-to-speech voice conversion is a newer generation technology, related to those used for the latest text-to-speech systems. It can generate more dynamic and emotion-laden output - and so it provides a better approximation of the complexities inherent in human speech.
In fact, this is, in a nutshell, Respeecher explained. We use proprietary deep learning (artificial intelligence) techniques together with classical digital signal processing algorithms for the production of top quality synthetic voices. By doing this, we allow the creation and scaling of content for film and TV, video games, podcast and audiobook production, and more.
Our mission at Respeecher is to give film & TV creators the opportunity to innovate when it comes to using voices in creative ways. We aim to help filmmakers and TV producers gain more control over the audio part of their projects.
How, you ask? By resurrecting voices from the past, using voices of famous actors who can’t be present at the recording site, recording kids’ voices more easily even after they’ve grown up.
We do this by cloning human speech, which gives us the opportunity to swap voices without the robotic touch currently shared by many video games or auto navigation systems underpinned by text-to-speech technologies.
We are fully aware that voice cloning can be dangerous if used deceptively, i.e., fooling people into believing someone said something they didn't. Indeed, there are strong ethical concerns regarding the use of speech synthesis software for content creators. We are committed to a set of ethical principles, such as not using voices of private persons without permission, in order to protect subjects’ privacy.
This principle is substantiated by our requirement of written consent from voice owners. What we do allow are non-deceptive uses of the voices of historical figures - for example, the use of president Nixon’s voice for the ‘In Event of Moon Disaster’ project we created in collaboration with MIT. Additionally, in order to easily distinguish Respeecher-generated content from other content, we are developing a unique audio watermark.
Respeecher technology transcends the limits of text-to-speech solutions, and is thus better suited for more complex projects. It not only minimises the lack of emotions in voices, but it also creates voices that capture subtle nuances such as humming and giggling, and better deals with the pronunciation of unusual or foreign words.
We use state of the art technology to revolutionize movies and other creative projects, that is, we develop speech synthesis software for film & TV creators. However, in the wrong hands, voice conversion technology can be used maliciously. We do not allow any deceptive uses of our technology, nor do we use voices without permission when this could impact the privacy of the subject or their ability to make a living. This means we will never use the voice of a private person or an actor without permission.
If you’d like to find out more about our technology or if you have any questions, please reach out through our website. We’d love to hear from you. (Let me tell you a little secret, too. We’re going to launch the first ever voice marketplace soon. Subscribe to our newsletter to be among the first to test it.)
30.09.2020 | Bruno's blog