Provide additional control over how Overdub generates speech from the text (e.g., using SSML tags). 🤖
Frameworks
Please extend Overdub to support the easy-to-use Speech Synthesis Markup Language (SSML) tags for cases where users need additional control over how Overdub generates speech from the text. As not to detract from readability, SSML tag visibility should be controlled by way of a keyboard short-cut and/or menu option in Descript (as Pascal commented here).
TL;DR VERSION:
While I understand Overdub Styles are intended to provide Descript users some control over how their text is rendered via Overdub speech, its implementation is opaque, imprecise and cumbersome to apply and manage. Rather than reinventing the wheel, supporting standardized SSML tags in Overdub will enable Descript users easily to see and precisely to apply their desired speech prosody/intonation, tonality/pitch, pause duration, emphasis and phonetic pronunciation as well as have Overdub read out individual letters of a word or digits of a number as exemplified hereafter...SSML TAG EXAMPLES
- To insert a 3 second pause in the speech, simply insert the markup <break time="3s"/>in the corresponding text.
- To read the individual digits of a telephone number 555-123-4567, wrap the number with <say-as interpret-as="digits">555-123-4567</say-as>.
- To emphasize a word or phrase such as "truth to power", simply wrap the text in <prosody volume="x-loud">truth to power</prosody>.
- To lower/raise its pitch, wrap the text in <prosody pitch="low">truth to power</prosody>or<prosody pitch="high">truth to power</prosody>. Similarly, the prosody tagratechanges the speaking rate and prosody tagvolumeadjusts the speaking volume accordingly.
- To read out the individual letters of the word "consensus", wrap the word with <say-as interpret-as='spell-out'>consensus</say-as>.
For more details of SSML use, please see Amazon's SSML reference for Alexa here and Polly here, for Google's Assistant and, for Apple's Siri, et al, see here.
NOTES
- To help other readers understand how SSML tags could be employed in Descript's Overdub, Google has provided lots of excellent examples to test drive here: https://developers.google.com/assistant/conversational/df-asdk/ssml
- Hereafter, with SSML implemented, I would recommend you license the Descript services for general global use in Alexa, Polly, Siri, Cortana, etc so we can all text speech (using our own voices) to friends, family, ... the World (even after we're gone ;-).
- This request would address issues indicated by these 29 users: "Marquis Miller", "Alessandro", "Nick Ritter", "Stephen Massey", "daytona", "Adam Knee", "Mercedes Rothwell", "Roxana Stratila", "Nosson Weissman", "Steve Steve", "Mark Sobrepena", "Laura Baiardi", "Hargitai Henrik", "Umesh Kumar", "Mark Bramhill", "Chad Pennycuff", "Aemyn Connolly", "Matt Neputin [Mateusz Peplinski]", "Marek Basler", "Akshay Raj", "Jim McKeeth", "Harry Hawk", "Tau Lukos", "Kweli Kush", "Lee Schneider", "Kat Lind", "Podcast Advocate", "David Swaddle", "Pascal".
Please see the following 12 related feature requests:
- Expressions and Tonality (Overdub)
- Pitch Correction
- Emotional Prosody
- Support SSML export format
- Ability to use foreign languages
- Edit word gap duration from the script
- Control of Overdub pauses and emphasis
- Overdub: Add emphasis
- Add phonetic pronunciations support for Overdub
- '/' gets read as 'Divided by' in overdub?
- Provide a flexible way to increase the pauses between words and sentences
- SSML Input
C
Carl Bartlett
As others have mentioned it is critical to large projects. I need to be able to adjust audio, in text format, for large sections. Changing the speaker, add pauses(with length), the tempo, volume, add audio effects for sections of text. This allows editing in other applications where it can be done quickly. If I want to add a pause after every instance a specific word, It is simple to search replace in a text editor, but in this it is cumbersome and tedious!
M
Miss Mez
Full Disclosure: I've been using Descript Pro for a week now, so I'm still learning. But, this needs to be bumped, please. We need easier, global control for pauses and Convert to Audio functions.
Right now I am using overdub to voice my entire affirmation / meditation scripts. I trained my voice profile to speak slowly, not as slowly as I'd like, but (I think) that will just take time and more training.
The problem I'm facing is that the pauses between sentences is FAR too short for my genre.
It is a hassle to go thru and first change EVERY sentence, individually, with Convert to Audio. (Global Convert would be AWESOME) Then I have to go thru and gap between each sentence.
If we can't assign symbols, punctuations, or code to be certain pause lengths, then could we at least use the Shorten Word Gaps to actually INCREASE pauses globally? That would greatly decrease the amount of time we have to spend adjusting the pacing of our files.
Right now, it's frustrating to have to do this on my 5-minute, daily affirmation files... I can't even imagine how long it will take me to do a 1+ hour guided meditation!
Or am I just missing something? If I am, please let me know, because I'm nearly done with my 30 days of 5-min affirmations project... and the long meditation projects are next!
Samuel Eisenberg
Yes, this would make a huge impact. Right now Overdub is pretty monotonous.
I'm using Overdub for audiobooks and the output has plenty of room for improvement.
Mathnasium Online
Enabling tuning and adjusting of Overdub output (for example, to read individual letters and numeric digits, insert pauses, adjust tone and emphasis, etc) is essential for our use-case creating training materials for staff/students teaching/learning mathematics.
The requester’s proposal of using SSML to implement such easy, user-configurable tweaking of Overdub’s output would be truly awesome and, presently lacking this capability, Descript is forcing its customers to seek workarounds and alternate solutions, and leaves a big opening for competitors to walk through.
Based on its continuing upvotes and enthusiastic comments below, after ~two years~ sitting here as a feature request, it’s time to get this one at least “Under Review”!
N
Nicole Berryhill Phd
YES! This would be INCREDIBLY useful. We're currently in the process of using Overdub to make written coursework available to blind students in my voice. Manually adding gaps between sentences for a long script is extremely time consuming.
My suggestion (from a strictly UI standpoint) would be to expand the functionality of the "Shorten Word Gaps" tool, making it a "Manage Word Gaps" tool. Ideally, to include the option to "Insert [input area to define length of desired gap] Between all Sentences".
While Descript is indeed a "game changing" tool for all of its' current offerings, I (for one, with many equally interested colleagues) would most certainly continue my Pro Subscription for all eternity with this specific, automated flexibility. This feature request should really be moved to the top of the list. It would have an incredible impact on workflow for those looking to automate this necessary part of using Overdub on lengthy scripts.
If this function already exists in some other format that I'm/we're overlooking, please advise. Otherwise, please provide it in some automated way, perhaps as described above.
Thank you.
veritas et caritas
Please, something like this is needed as soon as possible.
Pro DJs
Now a days these features are a must. Robotic sounds are out and natural voices are in.
W
Waverly Edwards
I create content primarily with my overdub voice and I would love SSML capabilities added to Overdub.
I have unsuccessfully tried, multiple times to create a pronunciation for a simple word or homograph, based on what it sounds like, without success. An IPA pronunciation would truly help but SSML would be far more beneficial.
Please note, it is very time consuming to try a soundalike alternative, wait for the generation, only to find that it didn't work as planned and to try again.
Additionally, the pausing, emphasis and flexibility between words and sentences would make this service so much better.
Finally and because I create content primarily with my overdub voice, with much technical jargon, I am finding this extremely difficult. Utilizing SSML would fill a much needed gap :-).
Shane Ormsby
I am adding my strong support for this request. When using the stock voices, it is very tedious to add pauses at each full stop etc. Def need SSML!
Sharif I.
exactly, but to make it easier than this.
Load More
→