Control of Overdub pauses and emphasis
J
Jim McKeeth
Often Overdub just pauses too long in weird places in the sentence or puts the emphasis on the wrong word. There needs to be a better way to manage this. It is easy to add pauses with commas, but not shortening them. I realize the pauses can be edited by converting the Overdub to audio, but that is rather tedious (#5 on here: https://help.descript.com/hc/en-us/articles/360046925171-Overdub-Guide-to-Voice-Quality) but I've not found a good way to indicate a word receives emphasis.
J
Jared Young
Other competitors like WellSaid Labs let you edit pauses with hyphens and periods. I don’t understand why Descript doesn’t and why this has now been a feature request for over 3 years with no answer. I also need to make longer pauses, sometimes 5-7 second long pauses
Mathnasium Online
As described in a very similar feature request linked here (one that also proposes a convenient SSML tag-based solution), enabling tuning and adjusting of Overdub’s output (for example, to read individual letters and numeric digits, insert pauses, adjust tone and emphasis, etc) is essential for our use-case creating training materials for staff/students teaching/learning mathematics.
That requester’s proposal of using SSML tags to implement such easy, user-configurable tweaking of Overdub’s output would be truly awesome and, presently lacking this capability, Descript is forcing its customers to seek workarounds and alternate solutions, and leaves a big opening for competitors to walk through.
Based on both requests continuing upvotes (90+ combined) and enthusiastic comments, after ~two years~ as a common feature request, it’s time to get these at least “Under Review”!
S
Sjoerd de Vries
Pauses are very important! This actually makes it really sound more robotic, especially if you replace all audio with stock audio or a cloned voice.
Though we have some excellent cloned voices, the computer sound is still heard because of the unnatural pauses that are always the same. When this is bit more unpredictable and have different emphasis on words, it would be harder to tell that it is computer sound, and it would be more intelligent from an AI perspective.
Anyway, this is application already the best I have seen.
N
Nicole Berryhill Phd
YES! This would be INCREDIBLY useful. We're currently in the process of using Overdub to make written coursework available to blind students in my voice. Manually adding gaps between sentences for a long script is extremely time consuming. Also, correcting inflection and intonation via SSML markup would be an amazing addition.
My suggestion for the gap issue (from a strictly UI standpoint) would be to expand the functionality of the "Shorten Word Gaps" tool, making it a "Manage Word Gaps" tool. Ideally, to include the option to "Insert [input area to define length of desired gap] Between all Sentences".
While Descript is indeed a "game changing" tool for all of its' current offerings, I (for one, with many equally interested colleagues) would most certainly continue my Pro Subscription for all eternity with this specific, automated flexibility. This feature request should really be moved to the top of the list. It would have an incredible impact on workflow for those looking to automate this necessary part of using Overdub on lengthy scripts.
If this function already exists in some other format that I'm/we're overlooking, please advise. Otherwise, please provide it in some automated way, perhaps as described above. We also NEED SSML, badly.
Thank you.
Hoby Van Hoose
Wellsaid uses quotes for "emphasizing phrases" and markdown uses asterisks for
italics
and bolding
. Either or both seem good options to me, for telling Descript to add emphasis within.I currently have the reverse problem, where pauses from punctuation aren't long enough. Specifying editable defaults for each kind of punctuation ( , . : ; - — ) would solve everyone's issues.
M
Matty Dalrymple
Absolutely! The most recent Overdub update has improved the situation but pauses still require tedious manual adjustment to sound natural. As a stopgap measure, it would also be helpful to have some guidelines about the effect of using punctuation to control pause length (e.g., how long a pause does a comma create? It doesn't seem consistent).
J
Joe Miller
Agree! Overdub is now doing a good job sounding like me but the phrasing, pacing and inflection still make my overdub tests sound robotic. Its better but still not ready for me to use.
Having more control would be very helpful but with that said I don't want to get trapped into adding a lot of controls to typed text otherwise I may end up spending more time adding controls than just recording directly. Perhaps a better answer would be at the source generation. Could the AI code use the transcribed text punctuation and other contextual attributes to render a more natural overdubs? I would rather spend more time on the source transcript if that's what is needed to fill in the gaps to what the AI code is capable of doing.
I hope that makes sense..
B
Bebo Habebo
This is a critical function many await, especially the emphasis
Frameworks
Hear, hear! Concerning this and your other request entitled "Add phonetic pronunciations support for Overdub", I've up-voted them both and, as described hereafter, I am recommending the use of Speech Synthesis Markup Language (SSML) tags to provide the control you're seeking over how Overdub generates speech from the text.
For example, to specify a pause of fixed duration (e.g., 2 seconds), you would simply insert the markup
<break time="2s"/>
at the desire pause point within the corresponding text.Similarly, as described here and here, you can specify International Phonetic Alphabet (IPA) pronunciations using the SSML
<phoneme alphabet=...>
tag as in: You say, <phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</phoneme>.
I say, <phoneme alphabet="ipa"ph="ˈpi.kæn">pecan</phoneme>.
For more details on possible SSML tag use within Overdub, please see
(and up vote :)
my request entitled "Provide additional control over how Overdub generates speech from the text (e.g., using SSML tags)" here: https://feedback.descript.com/feature-requests/p/provide-additional-control-over-how-overdub-generates-speech-from-the-text-eg-usSSML USE-CASES
- Amazon's Polly: https://docs.aws.amazon.com/polly/latest/dg/supportedtags.html
- Google's Assistant: https://cloud.google.com/text-to-speech/docs/ssml
- Siri, et al: https://www.smashingmagazine.com/2019/03/sanity-portabletext-speech-synthesis/
BTW
: Are you the "Jim McKeeth" of Embarcadero|Delphi fame?J
Jim McKeeth
Frameworks: That is me. I voted up your SSML request too.
S
Sjoerd de Vries
Frameworks: I would love it if SSML is implemented. This will give a lot of flexibility/
Harry Hawk
It would be great:
* to have markup code for time, pitch or emotion
It already should infer from a "?" or "!" so why not infer "more" from "??" or "!!!"
Sometimes the underlying ground truth is faulty so it's generating faulty text. Solution:
Highlight the text passage, "have an option to "retrain" and it allows to record / attach audio of the original speaker reading that exact passage?
Also if there is a missing word, how can we add "and" without having to edit the video? Esp. w/ slides, it just just hold the prior image.
Load More
→