Audio track replacement with alignment. | Voters

Audio track replacement with alignment.

Don MacFarlane

Currently Descript only allows sentence by sentence replacement of text by dub. However, there are many instances with instructional content where an entire audio track needs to be replaced. For example, an SME records a demo with audio (and background noise) and professional voice is desired. There is also a potential application in localization. For reference Cubase ($$$) offer something similar based on waveform and splitting to avoid distortion. Descript's text-anchored tech seems perfect for this approach. It would be a very attractive feature for enterprise organizations.

June 12, 2021

Colleen Robinson

It would also be very useful for privacy concerns. We work with people who don't mind if their opinions are on video, but would prefer not to be recognized. If we could convert the text of their voice into an AI voice that would be VERY helpful.

Gabe Michalski

marked this post as

open

We did initially ship this feature, but after many updates to AI Speakers and Overdub - we seeing problems with Unlimited Overdub replacement.

We have since gone back to a 250 character limit to ensure stability. Text to Speech is still unlimited however.

Sjoerd de Vries

I also wondered why this feature is not possible as it seems obvious to me and even easier then blending in a cloned voice into the original recording. That feature is also great, but sometimes you want to replace it with another voice.

My idea is that a SME/trainer creates an explanatory video (product demo, screen recording etc) where he/she is not recorded. I add the video in descript for transcribing so it also adds the timing correctly with the original video.

Then I make changes to the script and want to export the updated transcript as SRT and the updated audio as an MP4. I want to import the original recording and the updated audio and recording into Camtasia, so I can check the timing of the updated voice over and remove the original recording. In this process we did not have captured the voice from the SME/Trainer and replace it with another voice.

The same would be applicable if the you have cloned the voice from the SME who records it, but the audio quality of the original recording is bad.

What we do now is to

transcribe the recording in Word.
Update the script and do a spelling/grammar check in word.
Copy past the script in Descript and add some basic timings like 2 sec. pause after each sentence.
I select a cloned voice and produce the audio and SRT.
I import the SRT, the dubbed audio file and the original recording in Camtasia.
I remove or hide the original audio on the timeline.
I cut in the audio on the timeline to allign it with the video and usually also make the video itself shorter.

It is just more work to cut the audio and CC and move it around in the vide as usually the Descript audio is shorter. It does work, but why can't this be simplified as following:

Import the original video in Descript
Descript transcribe it
Update the script and replace the original sound with a cloned voice
Export the new audio with the same timing as the original and the SRT file.

Dan Reyes-Cairo

marked this post as

shipped

Dan Reyes-Cairo

Hi all - seems like there's possibly a couple of requests going on here with this thread, one of which we've implemented a solution for, and one of which we have not:
For users interested in replacing audio for non-talking-head video (maybe a voiceover on top of a slide deck, or b-roll):
We've implemented the Replace Script Track feature which allows you to pre-produce content using a scratch audio track, or Overdub voice / stock voice. You can then build your edit around this scratch audio, adding music, b-roll, intro/outros, transitions, visual effects, or any other elements to the pinned track. Once your edit is complete, you can re-record a new polished version of your finalized speech track and use replace script track to replace the scratch version. More details here: https://help.descript.com/hc/en-us/articles/6567410329357-Replace-Script-Track-
For users interested in a mechanism to essentially "overlay" an Overdub custom or stock voice on top of a pre-recorded audio, replacing the original (typically users are looking to match the precise timing, intonation, expression as the source) - this is still an unplanned feature request that we do not have a solution for.
For any who are interested in the former solution, feel free to read through and take advantage of the new feature.
I may end up closing this thread as Complete assuming we've hit most of the high points on the original request, then creating a separate request for the Overdub replacement idea that's a little more specific. Before then, feel free to provide any feedback in the coming days.

Emily Richardson-Lorente

Dan Reyes-Cairo: I wanted to pop in here because I'm not sure if we're talking about the same thing. Using the paste special --> replace script track feature has been useful for retaining the timing of SFX/nat/music when I replace a single section of voiceover. But it's far less useful when I need to replace the entire AI-generated script track (which we use for our roughcuts) with my host's voice track. In that case - unless I've misunderstood how to do it - I need to copy and paste each individual sentence/paragraph from my host's track/composition, then select the relevant sentence/paragraph in the roughcut composition, then select paste special --> replace script track. Over and over and over again. It's really clunky. Yesterday, it literally took me 6 hours to replace the voice track in a 34 minute episode that was less than 40% host. It would be incredible if I could use the "replace script track" feature to replace ALL of the AI-generated voice track with my host's voice with a single copy and paste.

Ido

This feature would be immensely useful. I have multiple SME screen recordings that need to be edited, cleaned, and replaced with a professional voice.

Ritesh

I have an instructor who speaks with an heavy accent. I would like to replace the entire voice with a overdubbed version with synchronization with the original video and audio.

Rod Bergen

It would also be useful in podcasts where one of the persons being interviewed has a strong accent.

Don MacFarlane

After a deep-dive into ADR (Automatic Dialogue Replacement) I can say that Descript's text-centric core technology would be perfect for the replacement of spoken audio track on instructional content. The alternatives such as Adobe Audition and more professional tools from Cubase are centered around waveform alignment and creates significant distortion on longer pieces and even for shorter pieces the alignment is off as any two speakers will vary their cadence at different points. Descript snaps to words rather than audio waveform.

The potential market for this type of solution is immense. It is not just the replacement of a speaker's voice but redoing the audio with a better recording tools or in a different accent and could pave the way for easier localization.