Karaoke-style export options, i.e. surround each word with timestamps rather than each line
I'd like to export transcription data that I can use in an interactive transcript. i.e. A transcript where each word is highlighted as it is spoken in the audio (in the same way that words are highlighted as they are spoken in the Descript app itself). The VTT spec allows for Karaoke style cues: 1 00:16.500 --> 00:18.500 When the moon <00:17.500>hits your eye 1 00:00:18.500 --> 00:00:20.500 Like a <b><00:19.000>big-a <00:19.500></b>pizza <00:20.000>pie 1 00:00:20.500 --> 00:00:21.500 That's <00:00:21.000>amore Another option would be to export a JSON file that contains an array of words with a start_time and end_time value for each. Ideally you'd be able to export to both these options, and have some control over how the exported data is formatted. For example, should the current word in a line inside a VTT file be bolded.
Change location of offline files
It would be great to specify the location where offline files are stored, so that they could perhaps live on a external USB drive with other project files and would not take up limited start-up disk space. Thanks!
integration with foot pedal
Are there any plans to integrate the software with transcription pedals? (to stop and start audio.)
Frequently used word list
A list of words (often proper nouns) on the right side of the transcript that you can add then quick-select when going through and editing text rather than re-typing the whole word. This is beneficial for longer proper nouns that are frequently misspelled by the AI.
Expressions and Tonality
I'm surprised this doesn't exist. We need to include having exclamations, capitalization, question marks, etc, have their own recorded inputs/outputs. This can change the sounding of entire sentences by default. For example, a separate sampler asking the voice originator to ask a question, say something with a specified emotion, or add volume. For things with various levels of emotions, I recommend a highlight-enabled tool with a color representing an emotion. That way when you highlight a specific word or series of words, the AI can present a particular emotion. The transition between emotions or lack there of could then be the only issue afterwards, but from the way everything seems to sound so smoothly so far, it might not be. If achieved, this can tap into the screenplay market/customers.
Variety of Display options
Right now the desktop version is very limited in how it can be displayed... not resizable window, can't left or right justify the transcript... These extra display features would be a vast improvement on the current version.
Filler Words - Remove from Transcript Only
I am using the "filler words" feature that was recently introduced: https://descript.canny.io/feature-requests/p/include-filler-words Is it possible to remove the filler words from a transcript but _not_ remove them from the audio/video? Ideally, I don't need the "um"s and "you know"s in my transcript or SRT file, but I don't necessarily want to remove them from the video, but the SRT file and the transcript timestamps are all off if I "ignore" or "delete" them. It would be great if the "ignore" option (or a third option) could remove them from the transcript without changing the video so that I don't have to replace an existing video and still have a usable transcript with timestamps and a usable SRT file. Hopefully there's a way to do that that I'm missing or it's an upcoming feature. Either way, thanks, it's still a great feature! Would be great if it could recognize stuttering too where a word or short phrase is repeated 2-3 times in a row...