Ability to deselect "slang" transcription
S
Support Team
Merged in a post:
Make it transcribe the ACTUAL spoken words.
K
Karen Schouest
Descript folks have a theory or defense that this issue is part of the AI's learning process and beyond it's control. But that is baloney. I've done a ton of research on how machine learning tools are created, and while they do "learn as they go" for many things, there is a final step where the developer places deliberate filters to control how the AI tool interprets and writes/translates certain words and phrases - e.g., in terms of punctuation.
If that were not the case, we would see wild inconsistencies in how things are punctuated as AI continues to gather input in the AI-verse and "learns" from it. But the fact that we DO see consistencies is proof that there is some deliberate programming involved as well.
I also believe that to be the case when it comes to Descript's interpretation of contractions and dates and numbers. It is crazy-making that it (1) mostly (but not 100%) ignores spoken zeros, (2) removes "of" when someone says "the 3rd of January," and does about a 50/50 split with whether it wants to use contractions or not. Sometimes we'll get "would've" when a toddler would have clearly heard would HAVE," and sometimes they'll say, e.g., "didn't," and we'll get "did not."
The "mostly" descriptor in the above paragraph is very important, because since it is not a 100% all-or-nothing behavior, I can't just add an instruction in my clean-up macro to always tweak date ordinal phrases to add an "of" or always change contractions or would-be contractions to X. I have to make these corrections one at a time as I'm proofing to audio. And with each correction, my frustration at this unnecessary behavior (and Descript's defense of it) is renewed.
Descript also wants to blame some of this on poor audio or listening errors, but the results are consistent enough and the audio is crystal-clear enough that I would be dollars to donuts it's ALL due to deliberate (and short-sighted) programming/coding choices governing how certain phrases are transcribed.
The fact that it continues to (consistently) transcribe slang words that are rarely a good thing (wanna, gonna, etc.) but makes these arbitrary decisions about other words is crazy-making.
I have scoured the internet and tried many other AI tools to compare the results. Yes, I do believe Descript is a clear winning in terms of overall accuracy, but interestingly, it is the ONLY AI tool that makes these crazy-stupid choices when it comes to the issues I mentioned above.
All the AI tools are doing the same thing the Descript (Rev.AI) AI engine is doing - learning from new content. And yet NONE of the other ones are "learning" to drop zeros, arbitrarily change contractions (in both directions), or drop the "of" in date ordinal phrases.
I wish the Descript folks would pass this on to the actual people doing the programming and not the non-tech folks who probably don't fully understand how it all works.
Rant off.
S
Support Team
Merged in a post:
Stop defaulting to transcribing gonna, wanna, cuz, and gotta
S
Shannon Wedge
I can hear my presenters saying got to, want to, because and going to, but Descript chooses gotta, wanna, cuz and gonna almost 100% of the time, which means searching for and replacing them in every transcript. Please make it stop that - does anyone really want their captions to say these slang terms anyway?
S
Support Team
Merged in a post:
Option to avoid colloquialisms
S
Sally Le Page
Right now, Descript is transcribing 'gotta', 'oughta', 'cuz', when I would prefer it to transcribe 'got to', 'ought to', and 'because'.
Would be nice to select that in a menu somewhere rather than doing it manually.
Canny AI
Merged in a post:
Include an option for a "literal" verbatim transcript.
K
Karen Schouest
We've had this discussion before - several times. I keep being told by Decsript (and other ASR software developers as well) that the software is always in "learning mode" and interpreting the best transcription choices. That results in many annoyances in a transcript, such as (spoken: January of 2020, transcribed : January 2020; spoken: January 25th, 2020, transcribed: January 25, 2020).
Today was the most egregious. Not only is it adding short phrases that weren't even spoken, kind of "finishing the sentence" of the speaker as people sometimes do in real life, but it's changing words altogether, which are clearly NOT a "mishear," but rather the ASR engine deciding which word is a "better" fit.
I'm working on a transcript that has a lot of dollar amount references. Sometimes they will say "dollars" (one hundred forty-eight thousand dollars), sometimes nothing (one hundred forty-eight thousand), and sometimes bucks (one hundred forty-eight thousand bucks).
When it was just a random mention by itself on a page, I would get the literal translation. But just now, I had a scenario where that amount was mentioned in every sentence for five sentences. The speakers said "dollars" four times in a row, and then the fifth time, they used "bucks." Descript changed all five instances to "bucks," even though you can hear "dollars" clear as day.
Descript's defense of this has been that maybe the audio is not clear. Nope, it's crystal clear in ALL of the above instances (including the date examples). A toddler or a dog could make it out and repeat it correctly. Their next defense has been that maybe Descript is not the best choice for a legal verbatim transcript, which is unique in the ASR industry and has its own special needs.
Surely, legal transcripts are not the only scenarios where people care that the transcript reflects the words that were actually spoken and not ASR putting words in the speakers' mouths. The end-user should have the choice when to change that, not the ASR engine. Maybe some users don't care, since it clearly is capable of a literal interpretation (e.g., gonna, wanna, etc.), why can't we have the option of a "smart translation" (I use the term "smart" loosely LOL), where we allow it to change words and a "literal translation" where it doesn't change anything.
Every transcript seems to be a robust mix of amazement at the stuff the ASR engine
did
get right that I would have expected otherwise, and the hair-raising, crazy-making annoyances of things that were changed that should have been left alone. So years after adopting and embracing ASR technology for transcription, I still have a love/hate relationship with it. :(::Rant off::
P.S. And to add another layer of annoyance to this rant, after it posted, I received a notification that my post has been merged into "filler words." Ugh! This has NOTHING to do with filler words! This is a separate and unrelated issue. ASR at work again, trying to decide what's "best" for us and missing the mark. :(
B
Brian Teeman
Glad I stumbled across this. I thought that this was just a buggy transcription and one of the reasons I cancelled my subscription. WTF doesn't the transcription use what I say not what it thinks I might say.
Ruth Antrich
My transcriptions always come back with the slang versions. There should be a dropdown option to formalize them all, similar to the filler words list.
R
Ross
Completely agree the following words are always translated into slang. Here's better replacements that should be there by default
cuz -> because
'em -> them
gimme -> give me
gonna -> going to
gotta -> got to
kinda -> kind of
outta -> out of
wanna -> want to
R
Ross
Completely agree the following words are always translated into slang. Here's better replacements that should be there by default
gonna -> going to
wanna -> want to
gimme -> give me
cuz -> because
'em -> them
gotta -> got to
outta -> out of
K
Karen Schouest
Yes! This!
James Lai
"gonna" instead of "going to"
"wanna" instead of "want to"
"gotta" instead of "got to"
"kinda" instead of "kind of"
"cuz" instead of "because"
Dear devs at Descript, I'm pretty sure you don't want your children to be learning and writing with these informal English. So why should you allow this to happen in your app?
Load More
→