The auto-transcription performance for songs or voice tracks (lyrics) is not good and need to be improved to be practically useful/effective, at least based on our testing, and in comparison with other auto-transcription online services.
Following multiple tests including with auto-transcription, separating music vs voice and using voice-only track for transcription, importing the song lyric (transcript), we found that the transcription timing accuracy error rate is so significant that it cannot function efficiently and would require too much manual transcription.
In contrast, other auto-transcription online services that we have been using can perform this without issue, almost getting the timing always accurate, even with both instrument + voice combined track. Typically the only minor corrections needed are only for the words/spelling/punctuation marks, but almost never for the timing. In Descript, the timing is always significantly off at various places, sometimes even impacting a big chunk of the lyric text. We have consulted Descript support and arrived at this conclusion.
We understand that this is not really the primary use-case for Descript as we were advised. However we are submitting this feature request as it could be helpful for other potential users as well, it would be good if Descript can improve the transcription performance for this use case. We were interested in exploring Descript especially with the innovative script/storyboard/scenes interface that seems nice, however unfortunately the lack of auto-transcription effectiveness is a significant problem.
Thank you.