The transcription quality and speaker detection quality are generally good, but the transcripts often show a change in speakers a little before or a little after the actual change. So brief snippets are misattributed at these transition points. This leads to a lot of manual correction.