I saw the team over at Riverside announce this feature recently. Here's how they describe it on their Magic Editor page (https://riverside.fm/magic-editor), "AI Speaker View analyzes the recording and automatically switches whoever is speaking into full screen 1 second before they speak, making for a seamless transition. By analyzing the recording, AI Speaker View switches the video if someone actually speaks, and not for unwanted interruption like a sneeze. We are the first in the world to automate this."
The Riverside feature only works on unedited video recorded straight into their system but it would really be better utilized AFTER the multitrack recording has been edited. That's where Descript could shine with a feature like this available to implement after cleaning up the original recordings and just before export.