When people record themselves locally and then send the recordings to an editor, it can be onerous to align the individual tracks. I imagine Descript could to it automatically either by identifying signals that are present on all tracks or by using a reference track, for example a Zoom recording.