After a discussion, it seems the abortion of snapshot job in VDSM doesn't make much sense. The abort is mostly about hitting timeout while calling libvirt (or doing preparations to that call), an operation that should be very fast; while we mostly hit timing issues calling freeze/thaw operations. In those cases we can't really abort, and if we already passed them we probably should let the snapshot operation to finish. Therefore, the current decision is to remove the abort mechanism.
In further discussion we saw we have 2 main flows: 1. Snapshot with memory - In such case it makes sense to have a timeout, failing the operation and releasing the VM. Even so, we need to think about the timeout (currently 30 minutes by default and configurable in engine-config). Also, to consider timeout per snapshot. 2. Snapshot without memory - In this case we usually desire that the snapshot will be completed. In this case we may consider to drop the timeout.
This bugzilla is included in oVirt 4.4.9 release, published on October 20th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.9 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.