Previously, sending rapid multiple requests to update affinity groups simultaneously would cause conflicts resulting in a failure. The conflict would occur because the affinity group was being removed and recreated during the update process. The current release fixes this issue by allowing each update on an affinity group to be initiated with a specific operation. Therefore, the affinity group is no longer removed and recreated during the update.
Created attachment 1773045[details]
engine log
Description of problem:
If you run multiple API calls to update an affinity group VMs in parallel you will get an error 400 from the API on some of them and will see SQL errors on the engine.log
Adding a part of the log to the bug, but here is the error:
2021-04-07 06:47:39,614-05 ERROR [org.ovirt.engine.core.bll.scheduling.commands.EditAffinityGroupCommand] (default task-8259) [9a49cc7e-2d51-4321-895a-fb6f2b11f853] Command 'org.ovirt.engine.core.bll.scheduling.commands.EditAffinityGroupCommand' failed: CallableStatementCallback; SQL [{call updateaffinitygroupwithmembers(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}ERROR: duplicate key value violates unique constraint "affinity_group_pk"
Detail: Key (id)=(87b1f509-af8e-4e14-9ea5-5a99f6080686) already exists.
Where: SQL statement "INSERT INTO affinity_groups (
2021-04-07 06:47:39,627-05 ERROR [org.ovirt.engine.core.bll.scheduling.commands.EditAffinityGroupCommand] (default task-8259) [9a49cc7e-2d51-4321-895a-fb6f2b11f853] Transaction rolled-back for command 'org.ovirt.engine.core.bll.scheduling.commands.EditAffinityGroupCommand'.
2021-04-07 06:47:39,635-05 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-8259) [] Operation Failed: [Internal Engine Error]
The reason is probably due to the fact that the update just replaces the Affinity group (CREATE OR REPLACE FUNCTION) see https://github.com/oVirt/ovirt-engine/blob/master/packaging/dbscripts/affinity_groups_sp.sql#L208
There should be a lock to that operation.
The many to many relationship between VMs and affinity rules is not modeled correctly, that's true.
However, clients that makes arguabley incorrect use of affinity rules can serialize the calls to add VMs on their end to avoid this.
Setting severity accordingly and let's aim to model it correctly at the database layer rather than introducing more locks in oVirt.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (RHV Manager (ovirt-engine) [ovirt-4.4.8]), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2021:3460