Description of problem: Unable to remove host from rhevm GUI and got "Failed to remove host xxxxxxxx" message in the event log. Version-Release number of selected component (if applicable): rhevm 3.4 rhev-hypervisor6-6.5-20140603.2.el6 How reproducible: some time Steps to Reproduce: 1. Put the host in maintenance mode to migrate the VMs to other host in that cluster 2. Then try to remove the host from rhevm gui once it will be in maintenance mode 3. Out of 10 times, 7-8 times it will fail with "Failed to remove host xxxxxxxx" message in the event log. In the engine logs following messages will be available: ====== 2014-06-23 18:25:51,683 INFO [org.ovirt.engine.core.bll.RemoveVdsCommand] (ajp-/127.0.0.1:8702-11) [48550cd7] Lock Acquired to object EngineLock [exclusiveLocks= key: 3bb08ce7-bd75-48 97-b371-97493bf4490b value: VDS , sharedLocks= ] 2014-06-23 18:25:51,844 INFO [org.ovirt.engine.core.bll.RemoveVdsCommand] (pool-4-thread-6) [48550cd7] Running command: RemoveVdsCommand internal: false. Entities affected : ID: 3bb0 8ce7-bd75-4897-b371-97493bf4490b Type: VDS 2014-06-23 18:25:52,614 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (pool-4-thread-6) [48550cd7] transaction rolled back 2014-06-23 18:25:52,614 ERROR [org.ovirt.engine.core.bll.RemoveVdsCommand] (pool-4-thread-6) [48550cd7] Command org.ovirt.engine.core.bll.RemoveVdsCommand throw exception: org.springfr amework.dao.DataIntegrityViolationException: CallableStatementCallback; SQL [{call deletevdsstatic(?)}]; ERROR: update or delete on table "vds_static" violates foreign key constraint " vds_static_vm_dynamic_m" on table "vm_dynamic" Detail: Key (vds_id)=(3bb08ce7-bd75-4897-b371-97493bf4490b) is still referenced from table "vm_dynamic". Where: SQL statement "DELETE FROM vds_static WHERE vds_id = $1 " PL/pgSQL function "deletevdsstatic" line 11 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: update or delete on table "vds_static" violates foreign key constraint "vds_static_vm_dynamic_m" on table "vm_dynamic" Detail: Key (vds_id)=(3bb08ce7-bd75-4897-b371-97493bf4490b) is still referenced from table "vm_dynamic". Where: SQL statement "DELETE FROM vds_static WHERE vds_id = $1 " PL/pgSQL function "deletevdsstatic" line 11 at SQL statement ========= Actual results: Unable to remove the host. Expected results: Should be able to remove the host as its in maintenance mode and no VMs running on it.
Present workaround: 1. Check the engine.log files for any relevant message like: ------ 2014-06-23 18:25:51,683 INFO [org.ovirt.engine.core.bll.RemoveVdsCommand] (ajp-/127.0.0.1:8702-11) [48550cd7] Lock Acquired to object EngineLock [exclusiveLocks= key: 3bb08ce7-bd75-48 97-b371-97493bf4490b value: VDS , sharedLocks= ] 2014-06-23 18:25:51,844 INFO [org.ovirt.engine.core.bll.RemoveVdsCommand] (pool-4-thread-6) [48550cd7] Running command: RemoveVdsCommand internal: false. Entities affected : ID: 3bb0 8ce7-bd75-4897-b371-97493bf4490b Type: VDS 2014-06-23 18:25:52,614 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (pool-4-thread-6) [48550cd7] transaction rolled back 2014-06-23 18:25:52,614 ERROR [org.ovirt.engine.core.bll.RemoveVdsCommand] (pool-4-thread-6) [48550cd7] Command org.ovirt.engine.core.bll.RemoveVdsCommand throw exception: org.springfr amework.dao.DataIntegrityViolationException: CallableStatementCallback; SQL [{call deletevdsstatic(?)}]; ERROR: update or delete on table "vds_static" violates foreign key constraint " vds_static_vm_dynamic_m" on table "vm_dynamic" Detail: Key (vds_id)=(3bb08ce7-bd75-4897-b371-97493bf4490b) is still referenced from table "vm_dynamic". Where: SQL statement "DELETE FROM vds_static WHERE vds_id = $1 " PL/pgSQL function "deletevdsstatic" line 11 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: update or delete on table "vds_static" violates foreign key constraint "vds_static_vm_dynamic_m" on table "vm_dynamic" Detail: Key (vds_id)=(3bb08ce7-bd75-4897-b371-97493bf4490b) is still referenced from table "vm_dynamic". Where: SQL statement "DELETE FROM vds_static WHERE vds_id = $1 " PL/pgSQL function "deletevdsstatic" line 11 at SQL statement ------- 2. Run the below command to findout the VM those are trying to migrate to this host: select vm_guid,status,run_on_vds,migrating_to_vds from vm_dynamic where migrating_to_vds = '3bb08ce7-bd75-4897-b371-97493bf4490b' or run_on_vds = '3bb08ce7-bd75-4897-b371-97493bf4490b'; 3. Then run the below command to alter the vm_dynamic table: UPDATE vm_dynamic SET migrating_to_vds = NULL where migrating_to_vds = '3bb08ce7-bd75-4897-b371-97493bf4490b' or run_on_vds = '3bb08ce7-bd75-4897-b371-97493bf4490b'; After the above two steps, able to delete the VM successfully.
Not sure in which way we should fix it but the following may be few of ther: - If any such tasks are there, it should not allow the host to put into maintenance mode and should through the name/id of the VMs those are trying to migrate. - Or directly set the migrating_to_vds to NULL in vm_dynamic if the VMs are running properly on other hosts or with some other status than "Migration to".
For me it seems that the canDoAction validation should check if there are some VMs trying to migrate to this host and block the operation if any such VM is found The user/admin can safely put the host on Maintenance when migration completes and remove it
(In reply to Eli Mesika from comment #3) > For me it seems that the canDoAction validation should check if there are > some VMs trying to migrate to this host and block the operation if any such > VM is found > > The user/admin can safely put the host on Maintenance when migration > completes and remove it Makes sense to me. Roy - does that makes sense?
Moving it to virt after discussing it with Arik, as it handles VM migration flows. Make sure to have an infra reviewer on the resulting patch.
what's the status of the VMs which point to the host that is in maintenance in their migrating_to_vds field? do we have engine.log?
I managed to reproduce this bug. The solution for [1] fix this one as well. Note that the fix was backported to 3.3 and 3.4. The patch which is attached to this bug will solve another problem which is related to this flow, where the migrating_to_vds field was not cleared after successful migration. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1097256
issue is fixed, keeping it open for the nice-to-have patch
http://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=commit;h=1117758ff8069311b2083485a6d6c98fc92edcd7
reopen as the fix which was mentioned in comment 11 doesn't solve all the flows that can cause this bug from happening.
Hi Arik, Thanks for the update. Let me know if you need more info for the analysis. I will be happy to collect that from the customer end. Thanks, Uday
http://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=commit;h=575df518a963a66cb43599a31f1c981156f6f34f
Tested on vt4. Couldn't reproduce the problem since on the 2nd trial of put host in maintenence migration failed on bug 1145636
Verified on vt8. Tried 7 times the "Steps to Reproduce" from bug description.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0158.html
*** Bug 1188854 has been marked as a duplicate of this bug. ***