Bug 1112359 - Failed to remove host xxxxxxxx
Summary: Failed to remove host xxxxxxxx
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.4.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 3.5.0
Assignee: Arik
QA Contact: Ilanit Stein
URL:
Whiteboard: virt
: 1188854 (view as bug list)
Depends On: 1145636
Blocks: 1131569 1131856 rhev3.5beta 1156165
TreeView+ depends on / blocked
 
Reported: 2014-06-23 17:15 UTC by Udayendu Sekhar Kar
Modified: 2019-04-28 10:45 UTC (History)
18 users (show)

Fixed In Version: vt2.2
Doc Type: Bug Fix
Doc Text:
Previously, virtual machines would be reported as running on the wrong host after failing to migrate due to a maintenance operation on the host. This would prevent hosts where such virtual machines were reported as running from being removed from the Manager. Now, virtual machines are reported as running on the correct host, and it is possible to remove hosts correctly when there are no running virtual machines on those hosts.
Clone Of:
: 1131569 1131856 (view as bug list)
Environment:
Last Closed: 2015-02-11 18:04:07 UTC
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 83073 0 None None None Never
Red Hat Product Errata RHSA-2015:0158 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Virtualization Manager 3.5.0 2015-02-11 22:38:50 UTC
oVirt gerrit 29620 0 master MERGED core: clear migrating_to_vds after successful migration Never
oVirt gerrit 31610 0 master MERGED core: fix incorrect value of migrating_to_vds Never
oVirt gerrit 31721 0 ovirt-engine-3.5 MERGED core: fix incorrect value of migrating_to_vds Never

Description Udayendu Sekhar Kar 2014-06-23 17:15:54 UTC
Description of problem:
Unable to remove host from rhevm GUI and got "Failed to remove host xxxxxxxx" message in the event log.

Version-Release number of selected component (if applicable):
rhevm 3.4
rhev-hypervisor6-6.5-20140603.2.el6


How reproducible:
some time

Steps to Reproduce:
1. Put the host in maintenance mode to migrate the VMs to other host in that cluster
2. Then try to remove the host from rhevm gui once it will be in maintenance mode
3. Out of 10 times, 7-8 times it will fail with "Failed to remove host xxxxxxxx" message in the event log.

In the engine logs following messages will be available:

======
2014-06-23 18:25:51,683 INFO  [org.ovirt.engine.core.bll.RemoveVdsCommand] (ajp-/127.0.0.1:8702-11) [48550cd7] Lock Acquired to object EngineLock [exclusiveLocks= key: 3bb08ce7-bd75-48
97-b371-97493bf4490b value: VDS
, sharedLocks= ]
2014-06-23 18:25:51,844 INFO  [org.ovirt.engine.core.bll.RemoveVdsCommand] (pool-4-thread-6) [48550cd7] Running command: RemoveVdsCommand internal: false. Entities affected :  ID: 3bb0
8ce7-bd75-4897-b371-97493bf4490b Type: VDS
2014-06-23 18:25:52,614 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (pool-4-thread-6) [48550cd7] transaction rolled back
2014-06-23 18:25:52,614 ERROR [org.ovirt.engine.core.bll.RemoveVdsCommand] (pool-4-thread-6) [48550cd7] Command org.ovirt.engine.core.bll.RemoveVdsCommand throw exception: org.springfr
amework.dao.DataIntegrityViolationException: CallableStatementCallback; SQL [{call deletevdsstatic(?)}]; ERROR: update or delete on table "vds_static" violates foreign key constraint "
vds_static_vm_dynamic_m" on table "vm_dynamic"
  Detail: Key (vds_id)=(3bb08ce7-bd75-4897-b371-97493bf4490b) is still referenced from table "vm_dynamic".
  Where: SQL statement "DELETE FROM vds_static WHERE vds_id =  $1 "
PL/pgSQL function "deletevdsstatic" line 11 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: update or delete on table "vds_static" violates foreign key 
constraint "vds_static_vm_dynamic_m" on table "vm_dynamic"
  Detail: Key (vds_id)=(3bb08ce7-bd75-4897-b371-97493bf4490b) is still referenced from table "vm_dynamic".
  Where: SQL statement "DELETE FROM vds_static WHERE vds_id =  $1 "
PL/pgSQL function "deletevdsstatic" line 11 at SQL statement
=========

Actual results:
Unable to remove the host.

Expected results:
Should be able to remove the host as its in maintenance mode and no VMs running on it.

Comment 1 Udayendu Sekhar Kar 2014-06-23 17:21:47 UTC
Present workaround:

1. Check the engine.log files for any relevant message like:

------
2014-06-23 18:25:51,683 INFO  [org.ovirt.engine.core.bll.RemoveVdsCommand] (ajp-/127.0.0.1:8702-11) [48550cd7] Lock Acquired to object EngineLock [exclusiveLocks= key: 3bb08ce7-bd75-48
97-b371-97493bf4490b value: VDS
, sharedLocks= ]
2014-06-23 18:25:51,844 INFO  [org.ovirt.engine.core.bll.RemoveVdsCommand] (pool-4-thread-6) [48550cd7] Running command: RemoveVdsCommand internal: false. Entities affected :  ID: 3bb0
8ce7-bd75-4897-b371-97493bf4490b Type: VDS
2014-06-23 18:25:52,614 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (pool-4-thread-6) [48550cd7] transaction rolled back
2014-06-23 18:25:52,614 ERROR [org.ovirt.engine.core.bll.RemoveVdsCommand] (pool-4-thread-6) [48550cd7] Command org.ovirt.engine.core.bll.RemoveVdsCommand throw exception: org.springfr
amework.dao.DataIntegrityViolationException: CallableStatementCallback; SQL [{call deletevdsstatic(?)}]; ERROR: update or delete on table "vds_static" violates foreign key constraint "
vds_static_vm_dynamic_m" on table "vm_dynamic"
  Detail: Key (vds_id)=(3bb08ce7-bd75-4897-b371-97493bf4490b) is still referenced from table "vm_dynamic".
  Where: SQL statement "DELETE FROM vds_static WHERE vds_id =  $1 "
PL/pgSQL function "deletevdsstatic" line 11 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: update or delete on table "vds_static" violates foreign key 
constraint "vds_static_vm_dynamic_m" on table "vm_dynamic"
  Detail: Key (vds_id)=(3bb08ce7-bd75-4897-b371-97493bf4490b) is still referenced from table "vm_dynamic".
  Where: SQL statement "DELETE FROM vds_static WHERE vds_id =  $1 "
PL/pgSQL function "deletevdsstatic" line 11 at SQL statement
-------


2. Run the below command to findout the VM those are trying to migrate to this host:

 select vm_guid,status,run_on_vds,migrating_to_vds from vm_dynamic where migrating_to_vds = '3bb08ce7-bd75-4897-b371-97493bf4490b' or run_on_vds = '3bb08ce7-bd75-4897-b371-97493bf4490b';

3.  Then run the below command to alter the vm_dynamic table:

UPDATE vm_dynamic SET migrating_to_vds = NULL where migrating_to_vds = '3bb08ce7-bd75-4897-b371-97493bf4490b' or run_on_vds = '3bb08ce7-bd75-4897-b371-97493bf4490b';

After the above two steps, able to delete the VM successfully.

Comment 2 Udayendu Sekhar Kar 2014-06-23 17:25:26 UTC
Not sure in which way we should fix it but the following may be few of ther:

- If any such tasks are there, it should not allow the host to put into maintenance mode and should through the name/id of the VMs those are trying to migrate.

- Or directly set the migrating_to_vds to NULL in vm_dynamic if the VMs are running properly on other hosts or with some other status than "Migration to".

Comment 3 Eli Mesika 2014-06-25 07:58:23 UTC
For me it seems that the canDoAction validation should check if there are some VMs trying to migrate to this host and block the operation if any such VM is found 

The user/admin can safely put the host on Maintenance when migration completes and remove it

Comment 4 Oved Ourfali 2014-06-25 08:43:54 UTC
(In reply to Eli Mesika from comment #3)
> For me it seems that the canDoAction validation should check if there are
> some VMs trying to migrate to this host and block the operation if any such
> VM is found 
> 
> The user/admin can safely put the host on Maintenance when migration
> completes and remove it

Makes sense to me.
Roy - does that makes sense?

Comment 5 Oved Ourfali 2014-06-25 11:27:02 UTC
Moving it to virt after discussing it with Arik, as it handles VM migration flows. Make sure to have an infra reviewer on the resulting patch.

Comment 10 Arik 2014-07-06 11:01:47 UTC
what's the status of the VMs which point to the host that is in maintenance in their migrating_to_vds field?

do we have engine.log?

Comment 11 Arik 2014-07-06 20:13:59 UTC
I managed to reproduce this bug. The solution for [1] fix this one as well. Note that the fix was backported to 3.3 and 3.4.

The patch which is attached to this bug will solve another problem which is related to this flow, where the migrating_to_vds field was not cleared after successful migration.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1097256

Comment 12 Michal Skrivanek 2014-07-07 08:31:45 UTC
issue is fixed, keeping it open for the nice-to-have patch

Comment 14 Arik 2014-08-17 14:29:37 UTC
reopen as the fix which was mentioned in comment 11 doesn't solve all the flows that can cause this bug from happening.

Comment 16 Udayendu Sekhar Kar 2014-08-18 05:49:21 UTC
Hi Arik,

Thanks for the update.

Let me know if you need more info for the analysis. I will be happy to collect that from the customer end.

Thanks,
Uday

Comment 17 Udayendu Sekhar Kar 2014-08-18 05:49:35 UTC
Hi Arik,

Thanks for the update.

Let me know if you need more info for the analysis. I will be happy to collect that from the customer end.

Thanks,
Uday

Comment 23 Ilanit Stein 2014-10-02 14:00:55 UTC
Tested on vt4.
Couldn't reproduce the problem since on the 2nd trial of put host in maintenence migration failed on bug 1145636

Comment 24 Ilanit Stein 2014-11-02 10:06:38 UTC
Verified on vt8.
Tried 7 times the "Steps to Reproduce" from bug description.

Comment 26 errata-xmlrpc 2015-02-11 18:04:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0158.html

Comment 27 Omer Frenkel 2015-02-16 09:33:17 UTC
*** Bug 1188854 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.