Bug 1112359
| Summary: | Failed to remove host xxxxxxxx | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Udayendu Sekhar Kar <ukar> | |
| Component: | ovirt-engine | Assignee: | Arik <ahadas> | |
| Status: | CLOSED ERRATA | QA Contact: | Ilanit Stein <istein> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 3.4.0 | CC: | adahms, asegundo, gwatson, iheim, ipinto, lpeer, mavital, michal.skrivanek, mkalinin, ofrenkel, oourfali, rbalakri, rgolan, Rhev-m-bugs, rpai, sherold, ukar, yeylon | |
| Target Milestone: | --- | Keywords: | Triaged, ZStream | |
| Target Release: | 3.5.0 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | virt | |||
| Fixed In Version: | vt2.2 | Doc Type: | Bug Fix | |
| Doc Text: |
Previously, virtual machines would be reported as running on the wrong host after failing to migrate due to a maintenance operation on the host. This would prevent hosts where such virtual machines were reported as running from being removed from the Manager. Now, virtual machines are reported as running on the correct host, and it is possible to remove hosts correctly when there are no running virtual machines on those hosts.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1131569 1131856 (view as bug list) | Environment: | ||
| Last Closed: | 2015-02-11 18:04:07 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1145636 | |||
| Bug Blocks: | 1131569, 1131856, 1142923, 1156165 | |||
Present workaround:
1. Check the engine.log files for any relevant message like:
------
2014-06-23 18:25:51,683 INFO [org.ovirt.engine.core.bll.RemoveVdsCommand] (ajp-/127.0.0.1:8702-11) [48550cd7] Lock Acquired to object EngineLock [exclusiveLocks= key: 3bb08ce7-bd75-48
97-b371-97493bf4490b value: VDS
, sharedLocks= ]
2014-06-23 18:25:51,844 INFO [org.ovirt.engine.core.bll.RemoveVdsCommand] (pool-4-thread-6) [48550cd7] Running command: RemoveVdsCommand internal: false. Entities affected : ID: 3bb0
8ce7-bd75-4897-b371-97493bf4490b Type: VDS
2014-06-23 18:25:52,614 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (pool-4-thread-6) [48550cd7] transaction rolled back
2014-06-23 18:25:52,614 ERROR [org.ovirt.engine.core.bll.RemoveVdsCommand] (pool-4-thread-6) [48550cd7] Command org.ovirt.engine.core.bll.RemoveVdsCommand throw exception: org.springfr
amework.dao.DataIntegrityViolationException: CallableStatementCallback; SQL [{call deletevdsstatic(?)}]; ERROR: update or delete on table "vds_static" violates foreign key constraint "
vds_static_vm_dynamic_m" on table "vm_dynamic"
Detail: Key (vds_id)=(3bb08ce7-bd75-4897-b371-97493bf4490b) is still referenced from table "vm_dynamic".
Where: SQL statement "DELETE FROM vds_static WHERE vds_id = $1 "
PL/pgSQL function "deletevdsstatic" line 11 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: update or delete on table "vds_static" violates foreign key
constraint "vds_static_vm_dynamic_m" on table "vm_dynamic"
Detail: Key (vds_id)=(3bb08ce7-bd75-4897-b371-97493bf4490b) is still referenced from table "vm_dynamic".
Where: SQL statement "DELETE FROM vds_static WHERE vds_id = $1 "
PL/pgSQL function "deletevdsstatic" line 11 at SQL statement
-------
2. Run the below command to findout the VM those are trying to migrate to this host:
select vm_guid,status,run_on_vds,migrating_to_vds from vm_dynamic where migrating_to_vds = '3bb08ce7-bd75-4897-b371-97493bf4490b' or run_on_vds = '3bb08ce7-bd75-4897-b371-97493bf4490b';
3. Then run the below command to alter the vm_dynamic table:
UPDATE vm_dynamic SET migrating_to_vds = NULL where migrating_to_vds = '3bb08ce7-bd75-4897-b371-97493bf4490b' or run_on_vds = '3bb08ce7-bd75-4897-b371-97493bf4490b';
After the above two steps, able to delete the VM successfully.
Not sure in which way we should fix it but the following may be few of ther: - If any such tasks are there, it should not allow the host to put into maintenance mode and should through the name/id of the VMs those are trying to migrate. - Or directly set the migrating_to_vds to NULL in vm_dynamic if the VMs are running properly on other hosts or with some other status than "Migration to". For me it seems that the canDoAction validation should check if there are some VMs trying to migrate to this host and block the operation if any such VM is found The user/admin can safely put the host on Maintenance when migration completes and remove it (In reply to Eli Mesika from comment #3) > For me it seems that the canDoAction validation should check if there are > some VMs trying to migrate to this host and block the operation if any such > VM is found > > The user/admin can safely put the host on Maintenance when migration > completes and remove it Makes sense to me. Roy - does that makes sense? Moving it to virt after discussing it with Arik, as it handles VM migration flows. Make sure to have an infra reviewer on the resulting patch. what's the status of the VMs which point to the host that is in maintenance in their migrating_to_vds field? do we have engine.log? I managed to reproduce this bug. The solution for [1] fix this one as well. Note that the fix was backported to 3.3 and 3.4. The patch which is attached to this bug will solve another problem which is related to this flow, where the migrating_to_vds field was not cleared after successful migration. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1097256 issue is fixed, keeping it open for the nice-to-have patch reopen as the fix which was mentioned in comment 11 doesn't solve all the flows that can cause this bug from happening. Hi Arik, Thanks for the update. Let me know if you need more info for the analysis. I will be happy to collect that from the customer end. Thanks, Uday Hi Arik, Thanks for the update. Let me know if you need more info for the analysis. I will be happy to collect that from the customer end. Thanks, Uday Tested on vt4. Couldn't reproduce the problem since on the 2nd trial of put host in maintenence migration failed on bug 1145636 Verified on vt8. Tried 7 times the "Steps to Reproduce" from bug description. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0158.html *** Bug 1188854 has been marked as a duplicate of this bug. *** |
Description of problem: Unable to remove host from rhevm GUI and got "Failed to remove host xxxxxxxx" message in the event log. Version-Release number of selected component (if applicable): rhevm 3.4 rhev-hypervisor6-6.5-20140603.2.el6 How reproducible: some time Steps to Reproduce: 1. Put the host in maintenance mode to migrate the VMs to other host in that cluster 2. Then try to remove the host from rhevm gui once it will be in maintenance mode 3. Out of 10 times, 7-8 times it will fail with "Failed to remove host xxxxxxxx" message in the event log. In the engine logs following messages will be available: ====== 2014-06-23 18:25:51,683 INFO [org.ovirt.engine.core.bll.RemoveVdsCommand] (ajp-/127.0.0.1:8702-11) [48550cd7] Lock Acquired to object EngineLock [exclusiveLocks= key: 3bb08ce7-bd75-48 97-b371-97493bf4490b value: VDS , sharedLocks= ] 2014-06-23 18:25:51,844 INFO [org.ovirt.engine.core.bll.RemoveVdsCommand] (pool-4-thread-6) [48550cd7] Running command: RemoveVdsCommand internal: false. Entities affected : ID: 3bb0 8ce7-bd75-4897-b371-97493bf4490b Type: VDS 2014-06-23 18:25:52,614 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (pool-4-thread-6) [48550cd7] transaction rolled back 2014-06-23 18:25:52,614 ERROR [org.ovirt.engine.core.bll.RemoveVdsCommand] (pool-4-thread-6) [48550cd7] Command org.ovirt.engine.core.bll.RemoveVdsCommand throw exception: org.springfr amework.dao.DataIntegrityViolationException: CallableStatementCallback; SQL [{call deletevdsstatic(?)}]; ERROR: update or delete on table "vds_static" violates foreign key constraint " vds_static_vm_dynamic_m" on table "vm_dynamic" Detail: Key (vds_id)=(3bb08ce7-bd75-4897-b371-97493bf4490b) is still referenced from table "vm_dynamic". Where: SQL statement "DELETE FROM vds_static WHERE vds_id = $1 " PL/pgSQL function "deletevdsstatic" line 11 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: update or delete on table "vds_static" violates foreign key constraint "vds_static_vm_dynamic_m" on table "vm_dynamic" Detail: Key (vds_id)=(3bb08ce7-bd75-4897-b371-97493bf4490b) is still referenced from table "vm_dynamic". Where: SQL statement "DELETE FROM vds_static WHERE vds_id = $1 " PL/pgSQL function "deletevdsstatic" line 11 at SQL statement ========= Actual results: Unable to remove the host. Expected results: Should be able to remove the host as its in maintenance mode and no VMs running on it.