Bug 1303994
Summary: | Host maintenance failed since VM is already running on destination host | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Israel Pinto <ipinto> | ||||||||||||||
Component: | BLL.Virt | Assignee: | Martin Betak <mbetak> | ||||||||||||||
Status: | CLOSED WONTFIX | QA Contact: | Israel Pinto <ipinto> | ||||||||||||||
Severity: | low | Docs Contact: | |||||||||||||||
Priority: | low | ||||||||||||||||
Version: | 3.6.0.3 | CC: | bugs, ipinto, mbetak, michal.skrivanek, mpoledni, ncredi, tjelinek | ||||||||||||||
Target Milestone: | --- | Keywords: | Automation | ||||||||||||||
Target Release: | --- | Flags: | sbonazzo:
ovirt-4.1-
|
||||||||||||||
Hardware: | Unspecified | ||||||||||||||||
OS: | Unspecified | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | Environment: | ||||||||||||||||
Last Closed: | 2016-12-07 16:42:56 UTC | Type: | Bug | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Attachments: |
|
Description
Israel Pinto
2016-02-02 16:09:00 UTC
Created attachment 1120489 [details]
engine_log
Created attachment 1120491 [details]
vdsm_log_1
Created attachment 1120492 [details]
vdsm_log_2
Created attachment 1120493 [details]
test_log
Hi can you please also provide the server.log? It usually contains more detailed error messages. Created attachment 1120890 [details]
server_log
Israel, in the server.log I see a lot of occurrences of BZ#1261877 which is supposed to be fixed in 3.6.0. Are you sure your version is 3.6.3? Thank you Martin yes: Version-Release number of selected component (if applicable): RHEVM Version: 3.6.3-0.1.el6 Host: RHEV Hypervisor - 7.2 - 20160126.0.el7ev vdsm-4.17.18-0.el7ev libvirt-1.2.17-13.el7_2.2 Please provide an output of following commands: virsh -r nodedev-list vdsClient -s 0 hostdevListByCaps lspci on both hosts. Created attachment 1122784 [details]
host_output
Although engine/server logs point to something similar to bz#1306333, the logs provided from the host are fine. Either something changed between the logs or the problematic devices disappeared. Indeed the host device (tracked in different bug) seems unrelated to this case. This new "behavior" of migration/maintenance interaction seems to be a consequence of 3.6 changes to Vm/Host monitoring. Investigating further ... Israel: I tried to reproduce this migration/maintenance scenario by mass migrating 10 VMs from source host to destination and putting the destination to maintenance mode immediately after. This resulted in the following: 1) some VMs managed to migrate sucessfully 2) some migrations were cancelled 3) A message was output into logs saying: "Migration failed while Host is in 'preparing for maintenance' state. Consider manual intervention: stopping/migrating Vms as Host's state will not turn to maintenance while VMs are still running on it." 4) Performed "manual intervention" as suggested by migrating VMs that managed to migrate back to source 5) Upon successful completion of 4) the *Destination* went to maintenance mode automatically. Maybe the only shortcoming of this is that the engine did not try to re-issue migrations from the preparing-for-maintenance destination but from the observed log messages and overall system behaviour I wouldn't say this behaviour is necessarily a bug. Martin: We have here case that VM is on Host_1 and we migrate it to Host_2 After migration done, we put Host_2 to maintenance. Host_1 is now the destination BUT probably the destroy of the VM did not finish yet on this host (Host_1) this way we get the message already running on destination. This case it rare since it done automation testing. But it can be in case where we need to do migration of one/more VM in loop between 2 Hosts without delay. We start seeing this failure after the fix in the destroy of VM which stay as external after it deleted This is a very rare case and does not cause any harm (e.g. does not affect the VM). It is a consequence of the fact that after the migration finished engine needs to send the destroy VM command to VDSM. This can race with new migration command. This request has been proposed for two releases. This is invalid flag usage. The ovirt-future release flag has been cleared. If you wish to change the release flag, you must clear one release flag and then set the other release flag to ?. This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP. best would be to work around that in automation and try again. it's very unlikely to happen in real world |