Bug 1274315 - HE-VM can't be migrated between the hosts and host is stuck in "Preparing For Maintenance".
HE-VM can't be migrated between the hosts and host is stuck in "Preparing For...
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt (Show other bugs)
3.6.0.1
x86_64 Linux
high Severity high (vote)
: ovirt-3.6.2
: 3.6.2
Assigned To: Roy Golan
Nikolai Sednev
: Triaged
Depends On: 1269768
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-22 09:20 EDT by Nikolai Sednev
Modified: 2016-02-18 06:00 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The import of the engine VM was constantly failing and that was colliding with the attempt update the migration progress of that VM. Consequence: the engine VM would appear as still be running on the host and the host couldn't be moved into maintenance. Fix: The import of the engine vm is stable now and shouldn't fail and that will make the migration proceed. Result: The engine VM will handover to the destination host, and the source host will be free and then it will be able to move to maintenance.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-18 06:00:54 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: SLA
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.z+
rule-engine: blocker+
mgoldboi: planning_ack+
dfediuck: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
logs from the engine and the host from which migration was started (2.73 MB, application/x-gzip)
2015-10-22 09:32 EDT, Nikolai Sednev
no flags Details
Screenshot from 2015-10-26 14:40:44.png (150.96 KB, image/png)
2015-10-26 08:41 EDT, Nikolai Sednev
no flags Details
engine's logs (146.81 KB, application/x-gzip)
2015-10-26 08:57 EDT, Nikolai Sednev
no flags Details
alma03 logs (source) (4.48 MB, application/x-gzip)
2015-10-26 08:57 EDT, Nikolai Sednev
no flags Details
alma04 logs (destination) (3.81 MB, application/x-gzip)
2015-10-26 08:58 EDT, Nikolai Sednev
no flags Details
engine's log (19.32 KB, application/x-gzip)
2015-11-05 06:01 EST, Nikolai Sednev
no flags Details
alma03 (876.62 KB, application/x-gzip)
2015-11-05 06:08 EST, Nikolai Sednev
no flags Details
alma04 (1.45 MB, application/x-gzip)
2015-11-05 06:17 EST, Nikolai Sednev
no flags Details

  None (edit)
Description Nikolai Sednev 2015-10-22 09:20:56 EDT
Description of problem:
I have two hosts within my HE environment and I tried to migrate all VMs including the HE-VM from one host to another, by setting the host in to maintenance. I saw that all guest VMs were migrated successfully, except HE-VM, which was shown for some time as being migrated, bu then it failed to migrate and hosts remained in "Preparing For Maintenance" state. 
I saw also error message received by the engine:
"VDSM <host's FQDN> command failed: Virtual machine does not exist"

Version-Release number of selected component (if applicable):
Host:
ovirt-hosted-engine-setup-1.3.1-0.0.master.20151020145724.git565c3f9.el7.centos.noarch
ovirt-hosted-engine-ha-1.3.1-1.20151016090950.git5ea5093.el7.noarch                    
mom-0.5.1-2.el7.noarch
vdsm-4.17.9-15.git1a7d1d3.el7.noarch
qemu-kvm-rhev-2.3.0-31.el7.x86_64
libvirt-client-1.2.17-12.el7.x86_64
sanlock-3.2.4-1.el7.x86_64

Engine:
ovirt-vmconsole-1.0.0-0.0.6.master.el6ev.noarch
rhevm-3.6.0.1-0.1.el6.noarch
rhevm-guest-agent-common-1.0.11-2.el6ev.noarch
qemu-guest-agent-0.12.1.2-2.479.el6_7.1.x86_64


How reproducible:
100%

Steps to Reproduce:
1.Deploy RHEVM3.6 (16) on 2 RHEL7.2 hosts over iSCSI.
2.Create 2 VMs and start them all on the same host.
3.Set host that hosts 3 VMs to maintenance via WEBUI.

Actual results:
All guest VMs except HE-VM migrated to second host and host itself is stuck in "Preparing For Maintenance".

Expected results:
HE-VM should be migrated to second host.

Additional info:
Comment 1 Nikolai Sednev 2015-10-22 09:32 EDT
Created attachment 1085520 [details]
logs from the engine and the host from which migration was started
Comment 2 Oved Ourfali 2015-10-22 10:43:45 EDT
Host will remain in preparing for maintenance until all VMs are migrated. 
Putting virt (as you haven't put any whiteboard), but sounds like notabug, unless the migration is problematic for some reason.
Comment 3 Michal Skrivanek 2015-10-23 13:56:40 EDT
vdsm log doesn't cover the right period. Please add correct logs for both source and destination (time seem to be ~2015-10-22 14:18:00)
Comment 4 Michal Skrivanek 2015-10-23 13:57:30 EDT
I don't see what's "urgent" here
Comment 5 Nikolai Sednev 2015-10-26 08:41 EDT
Created attachment 1086500 [details]
Screenshot from 2015-10-26 14:40:44.png
Comment 6 Nikolai Sednev 2015-10-26 08:56:24 EDT
(In reply to Michal Skrivanek from comment #4)
> I don't see what's "urgent" here

If you can't migrate HE, then the whole idea of backing up the engine is irrelevant, hence it's urgent IMHO.

Please see the attachments.
Comment 7 Nikolai Sednev 2015-10-26 08:57 EDT
Created attachment 1086514 [details]
engine's logs
Comment 8 Nikolai Sednev 2015-10-26 08:57 EDT
Created attachment 1086515 [details]
alma03 logs (source)
Comment 9 Nikolai Sednev 2015-10-26 08:58 EDT
Created attachment 1086516 [details]
alma04 logs (destination)
Comment 10 Michal Skrivanek 2015-11-04 03:25:07 EST
Nikolai,
- your new logs do not cover the original interval, they are 4 days later
- logs include one migration of HE at 2015-10-26 15:34:43 which took about 2 minutes and ended successfully at 15:36:53
- the attached picture in comment #5 shows a successful move to maintenance
- engine.log from comment #7 ends at 2015-10-26 14:51:27 showing perhaps some previous attempts/states?


please describe what is the problem, what did you observe, time frame of the issue and entities involved and relevant logs
Comment 11 Nikolai Sednev 2015-11-04 07:33:25 EST
I(In reply to Michal Skrivanek from comment #10)
> Nikolai,
> - your new logs do not cover the original interval, they are 4 days later
> - logs include one migration of HE at 2015-10-26 15:34:43 which took about 2
> minutes and ended successfully at 15:36:53
> - the attached picture in comment #5 shows a successful move to maintenance
> - engine.log from comment #7 ends at 2015-10-26 14:51:27 showing perhaps
> some previous attempts/states?
> 
> 
> please describe what is the problem, what did you observe, time frame of the
> issue and entities involved and relevant logs

Problem is that engine itself not being able to get migrated between two hosts, all the VMs are mirated no problem , except the HE-VM.
Comment 12 Michal Skrivanek 2015-11-04 08:18:52 EST
Sorry, but I can't do anything with the bug without getting answers/logs.
Comment 13 Sandro Bonazzola 2015-11-05 03:16:18 EST
oVirt 3.6.0 has been released on November 4th, re-targeting to 3.6.1 since this bug has been marked as high severity
Comment 14 Nikolai Sednev 2015-11-05 04:27:05 EST
(In reply to Michal Skrivanek from comment #12)
> Sorry, but I can't do anything with the bug without getting answers/logs.

Hi Michal,
Providing the full data as I could gather it.
1.Time frame is: "	
Nov 5, 2015 11:07:06 AM
	
Host alma03.qa.lab.tlv.redhat.com was switched to Maintenance mode by admin@internal (Reason: Not Specified)."

2.I'm migrating 3 Guest-VMs from alma03->alma04, result - PASS.
3.I'm trying to set the alma03 in to the maintenance via WEBUI and HE-VM is running on top of alma03, so during the maintenance HE-VM should be migrated to alma04, result - alma03 stuck in preparing to maintenance, as HE-VM not migrated to alma04. Result - FAIL.
4.Sosreports from the engine, alma03(source), alma04(target) attached.
 
Engine components:
rhevm-3.6.0.2-0.1.el6.noarch
rhevm-setup-plugin-ovirt-engine-common-3.6.0.2-0.1.el6.noarch
ovirt-vmconsole-proxy-1.0.0-1.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-3.6.0.2-0.1.el6.noarch
ovirt-host-deploy-1.4.0-1.el6ev.noarch
ovirt-vmconsole-1.0.0-1.el6ev.noarch
ovirt-engine-extension-aaa-jdbc-1.0.1-1.el6ev.noarch
ovirt-host-deploy-java-1.4.0-1.el6ev.noarch

Host's components on alma03:
ovirt-vmconsole-1.0.0-0.0.6.master.el7ev.noarch
ovirt-release36-001-0.5.beta.noarch
mom-0.5.1-2.el7.noarch
ovirt-hosted-engine-setup-1.3.1-0.0.master.20151020145724.git565c3f9.el7.centos.noarch
ovirt-setup-lib-1.0.0-1.20150922141000.git147e275.el7.centos.noarch
ovirt-host-deploy-1.5.0-0.0.master.20151015221110.gitc2abfed.el7.noarch
ovirt-release36-snapshot-001-0.5.beta.noarch
vdsm-4.17.10-13.gite438b03.el7.noarch
qemu-kvm-rhev-2.3.0-31.el7.x86_64
libvirt-client-1.2.17-12.el7.x86_64
ovirt-hosted-engine-ha-1.3.1-1.20151016090950.git5ea5093.el7.noarch
sanlock-3.2.4-1.el7.x86_64
ovirt-engine-sdk-python-3.6.0.4-0.1.20151014.git117764a.el7.centos.noarch
ovirt-vmconsole-host-1.0.0-0.0.6.master.el7ev.noarch

Host's components on alma04:
mom-0.5.1-2.el7.noarch
ovirt-hosted-engine-setup-1.3.1-0.0.master.20151020145724.git565c3f9.el7.centos.noarch
ovirt-setup-lib-1.0.0-1.20150922141000.git147e275.el7.centos.noarch
ovirt-vmconsole-1.0.0-0.0.6.master.el7ev.noarch
ovirt-release36-snapshot-001-0.5.beta.noarch
ovirt-host-deploy-1.5.0-0.0.master.20151015221110.gitc2abfed.el7.noarch
ovirt-release36-001-0.5.beta.noarch
ovirt-engine-sdk-python-3.6.0.4-0.1.20151014.git117764a.el7.centos.noarch
vdsm-4.17.10.1-0.el7ev.noarch
ovirt-hosted-engine-ha-1.3.1-1.20151016090950.git5ea5093.el7.noarch
sanlock-3.2.4-1.el7.x86_64
ovirt-vmconsole-host-1.0.0-0.0.6.master.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7.x86_64
libvirt-client-1.2.17-12.el7.x86_64
Comment 15 Nikolai Sednev 2015-11-05 06:01 EST
Created attachment 1090019 [details]
engine's log
Comment 16 Nikolai Sednev 2015-11-05 06:08 EST
Created attachment 1090021 [details]
alma03
Comment 17 Nikolai Sednev 2015-11-05 06:17 EST
Created attachment 1090023 [details]
alma04
Comment 18 Tomas Jelinek 2015-11-05 08:09:38 EST
Looking into logs I see that:
- source VDSM: on line 1311 the migration of the HE VM is initiated
- source VDSM: on line 11643 the migration finished successfully
- engine: in engine log line 605 the migration succeeded is received
- engne: line 630 the ERROR [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-7-thread-22) [18437bc6] Transaction rolled-back for command 'org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand'.

Moving to SLA for more investigation.

May be related to: https://bugzilla.redhat.com/show_bug.cgi?id=1269768
Comment 19 Roy Golan 2015-12-03 15:58:54 EST
This should be solved by Bug 1269768  (auto import of HE VM and Storage domain)

The VMs Monitoring is intercepting the engine VM and tries to import it but fails. It will skip probably the part the does the migration hand-over (where we set the destination host of the vm as the one this vm is running on)
Comment 20 Nikolai Sednev 2016-01-21 07:18:11 EST
Successfully got HE-SD&HE-VM auto-imported on cleanly installed NFS deployment after NFS data SD was added. Engine was installed using PXE. Then I've set via WEBUI the host with VMs running on in to maintenance mode and all VMs were migrated to second host.
Works for me on these components:
Host:
ovirt-vmconsole-1.0.0-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.3.7-1.el7ev.noarch
mom-0.5.1-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.6.x86_64
ovirt-host-deploy-1.4.1-1.el7ev.noarch
libvirt-client-1.2.17-13.el7_2.2.x86_64
ovirt-setup-lib-1.0.1-1.el7ev.noarch
vdsm-4.17.18-0.el7ev.noarch
ovirt-vmconsole-host-1.0.0-1.el7ev.noarch
ovirt-hosted-engine-setup-1.3.2.3-1.el7ev.noarch
sanlock-3.2.4-2.el7_2.x86_64
Linux version 3.10.0-327.8.1.el7.x86_64 (mockbuild@x86-034.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon Jan 11 05:03:18 EST 2016

Engine:
ovirt-vmconsole-1.0.0-1.el6ev.noarch
ovirt-host-deploy-1.4.1-1.el6ev.noarch
ovirt-setup-lib-1.0.1-1.el6ev.noarch
ovirt-vmconsole-proxy-1.0.0-1.el6ev.noarch
ovirt-host-deploy-java-1.4.1-1.el6ev.noarch
ovirt-engine-extension-aaa-jdbc-1.0.5-1.el6ev.noarch
rhevm-3.6.2.6-0.1.el6.noarch
rhevm-dwh-setup-3.6.2-1.el6ev.noarch
rhevm-dwh-3.6.2-1.el6ev.noarch
rhevm-reports-setup-3.6.2.4-1.el6ev.noarch
rhevm-reports-3.6.2.4-1.el6ev.noarch
rhevm-guest-agent-common-1.0.11-2.el6ev.noarch
Linux version 2.6.32-573.8.1.el6.x86_64 
(mockbuild@x86-033.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #1 SMP Fri Sep 25 19:24:22 EDT 2015

Note You need to log in before you can comment on or make changes to this bug.