Bug 1274315
Summary: | HE-VM can't be migrated between the hosts and host is stuck in "Preparing For Maintenance". | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Nikolai Sednev <nsednev> | ||||||||||||||||||
Component: | BLL.Virt | Assignee: | Roy Golan <rgolan> | ||||||||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Nikolai Sednev <nsednev> | ||||||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||||||
Priority: | high | ||||||||||||||||||||
Version: | 3.6.0.1 | CC: | bugs, cshao, dfediuck, gklein, huiwa, huzhao, mavital, mgoldboi, nsednev, sbonazzo, tjelinek, ycui | ||||||||||||||||||
Target Milestone: | ovirt-3.6.2 | Keywords: | Triaged | ||||||||||||||||||
Target Release: | 3.6.2 | Flags: | rule-engine:
ovirt-3.6.z+
rule-engine: blocker+ mgoldboi: planning_ack+ dfediuck: devel_ack+ rule-engine: testing_ack+ |
||||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||
Doc Text: |
Cause: The import of the engine VM was constantly failing and that was colliding with the attempt update the migration progress of that VM.
Consequence: the engine VM would appear as still be running on the host and the host couldn't be moved into maintenance.
Fix: The import of the engine vm is stable now and shouldn't fail and that will make the migration proceed.
Result: The engine VM will handover to the destination host, and the source host will be free and then it will be able to move to maintenance.
|
Story Points: | --- | ||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||
Last Closed: | 2016-02-18 11:00:54 UTC | Type: | Bug | ||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
Embargoed: | |||||||||||||||||||||
Bug Depends On: | 1269768 | ||||||||||||||||||||
Bug Blocks: | |||||||||||||||||||||
Attachments: |
|
Description
Nikolai Sednev
2015-10-22 13:20:56 UTC
Created attachment 1085520 [details]
logs from the engine and the host from which migration was started
Host will remain in preparing for maintenance until all VMs are migrated. Putting virt (as you haven't put any whiteboard), but sounds like notabug, unless the migration is problematic for some reason. vdsm log doesn't cover the right period. Please add correct logs for both source and destination (time seem to be ~2015-10-22 14:18:00) I don't see what's "urgent" here Created attachment 1086500 [details]
Screenshot from 2015-10-26 14:40:44.png
(In reply to Michal Skrivanek from comment #4) > I don't see what's "urgent" here If you can't migrate HE, then the whole idea of backing up the engine is irrelevant, hence it's urgent IMHO. Please see the attachments. Created attachment 1086514 [details]
engine's logs
Created attachment 1086515 [details]
alma03 logs (source)
Created attachment 1086516 [details]
alma04 logs (destination)
Nikolai, - your new logs do not cover the original interval, they are 4 days later - logs include one migration of HE at 2015-10-26 15:34:43 which took about 2 minutes and ended successfully at 15:36:53 - the attached picture in comment #5 shows a successful move to maintenance - engine.log from comment #7 ends at 2015-10-26 14:51:27 showing perhaps some previous attempts/states? please describe what is the problem, what did you observe, time frame of the issue and entities involved and relevant logs I(In reply to Michal Skrivanek from comment #10) > Nikolai, > - your new logs do not cover the original interval, they are 4 days later > - logs include one migration of HE at 2015-10-26 15:34:43 which took about 2 > minutes and ended successfully at 15:36:53 > - the attached picture in comment #5 shows a successful move to maintenance > - engine.log from comment #7 ends at 2015-10-26 14:51:27 showing perhaps > some previous attempts/states? > > > please describe what is the problem, what did you observe, time frame of the > issue and entities involved and relevant logs Problem is that engine itself not being able to get migrated between two hosts, all the VMs are mirated no problem , except the HE-VM. Sorry, but I can't do anything with the bug without getting answers/logs. oVirt 3.6.0 has been released on November 4th, re-targeting to 3.6.1 since this bug has been marked as high severity (In reply to Michal Skrivanek from comment #12) > Sorry, but I can't do anything with the bug without getting answers/logs. Hi Michal, Providing the full data as I could gather it. 1.Time frame is: " Nov 5, 2015 11:07:06 AM Host alma03.qa.lab.tlv.redhat.com was switched to Maintenance mode by admin@internal (Reason: Not Specified)." 2.I'm migrating 3 Guest-VMs from alma03->alma04, result - PASS. 3.I'm trying to set the alma03 in to the maintenance via WEBUI and HE-VM is running on top of alma03, so during the maintenance HE-VM should be migrated to alma04, result - alma03 stuck in preparing to maintenance, as HE-VM not migrated to alma04. Result - FAIL. 4.Sosreports from the engine, alma03(source), alma04(target) attached. Engine components: rhevm-3.6.0.2-0.1.el6.noarch rhevm-setup-plugin-ovirt-engine-common-3.6.0.2-0.1.el6.noarch ovirt-vmconsole-proxy-1.0.0-1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-3.6.0.2-0.1.el6.noarch ovirt-host-deploy-1.4.0-1.el6ev.noarch ovirt-vmconsole-1.0.0-1.el6ev.noarch ovirt-engine-extension-aaa-jdbc-1.0.1-1.el6ev.noarch ovirt-host-deploy-java-1.4.0-1.el6ev.noarch Host's components on alma03: ovirt-vmconsole-1.0.0-0.0.6.master.el7ev.noarch ovirt-release36-001-0.5.beta.noarch mom-0.5.1-2.el7.noarch ovirt-hosted-engine-setup-1.3.1-0.0.master.20151020145724.git565c3f9.el7.centos.noarch ovirt-setup-lib-1.0.0-1.20150922141000.git147e275.el7.centos.noarch ovirt-host-deploy-1.5.0-0.0.master.20151015221110.gitc2abfed.el7.noarch ovirt-release36-snapshot-001-0.5.beta.noarch vdsm-4.17.10-13.gite438b03.el7.noarch qemu-kvm-rhev-2.3.0-31.el7.x86_64 libvirt-client-1.2.17-12.el7.x86_64 ovirt-hosted-engine-ha-1.3.1-1.20151016090950.git5ea5093.el7.noarch sanlock-3.2.4-1.el7.x86_64 ovirt-engine-sdk-python-3.6.0.4-0.1.20151014.git117764a.el7.centos.noarch ovirt-vmconsole-host-1.0.0-0.0.6.master.el7ev.noarch Host's components on alma04: mom-0.5.1-2.el7.noarch ovirt-hosted-engine-setup-1.3.1-0.0.master.20151020145724.git565c3f9.el7.centos.noarch ovirt-setup-lib-1.0.0-1.20150922141000.git147e275.el7.centos.noarch ovirt-vmconsole-1.0.0-0.0.6.master.el7ev.noarch ovirt-release36-snapshot-001-0.5.beta.noarch ovirt-host-deploy-1.5.0-0.0.master.20151015221110.gitc2abfed.el7.noarch ovirt-release36-001-0.5.beta.noarch ovirt-engine-sdk-python-3.6.0.4-0.1.20151014.git117764a.el7.centos.noarch vdsm-4.17.10.1-0.el7ev.noarch ovirt-hosted-engine-ha-1.3.1-1.20151016090950.git5ea5093.el7.noarch sanlock-3.2.4-1.el7.x86_64 ovirt-vmconsole-host-1.0.0-0.0.6.master.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7.x86_64 libvirt-client-1.2.17-12.el7.x86_64 Created attachment 1090019 [details]
engine's log
Created attachment 1090021 [details]
alma03
Created attachment 1090023 [details]
alma04
Looking into logs I see that: - source VDSM: on line 1311 the migration of the HE VM is initiated - source VDSM: on line 11643 the migration finished successfully - engine: in engine log line 605 the migration succeeded is received - engne: line 630 the ERROR [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-7-thread-22) [18437bc6] Transaction rolled-back for command 'org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand'. Moving to SLA for more investigation. May be related to: https://bugzilla.redhat.com/show_bug.cgi?id=1269768 This should be solved by Bug 1269768 (auto import of HE VM and Storage domain) The VMs Monitoring is intercepting the engine VM and tries to import it but fails. It will skip probably the part the does the migration hand-over (where we set the destination host of the vm as the one this vm is running on) Successfully got HE-SD&HE-VM auto-imported on cleanly installed NFS deployment after NFS data SD was added. Engine was installed using PXE. Then I've set via WEBUI the host with VMs running on in to maintenance mode and all VMs were migrated to second host. Works for me on these components: Host: ovirt-vmconsole-1.0.0-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.3.7-1.el7ev.noarch mom-0.5.1-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.6.x86_64 ovirt-host-deploy-1.4.1-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.2.x86_64 ovirt-setup-lib-1.0.1-1.el7ev.noarch vdsm-4.17.18-0.el7ev.noarch ovirt-vmconsole-host-1.0.0-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.2.3-1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 Linux version 3.10.0-327.8.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon Jan 11 05:03:18 EST 2016 Engine: ovirt-vmconsole-1.0.0-1.el6ev.noarch ovirt-host-deploy-1.4.1-1.el6ev.noarch ovirt-setup-lib-1.0.1-1.el6ev.noarch ovirt-vmconsole-proxy-1.0.0-1.el6ev.noarch ovirt-host-deploy-java-1.4.1-1.el6ev.noarch ovirt-engine-extension-aaa-jdbc-1.0.5-1.el6ev.noarch rhevm-3.6.2.6-0.1.el6.noarch rhevm-dwh-setup-3.6.2-1.el6ev.noarch rhevm-dwh-3.6.2-1.el6ev.noarch rhevm-reports-setup-3.6.2.4-1.el6ev.noarch rhevm-reports-3.6.2.4-1.el6ev.noarch rhevm-guest-agent-common-1.0.11-2.el6ev.noarch Linux version 2.6.32-573.8.1.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #1 SMP Fri Sep 25 19:24:22 EDT 2015 |