Bug 1330548
| Summary: | VMs failed to migrate when one of the node in the cluster is put into maintenance. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | RamaKasturi <knarra> |
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
| Status: | CLOSED DUPLICATE | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.2 | CC: | bugs, dyuan, jsuchane, msivak, pzhang, rbalakri, sabose, xuzhang, ykaul |
| Target Milestone: | pre-dev-freeze | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-09-12 13:49:53 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1258386 | ||
From the logs on migration destination:
periodic/47::WARNING::2016-04-26 15:00:56,440::periodic::285::virt.vm::(__call__) vmId=`2050bdfa-caea-49a3-bad4-f49964b29657`::could not run on 2050bdfa-caea-49a3-bad4-f49964b29657: domain n
ot connected
libvirtEventLoop::WARNING::2016-04-26 15:00:58,783::utils::140::root::(rmFile) File: /var/lib/libvirt/qemu/channels/2050bdfa-caea-49a3-bad4-f49964b29657.com.redhat.rhevm.vdsm already removed
jsonrpc.Executor/4::DEBUG::2016-04-26 15:00:58,773::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.destroy' in bridge with [u'2050bdfa-caea-49a3-bad4-f49964b29657']
jsonrpc.Executor/4::INFO::2016-04-26 15:00:58,788::API::341::vds::(destroy) vmContainerLock acquired by vm 2050bdfa-caea-49a3-bad4-f49964b29657
jsonrpc.Executor/4::DEBUG::2016-04-26 15:00:58,789::vm::3885::virt.vm::(destroy) vmId=`2050bdfa-caea-49a3-bad4-f49964b29657`::destroy Called
Thread-3732940::ERROR::2016-04-26 15:00:58,774::vm::753::virt.vm::(_startUnderlyingVm) vmId=`2050bdfa-caea-49a3-bad4-f49964b29657`::Failed to start a migration destination vm
Traceback (most recent call last):
File "/usr/share/vdsm/virt/vm.py", line 722, in _startUnderlyingVm
self._completeIncomingMigration()
File "/usr/share/vdsm/virt/vm.py", line 2852, in _completeIncomingMigration
self._incomingMigrationFinished.isSet(), usedTimeout)
File "/usr/share/vdsm/virt/vm.py", line 2911, in _attachLibvirtDomainAfterMigration
raise MigrationError(e.get_error_message())
MigrationError: Domain not found: no domain with matching uuid '2050bdfa-caea-49a3-bad4-f49964b29657'
Thread-3732940::INFO::2016-04-26 15:00:58,793::vm::1330::virt.vm::(setDownStatus) vmId=`2050bdfa-caea-49a3-bad4-f49964b29657`::Changed state to Down: VM failed to migrate (code=8)
Thread-3732940::DEBUG::2016-04-26 15:00:58,795::__init__::206::jsonrpc.Notification::(emit) Sending event {"params": {"2050bdfa-caea-49a3-bad4-f49964b29657": {"status": "Down", "timeOffset": "0", "exitReason": 8, "exitMessage": "VM failed to migrate", "exitCode": 1}, "notify_time": 5589338680}, "jsonrpc": "2.0", "method": "|virt|VM_status|2050bdfa-caea-49a3-bad4-f49964b29657"}
The sosreport tree output does have the image path - /rhev/data-center/mnt/glusterSD/sulphur.lab.eng.blr.redhat.com:_vmstore/297a9b9c-4396-4b30-8bfe-976a67d49a74/images/2c6601e9-456f-4638-9ce4-d98efd97c053/86999bdc-a7bd-4d1e-9faa-a8ba7cf531f4
Martin, any ideas about this error? I'm afraid I don't have enough virt expertise to debug further. It seems the engine actually tried to migrate the VM so it is not a scheduling issue. And I do not know enough about the underlying libvirt logic, you need someone from the virt team for that (I do scheduling and QoS). this looks like a libvirt or qemu bug, versions look the same on both sides, (see e.g. qemu/linux_vm.log) moving to libvirt, feel free to push down the stack The "internal error: info migration reply was missing return status" is a result of bug 1374613. Because of this bug and missing debug logs from libvirt it's impossible to diagnose the real cause of the migration failure. I'm closing this bug as a duplicate of 1374613. If the issue can be reproduced with a package with bug 1374613 fixed, please, file a new bug so that we can properly investigate the root cause. *** This bug has been marked as a duplicate of bug 1374613 *** |
Description of problem: Had three nodes in my cluster zod, sulphur and tettnang. There are some vms running on zod , put the node zod to maintenance. It tries to migrate the vms to another host and fails to migrate some of the vms. Version-Release number of selected component (if applicable): libgovirt-0.3.3-1.el7_2.1.x86_64 ovirt-host-deploy-1.4.1-1.el7ev.noarch ovirt-vmconsole-1.0.0-1.el7ev.noarch ovirt-vmconsole-host-1.0.0-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.1-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.4.0-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.4.x86_64 libvirt-daemon-driver-secret-1.2.17-13.el7_2.4.x86_64 libvirt-python-1.2.17-2.el7.x86_64 libvirt-daemon-1.2.17-13.el7_2.4.x86_64 libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.4.x86_64 libvirt-daemon-driver-interface-1.2.17-13.el7_2.4.x86_64 libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.4.x86_64 libvirt-daemon-driver-storage-1.2.17-13.el7_2.4.x86_64 libvirt-daemon-driver-qemu-1.2.17-13.el7_2.4.x86_64 libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.4.x86_64 libvirt-lock-sanlock-1.2.17-13.el7_2.4.x86_64 libvirt-daemon-driver-network-1.2.17-13.el7_2.4.x86_64 libvirt-daemon-kvm-1.2.17-13.el7_2.4.x86_64 How reproducible: Steps to Reproduce: 1. configure HC machines 2. BootStromed 30 vms 3. Now put the machine zod to maintenance. Actual results: There are 11 vms running on the machine which was put to maintenance and i see that only 5 of them migrated to another hypervisor and rest all of them failed to migrate. I tried migrating them manually and that did not work too. Following are the trace backs i see in the vdsm logs. Thread-3817985::ERROR::2016-04-26 17:41:26,086::migration::309::virt.vm::(run) vmId=`2050bdfa-caea-49a3-bad4-f49964b29657`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/virt/migration.py", line 297, in run self._startUnderlyingMigration(time.time()) File "/usr/share/vdsm/virt/migration.py", line 363, in _startUnderlyingMigration self._perform_migration(duri, muri) File "/usr/share/vdsm/virt/migration.py", line 402, in _perform_migration self._vm._dom.migrateToURI3(duri, params, flags) File "/usr/share/vdsm/virt/virdomain.py", line 68, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1836, in migrateToURI3 if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self) libvirtError: internal error: info migration reply was missing return status Expected results: All the vms should be migrated sucessfully. Additional info: