Red Hat Bugzilla – Bug 988400
[vdsm] VM migration is failing in case dst host has connectivity problems to an irrelevant storage domain
Last modified: 2016-02-10 15:26:00 EST
Created attachment 778273 [details] logs Description of problem: VM migration is failing when one of the storage domains in the data center is blocked to the destination host. The VM disk is on an active storage domain which has connectivity to the host. Version-Release number of selected component (if applicable): vdsm-4.12.0-rc1.12.git8ee6885.el6.x86_64 libvirt-0.10.2-18.el6_4.9.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6_4.5.x86_64 rhevm-3.3.0-0.9.master.el6ev.noarch How reproducible: 100% Steps to Reproduce: 1. on a block pool with 2 hosts and 2 storage domains from different storage servers 2. run a VM on SPM 3. block connectivity from HSM to the non master storage domain 4. maintenance the SPM Actual results: host is stuck in 'preparing for maintenance' because migration of its running VM is failing. The destination host cannot run the migrated VM: Thread-8811::DEBUG::2013-07-25 15:21:42,393::libvirtconnection::101::libvirtconnection::(wrapper) Unknown libvirterror: ecode: 42 edom: 10 level: 2 message: Domain not found: no domain with matching uuid 'cae206a3 -d1ab-4c36-9247-70bd0ca852db' Thread-8811::ERROR::2013-07-25 15:21:42,394::vm::2054::vm.Vm::(_startUnderlyingVm) vmId=`cae206a3-d1ab-4c36-9247-70bd0ca852db`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 2032, in _startUnderlyingVm self._waitForIncomingMigrationFinish() File "/usr/share/vdsm/vm.py", line 3341, in _waitForIncomingMigrationFinish self._connection.lookupByUUIDString(self.id), File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2838, in lookupByUUIDString if ret is None:raise libvirtError('virDomainLookupByUUIDString() failed', conn=self) libvirtError: Domain not found: no domain with matching uuid 'cae206a3-d1ab-4c36-9247-70bd0ca852db' Expected results: host should succeed to run the migrated VM, because its disk is on the active storage domain. Additional info: logs
this is unrelated, what matters is (on dst): Thread-8810::DEBUG::2013-07-25 15:19:41,620::vm::4771::vm.Vm::(waitForMigrationDestinationPrepare) vmId=`cae206a3-d1ab-4c36-9247-70bd0ca852db`::migration destination: waiting for VM creation Thread-8810::DEBUG::2013-07-25 15:19:41,620::vm::4776::vm.Vm::(waitForMigrationDestinationPrepare) vmId=`cae206a3-d1ab-4c36-9247-70bd0ca852db`::migration destination: waiting 42s for path prepara tion Thread-8810::DEBUG::2013-07-25 15:20:23,621::vm::4779::vm.Vm::(waitForMigrationDestinationPrepare) vmId=`cae206a3-d1ab-4c36-9247-70bd0ca852db`::Timeout while waiting for path preparation Thread-8810::DEBUG::2013-07-25 15:20:23,621::BindingXMLRPC::986::vds::(wrapper) return vmMigrationCreate with {'status': {'message': 'Error creating the requested VM', 'code': 9}} with the lvm-related error at: Thread-8811::DEBUG::2013-07-25 15:21:42,383::lvm::311::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ' /dev/mapper/1elad1213738004: read failed after 0 of 4096 at 107374116864: Input/output error\n /dev/mapper/1elad1213738004: read failed after 0 of 4096 at 107374174208: Input/output error\n /dev/mapper/1elad1213738004: read failed after 0 of 4096 at 0: Input/output error\n WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad1213738004 was disabled\n /dev/mapper/1elad1313738004: read failed after 0 of 4096 at 107374116864: Input/output error\n /dev/mapper/1elad1313738004: read failed after 0 of 4096 at 107374174208: Input/output error\n /dev/mapper/1elad1313738004: read failed after 0 of 4096 at 0: Input/output error\n WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad1313738004 was disabled\n /dev/mapper/1elad2213739782: read failed after 0 of 4096 at 53687025664: Input/output error\n /dev/mapper/1elad2213739782: read failed after 0 of 4096 at 53687083008: Input/output error\n /dev/mapper/1elad2213739782: read failed after 0 of 4096 at 0: Input/output error\n WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad2213739782 was disabled\n /dev/mapper/1elad1513738004: read failed after 0 of 4096 at 107374116864: Input/output error\n /dev/mapper/1elad1513738004: read failed after 0 of 4096 at 107374174208: Input/output error\n /dev/mapper/1elad1513738004: read failed after 0 of 4096 at 0: Input/output error\n WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad1513738004 was disabled\n /dev/mapper/1elad1413738004: read failed after 0 of 4096 at 107374116864: Input/output error\n /dev/mapper/1elad1413738004: read failed after 0 of 4096 at 107374174208: Input/output error\n /dev/mapper/1elad1413738004: read failed after 0 of 4096 at 0: Input/output error\n WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad1413738004 was disabled\n /dev/mapper/1elad31373904: read failed after 0 of 4096 at 21474770944: Input/output error\n /dev/mapper/1elad31373904: read failed after 0 of 4096 at 21474828288: Input/output error\n /dev/mapper/1elad31373904: read failed after 0 of 4096 at 0: Input/output error\n WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad31373904 was disabled\n /dev/mapper/1elad2513739782: read failed after 0 of 4096 at 53687025664: Input/output error\n /dev/mapper/1elad2513739782: read failed after 0 of 4096 at 53687083008: Input/output error\n /dev/mapper/1elad2513739782: read failed after 0 of 4096 at 0: Input/output error\n WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad2513739782 was disabled\n /dev/mapper/1elad2413739782: read failed after 0 of 4096 at 53687025664: Input/output error\n /dev/mapper/1elad2413739782: read failed after 0 of 4096 at 53687083008: Input/output error\n /dev/mapper/1elad2413739782: read failed after 0 of 4096 at 0: Input/output error\n WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad2413739782 was disabled\n /dev/mapper/1elad2113739782: read failed after 0 of 4096 at 53687025664: Input/output error\n /dev/mapper/1elad2113739782: read failed after 0 of 4096 at 53687083008: Input/output error\n /dev/mapper/1elad2113739782: read failed after 0 of 4096 at 0: Input/output error\n WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad2113739782 was disabled\n /dev/mapper/1elad2313739782: read failed after 0 of 4096 at 53687025664: Input/output error\n /dev/mapper/1elad2313739782: read failed after 0 of 4096 at 53687083008: Input/output error\n /dev/mapper/1elad2313739782: read failed after 0 of 4096 at 0: Input/output error\n WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad2313739782 was disabled\n'; <rc> = 0
(In reply to Michal Skrivanek from comment #4) Although LVM command sees problems with some devices, the command itself succeeded, however, the issue is that it took it 2 minutes to complete and migration timedout. Thread-8811::DEBUG::2013-07-25 15:19:41,603::lvm::311::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm lvs Thread-8811::DEBUG::2013-07-25 15:21:42,383::lvm::311::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '
host that has connection problems to irrelevant SD, is able to run vms successfully. Verified with is18
This bug is currently attached to errata RHBA-2013:15291. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0040.html