Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 988400 - [vdsm] VM migration is failing in case dst host has connectivity problems to an irrelevant storage domain [NEEDINFO]
[vdsm] VM migration is failing in case dst host has connectivity problems to ...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.3.0
x86_64 Unspecified
unspecified Severity high
: ---
: 3.3.0
Assigned To: Eduardo Warszawski
Elad
storage
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-25 10:12 EDT by Elad
Modified: 2016-02-10 15:26 EST (History)
10 users (show)

See Also:
Fixed In Version: v4.13.0
Doc Type: Bug Fix
Doc Text:
Virtual machine migration failed when the destination host had connectivity problems to storage domains in the data center, even irrelevant ones. This caused the host to be stuck in a 'preparing for maintenance' state. Now, a destination host with connection problems to irrelevant storage domains can successfully run migrated virtual machines.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-01-21 11:29:50 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
cboyle: needinfo? (ewarszaw)
abaron: Triaged+


Attachments (Terms of Use)
logs (6.32 MB, application/x-gzip)
2013-07-25 10:12 EDT, Elad
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 17968 None None None Never
Red Hat Product Errata RHBA-2014:0040 normal SHIPPED_LIVE vdsm bug fix and enhancement update 2014-01-21 15:26:21 EST

  None (edit)
Description Elad 2013-07-25 10:12:25 EDT
Created attachment 778273 [details]
logs

Description of problem:
VM migration is failing when one of the storage domains in the data center is blocked to the destination host. The VM disk is on an active storage domain which has connectivity to the host. 

Version-Release number of selected component (if applicable):
vdsm-4.12.0-rc1.12.git8ee6885.el6.x86_64
libvirt-0.10.2-18.el6_4.9.x86_64
qemu-kvm-rhev-0.12.1.2-2.355.el6_4.5.x86_64
rhevm-3.3.0-0.9.master.el6ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. on a block pool with 2 hosts and 2 storage domains from different storage servers
2. run a VM on SPM
3. block connectivity from HSM to the non master storage domain
4. maintenance the SPM

Actual results:
host is stuck in 'preparing for maintenance' because migration of its running VM is failing. The destination host cannot run the migrated VM:

Thread-8811::DEBUG::2013-07-25 15:21:42,393::libvirtconnection::101::libvirtconnection::(wrapper) Unknown libvirterror: ecode: 42 edom: 10 level: 2 message: Domain not found: no domain with matching uuid 'cae206a3
-d1ab-4c36-9247-70bd0ca852db'
Thread-8811::ERROR::2013-07-25 15:21:42,394::vm::2054::vm.Vm::(_startUnderlyingVm) vmId=`cae206a3-d1ab-4c36-9247-70bd0ca852db`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 2032, in _startUnderlyingVm
    self._waitForIncomingMigrationFinish()
  File "/usr/share/vdsm/vm.py", line 3341, in _waitForIncomingMigrationFinish
    self._connection.lookupByUUIDString(self.id),
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2838, in lookupByUUIDString
    if ret is None:raise libvirtError('virDomainLookupByUUIDString() failed', conn=self)
libvirtError: Domain not found: no domain with matching uuid 'cae206a3-d1ab-4c36-9247-70bd0ca852db'


Expected results:
host should succeed to run the migrated VM, because its disk is on the active storage domain.

Additional info:
logs
Comment 4 Michal Skrivanek 2013-08-14 03:27:28 EDT
this is unrelated, what matters is (on dst):
Thread-8810::DEBUG::2013-07-25 15:19:41,620::vm::4771::vm.Vm::(waitForMigrationDestinationPrepare) vmId=`cae206a3-d1ab-4c36-9247-70bd0ca852db`::migration destination: waiting for VM creation
Thread-8810::DEBUG::2013-07-25 15:19:41,620::vm::4776::vm.Vm::(waitForMigrationDestinationPrepare) vmId=`cae206a3-d1ab-4c36-9247-70bd0ca852db`::migration destination: waiting 42s for path prepara
tion
Thread-8810::DEBUG::2013-07-25 15:20:23,621::vm::4779::vm.Vm::(waitForMigrationDestinationPrepare) vmId=`cae206a3-d1ab-4c36-9247-70bd0ca852db`::Timeout while waiting for path preparation
Thread-8810::DEBUG::2013-07-25 15:20:23,621::BindingXMLRPC::986::vds::(wrapper) return vmMigrationCreate with {'status': {'message': 'Error creating the requested VM', 'code': 9}}

with the lvm-related error at:
Thread-8811::DEBUG::2013-07-25 15:21:42,383::lvm::311::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  /dev/mapper/1elad1213738004: read failed after 0 of 4096 at 107374116864: Input/output error\n  /dev/mapper/1elad1213738004: read failed after 0 of 4096 at 107374174208: Input/output error\n  /dev/mapper/1elad1213738004: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad1213738004 was disabled\n  /dev/mapper/1elad1313738004: read failed after 0 of 4096 at 107374116864: Input/output error\n  /dev/mapper/1elad1313738004: read failed after 0 of 4096 at 107374174208: Input/output error\n  /dev/mapper/1elad1313738004: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad1313738004 was disabled\n  /dev/mapper/1elad2213739782: read failed after 0 of 4096 at 53687025664: Input/output error\n  /dev/mapper/1elad2213739782: read failed after 0 of 4096 at 53687083008: Input/output error\n  /dev/mapper/1elad2213739782: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad2213739782 was disabled\n  /dev/mapper/1elad1513738004: read failed after 0 of 4096 at 107374116864: Input/output error\n  /dev/mapper/1elad1513738004: read failed after 0 of 4096 at 107374174208: Input/output error\n  /dev/mapper/1elad1513738004: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad1513738004 was disabled\n  /dev/mapper/1elad1413738004: read failed after 0 of 4096 at 107374116864: Input/output error\n  /dev/mapper/1elad1413738004: read failed after 0 of 4096 at 107374174208: Input/output error\n  /dev/mapper/1elad1413738004: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad1413738004 was disabled\n  /dev/mapper/1elad31373904: read failed after 0 of 4096 at 21474770944: Input/output error\n  /dev/mapper/1elad31373904: read failed after 0 of 4096 at 21474828288: Input/output error\n  /dev/mapper/1elad31373904: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad31373904 was disabled\n  /dev/mapper/1elad2513739782: read failed after 0 of 4096 at 53687025664: Input/output error\n  /dev/mapper/1elad2513739782: read failed after 0 of 4096 at 53687083008: Input/output error\n  /dev/mapper/1elad2513739782: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad2513739782 was disabled\n  /dev/mapper/1elad2413739782: read failed after 0 of 4096 at 53687025664: Input/output error\n  /dev/mapper/1elad2413739782: read failed after 0 of 4096 at 53687083008: Input/output error\n  /dev/mapper/1elad2413739782: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad2413739782 was disabled\n  /dev/mapper/1elad2113739782: read failed after 0 of 4096 at 53687025664: Input/output error\n  /dev/mapper/1elad2113739782: read failed after 0 of 4096 at 53687083008: Input/output error\n  /dev/mapper/1elad2113739782: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad2113739782 was disabled\n  /dev/mapper/1elad2313739782: read failed after 0 of 4096 at 53687025664: Input/output error\n  /dev/mapper/1elad2313739782: read failed after 0 of 4096 at 53687083008: Input/output error\n  /dev/mapper/1elad2313739782: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1elad2313739782 was disabled\n'; <rc> = 0
Comment 5 Ayal Baron 2013-08-21 06:22:24 EDT
(In reply to Michal Skrivanek from comment #4)

Although LVM command sees problems with some devices, the command itself succeeded, however, the issue is that it took it 2 minutes to complete and migration timedout.

Thread-8811::DEBUG::2013-07-25 15:19:41,603::lvm::311::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm lvs

Thread-8811::DEBUG::2013-07-25
15:21:42,383::lvm::311::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '
Comment 6 Elad 2013-10-14 05:10:06 EDT
host that has connection problems to irrelevant SD, is able to run vms successfully.

Verified with is18
Comment 7 Charlie 2013-11-27 19:32:21 EST
This bug is currently attached to errata RHBA-2013:15291. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to 
minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.
Comment 9 errata-xmlrpc 2014-01-21 11:29:50 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0040.html

Note You need to log in before you can comment on or make changes to this bug.