Bug 1059482

Summary: Migrating between older and newer RHEV-H images leads to flood "path /rhev/data-center/mnt/blockSD/<UUID>/images/<UUID>/<UUID> not assigned to domain"
Product: Red Hat Enterprise Virtualization Manager Reporter: David Gibson <dgibson>
Component: vdsmAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED ERRATA QA Contact: Leonid Natapov <lnatapov>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.3.0CC: abisogia, bazulay, cpelland, danken, eedri, fsimonce, gchakkar, gwatson, iheim, jcoscia, jraju, lbopf, lpeer, mbourvin, meverett, mhuth, michal.skrivanek, michele, mkalinin, mtessun, pablo.iranzo, rmcswain, sbonazzo, scohen, tcarlin, tvvcox, usurse, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 3.4.0Flags: scohen: needinfo+
scohen: needinfo+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: ovirt-3.4.0-beta3 Doc Type: Bug Fix
Doc Text:
An update to VDSM changed a path used by libvirt. Consequently, logs were flooded with messages about invalid paths. The virtual machines associated with these logs worked, but migrations from older RHEV-H to newer RHEV-H caused the logs to flood. An update to VDSM adjusts the libvirt XML to match the form the destination is using after a migration from an older version of RHEV-H to a newer version of RHEV-H. Now, logs are no longer flooded with messages about invalid paths.
Story Points: ---
Clone Of:
: 1061621 (view as bug list) Environment:
Last Closed: 2014-06-09 13:28:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1061621, 1078909, 1094944, 1142926    

Description David Gibson 2014-01-29 23:07:17 UTC
Description of problem:

Customer found that after updating RHEV-H and migrating VMs to the updated host, logs were flooded with messages like:

Jan 29 05:17:39 rhevh04 vdsm vm.Vm ERROR vmId=`ed1061e9-a91d-45f2-a62e-e90a57bcd32a`::Stats function failed: <AdvancedStatsFunction _highWrite at 0x25fa650>#012Traceback (most recent call last):#012  File "/usr/share/vdsm/sampling.py", line 351, in collect#012  File "/usr/share/vdsm/sampling.py", line 226, in __call__#012  File "/usr/share/vdsm/vm.py", line 529, in _highWrite#012  File "/usr/share/vdsm/vm.py", line 2316, in extendDrivesIfNeeded#012  File "/usr/share/vdsm/vm.py", line 842, in f#012  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper#012  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1814, in blockInfo#012libvirtError: invalid argument: invalid path /rhev/data-center/mnt/blockSD/589e5d96-84f0-412f-a6f5-3524e12e7606/images/aec90018-7195-4306-b3bb-b4a334f315a2/5f5e48e5-b4fe-4030-923d-cec014cc21b5 not assigned to domain

The VMs otherwise work.

The problem is because the newer vdsm version configures libvirt to use paths of the form:

/rhev/data-center/mnt/<mountpoint>/<sd uuid>/images/<disk uuid>/<img uuid>
                  ^^^^^^^^^^^^^^^^

whereas the older vdsm used the form

/rhev/data-center/<sp uuid>/<sd uuid>/images/<disk uuid>/<img uuid>
                  ^^^^^^^^^

For VMs started under the new vdsm this is fine, but for VMs migrated from an older hypervisor, the libvirt XML, including the old-style paths is carried along.  That means that the vdsm and libvirt paths are out of sync, causing the error above when vdsm attempts to query libvirt for information about the disk.

Version-Release number of selected component (if applicable):

vdsm 4.13.2-0.6 uses the new path format

How reproducible:

100%

Steps to Reproduce:
1. Have a RHEV setup with pre-vdsm-4.13.2 hypervisors
2. Update one hypervisor to the RHEL 6.5 based image with vdsm-4.13.2
3. Migrate a VM from the old to the new hypervisor

Actual results:

VM works, but logs are spammed with libvirt and vdsm errors.

Expected results:

VM works, without extraneous errors.

Additional info:

This can be worked around by restarting the VMs under the new hypervisor, but obviously customers prefer not to do that.

Comment 3 David Gibson 2014-02-03 04:06:44 UTC
In fact, fixing only for 3.4.0 could actually be *worse* than not fixing it at all.  Because the problem is a mismatch in behaviour between hypervisor versions, just reverting the behaviour in 3.4 would cause this problem to occur again for all those people who've switched to the current behaviour in the interim, and restarted their VMs to work around the errors.

A real fix would need to check for this situation during migrate, and adjust the libvirt XML to match whatever form the destination vdsm is using.

Comment 10 David Gibson 2014-02-11 23:08:31 UTC
@Ulhas,

Which workaround from the KCS?  Restarting the VM, or migrating back to the old vdsm?

Comment 11 Ulhas Surse 2014-02-12 04:26:59 UTC
David, 
It's restart the VM.

2) Otherwise if the VM can be shutdown then shut it down on the vdsm 4.13.2-0.6 hypervisor and start it again. Note it must be shutdown completely and not simply rebooted.

Comment 18 Nir Soffer 2014-02-26 19:50:59 UTC
*** Bug 1055437 has been marked as a duplicate of this bug. ***

Comment 19 Leonid Natapov 2014-02-27 06:51:34 UTC
Tested with two RHEVH nodes and 3.4 management.
First RHEVH - RHEL 6.4 with vdsm-4.10.2-25.1.el6ev
Second RHEVH - RHEL 6.5 with vdsm-4.13.2-0.6.el6ev.

VM migrated from the first rhevh to the second one and vise versa.
No error appear in vdsm.log.

Comment 20 errata-xmlrpc 2014-06-09 13:28:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0504.html