Bug 1066445

Summary: VMs are paused after migration from 3.2 to 3.3 hypervisor
Product: Red Hat Enterprise Virtualization Manager Reporter: Tomas Dosek <tdosek>
Component: vdsmAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED ERRATA QA Contact: Elad <ebenahar>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.3.0CC: acanan, amureini, avyadav, bazulay, danken, flo_bugzilla, gwatson, iheim, jraju, lbopf, lpeer, lyarwood, mkalinin, pablo.iranzo, pbandark, scohen, sputhenp, tcarlin, tdosek, ukar, yeylon, zat-pro
Target Milestone: ---Keywords: ZStream
Target Release: 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Virtual machines are no longer paused after migrations from 3.2 hypervisors to 3.3 hypervisors.
Story Points: ---
Clone Of:
: 1074298 (view as bug list) Environment:
Last Closed: 2014-06-09 13:29:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1061621    
Bug Blocks: 1074298, 1078909, 1142926    

Description Tomas Dosek 2014-02-18 12:33:20 UTC
Description of problem:

This seems to be partially a result of fix in https://bugzilla.redhat.com/show_bug.cgi?id=1061621.

After we inspect and alter the XML file during migration, VM gets stuck on destination (newer RHEV-H  or in case of 3.3 -> 3.2 migration on the 3.2 hypervisor) with attribute error:

94:Feb 18 11:17:11 bl460-282 vdsm vm.Vm WARNING vmId=`44a60669-2931-4e5a-810d-43952d705644`::updating drive virtio-disk2 config path from /rhev/data-center/5a94dbe6-3fdc-4584-b434-318354a66c6f/7e43a8e9-0f61-4906-b2b8-cc4ef8a98429/images/1dc43ff9-4ff2-420b-9735-3fad4ecf6b20/49dd740e-6cfa-4c19-8a1b-e0acbc43e4d1 to /rhev/data-center/mnt/blockSD/7e43a8e9-0f61-4906-b2b8-cc4ef8a98429/images/1dc43ff9-4ff2-420b-9735-3fad4ecf6b20/49dd740e-6cfa-4c19-8a1b-e0acbc43e4d1
<-------- Path is getting updated during the migration
115:Feb 18 11:17:52 bl460-282 vdsm vm.Vm ERROR vmId=`44a60669-2931-4e5a-810d-43952d705644`::Stats function failed: <AdvancedStatsFunction _highWrite at 0x24017d8>#012Traceback (most recent call last):#012  File "/usr/share/vdsm/sampling.py", line 351, in collect#012  File "/usr/share/vdsm/sampling.py", line 226, in __call__#012  File "/usr/share/vdsm/vm.py", line 529, in _highWrite#012  File "/usr/share/vdsm/vm.py", line 2313, in extendDrivesIfNeeded#012AttributeError: 'Drive' object has no attribute 'format'
116:Feb 18 11:17:54 bl460-282 vdsm vm.Vm ERROR vmId=`44a60669-2931-4e5a-810d-43952d705644`::Stats function failed: <AdvancedStatsFunction _highWrite at 0x24017d8>#012Traceback (most recent call last):#012  File "/usr/share/vdsm/sampling.py", line 351, in collect#012  File "/usr/share/vdsm/sampling.py", line 226, in __call__#012  File "/usr/share/vdsm/vm.py", line 529, in _highWrite#012  File "/usr/share/vdsm/vm.py", line 2313, in extendDrivesIfNeeded#012AttributeError: 'Drive' object has no attribute 'format'
117:Feb 18 11:17:56 bl460-282 vdsm vm.Vm ERROR vmId=`44a60669-2931-4e5a-810d-43952d705644`::Stats function failed: <AdvancedStatsFunction _highWrite at 0x24017d8>#012Traceback (most recent call last):#012  File "/usr/share/vdsm/sampling.py", line 351, in collect#012  File "/usr/share/vdsm/sampling.py", line 226, in __call__#012  File "/usr/share/vdsm/vm.py", line 529, in _highWrite#012  File "/usr/share/vdsm/vm.py", line 2313, in extendDrivesIfNeeded#012AttributeError: 'Drive' object has no attribute 'format'
118:Feb 18 11:17:58 bl460-282 vdsm vm.Vm ERROR vmId=`44a60669-2931-4e5a-810d-43952d705644`::Stats function failed: <AdvancedStatsFunction _highWrite at 0x24017d8>#012Traceback (most recent call last):#012  File "/usr/share/vdsm/sampling.py", line 351, in collect#012  File "/usr/share/vdsm/sampling.py", line 226, in __call__#012  File "/usr/share/vdsm/vm.py", line 529, in _highWrite#012  File "/usr/share/vdsm/vm.py", line 2313, in extendDrivesIfNeeded#012AttributeError: 'Drive' object has no attribute 'format'
119:Feb 18 11:18:00 bl460-282 vdsm vm.Vm ERROR vmId=`44a60669-2931-4e5a-810d-43952d705644`::Stats function failed: <AdvancedStatsFunction _highWrite at 0x24017d8>#012Traceback (most recent call last):#012  File "/usr/share/vdsm/sampling.py", line 351, in collect#012  File "/usr/share/vdsm/sampling.py", line 226, in __call__#012  File "/usr/share/vdsm/vm.py", line 529, in _highWrite#012  File "/usr/share/vdsm/vm.py", line 2313, in extendDrivesIfNeeded#012AttributeError: 'Drive' object has no attribute 'format'
120:Feb 18 11:18:02 bl460-282 vdsm vm.Vm ERROR vmId=`44a60669-2931-4e5a-810d-43952d705644`::Stats function failed: <AdvancedStatsFunction _highWrite at 0x24017d8>#012Traceback (most recent call last):#012  File "/usr/share/vdsm/sampling.py", line 351, in collect#012  File "/usr/share/vdsm/sampling.py", line 226, in __call__#012  File "/usr/share/vdsm/vm.py", line 529, in _highWrite#012  File "/usr/share/vdsm/vm.py", line 2313, in extendDrivesIfNeeded#012AttributeError: 'Drive' object has no attribute 'format'

Version-Release number of selected component (if applicable):
3.3.0-3

How reproducible:
vdsm-4.13.2-0.10

Steps to Reproduce:
1. have a 3.2 setup upgraded to 3.3
2. have a 3.2 hypervisor running VM and 3.3.0-3 hypervisor
3. migrate VM between the hypervisors

Actual results:
Argument error logged in the logs and VM gets paused.

Expected results:
Should work just fine

Comment 1 Tomas Dosek 2014-02-18 12:53:31 UTC
Actually it's the recovery files that are being rewritten indeed. But the result is the same. DST host constructs XML file with missing attributes for drives and hence VM pauses.

Comment 2 Dan Kenigsberg 2014-02-18 13:20:22 UTC
Could you share vdsm.log, too (from both source and destination hosts)?

Comment 9 Federico Simoncelli 2014-02-18 15:23:43 UTC
I have the feeling that what happened was:

1. the vm was created on vdsm 3.3.0 (< 3.3.0-3) with the new /rhev/data-center/mnt/... path

2. the vm was migrated to a 3.2 vdsm where it lost several attributes (e.g. format) because of the different path (the old /rhev/data-center/<spuuid>), this can be observed in bl460-282_vdsm.log (libvirtError: invalid argument: invalid path /rhev/data-center/mnt...)

3. the vm was migrated (or vdsm was upgraded in-place) to a vdsm 3.3.0-3 that thanks to the fix for bug 1061621 was able to recover the path (logs attached in the description) but not the rest (format)

Tomas can you confirm that this hypothesis is correct?

Comment 11 Tomas Dosek 2014-02-21 10:04:19 UTC
QE - note:

Please be sure all backward compatibility is tested next time. These particular scenarios were broken:

vm started on 3.2 => 3.3.0-3 OK
vm started on 3.2 => 3.3.0 NO!
vm started on 3.3.0 => 3.2 NO!
vm started on 3.3.0 => 3.3.0-3 OK (compatible mode)
vm started on 3.3.0-3 => 3.2 OK
vm started on 3.3.0-3 => 3.3.0 NO!

Comment 12 Tomas Dosek 2014-02-24 08:47:57 UTC
NO AIs for Engineering on this bug. The scenario was already fixed in 3.3.0-3.

Pushing to ON_QA - we need to make sure all the testing scenarios mentioned in comment 11 are covered ideally by automation.

After this the bug can be closed as NOTABUG or WONTFIX. We just need to cover this in QA process.

Resetting flags accordingly as well.

Comment 15 Elad 2014-03-12 15:13:09 UTC
The following combinations of vdsm versions were checked for VM migration. All passed.

4.13.2-0.5.el6ev  ---> vdsm-4.10.2-11.0.el6ev.x86_64     
vdsm-4.10.2-11.0.el6ev.x86_64  -->   4.13.2-0.5.el6ev   
vdsm-4.10.2-11.0.el6ev.x86_64  --> vdsm-4.12.0-105.git0da1561.el6ev.x86_64  
vdsm-4.12.0-105.git0da1561.el6ev.x86_64 --> vdsm-4.10.2-11.0.el6ev.x86_64
vdsm-4.10.2-11.0.el6ev.x86_64 --> vdsm-4.11.0-35.git34c1ef1.el6.x86_64



Moving to CLOSED WONTFIX

Comment 16 errata-xmlrpc 2014-06-09 13:29:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0504.html