Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 889579 - V1->V3 storage domain upgrade failure - results in round-robining of SPM role and invalid DC status.
Summary: V1->V3 storage domain upgrade failure - results in round-robining of SPM role...
Keywords:
Status: CLOSED DUPLICATE of bug 884314
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.3
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: rc
: ---
Assignee: Ayal Baron
QA Contact: Haim
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-12-21 19:18 UTC by Stephen Gordon
Modified: 2014-07-01 12:04 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-12-24 07:09:39 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Stephen Gordon 2012-12-21 19:18:12 UTC
Description of problem:

I upgraded my RHEV-M to 3.1 successfully, I have since however attempted to move my (default) cluster and data center to 3.1 mode. To do this I:

- Suspended all VMs.
- Put the hosts into maintenance mode.
- Ran yum update on the hosts and rebooted them to ensure I had the latest kernel, vdsm etc. (hosts pull updates from RHN).
- Changed the cluster compatibility mode to 3.1 in RHEV-M.

In the UI the change is reflected immediately, but it seems in the background VDSM has some work to do. When I re-activated the hosts:

- The Data Center status appears as invalid.
- The hosts contend for SPM but never succeed.

Looking at the logs it appears that VDSM has upgraded the data storage domain to V3 (as expected) but is unable to open some of the images.

The exceptions that jumped out to me at first were of the following form:

81ae8093-43c3-4e6e-8aaf-02203bf19a74::ERROR::2012-12-21 11:35:14,828::task::853::TaskManager.Task::(_setError) Task=`81ae8093-43c3-4e6e-8aaf-02203bf19a74`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 320, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/sp.py", line 274, in startSpm
    self._upgradePool(expectedDomVersion, __securityOverride=True)
  File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 414, in _upgradePool
    self._convertDomain(self.masterDomain, str(targetDomVersion))
  File "/usr/share/vdsm/storage/sp.py", line 1032, in _convertDomain
    domain.getRealDomain(), isMsd, targetFormat)
  File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 281, in convert
    converter(repoPath, hostId, imageRepo, isMsd)
  File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 202, in v3DomainConverter
    v3ResetMetaVolSize(vol)  # BZ#811880
  File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 115, in v3ResetMetaVolSize
    qemuImg.FORMAT.QCOW2)
  File "/usr/lib64/python2.6/site-packages/vdsm/qemuImg.py", line 67, in info
    raise QImgError(rc, out, err)
QImgError: ecode=1, stdout=[], stderr=["qemu-img: Could not open '/rhev/data-center/6254b9d8-bfa0-11e1-bfa9-0019b917ca79/b08fbecf-6b72-4576-9ee4-923546c8a29d/images/f13de9c8-3855-4579-bc2b-859a67572f69/e0191317-b28f-41f4-b8c1-fdc6659ffe51'"], message=None

I got some assistance from Lee Yarwood via #sbr-virt and his investigation suggested that the images it can't open are those storing the suspend state and are in raw format. The error being returned is because VDSM attempts to check them with qemu-img info -f qcow2.

After manually using qemu-img covert to turn the RAW images into QCOW2 images one of the hosts was able to win the SPM election so I could bring the environment back up.

So my questions are:

- What is the expected format of the images that are created when suspending?
- Why couldn't VDSM open them after the storage domain upgrade (well, more accurately why did the format saved now match the format passed to qemu-img info - we know given the parameters why it failed)?
- Are users expected to fully shutdown (not just suspend) VMs before changing the compatibility mode, and if so why do we allow it to proceed otherwise?

Version-Release number of selected component (if applicable):

Both hosts show the following vdsm packages installed.

# rpm -qa | grep vdsm
vdsm-cli-4.9.6-44.1.el6_3.noarch
vdsm-python-4.9.6-44.1.el6_3.x86_64
vdsm-4.9.6-44.1.el6_3.x86_64

Additional Info:

I am fairly concerned about the idea of a customer running into this. The

Comment 2 Stephen Gordon 2012-12-21 19:35:41 UTC
Some further information from Lee's notes:

e50dd0b3-8d6a-4076-a66b-292658b671d0::DEBUG::2012-12-21 11:57:17,176::formatConverter::144::Storage.v3DomainConverter::(v3DomainConverter) Converting volume: e0191317-b28f-41f4-b8c1-fdc6659ffe51
 
e50dd0b3-8d6a-4076-a66b-292658b671d0::DEBUG::2012-12-21 11:57:17,177::fileVolume::561::Storage.Volume::(validateVolumePath) validate path for e0191317-b28f-41f4-b8c1-fdc6659ffe51
 
e50dd0b3-8d6a-4076-a66b-292658b671d0::DEBUG::2012-12-21 11:57:17,179::formatConverter::128::Storage.v3DomainConverter::(v3UpgradeVolumePermissions) Changing permissions (read-write) for the volume e0191317-b28f-41f4-b8c1-fdc6659ffe51
 
e50dd0b3-8d6a-4076-a66b-292658b671d0::DEBUG::2012-12-21 11:57:17,210::formatConverter::153::Storage.v3DomainConverter::(v3DomainConverter) Creating the volume lease for e0191317-b28f-41f4-b8c1-fdc6659ffe51
 
e50dd0b3-8d6a-4076-a66b-292658b671d0::DEBUG::2012-12-21 11:57:17,210::fileVolume::401::Storage.Volume::(newVolumeLease) Initializing volume lease volUUID=e0191317-b28f-41f4-b8c1-fdc6659ffe51 sdUUID=b08fbecf-6b72-4576-9ee4-923546c8a29d, metaId=('/rhev/data-center/6254b9d8-bfa0-11e1-bfa9-0019b917ca79/b08fbecf-6b72-4576-9ee4-923546c8a29d/images/f13de9c8-3855-4579-bc2b-859a67572f69/e0191317-b28f-41f4-b8c1-fdc6659ffe51',)
 
e50dd0b3-8d6a-4076-a66b-292658b671d0::ERROR::2012-12-21 11:57:22,956::sp::316::Storage.StoragePool::(startSpm) Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/sp.py", line 274, in startSpm
self._upgradePool(expectedDomVersion, __securityOverride=True)
File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper
return f(self, *args, **kwargs)
File "/usr/share/vdsm/storage/sp.py", line 414, in _upgradePool
self._convertDomain(self.masterDomain, str(targetDomVersion))
File "/usr/share/vdsm/storage/sp.py", line 1032, in _convertDomain
domain.getRealDomain(), isMsd, targetFormat)
File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 281, in convert
converter(repoPath, hostId, imageRepo, isMsd)
File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 202, in v3DomainConverter
v3ResetMetaVolSize(vol) # BZ#811880
File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 115, in v3ResetMetaVolSize
qemuImg.FORMAT.QCOW2)
File "/usr/lib64/python2.6/site-packages/vdsm/qemuImg.py", line 67, in info
raise QImgError(rc, out, err)
QImgError: ecode=1, stdout=[], stderr=["qemu-img: Could not open '/rhev/data-center/6254b9d8-bfa0-11e1-bfa9-0019b917ca79/b08fbecf-6b72-4576-9ee4-923546c8a29d/images/f13de9c8-3855-4579-bc2b-859a67572f69/e0191317-b28f-41f4-b8c1-fdc6659ffe51'"], message=None
 
# ls -lah /rhev/data-center/mnt/tokyo.usersys.redhat.com:_opt_rhev_data/b08fbecf-6b72-4576-9ee4-923546c8a29d/images/f13de9c8-3855-4579-bc2b-859a67572f69/e0191317-b28f-41f4-b8c1-fdc6659ffe51
-rw-rw----. 1 vdsm kvm 11K Dec 21 2012 /rhev/data-center/mnt/tokyo.usersys.redhat.com:_opt_rhev_data/b08fbecf-6b72-4576-9ee4-923546c8a29d/images/f13de9c8-3855-4579-bc2b-859a67572f69/e0191317-b28f-41f4-b8c1-fdc6659ffe51
 
# qemu-img info !$
qemu-img info /rhev/data-center/mnt/tokyo.usersys.redhat.com:_opt_rhev_data/b08fbecf-6b72-4576-9ee4-923546c8a29d/images/f13de9c8-3855-4579-bc2b-859a67572f69/e0191317-b28f-41f4-b8c1-fdc6659ffe51
image: /rhev/data-center/mnt/tokyo.usersys.redhat.com:_opt_rhev_data/b08fbecf-6b72-4576-9ee4-923546c8a29d/images/f13de9c8-3855-4579-bc2b-859a67572f69/e0191317-b28f-41f4-b8c1-fdc6659ffe51
file format: raw
virtual size: 10K (10240 bytes)
disk size: 12K
 
# ls -lahZ /rhev/data-center/mnt/tokyo.usersys.redhat.com:_opt_rhev_data/b08fbecf-6b72-4576-9ee4-923546c8a29d/images/f13de9c8-3855-4579-bc2b-859a67572f69/e0191317-b28f-41f4-b8c1-fdc6659ffe51
-rw-rw----. vdsm kvm system_u:object_r:nfs_t:s0 /rhev/data-center/mnt/tokyo.usersys.redhat.com:_opt_rhev_data/b08fbecf-6b72-4576-9ee4-923546c8a29d/images/f13de9c8-3855-4579-bc2b-859a67572f69/e0191317-b28f-41f4-b8c1-fdc6659ffe51


Note You need to log in before you can comment on or make changes to this bug.