Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 889579

Summary: V1->V3 storage domain upgrade failure - results in round-robining of SPM role and invalid DC status.
Product: Red Hat Enterprise Linux 6 Reporter: Stephen Gordon <sgordon>
Component: vdsmAssignee: Ayal Baron <abaron>
Status: CLOSED DUPLICATE QA Contact: Haim <hateya>
Severity: high Docs Contact:
Priority: urgent    
Version: 6.3CC: abaron, bazulay, iheim, lpeer, yeylon, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-24 07:09:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Stephen Gordon 2012-12-21 19:18:12 UTC
Description of problem:

I upgraded my RHEV-M to 3.1 successfully, I have since however attempted to move my (default) cluster and data center to 3.1 mode. To do this I:

- Suspended all VMs.
- Put the hosts into maintenance mode.
- Ran yum update on the hosts and rebooted them to ensure I had the latest kernel, vdsm etc. (hosts pull updates from RHN).
- Changed the cluster compatibility mode to 3.1 in RHEV-M.

In the UI the change is reflected immediately, but it seems in the background VDSM has some work to do. When I re-activated the hosts:

- The Data Center status appears as invalid.
- The hosts contend for SPM but never succeed.

Looking at the logs it appears that VDSM has upgraded the data storage domain to V3 (as expected) but is unable to open some of the images.

The exceptions that jumped out to me at first were of the following form:

81ae8093-43c3-4e6e-8aaf-02203bf19a74::ERROR::2012-12-21 11:35:14,828::task::853::TaskManager.Task::(_setError) Task=`81ae8093-43c3-4e6e-8aaf-02203bf19a74`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 320, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/sp.py", line 274, in startSpm
    self._upgradePool(expectedDomVersion, __securityOverride=True)
  File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 414, in _upgradePool
    self._convertDomain(self.masterDomain, str(targetDomVersion))
  File "/usr/share/vdsm/storage/sp.py", line 1032, in _convertDomain
    domain.getRealDomain(), isMsd, targetFormat)
  File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 281, in convert
    converter(repoPath, hostId, imageRepo, isMsd)
  File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 202, in v3DomainConverter
    v3ResetMetaVolSize(vol)  # BZ#811880
  File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 115, in v3ResetMetaVolSize
    qemuImg.FORMAT.QCOW2)
  File "/usr/lib64/python2.6/site-packages/vdsm/qemuImg.py", line 67, in info
    raise QImgError(rc, out, err)
QImgError: ecode=1, stdout=[], stderr=["qemu-img: Could not open '/rhev/data-center/6254b9d8-bfa0-11e1-bfa9-0019b917ca79/b08fbecf-6b72-4576-9ee4-923546c8a29d/images/f13de9c8-3855-4579-bc2b-859a67572f69/e0191317-b28f-41f4-b8c1-fdc6659ffe51'"], message=None

I got some assistance from Lee Yarwood via #sbr-virt and his investigation suggested that the images it can't open are those storing the suspend state and are in raw format. The error being returned is because VDSM attempts to check them with qemu-img info -f qcow2.

After manually using qemu-img covert to turn the RAW images into QCOW2 images one of the hosts was able to win the SPM election so I could bring the environment back up.

So my questions are:

- What is the expected format of the images that are created when suspending?
- Why couldn't VDSM open them after the storage domain upgrade (well, more accurately why did the format saved now match the format passed to qemu-img info - we know given the parameters why it failed)?
- Are users expected to fully shutdown (not just suspend) VMs before changing the compatibility mode, and if so why do we allow it to proceed otherwise?

Version-Release number of selected component (if applicable):

Both hosts show the following vdsm packages installed.

# rpm -qa | grep vdsm
vdsm-cli-4.9.6-44.1.el6_3.noarch
vdsm-python-4.9.6-44.1.el6_3.x86_64
vdsm-4.9.6-44.1.el6_3.x86_64

Additional Info:

I am fairly concerned about the idea of a customer running into this. The

Comment 2 Stephen Gordon 2012-12-21 19:35:41 UTC
Some further information from Lee's notes:

e50dd0b3-8d6a-4076-a66b-292658b671d0::DEBUG::2012-12-21 11:57:17,176::formatConverter::144::Storage.v3DomainConverter::(v3DomainConverter) Converting volume: e0191317-b28f-41f4-b8c1-fdc6659ffe51
 
e50dd0b3-8d6a-4076-a66b-292658b671d0::DEBUG::2012-12-21 11:57:17,177::fileVolume::561::Storage.Volume::(validateVolumePath) validate path for e0191317-b28f-41f4-b8c1-fdc6659ffe51
 
e50dd0b3-8d6a-4076-a66b-292658b671d0::DEBUG::2012-12-21 11:57:17,179::formatConverter::128::Storage.v3DomainConverter::(v3UpgradeVolumePermissions) Changing permissions (read-write) for the volume e0191317-b28f-41f4-b8c1-fdc6659ffe51
 
e50dd0b3-8d6a-4076-a66b-292658b671d0::DEBUG::2012-12-21 11:57:17,210::formatConverter::153::Storage.v3DomainConverter::(v3DomainConverter) Creating the volume lease for e0191317-b28f-41f4-b8c1-fdc6659ffe51
 
e50dd0b3-8d6a-4076-a66b-292658b671d0::DEBUG::2012-12-21 11:57:17,210::fileVolume::401::Storage.Volume::(newVolumeLease) Initializing volume lease volUUID=e0191317-b28f-41f4-b8c1-fdc6659ffe51 sdUUID=b08fbecf-6b72-4576-9ee4-923546c8a29d, metaId=('/rhev/data-center/6254b9d8-bfa0-11e1-bfa9-0019b917ca79/b08fbecf-6b72-4576-9ee4-923546c8a29d/images/f13de9c8-3855-4579-bc2b-859a67572f69/e0191317-b28f-41f4-b8c1-fdc6659ffe51',)
 
e50dd0b3-8d6a-4076-a66b-292658b671d0::ERROR::2012-12-21 11:57:22,956::sp::316::Storage.StoragePool::(startSpm) Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/sp.py", line 274, in startSpm
self._upgradePool(expectedDomVersion, __securityOverride=True)
File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper
return f(self, *args, **kwargs)
File "/usr/share/vdsm/storage/sp.py", line 414, in _upgradePool
self._convertDomain(self.masterDomain, str(targetDomVersion))
File "/usr/share/vdsm/storage/sp.py", line 1032, in _convertDomain
domain.getRealDomain(), isMsd, targetFormat)
File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 281, in convert
converter(repoPath, hostId, imageRepo, isMsd)
File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 202, in v3DomainConverter
v3ResetMetaVolSize(vol) # BZ#811880
File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 115, in v3ResetMetaVolSize
qemuImg.FORMAT.QCOW2)
File "/usr/lib64/python2.6/site-packages/vdsm/qemuImg.py", line 67, in info
raise QImgError(rc, out, err)
QImgError: ecode=1, stdout=[], stderr=["qemu-img: Could not open '/rhev/data-center/6254b9d8-bfa0-11e1-bfa9-0019b917ca79/b08fbecf-6b72-4576-9ee4-923546c8a29d/images/f13de9c8-3855-4579-bc2b-859a67572f69/e0191317-b28f-41f4-b8c1-fdc6659ffe51'"], message=None
 
# ls -lah /rhev/data-center/mnt/tokyo.usersys.redhat.com:_opt_rhev_data/b08fbecf-6b72-4576-9ee4-923546c8a29d/images/f13de9c8-3855-4579-bc2b-859a67572f69/e0191317-b28f-41f4-b8c1-fdc6659ffe51
-rw-rw----. 1 vdsm kvm 11K Dec 21 2012 /rhev/data-center/mnt/tokyo.usersys.redhat.com:_opt_rhev_data/b08fbecf-6b72-4576-9ee4-923546c8a29d/images/f13de9c8-3855-4579-bc2b-859a67572f69/e0191317-b28f-41f4-b8c1-fdc6659ffe51
 
# qemu-img info !$
qemu-img info /rhev/data-center/mnt/tokyo.usersys.redhat.com:_opt_rhev_data/b08fbecf-6b72-4576-9ee4-923546c8a29d/images/f13de9c8-3855-4579-bc2b-859a67572f69/e0191317-b28f-41f4-b8c1-fdc6659ffe51
image: /rhev/data-center/mnt/tokyo.usersys.redhat.com:_opt_rhev_data/b08fbecf-6b72-4576-9ee4-923546c8a29d/images/f13de9c8-3855-4579-bc2b-859a67572f69/e0191317-b28f-41f4-b8c1-fdc6659ffe51
file format: raw
virtual size: 10K (10240 bytes)
disk size: 12K
 
# ls -lahZ /rhev/data-center/mnt/tokyo.usersys.redhat.com:_opt_rhev_data/b08fbecf-6b72-4576-9ee4-923546c8a29d/images/f13de9c8-3855-4579-bc2b-859a67572f69/e0191317-b28f-41f4-b8c1-fdc6659ffe51
-rw-rw----. vdsm kvm system_u:object_r:nfs_t:s0 /rhev/data-center/mnt/tokyo.usersys.redhat.com:_opt_rhev_data/b08fbecf-6b72-4576-9ee4-923546c8a29d/images/f13de9c8-3855-4579-bc2b-859a67572f69/e0191317-b28f-41f4-b8c1-fdc6659ffe51