Bug 882276
| Summary: | 3.2 - [vdsm] Failure upgrading a storage domain to V3 - No space left on device | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Federico Simoncelli <fsimonce> |
| Component: | vdsm | Assignee: | Federico Simoncelli <fsimonce> |
| Status: | CLOSED ERRATA | QA Contact: | Haim <hateya> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.2.0 | CC: | abaron, amureini, bazulay, cpelland, hateya, iheim, jgalipea, lnatapov, lpeer, scohen, yeylon, ykaul |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | 3.2.0 | Flags: | scohen:
Triaged+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | storage | ||
| Fixed In Version: | vdsm-4.10.2-2.0 | Doc Type: | Bug Fix |
| Doc Text: |
During an early beta of rhev 3.0, vdsm generated problematic metadata tags (see BZ#732980) that are incompatible with the V3 upgrade. A preliminary step has been added to the upgrade process in order to fix the relevant tags (when needed) and proceed with the regular upgrade.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-06-10 20:36:50 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 884314, 896516, 915537 | ||
An initial patch has been proposed upstream to skip this error for volumes that we know are affected by this issue (MD offset > 2048 - 100 - 1). This will lead to V3 domains that have no leases for certain volumes and they could be handled as a special case in the future (eg: no
leases for MD offsets > 1947, or even better they could be eventually fixed somehow).
It's not the best solution (as for example reallocate the metadata slots during the upgrade) but it's probably one of the best compromise we could have at this time.
commit d0809c40befb370f49444d1c3b3de34a05e3fa89
Author: Federico Simoncelli <fsimonce>
Date: Fri Nov 30 11:32:46 2012 -0500
upgrade: skip volume lease for faulty metadata offsets
http://gerrit.ovirt.org/#/c/9600/
The final patch is:
commit 2ba76e3b45a9866b913ece63a51c540410fdd561
Author: Federico Simoncelli <fsimonce>
Date: Mon Dec 3 05:28:39 2012 -0500
upgrade: reallocate the metadata slots when needed
http://gerrit.ovirt.org/#/c/9660/
Verify from upgrade iSCSI storage domain from v2 to v3 (upgrade from 3.0 to 3.2 environment) RHEVM 3.0 - IC159 environment: RHEVM: rhevm-3.0.7_0001-2.el6_3.x86_64 VDSM: vdsm-4.9-113.4.el6_3.x86_64 LIBVIRT: libvirt-0.9.10-21.el6_3.6.x86_64 QUMU-KVM: qemu-kvm-rhev-0.12.1.2-2.295.el6_3.5.x86_64 RHEVM 3.2 - SF02 environment: RHEVM: rhevm-3.2.0-2.el6ev.noarch VDSM: vdsm-4.10.2-2.0.el6.x86_64 LIBVIRT: libvirt-0.10.2-13.el6.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.348.el6.x86_64 SANLOCK: sanlock-2.6-2.el6.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0886.html |
The upgrade process to V3 was interrupted on the vdsm side with the following traceback: 3a05edaa-d983-4f30-ab79-4762466f73cb::ERROR::2012-11-28 16:58:26,041::task::853::TaskManager.Task::(_setError) Task=`3a05edaa-d983-4f30-ab79-4762466f73cb`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 320, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/share/vdsm/storage/sp.py", line 274, in startSpm self._upgradePool(expectedDomVersion, __securityOverride=True) File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper return f(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 414, in _upgradePool self._convertDomain(self.masterDomain, str(targetDomVersion)) File "/usr/share/vdsm/storage/sp.py", line 1032, in _convertDomain domain.getRealDomain(), isMsd, targetFormat) File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 281, in convert converter(repoPath, hostId, imageRepo, isMsd) File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 155, in v3DomainConverter type(vol).newVolumeLease(metaId, domain.sdUUID, volUUID) File "/usr/share/vdsm/storage/blockVolume.py", line 617, in newVolumeLease sanlock.init_resource(sdUUID, volUUID, [(leasePath, leaseOffset)]) SanlockException: (28, 'Sanlock resource init failure', 'No space left on device') After investigating the issue it has been discovered that the storage domain was previously managed with old vdsm releases, one of which was generating the metadata offset by step of 512 (instead of 1). For reference the patch that fixed the issue (in vdsm-4.9-107.el6) is: commit 496c0c37df387a25afee67a0ff90030e7eab36e2 Author: Federico Simoncelli <fsimonce> Date: Fri Sep 2 16:36:33 2011 +0000 BZ#732980 MD tag must be in blocks unit This patch fixes the unit used for the MD tag (blocks instead of bytes) and completes the support for the new MS tag which holds the size (in blocks) of the metadata. Change-Id: Id268735f60a289f9e8f12a7d7bb3180bf3398c54 This bug led to very high offsets numbers for the volumes metadata: # lvs -o +tags | sed -n 's/.*MD_\([0-9]\+\).*/\1/p' | sort -n | tail 45568 46080 47104 47616 48128 48640 49664 50176 51200 51712 Since sanlock is using the same offsets to initialize the volume leases, these values would need an LV of 52Gb. The current leases LV size is 2Gb and that would allow 2048 - 100 = 1948 volumes, since the older vdsm version generated "holes" of 511 free spots now there's not enough space to satisfy the requests during the upgrade.