Bug 882276 - 3.2 - [vdsm] Failure upgrading a storage domain to V3 - No space left on device
Summary: 3.2 - [vdsm] Failure upgrading a storage domain to V3 - No space left on device
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.2.0
Assignee: Federico Simoncelli
QA Contact: Haim
URL:
Whiteboard: storage
Depends On:
Blocks: 884314 896516 915537
TreeView+ depends on / blocked
 
Reported: 2012-11-30 14:58 UTC by Federico Simoncelli
Modified: 2022-07-09 05:52 UTC (History)
12 users (show)

Fixed In Version: vdsm-4.10.2-2.0
Doc Type: Bug Fix
Doc Text:
During an early beta of rhev 3.0, vdsm generated problematic metadata tags (see BZ#732980) that are incompatible with the V3 upgrade. A preliminary step has been added to the upgrade process in order to fix the relevant tags (when needed) and proceed with the regular upgrade.
Clone Of:
Environment:
Last Closed: 2013-06-10 20:36:50 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
scohen: Triaged+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-47040 0 None None None 2022-07-09 05:52:26 UTC
Red Hat Product Errata RHSA-2013:0886 0 normal SHIPPED_LIVE Moderate: rhev 3.2 - vdsm security and bug fix update 2013-06-11 00:25:02 UTC

Description Federico Simoncelli 2012-11-30 14:58:31 UTC
The upgrade process to V3 was interrupted on the vdsm side with the following traceback:

3a05edaa-d983-4f30-ab79-4762466f73cb::ERROR::2012-11-28 16:58:26,041::task::853::TaskManager.Task::(_setError) Task=`3a05edaa-d983-4f30-ab79-4762466f73cb`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 320, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/sp.py", line 274, in startSpm
    self._upgradePool(expectedDomVersion, __securityOverride=True)
  File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 414, in _upgradePool
    self._convertDomain(self.masterDomain, str(targetDomVersion))
  File "/usr/share/vdsm/storage/sp.py", line 1032, in _convertDomain
    domain.getRealDomain(), isMsd, targetFormat)
  File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 281, in convert
    converter(repoPath, hostId, imageRepo, isMsd)
  File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 155, in v3DomainConverter
    type(vol).newVolumeLease(metaId, domain.sdUUID, volUUID)
  File "/usr/share/vdsm/storage/blockVolume.py", line 617, in newVolumeLease
    sanlock.init_resource(sdUUID, volUUID, [(leasePath, leaseOffset)])
SanlockException: (28, 'Sanlock resource init failure', 'No space left on device')

After investigating the issue it has been discovered that the storage domain was previously managed with old vdsm releases, one of which was generating the metadata offset by step of 512 (instead of 1). For reference the patch that fixed the issue (in vdsm-4.9-107.el6) is:

commit 496c0c37df387a25afee67a0ff90030e7eab36e2
Author: Federico Simoncelli <fsimonce>
Date:   Fri Sep 2 16:36:33 2011 +0000

    BZ#732980 MD tag must be in blocks unit
    
    This patch fixes the unit used for the MD tag (blocks instead of
    bytes) and completes the support for the new MS tag which holds
    the size (in blocks) of the metadata.
    
    Change-Id: Id268735f60a289f9e8f12a7d7bb3180bf3398c54


This bug led to very high offsets numbers for the volumes metadata:

# lvs -o +tags | sed -n 's/.*MD_\([0-9]\+\).*/\1/p' | sort -n | tail
45568
46080
47104
47616
48128
48640
49664
50176
51200
51712

Since sanlock is using the same offsets to initialize the volume leases, these values would need an LV of 52Gb.
The current leases LV size is 2Gb and that would allow 2048 - 100 = 1948 volumes, since the older vdsm version generated "holes" of 511 free spots now there's not enough space to satisfy the requests during the upgrade.

Comment 2 Federico Simoncelli 2012-11-30 16:42:42 UTC
An initial patch has been proposed upstream to skip this error for volumes that we know are affected by this issue (MD offset > 2048 - 100 - 1). This will lead to V3 domains that have no leases for certain volumes and they could be handled as a special case in the future (eg: no
leases for MD offsets > 1947, or even better they could be eventually fixed somehow).

It's not the best solution (as for example reallocate the metadata slots during the upgrade) but it's probably one of the best compromise we could have at this time.

commit d0809c40befb370f49444d1c3b3de34a05e3fa89
Author: Federico Simoncelli <fsimonce>
Date:   Fri Nov 30 11:32:46 2012 -0500

    upgrade: skip volume lease for faulty metadata offsets

http://gerrit.ovirt.org/#/c/9600/

Comment 4 Federico Simoncelli 2012-12-06 10:52:09 UTC
The final patch is:

commit 2ba76e3b45a9866b913ece63a51c540410fdd561
Author: Federico Simoncelli <fsimonce>
Date:   Mon Dec 3 05:28:39 2012 -0500

    upgrade: reallocate the metadata slots when needed

http://gerrit.ovirt.org/#/c/9660/

Comment 6 vvyazmin@redhat.com 2013-01-02 12:52:42 UTC
Verify from upgrade iSCSI storage domain from v2 to v3 (upgrade from 3.0 to 3.2 environment) 


RHEVM 3.0 - IC159 environment:
RHEVM: rhevm-3.0.7_0001-2.el6_3.x86_64
VDSM: vdsm-4.9-113.4.el6_3.x86_64
LIBVIRT: libvirt-0.9.10-21.el6_3.6.x86_64
QUMU-KVM: qemu-kvm-rhev-0.12.1.2-2.295.el6_3.5.x86_64

RHEVM 3.2 - SF02 environment:
RHEVM: rhevm-3.2.0-2.el6ev.noarch
VDSM: vdsm-4.10.2-2.0.el6.x86_64
LIBVIRT: libvirt-0.10.2-13.el6.x86_64
QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.348.el6.x86_64
SANLOCK: sanlock-2.6-2.el6.x86_64

Comment 11 errata-xmlrpc 2013-06-10 20:36:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0886.html


Note You need to log in before you can comment on or make changes to this bug.