This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 882276 - 3.2 - [vdsm] Failure upgrading a storage domain to V3 - No space left on device
3.2 - [vdsm] Failure upgrading a storage domain to V3 - No space left on device
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.2.0
Unspecified Unspecified
high Severity high
: ---
: 3.2.0
Assigned To: Federico Simoncelli
Haim
storage
: ZStream
Depends On:
Blocks: 884314 896516 915537
  Show dependency treegraph
 
Reported: 2012-11-30 09:58 EST by Federico Simoncelli
Modified: 2016-02-10 13:32 EST (History)
12 users (show)

See Also:
Fixed In Version: vdsm-4.10.2-2.0
Doc Type: Bug Fix
Doc Text:
During an early beta of rhev 3.0, vdsm generated problematic metadata tags (see BZ#732980) that are incompatible with the V3 upgrade. A preliminary step has been added to the upgrade process in order to fix the relevant tags (when needed) and proceed with the regular upgrade.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-06-10 16:36:50 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
scohen: Triaged+


Attachments (Terms of Use)

  None (edit)
Description Federico Simoncelli 2012-11-30 09:58:31 EST
The upgrade process to V3 was interrupted on the vdsm side with the following traceback:

3a05edaa-d983-4f30-ab79-4762466f73cb::ERROR::2012-11-28 16:58:26,041::task::853::TaskManager.Task::(_setError) Task=`3a05edaa-d983-4f30-ab79-4762466f73cb`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 320, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/sp.py", line 274, in startSpm
    self._upgradePool(expectedDomVersion, __securityOverride=True)
  File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 414, in _upgradePool
    self._convertDomain(self.masterDomain, str(targetDomVersion))
  File "/usr/share/vdsm/storage/sp.py", line 1032, in _convertDomain
    domain.getRealDomain(), isMsd, targetFormat)
  File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 281, in convert
    converter(repoPath, hostId, imageRepo, isMsd)
  File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 155, in v3DomainConverter
    type(vol).newVolumeLease(metaId, domain.sdUUID, volUUID)
  File "/usr/share/vdsm/storage/blockVolume.py", line 617, in newVolumeLease
    sanlock.init_resource(sdUUID, volUUID, [(leasePath, leaseOffset)])
SanlockException: (28, 'Sanlock resource init failure', 'No space left on device')

After investigating the issue it has been discovered that the storage domain was previously managed with old vdsm releases, one of which was generating the metadata offset by step of 512 (instead of 1). For reference the patch that fixed the issue (in vdsm-4.9-107.el6) is:

commit 496c0c37df387a25afee67a0ff90030e7eab36e2
Author: Federico Simoncelli <fsimonce@redhat.com>
Date:   Fri Sep 2 16:36:33 2011 +0000

    BZ#732980 MD tag must be in blocks unit
    
    This patch fixes the unit used for the MD tag (blocks instead of
    bytes) and completes the support for the new MS tag which holds
    the size (in blocks) of the metadata.
    
    Change-Id: Id268735f60a289f9e8f12a7d7bb3180bf3398c54


This bug led to very high offsets numbers for the volumes metadata:

# lvs -o +tags | sed -n 's/.*MD_\([0-9]\+\).*/\1/p' | sort -n | tail
45568
46080
47104
47616
48128
48640
49664
50176
51200
51712

Since sanlock is using the same offsets to initialize the volume leases, these values would need an LV of 52Gb.
The current leases LV size is 2Gb and that would allow 2048 - 100 = 1948 volumes, since the older vdsm version generated "holes" of 511 free spots now there's not enough space to satisfy the requests during the upgrade.
Comment 2 Federico Simoncelli 2012-11-30 11:42:42 EST
An initial patch has been proposed upstream to skip this error for volumes that we know are affected by this issue (MD offset > 2048 - 100 - 1). This will lead to V3 domains that have no leases for certain volumes and they could be handled as a special case in the future (eg: no
leases for MD offsets > 1947, or even better they could be eventually fixed somehow).

It's not the best solution (as for example reallocate the metadata slots during the upgrade) but it's probably one of the best compromise we could have at this time.

commit d0809c40befb370f49444d1c3b3de34a05e3fa89
Author: Federico Simoncelli <fsimonce@redhat.com>
Date:   Fri Nov 30 11:32:46 2012 -0500

    upgrade: skip volume lease for faulty metadata offsets

http://gerrit.ovirt.org/#/c/9600/
Comment 4 Federico Simoncelli 2012-12-06 05:52:09 EST
The final patch is:

commit 2ba76e3b45a9866b913ece63a51c540410fdd561
Author: Federico Simoncelli <fsimonce@redhat.com>
Date:   Mon Dec 3 05:28:39 2012 -0500

    upgrade: reallocate the metadata slots when needed

http://gerrit.ovirt.org/#/c/9660/
Comment 6 vvyazmin@redhat.com 2013-01-02 07:52:42 EST
Verify from upgrade iSCSI storage domain from v2 to v3 (upgrade from 3.0 to 3.2 environment) 


RHEVM 3.0 - IC159 environment:
RHEVM: rhevm-3.0.7_0001-2.el6_3.x86_64
VDSM: vdsm-4.9-113.4.el6_3.x86_64
LIBVIRT: libvirt-0.9.10-21.el6_3.6.x86_64
QUMU-KVM: qemu-kvm-rhev-0.12.1.2-2.295.el6_3.5.x86_64

RHEVM 3.2 - SF02 environment:
RHEVM: rhevm-3.2.0-2.el6ev.noarch
VDSM: vdsm-4.10.2-2.0.el6.x86_64
LIBVIRT: libvirt-0.10.2-13.el6.x86_64
QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.348.el6.x86_64
SANLOCK: sanlock-2.6-2.el6.x86_64
Comment 11 errata-xmlrpc 2013-06-10 16:36:50 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0886.html

Note You need to log in before you can comment on or make changes to this bug.